On Thursday, April 28, 2022 at 10:12:12 AM UTC+1, Theo wrote:
jkn <[email protected]> wrote:
So a few thoughts:
- any idea if this is related to my recent upgrade? ie. new feature in [K]Ubunto 22.04?
- is this likely to be a real issue, or an over-zealous warning?
I checked my NVMe and I don't have an 'error log' section. I don't know
what yours means. You could try nvme-cli (that's the package in Ubuntu) eg:
$ sudo nvme error-log /dev/nvme0n1
$ sudo nvme smart-log /dev/nvme0n1
(other *-log commands available)
and see if it reports anything interesting. eg for me smart-log says:
$ sudo nvme smart-log /dev/nvme1n1
Smart Log for NVME device:nvme1n1 namespace-id:ffffffff
critical_warning : 0
temperature : 25 C
available_spare : 100%
available_spare_threshold : 5%
percentage_used : 10%
endurance group critical warning summary: 0
so I seem to have used 10% of my write endurance (I think).
It is possible doing an upgrade has eaten some of your available writes and pushed it over some threshold.
I am thinking of doing two things: buying a new/larger(1TB) M.2 drive, and dd'ing everything over; and upgrading the firmware on this Crucial M.2 drive
- Is it a particularly risky operation up update the M.2 firmware without backing up the drive first?
In theory it shouldn't be a risk to update the firmware (it happens in production all the time), but if the drive is exhibiting failure signs I'd want to make a backup first just in case.
Theo
(who hadn't come across nvme-cli before and thinks it could be a useful way of using cheaper NVMe in servers and replacing drives when they start
running out of writes)
Thanks a lot Theo, very useful.
I installed nvme-cli and get this:
{{{ $ sudo nvme error-log /dev/nvme0n1
Error Log Entries for device:nvme0n1 entries:16
.................
Entry[ 0]
.................
error_count : 275
sqid : 0
cmdid : 0x1008
status_field : 0x2002(INVALID_FIELD: A reserved coded value or an unsupported value in a defined field)
phase_tag : 0x1
parm_err_loc : 0x28
lba : 0
nsid : 0
vs : 0
trtype : The transport type is not indicated or the error is not transport related.
cs : 0
trtype_spec_info: 0
# (all other log entries seem 'empty')
}}}
{{{ $ sudo nvme smart-log /dev/nvme0n1
Smart Log for NVME device:nvme0n1 namespace-id:ffffffff
critical_warning : 0
temperature : 31 C (304 Kelvin)
available_spare : 100%
available_spare_threshold : 5%
percentage_used : 0%
endurance group critical warning summary: 0
data_units_read : 1,140,151
data_units_written : 2,357,758
host_read_commands : 14,833,879
host_write_commands : 25,231,374
controller_busy_time : 11,687
power_cycles : 236
power_on_hours : 8,834
unsafe_shutdowns : 38
media_errors : 0
num_err_log_entries : 275
Warning Temperature Time : 0
Critical Composite Temperature Time : 0
Thermal Management T1 Trans Count : 0
Thermal Management T2 Trans Count : 0
Thermal Management T1 Total Time : 0
Thermal Management T2 Total Time : 0
}}}
It is possible doing an upgrade has eaten some of your available writes and pushed it over some threshold.
That is a good thought...
I think I will press on with buying a new 1TB M.2 drive (I was thinking of doing
that anyway, as it happens), and updating the firmware on this one
only after I have dd'd everything over and swapped to the new one.
Any recommendations for a decent M.2 1TB drive? I see a lot of slagging off
on Amazon on the Crucial P2 I have here...
Thanks, J^n
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)