Forum: >>> Magnum BBS <<<

I/O errors during RAID check but no SMART errors

From Jochen Spieker@21:1/5 to All on Tue Oct 8 17:00:01 2024

Hey,

please forgive me for posting a question that is not Debian-specific,
but maybe somebody here can explain this to me. Ten years ago I would
have posted to Usenet instead.

I have two disks in a RAID-1:

| $ cat /proc/mdstat
| Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
| md0 : active raid1 sdb1[2] sdc1[0]
| 5860390400 blocks super 1.2 [2/2] [UU]
| bitmap: 5/44 pages [20KB], 65536KB chunk
|
| unused devices: <none>

During the latest monthly check I got kernel messages like this:

| Oct 06 00:57:01 jigsaw kernel: md: data-check of RAID array md0
| Oct 06 14:27:11 jigsaw kernel: ata3.00: exception Emask 0x0 SAct 0x4000000 SErr 0x0 action 0x0
| Oct 06 14:27:11 jigsaw kernel: ata3.00: irq_stat 0x40000008
| Oct 06 14:27:11 jigsaw kernel: ata3.00: failed command: READ FPDMA QUEUED
| Oct 06 14:27:11 jigsaw kernel: ata3.00: cmd 60/80:d0:80:74:f9/08:00:2d:02:00/40 tag 26 ncq dma 1114112 in
| res 41/40:00:50:77:f9/00:00:2d:02:00/00 Emask 0x409 (media error) <F>
| Oct 06 14:27:11 jigsaw kernel: ata3.00: status: { DRDY ERR }
| Oct 06 14:27:11 jigsaw kernel: ata3.00: error: { UNC }
| Oct 06 14:27:11 jigsaw kernel: ata3.00: configured for UDMA/133
| Oct 06 14:27:11 jigsaw kernel: sd 2:0:0:0: [sdb] tag#26 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=7s
| Oct 06 14:27:11 jigsaw kernel: sd 2:0:0:0: [sdb] tag#26 Sense Key : Medium Error [current]
| Oct 06 14:27:11 jigsaw kernel: sd 2:0:0:0: [sdb] tag#26 Add. Sense: Unrecovered read error - auto reallocate failed
| Oct 06 14:27:11 jigsaw kernel: sd 2:0:0:0: [sdb] tag#26 CDB: Read(16) 88 00 00 00 00 02 2d f9 74 80 00 00 08 80 00 00
| Oct 06 14:27:11 jigsaw kernel: I/O error, dev sdb, sector 9361257600 op 0x0:(READ) flags 0x0 phys_seg 150 prio class 3
| Oct 06 14:27:11 jigsaw kernel: ata3: EH complete

The sector number mentioned at the bottom is increasing during the
check.

The way I understand these messages is that some sectors cannot be read
from sdb at all and the disk is unable to reallocate the data somewhere
else (probably because it doesn't know what the data should be in the
first place).

The disk has been running continuously for seven years now and I am
running out of space anyway, so I already ordered a replacement. But I
do not fully understand what is happening.

Two of these message blocks end with this:

| Oct 07 10:26:12 jigsaw kernel: md/raid1:md0: sdb1: rescheduling sector 10198068744

What does that mean for the other instances of this error? The data
is still readable from the other disk in the RAID, right? Why doesn't md mention it? Why is the RAID still considered healthy? At some point I
would expect the disk to be kicked from the RAID.

I unmounted the filesystem and performed a bad blocks scan (fsck.ext4
-fcky) that did not find anything of importance (only "Inode x extent
tree (at level 1) could be shorter/narrower"), and it also did not yield
any of the above kernel messages. But another RAID check triggers these messages again, just with different sector numbers. The RAID is still
healthy, though.

Should this tell me that it is new sectors are dying all the time, or
should this lead me to believe that a cable / the SATA controller is at
fault? I don't even see any errors with smartctl:

| # smartctl -a /dev/sdb
| smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-25-amd64] (local build)
| Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
|
| === START OF INFORMATION SECTION ===
| Model Family: Western Digital Red
| Device Model: WDC WD60EFRX-68L0BN1
| Serial Number: WD-xxxxxxxxxxxx
| LU WWN Device Id: 5 0014ee 263faee8c
| Firmware Version: 82.00A82
| User Capacity: 6,001,175,126,016 bytes [6.00 TB]
| Sector Sizes: 512 bytes logical, 4096 bytes physical
| Rotation Rate: 5700 rpm
| Device is: In smartctl database 7.3/5319
| ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b
| SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
| Local Time is: Tue Oct 8 15:15:22 2024 CEST
| SMART support is: Available - device has SMART capability.
| SMART support is: Enabled
|
| === START OF READ SMART DATA SECTION ===
| SMART overall-health self-assessment test result: PASSED
|
| General SMART Values:
| Offline data collection status: (0x85) Offline data collection activity
| was aborted by an interrupting command from host.
| Auto Offline Data Collection: Enabled.
| Self-test execution status: ( 245) Self-test routine in progress...
| 50% of test remaining.
| Total time to complete Offline
| data collection: ( 1904) seconds.
| Offline data collection
| capabilities: (0x7b) SMART execute Offline immediate.
| Auto Offline data collection on/off support.
| Suspend Offline collection upon new
| command.
| Offline surface scan supported.
| Self-test supported.
| Conveyance Self-test supported.
| Selective Self-test supported.
| SMART capabilities: (0x0003) Saves SMART data before entering
| power-saving mode.
| Supports SMART auto save timer.
| Error logging capability: (0x01) Error logging supported.
| General Purpose Logging supported.
| Short self-test routine
| recommended polling time: ( 2) minutes.
| Extended self-test routine
| recommended polling time: ( 673) minutes.
| Conveyance self-test routine
| recommended polling time: ( 5) minutes.
| SCT capabilities: (0x303d) SCT Status supported.
| SCT Error Recovery Control supported. | SCT Feature Control supported.
| SCT Data Table supported.
|
| SMART Attributes Data Structure revision number: 16
| Vendor Specific SMART Attributes with Thresholds:
| ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
| 1 Raw_Read_Error_Rate 0x002f 199 169 051 Pre-fail Always - 81
| 3 Spin_Up_Time 0x0027 198 197 021 Pre-fail Always - 9100
| 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 83
| 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
| 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
| 9 Power_On_Hours 0x0032 016 016 000 Old_age Always - 61794
| 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
| 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
| 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 82
| 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 54
| 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 2219
| 194 Temperature_Celsius 0x0022 119 116 000 Old_age Always - 33
| 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
| 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
| 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
| 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
| 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 43
|
| SMART Error Log Version: 1
| No Errors Logged
|
| SMART Self-test log structure revision number 1
| Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
| # 1 Short offline Completed without error 00% 61789 -
| # 2 Short offline Completed without error 00% 61758 -
| # 3 Short offline Completed without error 00% 61752 -
| # 4 Extended offline Completed without error 00% 61726 -
| # 5 Short offline Completed without error 00% 61710 -
| # 6 Short offline Completed without error 00% 61686 -
| # 7 Short offline Completed without error 00% 61662 -
| # 8 Short offline Completed without error 00% 61638 -
| # 9 Short offline Completed without error 00% 61615 -
| #10 Short offline Completed without error 00% 61591 -
| #11 Short offline Completed without error 00% 61567 -
| #12 Extended offline Completed without error 00% 61559 -
| #13 Short offline Completed without error 00% 61543 -
| #14 Short offline Completed without error 00% 61519 -
| #15 Short offline Completed without error 00% 61495 -
| #16 Short offline Completed without error 00% 61471 -
| #17 Short offline Completed without error 00% 61447 -
| #18 Short offline Completed without error 00% 61423 -
| #19 Short offline Completed without error 00% 61399 -
| #20 Extended offline Completed without error 00% 61391 -
| #21 Short offline Completed without error 00% 61375 -
|
| SMART Selective self-test log data structure revision number 1
| SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
| 1 0 0 Not_testing
| 2 0 0 Not_testing
| 3 0 0 Not_testing
| 4 0 0 Not_testing
| 5 0 0 Not_testing
| Selective self-test flags (0x0):
| After scanning selected spans, do NOT read-scan remainder of disk.
| If Selective self-test is pending on power-up, resume after 0 minute delay.

I am still waiting for the result of a long self-test.

Do you think I should do remove the drive from the RAID immediately? Or
should I suspect something else is at faula?t I perfer not to run the
risk of losing the RAID completely when I keep on running on one disk
while the new one is being shipped. I do have backups, but it would be
great if I didn't need to restore.

Regards,
Jochen
--
I see weapons of mass destruction as shameful but necessary.
[Agree] [Disagree]
<http://archive.slowlydownward.com/NODATA/data_enter2.html>

-----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEERCNn0ngYrOUG3zZFU4ruOUNvhZcFAmcFSKQACgkQU4ruOUNv hZfVVw//e9K97BcV2h5mLmNoQSEgFjNJMRvSZALjm+9eZEV0VddOq1p099xoSmEb GNYXy7TRT7BSLo4seNwQC6xW4/uGwTE0zrdM9BTI82ujzlfJ7yklVvSB0pP/DEs9 lXCxX4eKQPZHYqO939V+QhL8ZIGqDWYZYUk8+a02VKTv1mSlNalZCFJia2yN9Lra +N8nATYoNWaZC971zfaNv5JZORebS7/Zm/3qYnGqTIB9jpNCTl9zNxH7e8Vtam46 rapo46n4zEnDIbQDNdckXT1UxBeQm0zFwvGtdKNC83fNET3yliD7cZ0ljE78K4Qe c++SMYKF+CWZ+H3spzFu0RFT2X7/9xpMpETv5jyb8wwGYbnI+s1ST7Fi+crUuij/ wYqAE21H8IXPiA4tCBFfBNFOduOqkMN4IDRSQ10fuGPeQ+FnWKx6N7w51KEgV9MY R7ADSnfwXpRg54KixLsShzk417AP2at2GAgg9342f3Gavu1tbPtaLfB3Smf5n+6T Q3cn3iKf//PiiPwbg74snqk6pzkqbrneG/h6dkDZh/pTFjwrlayH/g0KYo8eJIXi x/Ss7VmfKWuX0FbBnFokggJiwDybbhOG8oSfLl0GbgbakQpXWGS5xRPF3kyTsp8G MRX4bOG0F/hCOTwChSRA2sV/Vld/3bW4eaUP9+Xdq77zeQBi0CY=
=TE6c
-----END PGP SIGNATURE-----

---

From Dan Ritter@21:1/5 to Jochen Spieker on Tue Oct 8 17:50:01 2024

Jochen Spieker wrote:

I have two disks in a RAID-1:

| $ cat /proc/mdstat
| Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
| md0 : active raid1 sdb1[2] sdc1[0]
| 5860390400 blocks super 1.2 [2/2] [UU]
| bitmap: 5/44 pages [20KB], 65536KB chunk
|
| unused devices: <none>

During the latest monthly check I got kernel messages like this:

| Oct 06 00:57:01 jigsaw kernel: md: data-check of RAID array md0
| Oct 06 14:27:11 jigsaw kernel: ata3.00: exception Emask 0x0 SAct 0x4000000 SErr 0x0 action 0x0
| Oct 06 14:27:11 jigsaw kernel: ata3.00: irq_stat 0x40000008
| Oct 06 14:27:11 jigsaw kernel: ata3.00: failed command: READ FPDMA QUEUED
| Oct 06 14:27:11 jigsaw kernel: ata3.00: cmd 60/80:d0:80:74:f9/08:00:2d:02:00/40 tag 26 ncq dma 1114112 in
| res 41/40:00:50:77:f9/00:00:2d:02:00/00 Emask 0x409 (media error) <F>
| Oct 06 14:27:11 jigsaw kernel: ata3.00: status: { DRDY ERR }
| Oct 06 14:27:11 jigsaw kernel: ata3.00: error: { UNC }
| Oct 06 14:27:11 jigsaw kernel: ata3.00: configured for UDMA/133
| Oct 06 14:27:11 jigsaw kernel: sd 2:0:0:0: [sdb] tag#26 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=7s
| Oct 06 14:27:11 jigsaw kernel: sd 2:0:0:0: [sdb] tag#26 Sense Key : Medium Error [current]
| Oct 06 14:27:11 jigsaw kernel: sd 2:0:0:0: [sdb] tag#26 Add. Sense: Unrecovered read error - auto reallocate failed
| Oct 06 14:27:11 jigsaw kernel: sd 2:0:0:0: [sdb] tag#26 CDB: Read(16) 88 00 00 00 00 02 2d f9 74 80 00 00 08 80 00 00
| Oct 06 14:27:11 jigsaw kernel: I/O error, dev sdb, sector 9361257600 op 0x0:(READ) flags 0x0 phys_seg 150 prio class 3
| Oct 06 14:27:11 jigsaw kernel: ata3: EH complete

If this happens once, it's just a thing that happened.

If it happens multiple times, it means that there's a hardware
error: sometimes a cable, rarely the SATA port, often the drive.

The sector number mentioned at the bottom is increasing during the
check.

So it repeats, and it's contiguous. That suggests a flaw in the
drive itself.

The way I understand these messages is that some sectors cannot be read
from sdb at all and the disk is unable to reallocate the data somewhere
else (probably because it doesn't know what the data should be in the
first place).

Yes.

The disk has been running continuously for seven years now and I am
running out of space anyway, so I already ordered a replacement. But I
do not fully understand what is happening.

The drive is dying, slowly. In this case it's starting with a
bad patch on a platter.

Two of these message blocks end with this:

| Oct 07 10:26:12 jigsaw kernel: md/raid1:md0: sdb1: rescheduling sector 10198068744

What does that mean for the other instances of this error? The data
is still readable from the other disk in the RAID, right? Why doesn't md mention it? Why is the RAID still considered healthy? At some point I
would expect the disk to be kicked from the RAID.

md will eventually do that, but not until it gets bad enough.
That could be quite noticeable.

I unmounted the filesystem and performed a bad blocks scan (fsck.ext4
-fcky) that did not find anything of importance (only "Inode x extent
tree (at level 1) could be shorter/narrower"), and it also did not yield
any of the above kernel messages. But another RAID check triggers these messages again, just with different sector numbers. The RAID is still healthy, though.

I don't think it is.

Should this tell me that it is new sectors are dying all the time, or
should this lead me to believe that a cable / the SATA controller is at fault? I don't even see any errors with smartctl:

If the sectors were effectively random, a cable fault would be
likely. If the sectors are contiguous or nearly-so, that's
definitely the disk.

| SMART Attributes Data Structure revision number: 16
| Vendor Specific SMART Attributes with Thresholds:
| ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
| 1 Raw_Read_Error_Rate 0x002f 199 169 051 Pre-fail Always - 81
| 3 Spin_Up_Time 0x0027 198 197 021 Pre-fail Always - 9100
| 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 83
| 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
| 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
| 9 Power_On_Hours 0x0032 016 016 000 Old_age Always - 61794
| 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
| 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
| 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 82
| 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 54
| 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 2219
| 194 Temperature_Celsius 0x0022 119 116 000 Old_age Always - 33
| 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
| 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
| 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
| 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
| 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 43

This looks like a drive which is old and starting to wear out
but is not there yet. The raw read error rate is starting to
creep up but isn't at a threshold.

I am still waiting for the result of a long self-test.

Do you think I should do remove the drive from the RAID immediately? Or should I suspect something else is at faula?t I perfer not to run the
risk of losing the RAID completely when I keep on running on one disk
while the new one is being shipped. I do have backups, but it would be
great if I didn't need to restore.

If the disk is a few days away from being replaced, I would not
bother shutting it off, but I would assume that it is not a full
mirror and somehow having the good disk fail would be bad.

-dsr-

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Andy Smith@21:1/5 to Jochen Spieker on Tue Oct 8 20:50:02 2024

Hi,

On Tue, Oct 08, 2024 at 04:58:46PM +0200, Jochen Spieker wrote:

The way I understand these messages is that some sectors cannot be read
from sdb at all and the disk is unable to reallocate the data somewhere
else (probably because it doesn't know what the data should be in the
first place).

When MD receives a read error it does read the mirrored data and write
it back. If it can't do that it fails the disk, so you are not getting
there yet.

Two of these message blocks end with this:

| Oct 07 10:26:12 jigsaw kernel: md/raid1:md0: sdb1: rescheduling sector 10198068744

What does that mean for the other instances of this error?

I expect you probably have either no TLER value set or it's set higher
than the kernel's own timeout. By default consumer drives try very hard
to read data, taking a long time doing so when there's issues. The
kernel SCSI layer will try several times, so the drive's timeout is
multiplied. Only if this ends up exceeding 30s will you get a read
error, and the message from MD about rescheduling the sector.

The data is still readable from the other disk in the RAID, right? Why doesn't md mention it?

I suspect that the times you saw an error from the SCSI layer but not
from MD, were times that the SCSI layer retried and got the data out eventually.

When the SCSI layer times out of all its retries it actually resets the
drive and then the whole bus, and that often causes MD to drop the disk.
You haven;t mentioned any messages about resetting the bus so I think
you are not having that many retries.

The fact that you are having any is bad, though.

Why is the RAID still considered healthy? At some point I
would expect the disk to be kicked from the RAID.

This will happen when/if MD can't compensate by reading data from other
mirrors and writing it back. If a write fails, or a disk drops
out entirely, then MD will fail the device.

Hopefully the results of your SMART long self-test will help clear this
up. These things can be hard to track down though.

After you do resolve this you should set TLER to some sensible value
like 7 seconds. That is not your biggest concern right now though.

Here is a thing I wrote about it quite some time ago:

https://strugglers.net/~andy/mothballed-blog/2015/11/09/linux-software-raid-and-drive-timeouts/#how-to-check-set-drive-timeouts

Do you think I should do remove the drive from the RAID immediately? Or should I suspect something else is at faula?t

The fact that you have no reallocated sectors and no pending sectors
and apparently all your writes are working makes me think there probably
isn't a fault with the drive but in some ways that is worse as it's easy
to replace a drive, not so eay to diagnose bad cables and marginal power supplies etc etc.

I probably wouldn't remove it because it's better than nothing. I
probably would try the easy fix of replacing the drive first, if I could
afford that.

I perfer not to run the risk of losing the RAID completely when I keep
on running on one disk while the new one is being shipped.

I would make sure the timeouts are set correctly because if you do get
into the situation where the kernel is resetting the bus, that can
temporarily take away both drives at once which can cause MD to fail
both out and mark the array as faulty. It's relatively easy to do the
manual intervention required to start it up again but it is a stressful.

Thanks,
Andy

--
https://bitfolk.com/ -- No-nonsense VPS hosting

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jochen Spieker@21:1/5 to All on Tue Oct 8 22:10:01 2024

Dan Ritter:

Jochen Spieker wrote:

The sector number mentioned at the bottom is increasing during the
check.

So it repeats, and it's contiguous. That suggests a flaw in the
drive itself.

It definitely looks like that:

| Oct 06 14:27:11 jigsaw kernel: I/O error, dev sdb, sector 9361257600 op 0x0:(READ) flags 0x0 phys_seg 150 prio class 3
| Oct 06 14:27:30 jigsaw kernel: I/O error, dev sdb, sector 9361275264 op 0x0:(READ) flags 0x4000 phys_seg 161 prio class 3
| Oct 06 14:27:37 jigsaw kernel: I/O error, dev sdb, sector 9361277696 op 0x0:(READ) flags 0x0 phys_seg 71 prio class 3
| Oct 06 14:28:02 jigsaw kernel: I/O error, dev sdb, sector 9361283584 op 0x0:(READ) flags 0x0 phys_seg 160 prio class 3
| Oct 06 14:28:09 jigsaw kernel: I/O error, dev sdb, sector 9361284864 op 0x0:(READ) flags 0x4000 phys_seg 160 prio class 3
| Oct 06 14:34:03 jigsaw kernel: I/O error, dev sdb, sector 9400838400 op 0x0:(READ) flags 0x0 phys_seg 168 prio class 3
| Oct 06 14:34:17 jigsaw kernel: I/O error, dev sdb, sector 9400841088 op 0x0:(READ) flags 0x0 phys_seg 153 prio class 3
| Oct 06 14:34:24 jigsaw kernel: I/O error, dev sdb, sector 9400842496 op 0x0:(READ) flags 0x4000 phys_seg 138 prio class 3
| Oct 06 14:34:31 jigsaw kernel: I/O error, dev sdb, sector 9400845056 op 0x0:(READ) flags 0x0 phys_seg 44 prio class 3
| Oct 06 14:34:39 jigsaw kernel: I/O error, dev sdb, sector 9400846464 op 0x0:(READ) flags 0x0 phys_seg 16 prio class 3
| Oct 06 14:34:46 jigsaw kernel: I/O error, dev sdb, sector 9400846592 op 0x0:(READ) flags 0x0 phys_seg 4 prio class 3
| Oct 06 14:34:53 jigsaw kernel: I/O error, dev sdb, sector 9400846848 op 0x0:(READ) flags 0x4000 phys_seg 59 prio class 3
| Oct 06 14:35:00 jigsaw kernel: I/O error, dev sdb, sector 9400849408 op 0x0:(READ) flags 0x0 phys_seg 27 prio class 3
| Oct 06 14:35:11 jigsaw kernel: I/O error, dev sdb, sector 9400850944 op 0x0:(READ) flags 0x4000 phys_seg 160 prio class 3
| Oct 06 14:35:19 jigsaw kernel: I/O error, dev sdb, sector 9400852224 op 0x0:(READ) flags 0x4000 phys_seg 160 prio class 3
| Oct 06 14:35:26 jigsaw kernel: I/O error, dev sdb, sector 9400853504 op 0x0:(READ) flags 0x4000 phys_seg 160 prio class 3
| Oct 06 14:35:37 jigsaw kernel: I/O error, dev sdb, sector 9400855040 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 3
| Oct 06 14:35:45 jigsaw kernel: I/O error, dev sdb, sector 9400855296 op 0x0:(READ) flags 0x0 phys_seg 160 prio class 3
| Oct 06 14:35:52 jigsaw kernel: I/O error, dev sdb, sector 9400856576 op 0x0:(READ) flags 0x4000 phys_seg 160 prio class 3
| Oct 06 14:35:59 jigsaw kernel: I/O error, dev sdb, sector 9400857856 op 0x0:(READ) flags 0x4000 phys_seg 159 prio class 3
| Oct 06 14:36:14 jigsaw kernel: I/O error, dev sdb, sector 9400859392 op 0x0:(READ) flags 0x4000 phys_seg 160 prio class 3
| Oct 06 14:36:21 jigsaw kernel: I/O error, dev sdb, sector 9400860672 op 0x0:(READ) flags 0x4000 phys_seg 160 prio class 3
| Oct 06 14:36:28 jigsaw kernel: I/O error, dev sdb, sector 9400861952 op 0x0:(READ) flags 0x4000 phys_seg 160 prio class 3
| Oct 06 14:36:41 jigsaw kernel: I/O error, dev sdb, sector 9400863488 op 0x0:(READ) flags 0x0 phys_seg 160 prio class 3
| Oct 06 14:36:48 jigsaw kernel: I/O error, dev sdb, sector 9400864768 op 0x0:(READ) flags 0x4000 phys_seg 168 prio class 3
| Oct 06 14:37:00 jigsaw kernel: I/O error, dev sdb, sector 9400867584 op 0x0:(READ) flags 0x0 phys_seg 160 prio class 3
| Oct 06 14:37:07 jigsaw kernel: I/O error, dev sdb, sector 9400868864 op 0x0:(READ) flags 0x4000 phys_seg 160 prio class 3
| Oct 06 14:37:20 jigsaw kernel: I/O error, dev sdb, sector 9400871680 op 0x0:(READ) flags 0x0 phys_seg 160 prio class 3

… and so on. On the second RAID check, the numbers are not the same, but
in the same range.

If the disk is a few days away from being replaced, I would not
bother shutting it off, but I would assume that it is not a full
mirror and somehow having the good disk fail would be bad.

Thanks a lot for your input, that is exactly the kind of advice that I
was looking for.

J.
--
Thy lyrics in pop songs seem to describe my life uncannily accurately.
[Agree] [Disagree]
<http://archive.slowlydownward.com/NODATA/data_enter2.html>

-----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEERCNn0ngYrOUG3zZFU4ruOUNvhZcFAmcFkQUACgkQU4ruOUNv hZdTEw/+OHP9mSAjP6nxoxYbHvJM27wD5QSbGTXHIR/NrpiOxAFuCYaOef2c4Z8N DFaJeJcn54P5+IPdyG5CBbjoOddgxq4NH8F3NQn5TVWbTMhJ+hJzrJN2fo1AIpHC lJAqUDndKNf8BwUAiJ4pD3RkApSX/uNF4WoR287kppiZZRbm9rQO0J0Wce26m/W7 /zbPu7yLWUsq39vkUE4eD/7HtIwXkyKYoi+Eov/sRCmgmSqcwO9B0m9z+9u+XAqO zzjskDACwPIQWH0kDp9yskoOJnhsWBXJgYEUN0LT8wTQ7NQyMBkdsjAVIgs2HTu6 hf4WJqUibXzgSzPy7Q+kkVSBVoRpIJNgOtJffuyNqC1bzwfXH2X0TXxPeZ83V4ge TY7cBI+zA5NgcdxvnUbgDL5z9ClXQfrPnoDqGf0636PM5q4W6eauHUB6TfUh4i+V eRzdQuPFHe/9+iL1NJewoWAVj0XEYJIRNcwGbIhphOgr6oMoLQYXEc5oWeL7zREg 9Ehd9Fr047oHZS3YCA2TbrBiz4vH5x+Zjedc/gpm7G0ZJmoKesVCnP2YCJgP5eI/ +QZUg225dYN3eCBT43G2lngU3BV7FnJ+I2884Z+BhRcPltTeVzmx9Dtfe1I7LwdJ iEe9+rfgE150Q6jJy5v2UyGOmds81ca9mwKBOrE5+ca1pyTOw24=
=Dt5Z
-----END PGP SIGNATURE---

From Jochen Spieker@21:1/5 to All on Tue Oct 8 22:30:01 2024

Andy Smith:

On Tue, Oct 08, 2024 at 04:58:46PM +0200, Jochen Spieker wrote:

The way I understand these messages is that some sectors cannot be read
from sdb at all and the disk is unable to reallocate the data somewhere
else (probably because it doesn't know what the data should be in the
first place).

When MD receives a read error it does read the mirrored data and write
it back. If it can't do that it fails the disk, so you are not getting
there yet.

Okay, that's good, I guess.

Two of these message blocks end with this:

| Oct 07 10:26:12 jigsaw kernel: md/raid1:md0: sdb1: rescheduling sector 10198068744

What does that mean for the other instances of this error?

I expect you probably have either no TLER value set

Thanks a lot, I had never heard of that before. But by chance my WD REDs actually seem to come with a default of 7 seconds:

| /dev/sdb:
| smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-25-amd64] (local build)
| Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
|
| SCT Error Recovery Control:
| Read: 70 (7.0 seconds)
| Write: 70 (7.0 seconds)

or it's set higher
than the kernel's own timeout. By default consumer drives try very hard
to read data, taking a long time doing so when there's issues. The
kernel SCSI layer will try several times, so the drive's timeout is multiplied. Only if this ends up exceeding 30s will you get a read
error, and the message from MD about rescheduling the sector.

That makes sense. And might also explain why the disk does not report
any reallocated sectors (yet).

Hopefully the results of your SMART long self-test will help clear this
up. These things can be hard to track down though.

10% remaining … "long" is really long.

After you do resolve this you should set TLER to some sensible value
like 7 seconds. That is not your biggest concern right now though.

Here is a thing I wrote about it quite some time ago:

https://strugglers.net/~andy/mothballed-blog/2015/11/09/linux-software-raid-and-drive-timeouts/#how-to-check-set-drive-timeouts

Thanks a lot again.

Do you think I should do remove the drive from the RAID immediately? Or
should I suspect something else is at faula?t

The fact that you have no reallocated sectors and no pending sectors
and apparently all your writes are working makes me think there probably isn't a fault with the drive but in some ways that is worse as it's easy
to replace a drive, not so eay to diagnose bad cables and marginal power supplies etc etc.

See my other reply, the sector numbers do not appear to be random, so I
hope that it is actually the disk.

I perfer not to run the risk of losing the RAID completely when I keep
on running on one disk while the new one is being shipped.

I would make sure the timeouts are set correctly because if you do get
into the situation where the kernel is resetting the bus, that can temporarily take away both drives at once which can cause MD to fail
both out and mark the array as faulty. It's relatively easy to do the
manual intervention required to start it up again but it is a stressful.

I guess if that really happens I will strongly consider to just restore
from backup. I just need to think hard about the things that I have
excluded from backup deliberately. ^^ But the new disk is expected to be delivered tomorrow, so I keep my fingers crossed. I mean, that is why I
am using RAID1 in the first place.

J.
--
I use a Playstation to block out the existence of my partner.
[Agree] [Disagree]
<http://archive.slowlydownward.com/NODATA/data_enter2.html>

-----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEERCNn0ngYrOUG3zZFU4ruOUNvhZcFAmcFlgQACgkQU4ruOUNv hZc2zg/+KWdKVTEAu83YJwNoC9jG3M6Zbe2SvUNvI6tmT0MPZkz0CsjpA1V1q1rX L5I+wewM4bDc/O9i+oSLhUitSGhF3eJfRpoT+lGIOEeqBhP99uh89eiFAELSCE18 GPjfDfksr1e2NulwLqoVaHQXVLoeWK9OHI2UcOrrX3ywpuUgi4/Pzs3q1pJ8VybU 7xMplth56rl0fhAJFZZ8fC4ChJSCVZmiidtW5CLzTOqhz+7tdj3II97zcv0ZQSfD M2llN2U1OCUe4lsfFatVdBawaCMV8CzbywFrAXo1rCm2NRvrGk9DY2Dh74BZecaJ 74+0+l9nQyaKKv2oxi5X/23BNiy+q+1awwnjpnzv7Rwor1XgZ//8ZThXt2ZgFTEN UrsbYJ8vUqEUvNU7C7aLsQigaeBHBSTIzzd0ojtJaMJPLurvjOoRCtwz63JfB8G7 mVQsX0IRNmyQH/D/rTjuhxMwPNgnea5Gfx8YjHkUKr1BYbUyeoevEwXdpOkY6Ba0 kGFIjlaAcnGUlahtJaK8W3JBad6kuqErDzTqOF3+V60T7ogH2fOFFh9CyZFcovM7 OYFbbLGitcvprxfb5xQQTjfZA02y3j56I9TqBRK9Crj/NPmNRTFgIveRfAUuAghY NXTnIko+emOZtC/oWS1OUDcK4QTi0Vm8sazAcZWqilkFqQQ9R6k=
=pK+H
-----END PGP SIGNATURE-----

---

From [email protected]@21:1/5 to Jochen Spieker on Tue Oct 8 22:40:01 2024

On 10/8/24 16:07, Jochen Spieker wrote:

| Oct 06 14:27:11 jigsaw kernel: I/O error, dev sdb, sector 9361257600 op 0x0:(READ) flags 0x0 phys_seg 150 prio class 3
| Oct 06 14:27:30 jigsaw kernel: I/O error, dev sdb, sector 9361275264 op 0x0:(READ) flags 0x4000 phys_seg 161 prio class 3
| Oct 06 14:27:37 jigsaw kernel: I/O error, dev sdb, sector 9361277696 op 0x0:(READ) flags 0x0 phys_seg 71 prio class 3
| Oct 06 14:28:02 jigsaw kernel: I/O error, dev sdb, sector 9361283584 op 0x0:(READ) flags 0x0 phys_seg 160 prio class 3
| Oct 06 14:28:09 jigsaw kernel: I/O error, dev sdb, sector 9361284864 op etc.

Those aren't sequential, or even exhibiting the same interval from one to
the next. Am I misinterpreting the data? Ten of the errors are 1280
sectors after the previous error and five more pairs are 1536 sectors apart; maybe that's significant?

--
I was 21 years when I wrote this song
I'm 22 now, but I won't be for long.
Time hurries on / and the leaves that are green turn to brown.
-- S&G, "Leaves that are Green"

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael =?utf-8?B?S2rDtnJsaW5n?=@21:1/5 to All on Tue Oct 8 23:20:01 2024

On 8 Oct 2024 11:29 -0400, from [email protected] (Dan Ritter):

The disk has been running continuously for seven years now and I am
running out of space anyway, so I already ordered a replacement. But I
do not fully understand what is happening.

The drive is dying, slowly. In this case it's starting with a
bad patch on a platter.

That would be my take too. The LBA sectors reported in a different
post in this thread being as close as they appear to be would also
corroborate the platter issue theory.

| SMART Attributes Data Structure revision number: 16
| Vendor Specific SMART Attributes with Thresholds:
| ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
| 1 Raw_Read_Error_Rate 0x002f 199 169 051 Pre-fail Always - 81
| 3 Spin_Up_Time 0x0027 198 197 021 Pre-fail Always - 9100
| 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 83
| 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
| 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
| 9 Power_On_Hours 0x0032 016 016 000 Old_age Always - 61794
| 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
| 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
| 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 82
| 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 54
| 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 2219
| 194 Temperature_Celsius 0x0022 119 116 000 Old_age Always - 33
| 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
| 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
| 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
| 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
| 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 43

This looks like a drive which is old and starting to wear out
but is not there yet. The raw read error rate is starting to
creep up but isn't at a threshold.

I agree. The almost 62000 hours is well over 7 years of run time, and
based on the start/stop count and power cycle count it's been running
basically continuously for that time (which is generally good for
longevity, as long as it's not subjected to excessive heat). It's
entirely possible that the mechanical components are degrading; which
in turn might also be interfering with the physical properties of data
storage. Yes, servo tracks and such things are supposed to catch and
compensate for that; but it might not be quite that bad yet.

Sometimes HDDs fail with a bang, and sometimes they fail with a
whimper.

Also note that some disks actually lie in SMART data. I don't know if
yours does, but I would definitely question a value of 0 for failed
(current pending and offline uncorrectable) _and_ reallocated sectors
for a disk that's reporting I/O errors, for example. _At least_ one of
those should be >0 for a truthful storage device in that situation.

What I would not do at this point is subject it to more physical
stress than unavoidable. Unless you absolutely must, do not physically
unplug or remove that disk before the RAID array has resilvered onto
the new disk. It's currently providing value being a second source of
truth about what's stored; you don't want to remove it and then find
during the resilver that the other current disk has a problem.

--
Michael Kjörling 🔗 https://michael.kjorling.se “Remember when, on the Internet, nobody cared that you were a dog?”

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jochen Spieker@21:1/5 to All on Wed Oct 9 15:00:01 2024

[email protected]:

On 10/8/24 16:07, Jochen Spieker wrote:

| Oct 06 14:27:11 jigsaw kernel: I/O error, dev sdb, sector 9361257600 op 0x0:(READ) flags 0x0 phys_seg 150 prio class 3
| Oct 06 14:27:30 jigsaw kernel: I/O error, dev sdb, sector 9361275264 op 0x0:(READ) flags 0x4000 phys_seg 161 prio class 3
| Oct 06 14:27:37 jigsaw kernel: I/O error, dev sdb, sector 9361277696 op 0x0:(READ) flags 0x0 phys_seg 71 prio class 3
| Oct 06 14:28:02 jigsaw kernel: I/O error, dev sdb, sector 9361283584 op 0x0:(READ) flags 0x0 phys_seg 160 prio class 3
| Oct 06 14:28:09 jigsaw kernel: I/O error, dev sdb, sector 9361284864 op etc.

Those aren't sequential, or even exhibiting the same interval from one to
the next. Am I misinterpreting the data?

No, but the numbers are close to each other and the errors did not
happen sporadically throughout the runtime of the md check, but only
within a specific timeframe.

Ten of the errors are 1280
sectors after the previous error and five more pairs are 1536 sectors apart; maybe that's significant?

That may have something to do with the physical layout on the platters.
If there is a "bad patch" on one of them, I would expect something like
this. Maybe one rotation is 1280 sectors apart at some point, and 1536 a
little bit further from the center.

J.
--
I think the environment will be okay.
[Agree] [Disagree]
<http://archive.slowlydownward.com/NODATA/data_enter2.html>

-----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEERCNn0ngYrOUG3zZFU4ruOUNvhZcFAmcGfUoACgkQU4ruOUNv hZdK3xAAxrLRrYfNXGkOL+3aWOimtlBiDKHFxvXk2ZpqC2l0wxP2nevt3uLBdDbi 2Tg6ebr0Zro8fOz97YrAizTU43XKBttPG0SA6x6T6RTkDQYYnidAZYLndaAIJdBg jd1BLTcVXptWr3MI+udxYbPKIYK806Ll2/XPgpWdSVZoKtJJ9TODFujxK/NtLTeO F377UlBiKcvV6/1+sKLHLejBsA4KNrXOr6WZmw3EdlO/0bSBvYZtObyYRjdEjXji NOS5E6lR/3pY8T7nIh5bd+tIExcrX4QCqzV2XN0MlAnjgQDzNaImu66k2N1HHo0o gsDywf19ZTl4bHz/cgwwX5Xe1RZchrfJskfTJO87YmtORugEVwiVtLKZ5fMC/2gk 96UIYjrcYTbg+aJOhgtePO8qBp51CUzA4lkuLj8dsl1+hgxz37VwkkWtlpkEJQg4 2bPimMGn5rwZtYNVO1COV+/YWhtWpDhP+wEpcGkR0Vkezu9GzeBPQfdSdn223FAB 9IcAv+cJMMTbLbj3Ox7k3v24wi/jZ64DxTFKqFVIvKgLvEJuLrCuOEn1H1og92HF Dd2X0+tnx9g5yuZNiBB7dmDGtESQqwAujGenKNe2Rwyo/6jYDsInc7uEwAW8QrKV d+egPYaOWv4ZWUXmqOls8PKlYsQM0WYlANe7nTCmKTS/9FA5xBA=
=JSmY
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05

From Jochen Spieker@21:1/5 to All on Wed Oct 9 15:20:01 2024

Michael Kj�rling:

On 8 Oct 2024 11:29 -0400, from [email protected] (Dan Ritter):

This looks like a drive which is old and starting to wear out
but is not there yet. The raw read error rate is starting to
creep up but isn't at a threshold.

I agree. The almost 62000 hours is well over 7 years of run time, and
based on the start/stop count and power cycle count it's been running basically continuously for that time (which is generally good for
longevity, as long as it's not subjected to excessive heat).

It is exactly that. It has been running mostly uninterrupted in my
basement. Max temp from the past 12 months (as far as I can tell by
looking at aggregated data in munin) is 36�C. That should be fairly
ideal.

Also note that some disks actually lie in SMART data. I don't know if
yours does, but I would definitely question a value of 0 for failed
(current pending and offline uncorrectable) _and_ reallocated sectors
for a disk that's reporting I/O errors, for example. _At least_ one of
those should be >0 for a truthful storage device in that situation.

That is exactly what was confusing me here.

What I would not do at this point is subject it to more physical
stress than unavoidable. Unless you absolutely must, do not physically
unplug or remove that disk before the RAID array has resilvered onto
the new disk. It's currently providing value being a second source of
truth about what's stored; you don't want to remove it and then find
during the resilver that the other current disk has a problem.

Helpful advice, thanks. Unfortunately, I cannot hotplug into this
system. Thinking of this, the errors came shortly after a (long overdue) reboot, so it will have to survive at least another shutdown to provide
some redundancy.

Just for completeness, the long self-check did not report any issues and
the SMART values also stayed the same. Nothing to see here, move along.
^^

The new disk is already sitting on my desk.

J.
--
I cannot comprehend the idea of chemical and biological weapons.
[Agree] [Disagree]
<http://archive.slowlydownward.com/NODATA/data_enter2.html>

-----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEERCNn0ngYrOUG3zZFU4ruOUNvhZcFAmcGgJcACgkQU4ruOUNv hZd5tw//V0lRGIzYamtGDdpmZgtOfkLdX3l2rCRH/Af0VOnf4QAaEs6XtxZJgQJh uIH6zIiNG4F+D8uX32Hz32xma6jFNGdz+ggwUaLCg3Z0I7dq2lF2TM8Pg1NGGpgz SB3NUcsQgqWRcyFysFAxPtH03eXjPGnT6ph2W4SMRzqOPvhqOYOCVZp69cb6pzY6 SAvFj+2lNddPDrmxoKJDWtv27VosqR7sQbenZOP4Qr3cbV44brLp0LqHcXy6dxZE GQYxacpEqIL/WD2mfzJMMT4vWGT5bBHMBKG4AOUPO+t6vswHSdqlO6QiSP353L3s ttg6XP5fPg0zr3T71izxUk8Ci/+LmdxlJS7gqWUotnWRit2aJXLzV59xEmQTxfFW jsD4iusE/unx1i0S+V153bxOQDwJ02zng24jni0fTkGR7AdOWzklk59S5FHjYjXD 4pG0Tqj0Mlg7O6i0DZXG+HNgIyVJXlw6U0KYU3eIX3GHO5ghvCRbQGyosl+mxKq0 kDdFmFWyJiZpDOTC3YBaiQzEPbQAgzvMudVWzClsk8JxOVmQLLObJBPBGpLa2o1o WCj+wjgMOGw1zklO7w6N08S+TdRdi0eV+4QdoniNBrD66xbu9XbPv8u2+MmmG/eO dvAch9CmtX9ozUgl/aaalBdqP3gTEMIIcETpJiRBn/OLiWFLYvQ=
=PXlZ
-----END PGP SIGNATURE-----

From Andy Smith@21:1/5 to Franco Martelli on Wed Oct 9 21:00:01 2024

Hi,

On Wed, Oct 09, 2024 at 08:41:38PM +0200, Franco Martelli wrote:

Do you know whether MD is clever enough to send an email to root when it fails the device? Or have I to keep an eye on /proc/mdstat?

For more than a decade mdadm has shipped with a service that runs in
monitor mode to do this.

https://manpages.debian.org/bookworm/mdadm/mdadm.8.en.html#MONITOR_MODE

There are also plugins for every Linux monitoring system out there to
read /proc/mdstat.

Thanks,
Andy

--
https://bitfolk.com/ -- No-nonsense VPS hosting

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jochen Spieker@21:1/5 to All on Wed Oct 9 21:20:02 2024

Andy Smith:

Hi,

On Wed, Oct 09, 2024 at 08:41:38PM +0200, Franco Martelli wrote:

Do you know whether MD is clever enough to send an email to root when it
fails the device? Or have I to keep an eye on /proc/mdstat?

For more than a decade mdadm has shipped with a service that runs in
monitor mode to do this.

https://manpages.debian.org/bookworm/mdadm/mdadm.8.en.html#MONITOR_MODE

And this is configured here:

# grep -B1 MAILADDR /etc/mdadm/mdadm.conf
# instruct the monitoring daemon where to send mail alerts
MAILADDR root

J.
--
Hell will have perfume.
[Agree] [Disagree]
<http://archive.slowlydownward.com/NODATA/data_enter2.html>

-----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEERCNn0ngYrOUG3zZFU4ruOUNvhZcFAmcG1Q8ACgkQU4ruOUNv hZdnqg//VsZxwEFjiOBIPoLnp7NvTdCMXYbpSlOXcnIlAiHBThZkZKH2oBumweot rPtLGZux8U/8oXQ29YbdUofZN5bGV9rZDyWQ0Rl/dxVRSrUdShYOmYxTLruiKEXj h7ItlHJzgTpVOOBqwg3XLjT0qrujd+QCAZBNWdfKADCwEjaWahGMfwMGcp8Ky4xF NYGZxI0qxCJGu7RVKFCUIqH+3DMc8wasVLa9ey0W9pW4KU76InTQdc3+8bHvi8tU reyYwK14GG3bMBLXZs2IINoAp0ktP47k7sbPwu+5eVczkIZNj4yjY85mFiD6DKkp 9kJeqdqANjKb5gim4vmjjHG57a8s4JEnrEjQsKDxgWFySU1afwI5MyH8URBC3leR lLWwDHZEIL434vnv8UjtOiTDOzdgnWRk+vxtw26SCj4Y3Ry6w+ODraCtX7OFPY2u /WMm9F8nrMlUOEsVqGk8EsjX8IeLeOoCDfVPxMUeUSl8iFlVn/p/7B58W0bK8dr3 lQKX+9MVbLjZ/wUbetuPSm4fscfqnJggB3H9SzMYkHVRYwgPvBU7B2lVcV7IuPYJ ZDEpDNXzTwRo7Gfl5TQG+gMS9wIVIiMoWRumDbg044fPPCPOgzWPOOGdBS4L3WE2 t1Q3bgE2oB3+SPiSvqE4R2yLP0OfmBMoaYV3ujwdQJi1ECsXRpM=
=kCpq
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxN

Who's Online
Recent Visitors
- Rixter
  Wed Jul 29 02:00:40 2026
  from Madison, Nc via Telnet
- Centurion
  Tue Jul 28 22:54:59 2026
  from Berea, Ohio via Telnet
- Bob Worm
  Tue Jul 28 16:01:18 2026
  from Wales, Uk via Telnet
- Rixter
  Tue Jul 28 13:42:46 2026
  from Madison, Nc via Telnet
- Krenn
  Tue Jul 28 11:59:57 2026
  from Sydney, Nsw via Telnet
- Rixter
  Tue Jul 28 01:23:48 2026
  from Madison, Nc via Telnet
- Centurion
  Mon Jul 27 22:50:42 2026
  from Berea, Ohio via Telnet
- Ataricrypt
  Mon Jul 27 19:19:17 2026
  from England via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	741
Nodes:	16 (2 / 14)
Uptime:	56:15:54
Calls:	12,446
Calls today:	1
Files:	15,192
Messages:	6,537,360

I/O errors during RAID check but no SMART errors

Who's Online

Recent Visitors

System Info