On Monday 29 Feb 2016 10:28, Janis Papanagnou conveyed the following to comp.unix.admin...
On 28.02.2016 09:57, Aragorn wrote:
On the other hand, there's also a chance ─ given that you're alluding
to hot-swap drive bays ─ that it could be the backplane itself which
is faulty, or the cable for that one particular drive bay. And in
that case, the only thing you can do is replace the cable (which is
cheapest) or the backplane (which will cost you more).
So I would advise first checking the cable, see whether it's well-
seated, try with another cable for a while, and then see whether the
problem persists. With a bit of luck, it's only the cable. which is
faulty. ;)
Thanks for your suggestions, Aragorn!
Sadly, making a plan to follow your suggestions localizing the
problem, my system seems to have decided to fool me. Without changing anything the ZFS file system again became 'degraded'; but this time
(and for the first time) it is another device, /dev/sdc, that became inaccessible.
Does that now, in your experience, change the diagnosis of the
problem?
Well, there is now something else that pops into my mind, and from
reading your contributions to comp.unix.shell, I'd imagine you to be a professional and thus not to make the mistake I'm about to expound on,
but there _is_ always the chance that the hard disks in your array are
actually not RAID-certified.
It is not uncommon for consumer-grade SATA disks ─ and most notably
those made by Western Digital ─ to be a little slow in handling certain status polls from the RAID controller ─ whether hardware or software ─
with as a result that the controller may falsely detect a degraded
state. For this purpose, Western Digital has released "RAID-certified"
disks, which have a different timing setup and report faster to status
polls, so that they wouldn't be marked as defective by software or
hardware RAID setups when they are in fact still functioning normally
but busy executing other instructions.
A second possibility is the following... Since you enumerate the
devices as /dev/sdc and /dev/sdd, that tells me that you're running a
GNU/Linux system. And then there are a few questions that pop up,
because then more information is needed...
1. Do you ever power the machine down, and if so, did you power down
between your previous report on the issue and the report that I'm
now replying to?
2. What distribution are you running on your system?
3. Are the devices mounted by UUID or LABEL, or do you mount them
by way of their Linux-specific /dev/sd? designations?
The thing is that the /dev/sd? designations are not guaranteed to be
persistent across reboots. The udev device manager was supposed to
provide for some consistency in that regard, but it doesn't do that
particular job very well either. Therefore, when using multiple disks
in the same machine, it is best to give the individual partitions a
unique LABEL and mount them while using that, or to mount them by way of
their unique UUID. (This does require booting with an initrd/initramfs,
as the kernel itself does not recognize LABELs and UUIDs at boot time.)
If you have indeed rebooted the machine, then it is possible that the
faulty drive /dev/sdd of the last time has now become /dev/sdc. If you
have not rebooted your machine, then you may safely discard this section
of my reply. ;)
If it's neither of the above, then I suspect there to be a problem with
your hot-swap backplane, as I wrote in my previous reply. It could just
be an intermittent problem ─ e.g. a contact issue ─ or it could be permanent, but I have insufficient experience with such backplanes, so I
don't really know which ones are high quality and which ones are prone
to failure.
Hope this helps. ;)
--
= Aragorn =
http://www.linuxcounter.net - registrant #223157
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)