Sunday, December 23, 2012

SMART errors and software RAID

For a while now I have been having an intermittent problem with some of my drives. They seem to be working but stop responding to SMART. If I reboot they don't show up any more but if I power off and back on they work fine.

I have two 640Gig WD Green drives in a software RAID1 and another WD Blue drive. After a suspend/resume, sometimes one of the drives is MIA. The kernel thinks it is still there but my SMART tools can't talk to it and start to complain.

The Blue drive has a bad (unreadable) sector but touch wood, that has not caused a problem yet. SMART knows about this and tells me about it frequently. This however does not appear to be the problem. There is a telling message is in dmesg:

sd 0:0:0:0: [sda] START_STOP FAILED
sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
PM: Device 0:0:0:0 failed to resume: error 262144

So for some reason, the disk is not responding when the computer resumes. I guess it is a timeout (and I wonder if I can extend it?) Anyway, now I have the situation where my disk is not working even though I know there is nothing actually wrong with it.

Luckily I am using software RAID and the other disk is working so I can continue about my business without crashing or loosing data. After poking and prodding a few different things I have worked out a solution:


  • Hot-remove the device (from the kernel)
echo 1 > /sys/block/sda/device/delete
  • Rescan for the device to hot-add it to the kernel
echo "- - -" > /sys/class/scsi_host/host0/scan
  • Add the 'failed' drive back into the RAID set
mdadm /dev/md127 --re-add /dev/sda2

You must remove the existing (sda) device first or the disk will be re-detected and added with a new name (sde in my case).

Because I have a write-intent bitmap, the RAID set knows what has changed since the drive was failed and only the changes must be re-synced which is quite fast.

There seems to be a 'vibe' that green drives are not good for RAID. I don't really think this is a problem because the drive is green, I think it is a problem because the driver is not trying hard enough to restart the disk.

So in the end this was not a SMART problem after all. Not there there are no bugs to fix there. Particularly in udisks-helper-ata-smart-collect which keeps running and locking up sending the load average into the hundreds. For a tool designed to detect error conditions it probably needs a bit more work.

My next job is to select a replacement drive for the faulty WD Blue...

24 comments:

  1. It is very nice that you share this with us. June

    ReplyDelete
    Replies
    1. There is no way to get this useful info your way. Its very fabulous and easy for me to get it from WhatsApp Aero. I am very thankful and I appreciate your intelligent work. Awesome!!

      Delete
  2. This blog is definitely entertaining additionally factual. I have picked up helluva helpful tips out of this amazing blog. I ad love to visit it again and again. Thanks!
    hotmail login

    ReplyDelete
  3. Great info. I love all the posts vidmate apkxyz, I really enjoyed, I would like more information about this, because it is very nice., Thanks for sharing.

    ReplyDelete
  4. Online gambling Maryland https://casinority.com/us/maryland/ allows lottery games in the casinos and is regulated by the state. Out of all the games, the state gets 30% of the share, while 10% goes to the retailer. The remaining 60% is returned to the players.

    ReplyDelete
  5. This is a good site to visit. Lots of valuable information to keep. Thanks!
    ------
    Madison| Shower Pan

    ReplyDelete
  6. Learned the right way to do it, finally! Thanks a million times to y'all!

    www.svggm.com

    ReplyDelete
  7. thank you for sharing http://www.pghcleaners.com/ https://www.djservicespgh.com/

    ReplyDelete
  8. brother printers are generally excellent according to their functioning effectiveness, speed, and quality printing. Be that as it may, even the best one separates after some time. How can i fix Brother printer not printing issue The equivalent goes, with Brother Printer. There is a period come when each Brother Printer user needs a Brother Printer Troubleshooting Guide. The most well-known issues Brother Printer user faces are – issue in printing, printer not printing, or paper sticking issues, and so on Here, Brother Printer Troubleshooting Guide will help you in tackling these issues – When Brother Printing isn't printing any archives. The explanation for such issues can be numerous, for example, - Lack of ink or paper sticking issues, absent or obsolete drivers, defective drivers are a portion of the normal reasons.

    ReplyDelete
  9. advance the cursor to the next tab stop. Homework Help Online you have a tab delimited file but some data is longer than and some it less. Well when you look at it, the output will be all over the place.

    ReplyDelete
  10. This blog is in reality interesting moreover factual. I've picked up helluva beneficial recommendations out of Thesis Help Uk terrific weblog. I advert love to go to it time and again.

    ReplyDelete