Sunday, December 23, 2012

SMART errors and software RAID

For a while now I have been having an intermittent problem with some of my drives. They seem to be working but stop responding to SMART. If I reboot they don't show up any more but if I power off and back on they work fine.

I have two 640Gig WD Green drives in a software RAID1 and another WD Blue drive. After a suspend/resume, sometimes one of the drives is MIA. The kernel thinks it is still there but my SMART tools can't talk to it and start to complain.

The Blue drive has a bad (unreadable) sector but touch wood, that has not caused a problem yet. SMART knows about this and tells me about it frequently. This however does not appear to be the problem. There is a telling message is in dmesg:

sd 0:0:0:0: [sda] START_STOP FAILED
sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
PM: Device 0:0:0:0 failed to resume: error 262144

So for some reason, the disk is not responding when the computer resumes. I guess it is a timeout (and I wonder if I can extend it?) Anyway, now I have the situation where my disk is not working even though I know there is nothing actually wrong with it.

Luckily I am using software RAID and the other disk is working so I can continue about my business without crashing or loosing data. After poking and prodding a few different things I have worked out a solution:


  • Hot-remove the device (from the kernel)
echo 1 > /sys/block/sda/device/delete
  • Rescan for the device to hot-add it to the kernel
echo "- - -" > /sys/class/scsi_host/host0/scan
  • Add the 'failed' drive back into the RAID set
mdadm /dev/md127 --re-add /dev/sda2

You must remove the existing (sda) device first or the disk will be re-detected and added with a new name (sde in my case).

Because I have a write-intent bitmap, the RAID set knows what has changed since the drive was failed and only the changes must be re-synced which is quite fast.

There seems to be a 'vibe' that green drives are not good for RAID. I don't really think this is a problem because the drive is green, I think it is a problem because the driver is not trying hard enough to restart the disk.

So in the end this was not a SMART problem after all. Not there there are no bugs to fix there. Particularly in udisks-helper-ata-smart-collect which keeps running and locking up sending the load average into the hundreds. For a tool designed to detect error conditions it probably needs a bit more work.

My next job is to select a replacement drive for the faulty WD Blue...

Wednesday, December 19, 2012

I2C Analog to Digital Converter

The first device I hooked to my Raspberry Pi is based on the PCF8591 Analog to Digital Converter (ADC). This chip has 4 analog inputs (ADC) and one analog output or Digital to Analog Converter (DAC).

I am using a pre-assembled board from Deal Extreme which comes with the chip, a temperature sensor, light sensor, variable resistor and LED. This provides a simple showcase for the chip and more importantly, it has a light sensor which is important to my project. The board was only a few dollars http://dx.com/p/pcf8591-8-bit-a-d-d-a-converter-module-150190 there are also other similar boards on there.

PCF8591 demo board. GPIO pins are visible on the right.


The first step is to physically hook up the board. Mine came with the required cables (often called dupont cables) which is also a handy way to start. The cables must be connected to the Raspberry Pi GPIO pins nominated for I2C. These have the required 'pull up resistors' already installed. (These are what make the wires operate like a bus). The pins are
  • P1-01 +3.3v (VCC)
  • P1-03 Data (SDA)
  • P1-05 Clock (SCL)
  • P1-09 Ground (GND)
Raspberry Pi showing GPIO cables connected.


My demo board has a red power indicator LED which came on once I powered up.

The next big test is to see if the i2c driver can talk to your chip. The Raspberry Pi actually comes configured with two I2C buses and for reasons unknown, on my system the bus labelled I2C0 is allocated the Linux device i2c-1.

Scanning both buses won't hurt.

jnewbigin@raspberrypi:~$ i2cdetect 1
WARNING! This program can confuse your I2C bus, cause data loss and worse!
I will probe file /dev/i2c-1.
I will probe address range 0x03-0x77.
Continue? [Y/n] y
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:          -- -- -- -- -- -- -- -- -- -- -- -- --
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
40: -- -- -- -- -- -- -- -- 48 -- -- -- -- -- -- --
50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
70: -- -- -- -- -- -- -- --                         


You can see here that a device has been detected at address 0x48. This is the expected address for my chip so that means we are in business.

Reading and writing to the chip is quite straight forward but the chip does have a few nuances.  The first read after power on will return 0x80. The analog to digital conversion is performed when you make a read request but the read will return the previous sample so it is always one sample behind. This is not too confusing unless you are switching to read a different input.

I will show how to read the data using the command line tools i2cget and i2cset. In another blog post I will show how you can interface with the chip from c code.

All these commands take a parameter 1 to specify which i2c bus and I pass -y which skips the safety warning. (We know what we are doing so this is OK. On other hardware such as your PC, things can and do go wrong by using these commands).

The most basic read is the default channel (input 0).
jnewbigin@raspberrypi:~$ i2cget -y 1 0x48
0x80
That is the power on status code. Now we read again

jnewbigin@raspberrypi:~$ i2cget -y 1 0x48
0xd2
That is the value that was sampled when we made our first read (the one that returned 0x80).
Now cover up the light sensor and read again

jnewbigin@raspberrypi:~$ i2cget -y 1 0x48
0xd2
Yep, no change. The new value has been sampled so now we read it
jnewbigin@raspberrypi:~$ i2cget -y 1 0x48
0xeb
Now we get the new value.

Now, switch to read another input, input number 1
jnewbigin@raspberrypi:~$ i2cset -y 1 0x48 0x01
jnewbigin@raspberrypi:~$ i2cget -y 1 0x48
0xeb
First we get an old value.
jnewbigin@raspberrypi:~$ i2cget -y 1 0x48
0xcf
Then the new value

We can repeat to select channel 2 and 3.

We can enable the analog output by adding bit 0x40 to the set command and then specify a value for the DAC
jnewbigin@raspberrypi:~$ i2cset -y 1 0x48 0x41 0xff

And the indicator LED turns on

jnewbigin@raspberrypi:~$ i2cset -y 1 0x48 0x41 0x00
And the indicator LED turns off. You can of course set it to any value between 0x00 and 0xff and see the LED dim and turn off. (You can also see why LEDs don't make good analog indicators).


Tuesday, December 4, 2012

FlashCache

For some time I have been using FlashCache on my PC at home. Of course I did it the hard way because I wanted it seamlessly integrated at boot time so I could use the cache on my root filesystem.

My hard work seems to have paid off and I have now officially published the fruits of it at http://chrysocome.net/dracut-flashcache

Now CentOS-6 users can get started with FlashCache with (hopefully) a minimum of fuss.

Just as soon as I get time to upgrade my machine at work I will be running it there too.