Sunday, December 23, 2012

SMART errors and software RAID

For a while now I have been having an intermittent problem with some of my drives. They seem to be working but stop responding to SMART. If I reboot they don't show up any more but if I power off and back on they work fine.

I have two 640Gig WD Green drives in a software RAID1 and another WD Blue drive. After a suspend/resume, sometimes one of the drives is MIA. The kernel thinks it is still there but my SMART tools can't talk to it and start to complain.

The Blue drive has a bad (unreadable) sector but touch wood, that has not caused a problem yet. SMART knows about this and tells me about it frequently. This however does not appear to be the problem. There is a telling message is in dmesg:

sd 0:0:0:0: [sda] START_STOP FAILED
sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
PM: Device 0:0:0:0 failed to resume: error 262144

So for some reason, the disk is not responding when the computer resumes. I guess it is a timeout (and I wonder if I can extend it?) Anyway, now I have the situation where my disk is not working even though I know there is nothing actually wrong with it.

Luckily I am using software RAID and the other disk is working so I can continue about my business without crashing or loosing data. After poking and prodding a few different things I have worked out a solution:

  • Hot-remove the device (from the kernel)
echo 1 > /sys/block/sda/device/delete
  • Rescan for the device to hot-add it to the kernel
echo "- - -" > /sys/class/scsi_host/host0/scan
  • Add the 'failed' drive back into the RAID set
mdadm /dev/md127 --re-add /dev/sda2

You must remove the existing (sda) device first or the disk will be re-detected and added with a new name (sde in my case).

Because I have a write-intent bitmap, the RAID set knows what has changed since the drive was failed and only the changes must be re-synced which is quite fast.

There seems to be a 'vibe' that green drives are not good for RAID. I don't really think this is a problem because the drive is green, I think it is a problem because the driver is not trying hard enough to restart the disk.

So in the end this was not a SMART problem after all. Not there there are no bugs to fix there. Particularly in udisks-helper-ata-smart-collect which keeps running and locking up sending the load average into the hundreds. For a tool designed to detect error conditions it probably needs a bit more work.

My next job is to select a replacement drive for the faulty WD Blue...

Wednesday, December 19, 2012

I2C Analog to Digital Converter

The first device I hooked to my Raspberry Pi is based on the PCF8591 Analog to Digital Converter (ADC). This chip has 4 analog inputs (ADC) and one analog output or Digital to Analog Converter (DAC).

I am using a pre-assembled board from Deal Extreme which comes with the chip, a temperature sensor, light sensor, variable resistor and LED. This provides a simple showcase for the chip and more importantly, it has a light sensor which is important to my project. The board was only a few dollars there are also other similar boards on there.

PCF8591 demo board. GPIO pins are visible on the right.

The first step is to physically hook up the board. Mine came with the required cables (often called dupont cables) which is also a handy way to start. The cables must be connected to the Raspberry Pi GPIO pins nominated for I2C. These have the required 'pull up resistors' already installed. (These are what make the wires operate like a bus). The pins are
  • P1-01 +3.3v (VCC)
  • P1-03 Data (SDA)
  • P1-05 Clock (SCL)
  • P1-09 Ground (GND)
Raspberry Pi showing GPIO cables connected.

My demo board has a red power indicator LED which came on once I powered up.

The next big test is to see if the i2c driver can talk to your chip. The Raspberry Pi actually comes configured with two I2C buses and for reasons unknown, on my system the bus labelled I2C0 is allocated the Linux device i2c-1.

Scanning both buses won't hurt.

jnewbigin@raspberrypi:~$ i2cdetect 1
WARNING! This program can confuse your I2C bus, cause data loss and worse!
I will probe file /dev/i2c-1.
I will probe address range 0x03-0x77.
Continue? [Y/n] y
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:          -- -- -- -- -- -- -- -- -- -- -- -- --
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
40: -- -- -- -- -- -- -- -- 48 -- -- -- -- -- -- --
50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
70: -- -- -- -- -- -- -- --                         

You can see here that a device has been detected at address 0x48. This is the expected address for my chip so that means we are in business.

Reading and writing to the chip is quite straight forward but the chip does have a few nuances.  The first read after power on will return 0x80. The analog to digital conversion is performed when you make a read request but the read will return the previous sample so it is always one sample behind. This is not too confusing unless you are switching to read a different input.

I will show how to read the data using the command line tools i2cget and i2cset. In another blog post I will show how you can interface with the chip from c code.

All these commands take a parameter 1 to specify which i2c bus and I pass -y which skips the safety warning. (We know what we are doing so this is OK. On other hardware such as your PC, things can and do go wrong by using these commands).

The most basic read is the default channel (input 0).
jnewbigin@raspberrypi:~$ i2cget -y 1 0x48
That is the power on status code. Now we read again

jnewbigin@raspberrypi:~$ i2cget -y 1 0x48
That is the value that was sampled when we made our first read (the one that returned 0x80).
Now cover up the light sensor and read again

jnewbigin@raspberrypi:~$ i2cget -y 1 0x48
Yep, no change. The new value has been sampled so now we read it
jnewbigin@raspberrypi:~$ i2cget -y 1 0x48
Now we get the new value.

Now, switch to read another input, input number 1
jnewbigin@raspberrypi:~$ i2cset -y 1 0x48 0x01
jnewbigin@raspberrypi:~$ i2cget -y 1 0x48
First we get an old value.
jnewbigin@raspberrypi:~$ i2cget -y 1 0x48
Then the new value

We can repeat to select channel 2 and 3.

We can enable the analog output by adding bit 0x40 to the set command and then specify a value for the DAC
jnewbigin@raspberrypi:~$ i2cset -y 1 0x48 0x41 0xff

And the indicator LED turns on

jnewbigin@raspberrypi:~$ i2cset -y 1 0x48 0x41 0x00
And the indicator LED turns off. You can of course set it to any value between 0x00 and 0xff and see the LED dim and turn off. (You can also see why LEDs don't make good analog indicators).

Tuesday, December 4, 2012


For some time I have been using FlashCache on my PC at home. Of course I did it the hard way because I wanted it seamlessly integrated at boot time so I could use the cache on my root filesystem.

My hard work seems to have paid off and I have now officially published the fruits of it at

Now CentOS-6 users can get started with FlashCache with (hopefully) a minimum of fuss.

Just as soon as I get time to upgrade my machine at work I will be running it there too.

Monday, November 19, 2012

Raspberry Pi I2C

I have a Raspberry Pi and lets face it, who doesn't?

I have played with linux on many architectures before including PPC, Hitachi, MIPS, PA-RISC and Sparc so I figure I had better have a go at ARM too.

Apart from playing around, I plan to create a light controller module for my garden lights. This will require some hardware hacking which is always a bit of fun but my main plan is to bring it together with some fancy software.

In previous projects I have interfaced with GPIO and I2C to run door controllers and read swipe cards (Mostly on the WRT54G).

I could not find accurate instructions for getting I2C going on the rpi so here are my instructions for users for raspbian:

Install some tools
# apt-get install i2c-tools

edit  /etc/modprobe.d/raspi-blacklist.conf and comment out the line


I don't know why it comes as blacklisted.

edit /etc/modules and add the lines
This will make sure the drivers are loaded during the boot.

create a file /etc/udev/rules.d/99-i2c.rules and add the line
SUBSYSTEM=="i2c-dev", MODE="0666"
This will give all users access to the i2c devices. You could instead set the owner or group but the rpi is not normally being used as a multi-user device

Now you can test these changes without a reboot:
modprobe i2c-bcm2708
modprobe i2c-dev
udevadm trigger
ls -l /dev/i2c*

And you should see output like this (Your date will be different):
crw-rw-rwT 1 root i2c 89, 0 Nov 18 22:36 /dev/i2c-0
crw-rw-rwT 1 root i2c 89, 1 Nov 18 22:36 /dev/i2c-1

If that works, reboot and run the ls again. The devices should be there and have world read/write permissions.

Now, to connect up some hardware and show that it works. Look for a new blog soon.

Saturday, November 10, 2012

Samba guest access

I was trying to share some photos for my wife to download from my linux desktop machine (CentOS 6). I had ~10Gig of photos but she only wanted a hand full. I though the best option would be to share out the folder using samba and she can use windows explorer to pick the ones she wants

Well it seems simple now but it took a few tries to get it working.

I use security=server which is not the most secure method but it is normally easy & convenient. It also seems to be poorly documented, particularly for more recent samba releases and versions of Windows.

My wife has an account on the password server but not on my new desktop. Guest access I though would be a simple solution here but no so. The problem is that guest access will not work by default when using security=server (despite what the man page says). There is a new setting called map to guest which defaults to Never. I had to change this to Bad User to get it to work.

There are the relevant parts from my working smb.conf:
security = server
password server = myserver
map to guest = Bad User

comment = John's photos
path = /home/jnewbigin/photos
guest ok = yes
writable = no
force user = jnewbigin

So now it is working and I am happy and my wife is happy too.

Wednesday, October 31, 2012

Puppet DNS lookup

From time to time NetworkManager breaks my /etc/resolv.conf. I normally turn that off using puppet but when /etc/resolv.conf is broken, puppet won't run so I have to fix it manually :-(

Well, not any more. Now I add a static /etc/hosts entry for the puppet server and puppet will run even when /etc/resolv.conf is broken.

The special part about this is that I look up the IP address using a template so I don't have to hard code the value in my manifest. Google said it could not be done without writing a custom function. As always, I did not believe it...

Assuming you have set $puppetserver to contain the fqdn of your puppet server:

host { 'hostsconf-puppet':
        ensure => present,
        ip => inline_template("<% _erbout.concat('$puppetserver').to_s) %>"),
        name => $puppetserver,
        target => '/etc/hosts',

Tuesday, October 23, 2012

KVM file permissions

Recently I have been testing CentOS 6 with libvirt & KVM. My goal is to set up a cluster of servers and enable migration between them without shared storage.

This seems possible but I hit a roadblock with file permissions. I am using a directory pool but any newly created file is assigned the permissions 0600 and owned by root:cso (cso is the group that all the sysadmins are in). The XML schema for libvirt pools allows me to specify a mode, owner & group but they don't seem to be honoured when the file is created.

My workaround to this problem was to create a hook which runs when a virtual machine is started. This gives me a chance to change the permissions to the correct values. The libvirt hooks are not widely publicised but at least on CentOS 6, you create the file and it just works. See

The basic skeleton for my hook is:

if [ "$2" = "prepare" -a "$3" = "begin" ] ; then
   # Fix the permissions
exit 0

The next headache is that I don't know which disks are needed for this virtual machine. This info is provided on stdin but it is in XML which is not easy for bash to process.

My solution to this was to use XSLT to transform the XML into a bash script.
I have never used XSLT before so there was a fair amount of guessing involved. The output of xsltproc always has an XML header which I strip off with grep. The output is logged to syslog with the logger command. The final hook script looks like this:
if [ "$2" = "prepare" -a "$3" = "begin" ] ; then
   /usr/bin/xsltproc /etc/libvirt/hooks/qemu.hook.xsl - | \
      grep -v '?xml' | \
      sh -x | logger
exit 0

As for the XSL, I am no expert but I got it working. I could not work out how to correctly escape the values of the file names so there could be a nasty surprise in there if you don't trust your users.

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="">
<xsl:template match="/">
<xsl:for-each select="domain/devices/disk/source">
chown qemu:cso '<xsl:value-of select="@file"/>'
chmod 0660 '<xsl:value-of select="@file"/>'

Monday, October 8, 2012

Mobile Internet

I recently got a new work phone, the Samsung Galaxy Nexus. This is my first 'smart' phone and it seems to be a good fit with my Linux interests.

The problem is that work will not pay for mobile data so I can't make use of it on the train (some may argue that is a good thing). My workaround was to get a portable 3G to WIFI router aka, MIFI, Pocket WIFI etc.

I had already researched phone plans and worked out that the phone 'cap' seems to be one of the best marketing tricks ever schemed up. The only plan of interest to me is the TPG $1 per month pay as you go plan. Being an existing TPG customer I also get 150Mb of free data per month.

So the next question was which router to get? Most comparisons seemed to be caught up on the price of the data but I wanted to know which one works the best. In the end I went for the Vodafone Pocket WIFI. This is a Huawei E585 V2. It comes with a Vodafone SIM which you must activate and install yourself. On special for $40 I though I would give it a go. After several hours trying to activate the SIM and then a few more trying to make the phone get an address via DHCP I was on the internet. (The activation problem was because the Vodafone web site was asking for the wrong ID numbers. The DHCP is still an issue and is presumably a bug in ICS).

So, once it was all working I tried it out on the train and was most unimpressed. Although it would say connected it rarely managed to download as much as a web page. There are people complaining about this on the Vodafone forums and they insist that if there is a problem it will be fixed real soon now. Luckily, Vodafone did not factor in my plan so once my free month was up I unlocked the router and switched to TPG (which uses Optus 3G).

This is much more useful on the train though still does not knock my socks off. I have come to the realisation that the term 'Mobile Broadband' is a misnomer and should really be called 'Portable Broadband'. You can use it in one spot. You can take it and use it somewhere else, but try and use it on the way and it will not work very well at all.

Perhaps other networks work better. I see lots of new towers on the side of the train line. Perhaps other routers work better. The software quality on these things is never very impressive. Perhaps the whole thing is a joke and handing off to the next tower will never work (it does not work for voice, why would it work for data?).

Perhaps one day work will pay for my data and I can compare to the built in functionality of the phone.

Finally, it turns out that using facebook on the train is not as exciting as I was led to believe and my need for data is not that great anyway.

Thursday, September 13, 2012

Adding Drupal nodes with a specific ID

As part of a drupal upgrade/migration I had a requirement to create a few thousand nodes with specific IDs. The internet said it could not be done but I did not believe it.

After diving into the bowels of the drupal code I found the spot which did the creating and it seemed to be doing something quite reasonable with the new node ID.

With just a small amount of convincing I managed to get it to do what I wanted.

diff -ru vanilla/drupal-7.12/modules/node/node.module web/drupal/modules/node/node.module
--- vanilla/drupal-7.12/modules/node/node.module        2012-02-02 09:03:14.000000000 +1100
+++ web/drupal/modules/node/node.module     2012-05-28 16:37:36.827171000 +1000
@@ -1095,6 +1095,12 @@
     if ($node->is_new) {
       // For new nodes, save new records for both the node itself and the node
       // revision.
+       echo "Requested save new node with nid {$node->request_nid}\n";
+       $node->nid = $node->request_nid;
+       //print_r($node);
       drupal_write_record('node', $node);
       _node_save_revision($node, $user->uid);
       $op = 'insert';

Creating a node can now be done like this:
$node = new stdClass();
$node->type = ...;
// fill in node details here...
$node->request_nid = $my_nid;

That seems quite trivial now. I don't know why so many people say it can't be done.

Friday, September 7, 2012

/usr/bin/ld: cannot find -lm

I was updating some of my software to run on CentOS6 and I had an unexpected error:
/usr/bin/ld: cannot find -lm
collect2: ld returned 1 exit status

The tool I was building uses -static and it turns out that static libraries are no longer shipped in glibc-devel. They are in a new package called glibc-static.

yum install glibc-static
solved the problem. My program now compiles and thanks to puppet, glibc-static is now installed on all my systems.
package { 'glibc-static': ensure => present }

Wednesday, August 29, 2012

Virtual Volumes and Dokan

It has been a holy grail for many years now and it is very close. The ability to assign a drive letter to your linux filesystems under Windows.

I am doing some final tests on my first release of Virtual Volumes which includes support for Dokan. Dokan is the Windows equivalent of FUSE under Linux. This means you can implement a filesystem in a userspace application.

There are some knows issues but my Windows XP install test was successful and I am currently doing some large file testing. If all goes well I might get a beta version released this weekend.

Tuesday, August 14, 2012

The blog is back

After some time (4 years) I have a blog again. I have been doing a lot with CentOS 6 and I need somewhere to post my results so this will be it. Probably with some general Linux & PC things, filesystems, networks and even the odd opinion on something non-technical.

I found my old blog here
I'll try and syndicate that over here once I work out how blogger works.