Thursday, July 18, 2013

Sparse files on Windows

Once again I am drawn away from Linux to solve a Windows problem. The source of the problem is Hyper-V which (as always) has a cryptic error message about 'cannot open attachment' and 'incorrect file version'.

The source of the error was tracked down to the file being flagged as Sparse.

What is a sparse file?
Under UNIX/Linux, a sparse file is a file where not all of the storage for the file has been allocated. Handling of sparse files is normally transparent but some tools like file copy and backup programs can handle sparse files more efficiently if they know where the sparse bits are. Getting that information can be tricky.

In contrast, under windows, a sparse file is a file which has the sparse flag set. Presumably the sparse flag is set because not all of the storage for the file has been allocated (much like under Linux). Interestingly, even if all the storage is allocated, the sparse flag may still be set. (It seems the flag indicates the potential to be sparse rather than actually being sparse. There is an API to find out the actual sparse parts).

The the problem started when I happened to download a Hyper-V virtual machine using BitTorrent. When the files are being created, not all of the content exists so it is indeed sparse. Once all the content has been supplied, the file is (to my mind anyway) no longer sparse. However, under windows it seems, once a sparse file, always a sparse file.

Microsoft provide a tool to check and set the sparse flag:
fsutil sparse queryflag <filename>
fsutil sparse setflag <filename>
Note 1: Have they not heard of get and set
Note 2: You can't use a wildcard for <filename>
The amazing thing to note here is that there is no clearflag option. This might lead you to believe that you can not do that. In fact you can. For users in a pickle, there is a program called Far Manager which can (among other things) clear the flag. Far Manager is open source and a quick peek at the code shows that it uses a standard IOCTL to do this named FSCTL_SET_SPARSE.

So with that knowledge, it is actually quite easy to make a file not be sparse any more. In fact, I wrote a program called unsparse.

Not only does the tool have the ability to clear the sparse flag, it can recursively process a directory and unsparse all the sparse files found, making it perfect to fix up a Hyper-V download.

Look for the program soon on my chrysocome website http://www.chrysocome.net/unsparse