Blog

NTFS compression minutiae

It’s been a rather productive sprint of making enhancements to DiskDigger in recent weeks, specifically in solidifying its support for parsing NTFS filesystems, and supporting the finer details and edge cases that are found in the wild.

Firstly, individual files in NTFS can be compressed, which the user can choose to do to conserve disk space. This is done by taking the file data and compressing it using a variant of LZ77. (DiskDigger has been able to recover compressed files, but I have now improved the speed of the decompression routine.)

However, there is a second way that files in NTFS can be compressed. Starting with Windows 10, there is a system process that runs in the background and looks for system files and program files that are seldom used, and compresses them in a different way: it creates an alternate data stream called WofCompressedData and writes the compressed data to it, using either the Xpress or LZX algorithms (Xpress seems to be the default). The original data stream is turned into a sparse stream, and the file gets a reparse point added to it. This is a bit confusing because these types of files are not shown as “compressed” files in Windows Explorer, and yet they are compressed at the level of the filesystem. You can see which files are compressed this way by running the compact command at the command line.

Anyway, DiskDigger now supports this second type of compression in NTFS filesystems, and will transparently decompress the file when it’s recovered. Of course this also means that these types of files can only be recovered at the filesystem level (“Dig Deep” mode) and cannot be carved heuristically (in “Dig Deeper” mode).

As a side note, various different types of compression are available via Windows APIs or the .NET framework itself:

  • .NET offers the System.IO.Compression namespace which has DeflateStream and GZipStream, which implement the DEFLATE algorithm (using zlib under the hood), and also ZipArchive for dealing with actual Zip files. Many file formats use DEFLATE compression, but it doesn’t help us for compressed NTFS files.
  • Windows itself provides the little-known Compression API (via cabinet.dll) which lets us use the Xpress, LZMS, and MSZIP algorithms. This API can be easily P/Invoked from .NET, but unfortunately this means it can’t be used outside of Windows. This API fulfills half of the requirements for decompressing WofCompressedData streams, which use Xpress compression by default.

However, the above APIs are the extent of what’s offered to us by the system, which means that algorithms like LZ77 and LZX must be implemented manually, or using other third-party libraries. I opted for the former in DiskDigger, to keep my number of dependencies to a minimum.

Euler walks

This is the first 10000 digits of e (the base of the natural logarithm), as interpreted by a spiral walk determined by each successive digit:

Zoom:

And here is a similar interpretation for γ (the Euler-Mascheroni constant):

Zoom:

Pi walk

This is the first 10000 digits of π, as interpreted by a spiral walk, with each step of the walk determined by each digit. In other words, if the first digits are “3.1415…” then we walk up 3 pixels, then left 1 pixel, then down 4 pixels, then right 1 pixel, then up 5 pixels, and so on, while painting each step of the walk with a different random color.

Zoom:

Ulam’s spiral in your browser

My day-to-day work is focused mostly on Android and Windows development, so I often find myself a bit disconnected from web development. I thought I’d go through a few random exercises in JavaScript, and simultaneously bring some of my oldie-but-goodie projects “up to date,” as it were. A long time ago I made a Windows application that displays the Ulam prime number spiral, but there’s no reason it can’t be done in the browser today, so here we go:

Zoom:
Highlight twin primes
Highlight Mersenne primes

The above picture is dynamically generated in your browser. Go ahead and interact with it: you can use your mouse scroll wheel to zoom in and out, and use the checkboxes to highlight certain special types of primes.

Send me all your floppies! They nourish me.

Recently I had another data recovery case that involved a comically large number of floppy disks, as in… more than five hundred (split evenly between 3.5” and 5.25” disks). We’re talking several large USPS boxes packed to the brim with floppies.

Of the numerous 3.5” floppies, only about 10% had one or more bad sectors, and none of them were completely unreadable.  The same was true for the 5.25” floppies, even though some of them were physically bent or warped, to the point where I had to cut them open and transplant the disk itself into a new container.  Some of the oldest files on these disks dated all the way back to 1986!

The recovery was performed using two older PCs, each of which have both 3.5” and 5.25” internal floppy drives, allowing the reading to be done somewhat in parallel.

There are actually plenty of cheapo floppy drives that connect over USB that can be purchased even now for as little as $15, but these drives are not, I repeat not suitable for recovering data from actual old floppy disks.  They must be read by a proper original floppy drive, preferably from the same era as the disks themselves.

Anyway, when floppy disks were in widespread use in the 1980s and 1990s, they weren’t really intended or marketed as a long-term storage solution, but they’re proving to be quite resilient as time goes by.  I’m not nearly as optimistic that today’s USB flash drives or SD cards will be readable in 30 years.

To be fair, these old disks have a much lower data density than modern storage media, so it makes sense that they would be more resilient to wear and tear. But still, it’s impressive that even what seems like mediocre-quality floppy disks still hold up to this day.

Despite these excellent outcomes, this still underscores how important it is to recover this data now, rather than waiting any longer and risking these disks developing any more bad sectors. So, let this be a call to action: if you have any old floppies lying around (or old tapes, Zip disks, Jaz disks, or anything else!), contact me for details on how to send them over, and I’ll recover the data from them for a fraction of the cost of other companies.