NTFS compression minutiae

It’s been a rather productive sprint of making enhancements to DiskDigger in recent weeks, specifically in solidifying its support for parsing NTFS filesystems, and supporting the finer details and edge cases that are found in the wild.

Firstly, individual files in NTFS can be compressed, which the user can choose to do to conserve disk space. This is done by taking the file data and compressing it using a variant of LZ77. (DiskDigger has been able to recover compressed files, but I have now improved the speed of the decompression routine.)

However, there is a second way that files in NTFS can be compressed. Starting with Windows 10, there is a system process that runs in the background and looks for system files and program files that are seldom used, and compresses them in a different way: it creates an alternate data stream called WofCompressedData and writes the compressed data to it, using either the Xpress or LZX algorithms (Xpress seems to be the default). The original data stream is turned into a sparse stream, and the file gets a reparse point added to it. This is a bit confusing because these types of files are not shown as “compressed” files in Windows Explorer, and yet they are compressed at the level of the filesystem. You can see which files are compressed this way by running the compact command at the command line.

Anyway, DiskDigger now supports this second type of compression in NTFS filesystems, and will transparently decompress the file when it’s recovered. Of course this also means that these types of files can only be recovered at the filesystem level (“Dig Deep” mode) and cannot be carved heuristically (in “Dig Deeper” mode).

As a side note, various different types of compression are available via Windows APIs or the .NET framework itself:

  • .NET offers the System.IO.Compression namespace which has DeflateStream and GZipStream, which implement the DEFLATE algorithm (using zlib under the hood), and also ZipArchive for dealing with actual Zip files. Many file formats use DEFLATE compression, but it doesn’t help us for compressed NTFS files.
  • Windows itself provides the little-known Compression API (via cabinet.dll) which lets us use the Xpress, LZMS, and MSZIP algorithms. This API can be easily P/Invoked from .NET, but unfortunately this means it can’t be used outside of Windows. This API fulfills half of the requirements for decompressing WofCompressedData streams, which use Xpress compression by default.

However, the above APIs are the extent of what’s offered to us by the system, which means that algorithms like LZ77 and LZX must be implemented manually, or using other third-party libraries. I opted for the former in DiskDigger, to keep my number of dependencies to a minimum.