Updates to DiskDigger and FileSystemAnalyzer, October 2023

Usually I post updates about DiskDigger on its own website, but my most recent round of updates merits a slight technical digression.

Previous versions of DiskDigger and FileSystemAnalyzer have already had basic support for 4K-native disk drives, i.e. drives that have 4 KiB sectors instead of the usual 512 bytes. However, only recently have I been able to test this support more thoroughly, fixing a few bugs along the way. 4K-native drives have been around for a while, and in fact most modern drives already use 4K sectors natively under the hood, but simply emulate 512-byte sectors to the outside world. However, increasingly we’re seeing more drives that no longer emulate 512-byte sectors (exposing the native 4K sectors to the operating system), as well as users who are opting to reconfigure the firmware of their drive to use 4K sectors instead of 512-byte emulation. DiskDigger and FileSystemAnalyzer can now handle all of these cases when mounting and searching file systems that might be present on such disks (FAT, NTFS, ext4, etc).

I did most of my testing and experimenting using a real 4Kn drive, but some testing I did with emulated disk images. Here is how you can configure qemu to treat a disk image as a 4Kn drive:

qemu-system-x86_64.exe -machine q35 -m 8G -boot d -cdrom "linux.iso" -drive file=mydisk.vdi,if=none,format=vdi,id=D24 -device nvme,drive=D24,serial=1234,logical_block_size=4096,physical_block_size=4096

The above example boots qemu from an ISO file, which can be a Linux live DVD, and makes the hard disk become a NVMe device, which allows us to configure its physical and logical block size, which we set to 4096. Linux should detect this NVMe device automatically, which will then let you create partitions and file systems on it for experimentation.


The other interesting update has to do with ancient retro file systems that are supported by FileSystemAnalyzer (and by extension DiskDigger). By coincidence, I’ve been contacted by multiple people in a short span of time regarding recovering data from Xenix file systems which they’ve saved as binary disk images. One image is from an Intel System 320 Multibus System owned by Herb Johnson of retrotechnology.com, and another is from an owner of an Altos 586 system in New Zealand.

image

Each of these images used a slightly different version of the Xenix file system, each of which use a different structure for their superblock (and each of which is different from the Xenix/SysV support that’s built into the current version of the Linux kernel). This took a bit of effort to reverse-engineer, but ultimately wasn’t too difficult to crack and integrate into FileSystemAnalyzer. The nice thing about dealing with very old data formats is that they’re usually very simple, not to say primitive. Best of all, these Xenix images contain C header files that actually describe their own filesystem structure (can I call them eigenheaders?), which I was able to use for refining and solidifying support for these file systems.

image

I even learned something else that was new to me: in addition to little-endian and big-endian byte orders, there’s also something called “middle-endian” or “PDP-11-endian”, where 16-bit values are stored in native little-endian order, but 32-bit long integers are composed of two 16-bit words in big-endian order (while the numbers in both 16-bit halves are still little-endian). This was the encoding used by the PDP-11 system, and apparently also by the Altos 586 system which was running this version of Xenix. All of these variations are now supported in FileSystemAnalyzer.

Brain dump, September 2023

I finally did something I’ve been meaning to do for a long time: get the final version of the ftape driver to work on a Linux distro that I can use in my data recovery workstations. This is for the purpose of using Linux to dump the contents of QIC-80 and similar tapes, using “floppy tape” drives, i.e. tape drives that connect to the floppy disk controller on the motherboard.

Up until this point, I’ve been using an old version of Ubuntu that has ftape pre-packaged into the kernel. The problem with this is that this version of ftape is not the latest. Development of ftape seemed to continue independently of the version that was included with the kernel. And the “last” version of ftape that is available (version 4.04a, from around July 2000) contains many enhancements over the version that was in the kernel, which seems to be 3.04, specifically compatibility with parallel port tape drives such as the Iomega Ditto 2GB.

This meant that I needed to compile the driver from source. Sounds simple enough; the driver is just a couple of loadable kernel modules. However, I would need to compile it for a version of the kernel that can boot nicely on my workstation. Browsing the source code of the driver, it appears to be intended to be compiled for kernel version 2.4.x. As an amateur kernel hacker in a previous job, I knew that even patch version changes (the third version number) in the kernel can break compilation of custom kernel modules. So, I tried to find a Linux distro that uses the earliest possible patch version of the 2.4 kernel, and still runs well on my workstation.

image

CentOS 3.5 to the rescue! I was able to find ISO installation media that I used to install CentOS 3.5 flawlessly onto my recovery workstation. It uses kernel version 2.4.21, which still turned out to be “too new” for compiling ftape successfully. I got a number of compilation errors, but thankfully they were all errors that were comprehensible and easy to remedy by an amateur. After just a few hacky modifications, I got the driver to compile into a loadable module!

And would you look at that – it’s able to communicate successfully with all of my floppy tape drives, as well as my parallel port Ditto 2GB drive!

image

Here’s my repository on GitHub that has the source code for the ftape driver, with my modifications for getting it to build in CentOS 3.5.


In other news, I found and restored an old ThinkPad X131e, which came to me as a Chromebook, i.e. with ChromeOS installed. In order to remove ChromeOS and install a regular Linux distro, I had to overwrite it with custom firmware that allows installing other operating systems. And in order to overwrite the firmware, I had to disassemble it and flip a physical write-protect switch that allows the firmware to be written. Why do they do this?! Anyway, with the latest version of the lightweight Xubuntu installed, this tiny thing works beautifully, and can now have a second life.

image

Brain dump, February 2023

As a software archaeologist I often find myself trying out old software that I hadn’t used myself in my own career. I think this can be very instructive, since old software can often have some good ideas built into it, ideas that might have been forgotten, but nevertheless ideas from which we can draw when building today’s software.

Recently I played around with Microsoft QuickC for Windows 3.1, which was a C development environment (IDE) targeted at individual developers, and had a rather modest set of features compared to enterprise-caliber IDEs of the era. Nevertheless my existing knowledge of Windows programming, coming from Windows 9x development and onward, transferred fairly easily onto QuickC, and I was able to develop a sample app fairly quickly:

image

It’s a Mandelbrot viewer/explorer app, which is one of my favorite “sample” apps to build in a new environment. It runs in any version of Windows 3.x, has no dependencies, and weighs in at 20KB. Here is the source code, if you like!

What struck me about using QuickC is the simplicity and efficiency of it. Even though it still has the familiar issues of native Windows programming — many screenfuls of boilerplate code and having to manually handle message loops and drawing subroutines — after this was out of the way, the sailing was smooth.

Today I make Android apps for a living, and I can’t help but compare the user experience of building an Android app (using Android Studio) to the experience of building old-school Windows apps, specifically in the way of efficiency. The compilation time of my QuickC app was no more than a few seconds (in an emulator that was emulating a 50 MHz PC). Compare this with building a similar Android app, where kicking off a clean Gradle build is a cue to take a coffee break, even on the most modern hardware. Of course over the years the Gradle build process has gotten faster, and the Android folks at Google are quick to award themselves a medal for improving build speeds by a few seconds. Still, it’s only very recently that Gradle has gotten fast enough to finish building a Hello World app in under a minute. I won’t even get into the sizes, now measured in gigabytes, that modern IDEs require to make themselves at home on our workstations, whereas the entirety of QuickC was able to fit on three floppy disks.

Is this kind of level of efficiency and streamlining squarely in the distant past of software tools, or can we in the present day take steps to get back to that spirit?

ECC RAM should be a human right

I am now a staunch advocate for ECC RAM, after the events of last week. You see, over the last several weeks my main desktop workstation has been misbehaving, with occasional freezing and crashing. After some diagnostics I began to suspect a faulty RAM module, and sure enough, upon performing a quick run of memtest86, it lit up the screen with a multitude of bit flip errors, at numerous memory locations, indicating that something was seriously wrong with the RAM.

Within a day or two I scrambled to replace the RAM modules with new ones, and when this was done the problems resolved themselves and everything was stable again. However, there was another more sinister side effect that I discovered shortly afterwards: Some of my data was corrupted! That’s right, it was the worst-case scenario for RAM failure: bit flip errors that get written back to the disk. I discovered that several video files that I had been editing had corrupted bits, and were no longer usable. Fortunately I still have the original source materials for the videos which I can use to recreate the final videos. It’s an unfortunate waste of time, but it could have been a lot worse if I’d let the RAM failure go on even longer. There doesn’t seem to be any further corruption in any more of my personal data, and just to be on the safe side I performed a clean install of Windows, to ensure that no system files or program files are corrupted.

The point of the story is that the data corruption was completely preventable, if only my RAM had ECC built into it. But because it doesn’t, these kinds of bit flip events go completely undetected, and proceed to wreak havoc on the integrity of our data, right under our noses.

Memory manufacturers assure us that desktop RAM is so reliable that it doesn’t need ECC, that the probability of bit flip events is so low that it’s not worth the extra “cost” of ECC. Chip manufacturers (i.e. Intel) produce CPUs that don’t even support ECC memory. Users are expected to upgrade to server-grade components just to get access to ECC memory.

Let’s quickly review the reasons why server-class machines are deemed to be “deserving” of ECC memory, while desktop machines are not:

On a personal desktop computer, your data is stored permanently on a disk, whether it’s a spinning hard drive, SSD drive, memory card, and so on. When you want to do something with your data (e.g. write a document, edit a photo, etc), the data is loaded into RAM, and when you’re finished modifying your data, it’s written back to the disk.

On a server machine, however, the situation is different: since disk access is much slower than RAM access, the server must keep as much data as possible in RAM, so that the data is instantly available to clients who request it. This means that the data ends up sitting in RAM for extended periods of time. If the RAM were to experience bit-flip errors that went undetected, the server would serve incorrect data, or worse, would end up writing incorrect data back to the disk. Therefore, the server’s RAM has ECC, so that it will correct itself in case of an occasional bit flip.

This is oversimplifying a bit, but the difference between a server and a desktop, for this exercise, is simply the amount of time that data is made to sit in RAM. So then, are we supposed to accept that if our data doesn’t remain in RAM for very long, it doesn’t need ECC at all?!

By the way, you’d better believe that your disk(s) have all kinds of error correction schemes built into them, which work automatically and transparently. It’s completely normal for data written to a physical medium to be imperfect, and those imperfections will be corrected by the firmware of the disk.

Well guess what? RAM is a physical medium, and yet we’re simply asked to take the manufacturers’ word that our RAM is reliable enough to never need ECC for the use cases of a desktop workstation. Well, I’m here to say that these practices are reckless, and represent a ticking time bomb for anyone who uses non-ECC memory for anything nontrivial. And it seems I’m not the only one.

(discussion on HackerNews)

Software update round-up

It’s high time to give some love to a few of my older and less-maintained software projects, and bring them up to date with a few much-needed and requested features!

DiskImager

DiskImager is a tool that I’ve used “internally” for a few years now to read and write raw disk images. I’ve simply never found the time to polish it up and make it production-ready, until now. This is a small standalone tool that will dump the contents of any drive connected to your PC to a file on another (larger) drive. It can also write a disk image file to a physical disk. Furthermore, when selecting a disk image to write to a physical disk, you can choose from several types of image formats (besides raw images) including VDI, VMDK, VHD, and E01.

FileSystemAnalyzer

The FileSystemAnalyzer tool has gotten a huge number of bug fixes, as well as these enhancements:

  • Improved compatibility with FAT, exFAT, NTFS, ext4, and UDF filesystems in various states of corruption.
  • Improved previews and metadata for more file types.
  • The main file tree view now has columns with file size and date, similar to Windows Explorer. These columns are clickable to sort the file list by ascending or descending order of the column type.
  • Directories can now be saved from the file tree view (recursively), in addition to individual files.

Outlook PST viewer

The PST viewer tool has been updated to be more compatible with a wider range of PST files from different versions of Outlook, and to be more forgiving of corrupted PST files. There is also a new option to save individual messages as .MSG files.