ECC RAM should be a human right

I am now a staunch advocate for ECC RAM, after the events of last week. You see, over the last several weeks my main desktop workstation has been misbehaving, with occasional freezing and crashing. After some diagnostics I began to suspect a faulty RAM module, and sure enough, upon performing a quick run of memtest86, it lit up the screen with a multitude of bit flip errors, at numerous memory locations, indicating that something was seriously wrong with the RAM.

Within a day or two I scrambled to replace the RAM modules with new ones, and when this was done the problems resolved themselves and everything was stable again. However, there was another more sinister side effect that I discovered shortly afterwards: Some of my data was corrupted! That’s right, it was the worst-case scenario for RAM failure: bit flip errors that get written back to the disk. I discovered that several video files that I had been editing had corrupted bits, and were no longer usable. Fortunately I still have the original source materials for the videos which I can use to recreate the final videos. It’s an unfortunate waste of time, but it could have been a lot worse if I’d let the RAM failure go on even longer. There doesn’t seem to be any further corruption in any more of my personal data, and just to be on the safe side I performed a clean install of Windows, to ensure that no system files or program files are corrupted.

The point of the story is that the data corruption was completely preventable, if only my RAM had ECC built into it. But because it doesn’t, these kinds of bit flip events go completely undetected, and proceed to wreak havoc on the integrity of our data, right under our noses.

Memory manufacturers assure us that desktop RAM is so reliable that it doesn’t need ECC, that the probability of bit flip events is so low that it’s not worth the extra “cost” of ECC. Chip manufacturers (i.e. Intel) produce CPUs that don’t even support ECC memory. Users are expected to upgrade to server-grade components just to get access to ECC memory.

Let’s quickly review the reasons why server-class machines are deemed to be “deserving” of ECC memory, while desktop machines are not:

On a personal desktop computer, your data is stored permanently on a disk, whether it’s a spinning hard drive, SSD drive, memory card, and so on. When you want to do something with your data (e.g. write a document, edit a photo, etc), the data is loaded into RAM, and when you’re finished modifying your data, it’s written back to the disk.

On a server machine, however, the situation is different: since disk access is much slower than RAM access, the server must keep as much data as possible in RAM, so that the data is instantly available to clients who request it. This means that the data ends up sitting in RAM for extended periods of time. If the RAM were to experience bit-flip errors that went undetected, the server would serve incorrect data, or worse, would end up writing incorrect data back to the disk. Therefore, the server’s RAM has ECC, so that it will correct itself in case of an occasional bit flip.

This is oversimplifying a bit, but the difference between a server and a desktop, for this exercise, is simply the amount of time that data is made to sit in RAM. So then, are we supposed to accept that if our data doesn’t remain in RAM for very long, it doesn’t need ECC at all?!

By the way, you’d better believe that your disk(s) have all kinds of error correction schemes built into them, which work automatically and transparently. It’s completely normal for data written to a physical medium to be imperfect, and those imperfections will be corrected by the firmware of the disk.

Well guess what? RAM is a physical medium, and yet we’re simply asked to take the manufacturers’ word that our RAM is reliable enough to never need ECC for the use cases of a desktop workstation. Well, I’m here to say that these practices are reckless, and represent a ticking time bomb for anyone who uses non-ECC memory for anything nontrivial. And it seems I’m not the only one.

Software update round-up

It’s high time to give some love to a few of my older and less-maintained software projects, and bring them up to date with a few much-needed and requested features!

DiskImager

DiskImager is a tool that I’ve used “internally” for a few years now to read and write raw disk images. I’ve simply never found the time to polish it up and make it production-ready, until now. This is a small standalone tool that will dump the contents of any drive connected to your PC to a file on another (larger) drive. It can also write a disk image file to a physical disk. Furthermore, when selecting a disk image to write to a physical disk, you can choose from several types of image formats (besides raw images) including VDI, VMDK, VHD, and E01.

FileSystemAnalyzer

The FileSystemAnalyzer tool has gotten a huge number of bug fixes, as well as these enhancements:

  • Improved compatibility with FAT, exFAT, NTFS, ext4, and UDF filesystems in various states of corruption.
  • Improved previews and metadata for more file types.
  • The main file tree view now has columns with file size and date, similar to Windows Explorer. These columns are clickable to sort the file list by ascending or descending order of the column type.
  • Directories can now be saved from the file tree view (recursively), in addition to individual files.

Outlook PST viewer

The PST viewer tool has been updated to be more compatible with a wider range of PST files from different versions of Outlook, and to be more forgiving of corrupted PST files. There is also a new option to save individual messages as .MSG files.

Minimalist programming, Android edition

Suppose you open Android Studio and create a new project from a template, say, a blank Activity with a button. Then build the project, and look at the APK file that is produced. You’ll notice that the APK is over 3 MB in size. These days we don’t bat an eye at these kinds of numbers, and indeed this size is pretty modest in the grand scheme of today’s software ecosystem. However, objectively speaking, 3 MB is a lot! Let’s take a deep dive into what these 3 MB actually consist of, and see how much we can reduce that size while maintaining the same functionality.

The app I’ve built for this exercise is slightly more complex than “hello world”; it’s an app that could actually be minimally useful: a simple tip calculator.

It’s literally a single input field (an EditText component) where the user enters a number, followed by a few lines of text that tell the user different percentages of that number — 15%, 18%, and 20%, which are the most common tipping percentages in the U.S.

Once again, with the default project settings generated by Android Studio, this app comes out to about 3.2 MB when it’s built. Let’s examine the generated APK file and see what’s taking up all that space:

Right away we can see that the heaviest dependency by far is the AndroidX library, followed by the Kotlin standard library and the Material library (under the com.google package). In fact the code that actually belongs to our package (com.dmitrybrant.tipcalc) is a mere 76 KB, dwarfed by the library dependencies that it’s referencing.

To be fair it’s possible to reduce the size of the APK by a good amount by using the minifyEnabled directive, which is not enabled by default. In fact it would probably optimize away most of the “kotlin” dependency and much of the “androidx” dependency. However, even with minifyEnabled our APK size would still be on the order of megabytes. For the purpose of this exercise I left minifyEnabled off, so that we can see exactly which packages are contributing to the code sizes in our APK.

In any case, let’s start whittling away at this extra weight, and see how lean we can get.

The Kotlin tax

As we can see, merely using Kotlin in our app causes the Kotlin standard library to get bundled into the APK. If we don’t want this library to get bundled, we must no longer use Kotlin. (Although I’ll repeat that if we use minifyEnabled, then Kotlin would pretty much be optimized away, so this is more of an observation than a “tax.”)

After converting the code to plain Java and rebuilding the app, our APK is now 2.7 MB:

That’s a little better! But now can we remove the bulkiest dependency, namely the androidx library?

The AndroidX tax

AndroidX is a fabulous library that ensures your app will run consistently on a huge range of different devices (but not all of them!) and different versions of the Android OS. It makes perfect sense that AndroidX is used by default for new projects, and I’m not saying that you should reject it when building your next app. Buuuut… could we actually get away with not using it? How would our app look and run without it? And would our app still run on the same range of devices?

Getting rid of AndroidX means that our app will rely solely on the SDK libraries that are part of the operating system on the user’s device itself. To get rid of AndroidX in our project, we need to do the following:

  • Our Activity can no longer inherit from AppCompatActivity, and will now simply inherit from the standard Activity class from the SDK.
  • We can no longer use fancy things like ConstraintLayout, and will be limited to using basic components like LinearLayout.
  • Our theme definitions can no longer inherit from predefined Material themes. We will need to apply any color and style overrides ourselves.

After these modifications are all done, here’s what our APK looks like:

That’s right, you’re not dreaming, the app is now 88 KB. That’s kilobytes! Now we’re getting somewhere. And if we look closely at those numbers, we see that the bulk of the size is now taken up by the resources that are bundled in the app. What are those resources, you ask?

Launcher icons

By default Android Studio generates a launcher icon for our app that takes several forms: a mipmap resource, which is a series of PNG files at different scales, which will be chosen by the launcher to match the pixel density and resolution of the device, and also a vector resource that will be used instead of the mipmap on newer devices (Android 8 and higher).

This is all very useful stuff if you need your launcher icon to appear pixel-perfect across all devices. But since our goal is minimalism, we can dispense with all of these things, and instead use just a single PNG file as our launcher icon. I created a 32×32 icon and saved it as a 4-bit PNG file, making it take up a total of 236 bytes. It doesn’t look perfect, but it gets the point across:

So, after getting rid of all that extraneous baggage, how are we looking now?

We’ve arrived at 10.5 KB! This is more like it. It may be possible to squeeze it down even further, but that would necessitate doing even more hacky and inconvenient things, such as removing all XML resources and creating layouts programmatically in our code. While I’m going for minimalism, I do still want the app to be straightforward to develop further, so I’m happy to make this a good stopping point.

This is definitely closer to the size that I would “expect” a tip calculator app for a mobile device to be. Speaking of mobile devices, which devices will this app be able to run on?

Compatibility

By default Android Studio sets our minimum SDK to 21, making our app compatible with Lollipop and above. There are plenty of good reasons to set your minimum SDK to 21, but now that we’ve removed our dependency on AndroidX, as well as our dependency on vector graphics, there’s nothing stopping us from reducing our minimum SDK even lower. How much lower? How about… 1? That’s right, we can set our minimum SDK version to 1. This would make our app compatible with literally every Android device ever made.

I don’t own any devices that actually run Android 1.0, but here is my Tip Calculator app running on the oldest device I own, a Samsung Galaxy Ace from 2011, running Android 2.3 (API 9):

And here is the same app running on my current personal phone in 2021, a Google Pixel 3 XL running Android 11 (API 30):

Takeaways

Aside from being an interesting random exercise, there’s a point I hope to convey here:

As time goes by, software seems to be getting more and more bloated. I believe this might be because developers aren’t always cognizant of the cost of the dependencies they’re using in their projects, whether it’s third-party libraries that provide some kind of convenience over standard functionality, or even the standard libraries of their chosen programming environment that the developer has gotten used to relying upon.

As with anything in life, there should be a balance here — a balance between convenience offered by libraries that might add bloat, and lower-level optimization and active reduction of bloat. However, it feels like this balance is currently not in a healthy place. The overwhelming emphasis seems to be on convenience and abstraction ad infinitum, and virtually no emphasis on stepping back and taking account of the costs that these conveniences incur.

Android is far from the worst offender in the world of bloat, and even though a 3 MB binary may be totally acceptable, it doesn’t have to be that way. Even though bulky standard libraries should be used in the majority of cases, they don’t need to be used all the time, and there may even be cases where the app would benefit from not using them. If only developers would maintain a better sense of how their dependencies are impacting the size of their apps, or indeed what dependencies they’re even using in the first place, we can begin to restore the balance of bloat in our lives.

Reverse engineering a 25-year-old Visual Basic app

Following up from last week’s misadventures with the Avant Stellar keyboard (trying and failing to extract macro information from the keyboard’s internal memory), there was another glimmer of hope:  my friend found a backup file that possibly contains all the macros that were saved to the keyboard.  If I could just reverse-engineer this backup, we could extract the macros directly from the file.  It is a 2 KB file with a .KBD extension, unrecognizable as any binary format I’ve seen to date. Here is a partial hex dump of the file:

It’s pretty clear that the file contains a key mapping, as evidenced by the list of incrementing 32-bit numbers at the beginning, up to offset 0x210.  There are roughly 120 increasing numbers, which is roughly the number of keys on the keyboard, so we can safely assume that this is the key mapping.  After the key mapping, I presume, comes the macro information, and this is where things get tricky, since there’s virtually no way to tell how the macros are encoded in the file. The data simply looks too general to make sense of.

An obvious possibility would be to “load” the backup file into the Avant software tool that came with the keyboard, and visually inspect the macro(s) assigned to each key.  But no matter what I tried, the software would not load the file.  Or rather, it loaded the key mapping, but not the macros.  Time to think about the nuclear option: disassemble the Avant software and see how it’s actually processing the backup file.

Looking at the folder contents of the Avant software tool, I immediately notice a dead giveaway: VBRUN300.DLL, which means this tool was written in Visual Basic 3.0.  This makes our job much easier, because there are actually ready-made tools for decompiling Visual Basic executables. (If you recall, Visual Basic compiles executables into p-code instead of native machine code, which makes them much more straightforward to decompile.)  All of this took me quite a while to remember, because I hadn’t used these tools since my early, early hacking days, and it took a little while longer to find them in my archives!  The go-to utility for performing this task was literally called VB3 Decompiler, and the way to find this tool on the web today is… outside the scope of this post.

The decompilation basically results in several Visual Basic source files, in which the original function names are intact, but the local and global variables are changed to generic identifiers, since those names are not stored in the compiled code. It takes a little bit of further massaging to get these files to actually build within Visual Basic, but after that, it’s almost as if you have the original source code of the program at your fingertips.

There was one other minor hurdle because the Avant software uses custom UI components (.VBX files) that don’t allow themselves to be used in Design mode (as part of a copy-protection or licensing mechanism), but this is bypassable using another utility in the decompiler suite that “fools” Visual Basic into loading the components anyway.

With the source code buildable and debuggable, we can now easily run the program and load the .KBD backup file, and trace through where it processes the data in the file:

Even though the variable names aren’t very descriptive in the above screenshot, it’s easy enough to spot the loop that deserializes the keyboard macros, and how each macro is composed.  Not only that, but we can determine what was preventing it from displaying the macros in the first place – it turned out that it expects the keyboard to be physically connected while running, and while I’m pretty sure that we tried loading the backup with the keyboard attached, it wasn’t working anyway, probably because the keyboard is malfunctioning and no longer able to communicate properly.  But at last, with this requirement bypassed, the macros that were loaded from the backup file finally reveal themselves:

Recovering data from QIC-80 tapes: another case study

In another of my recent data recovery cases, the patient was a QIC-80 (DC 2000 mini cartridge) tape that was in pretty bad shape. From the outside I could already see that it would need physical repairs, so I opened it up and found a harrowing sight:

The problem with QIC cartridges in general is that they use a tension band to drive the tape spools. If you look at a QIC cartridge, it’s completely enclosed, except for a plastic wheel that sticks out and makes contact with the capstan mechanism when it’s inserted into the tape drive.  The flexible tension band is tightened in such a way that it hugs the tape spools and drives them using physical friction.

This tension band is a major point of weakness for these types of tapes, because the lifespan of the band is very much finite.  When the tape sits unused for many years, the band can stiffen or lose its friction against the tape spools, which can result in one of two scenarios:

The tension band can break, which would make the cartridge unusable and would require opening the cartridge and replacing the band. This is actually not the worst possible outcome because replacing the band, if done properly, isn’t too disruptive of the tape medium itself, and usually doesn’t result in any data loss.

A much worse scenario is if the tension band becomes weakened over time, such that it no longer grips the spools properly, so that the next time you attempt to read the cartridge, it will spin the spools inconsistently (or cause one of the spools to stall entirely), which will cause the tape to bunch up between the spools, or bend and crease, creating a sort of “tape salad” inside the cartridge, all of which can be catastrophic for the data on the tape. In this kind of case, the cartridge would need to be disassembled and the tape manually rewound onto the spools, being extremely careful to undo any folds or creases (and of course replace the band with a new one, perhaps from a donor tape). This will almost certainly result in loss of data, but depends greatly on the degree to which the tape was unwound and deformed.

Note that the tape drive that is reading the tape is relatively “dumb” with respect to the physical state of the tape. It has no way of knowing if the tension band is broken, or if the tape isn’t wound or tensioned properly, or if what it’s doing is damaging the tape even further. Great care must be taken to examine the integrity of the tape before attempting to read it.

With this cartridge, it’s clear that the tension band has failed (but didn’t break). The tape has obviously bunched up very badly on both spools.  Less obviously, the white plastic wheels at the bottom show evidence that the tension band has degraded, with bits of residue from the black band being stuck on the wheels. The fix for this cartridge was to remove the bad tension band, clean the white plastic wheels, respool and tighten the tape, and install a new band from a donor tape. After the procedure was complete, more than 99% of the data was recovered. The tape header was readable, as were the volume tables. Only a few KB of the file contents were lost.

Therefore, when recovering data from very old QIC tapes, it’s probably a good idea to replace the tension band preemptively with a known-good band, to minimize the chance of breakage and damage to the tape. This is why I keep a small stockpile of new(er) tapes from which I can harvest the tension band when needed. At the very least, it’s a good idea to open up the cartridge and examine the band before making any attempts to read data from it.