Brain dump, September 2023

I finally did something I’ve been meaning to do for a long time: get the final version of the ftape driver to work on a Linux distro that I can use in my data recovery workstations. This is for the purpose of using Linux to dump the contents of QIC-80 and similar tapes, using “floppy tape” drives, i.e. tape drives that connect to the floppy disk controller on the motherboard.

Up until this point, I’ve been using an old version of Ubuntu that has ftape pre-packaged into the kernel. The problem with this is that this version of ftape is not the latest. Development of ftape seemed to continue independently of the version that was included with the kernel. And the “last” version of ftape that is available (version 4.04a, from around July 2000) contains many enhancements over the version that was in the kernel, which seems to be 3.04, specifically compatibility with parallel port tape drives such as the Iomega Ditto 2GB.

This meant that I needed to compile the driver from source. Sounds simple enough; the driver is just a couple of loadable kernel modules. However, I would need to compile it for a version of the kernel that can boot nicely on my workstation. Browsing the source code of the driver, it appears to be intended to be compiled for kernel version 2.4.x. As an amateur kernel hacker in a previous job, I knew that even patch version changes (the third version number) in the kernel can break compilation of custom kernel modules. So, I tried to find a Linux distro that uses the earliest possible patch version of the 2.4 kernel, and still runs well on my workstation.

image

CentOS 3.5 to the rescue! I was able to find ISO installation media that I used to install CentOS 3.5 flawlessly onto my recovery workstation. It uses kernel version 2.4.21, which still turned out to be “too new” for compiling ftape successfully. I got a number of compilation errors, but thankfully they were all errors that were comprehensible and easy to remedy by an amateur. After just a few hacky modifications, I got the driver to compile into a loadable module!

And would you look at that – it’s able to communicate successfully with all of my floppy tape drives, as well as my parallel port Ditto 2GB drive!

image

Here’s my repository on GitHub that has the source code for the ftape driver, with my modifications for getting it to build in CentOS 3.5.


In other news, I found and restored an old ThinkPad X131e, which came to me as a Chromebook, i.e. with ChromeOS installed. In order to remove ChromeOS and install a regular Linux distro, I had to overwrite it with custom firmware that allows installing other operating systems. And in order to overwrite the firmware, I had to disassemble it and flip a physical write-protect switch that allows the firmware to be written. Why do they do this?! Anyway, with the latest version of the lightweight Xubuntu installed, this tiny thing works beautifully, and can now have a second life.

image

Brain dump, February 2023

As a software archaeologist I often find myself trying out old software that I hadn’t used myself in my own career. I think this can be very instructive, since old software can often have some good ideas built into it, ideas that might have been forgotten, but nevertheless ideas from which we can draw when building today’s software.

Recently I played around with Microsoft QuickC for Windows 3.1, which was a C development environment (IDE) targeted at individual developers, and had a rather modest set of features compared to enterprise-caliber IDEs of the era. Nevertheless my existing knowledge of Windows programming, coming from Windows 9x development and onward, transferred fairly easily onto QuickC, and I was able to develop a sample app fairly quickly:

image

It’s a Mandelbrot viewer/explorer app, which is one of my favorite “sample” apps to build in a new environment. It runs in any version of Windows 3.x, has no dependencies, and weighs in at 20KB. Here is the source code, if you like!

What struck me about using QuickC is the simplicity and efficiency of it. Even though it still has the familiar issues of native Windows programming — many screenfuls of boilerplate code and having to manually handle message loops and drawing subroutines — after this was out of the way, the sailing was smooth.

Today I make Android apps for a living, and I can’t help but compare the user experience of building an Android app (using Android Studio) to the experience of building old-school Windows apps, specifically in the way of efficiency. The compilation time of my QuickC app was no more than a few seconds (in an emulator that was emulating a 50 MHz PC). Compare this with building a similar Android app, where kicking off a clean Gradle build is a cue to take a coffee break, even on the most modern hardware. Of course over the years the Gradle build process has gotten faster, and the Android folks at Google are quick to award themselves a medal for improving build speeds by a few seconds. Still, it’s only very recently that Gradle has gotten fast enough to finish building a Hello World app in under a minute. I won’t even get into the sizes, now measured in gigabytes, that modern IDEs require to make themselves at home on our workstations, whereas the entirety of QuickC was able to fit on three floppy disks.

Is this kind of level of efficiency and streamlining squarely in the distant past of software tools, or can we in the present day take steps to get back to that spirit?

ECC RAM should be a human right

I am now a staunch advocate for ECC RAM, after the events of last week. You see, over the last several weeks my main desktop workstation has been misbehaving, with occasional freezing and crashing. After some diagnostics I began to suspect a faulty RAM module, and sure enough, upon performing a quick run of memtest86, it lit up the screen with a multitude of bit flip errors, at numerous memory locations, indicating that something was seriously wrong with the RAM.

Within a day or two I scrambled to replace the RAM modules with new ones, and when this was done the problems resolved themselves and everything was stable again. However, there was another more sinister side effect that I discovered shortly afterwards: Some of my data was corrupted! That’s right, it was the worst-case scenario for RAM failure: bit flip errors that get written back to the disk. I discovered that several video files that I had been editing had corrupted bits, and were no longer usable. Fortunately I still have the original source materials for the videos which I can use to recreate the final videos. It’s an unfortunate waste of time, but it could have been a lot worse if I’d let the RAM failure go on even longer. There doesn’t seem to be any further corruption in any more of my personal data, and just to be on the safe side I performed a clean install of Windows, to ensure that no system files or program files are corrupted.

The point of the story is that the data corruption was completely preventable, if only my RAM had ECC built into it. But because it doesn’t, these kinds of bit flip events go completely undetected, and proceed to wreak havoc on the integrity of our data, right under our noses.

Memory manufacturers assure us that desktop RAM is so reliable that it doesn’t need ECC, that the probability of bit flip events is so low that it’s not worth the extra “cost” of ECC. Chip manufacturers (i.e. Intel) produce CPUs that don’t even support ECC memory. Users are expected to upgrade to server-grade components just to get access to ECC memory.

Let’s quickly review the reasons why server-class machines are deemed to be “deserving” of ECC memory, while desktop machines are not:

On a personal desktop computer, your data is stored permanently on a disk, whether it’s a spinning hard drive, SSD drive, memory card, and so on. When you want to do something with your data (e.g. write a document, edit a photo, etc), the data is loaded into RAM, and when you’re finished modifying your data, it’s written back to the disk.

On a server machine, however, the situation is different: since disk access is much slower than RAM access, the server must keep as much data as possible in RAM, so that the data is instantly available to clients who request it. This means that the data ends up sitting in RAM for extended periods of time. If the RAM were to experience bit-flip errors that went undetected, the server would serve incorrect data, or worse, would end up writing incorrect data back to the disk. Therefore, the server’s RAM has ECC, so that it will correct itself in case of an occasional bit flip.

This is oversimplifying a bit, but the difference between a server and a desktop, for this exercise, is simply the amount of time that data is made to sit in RAM. So then, are we supposed to accept that if our data doesn’t remain in RAM for very long, it doesn’t need ECC at all?!

By the way, you’d better believe that your disk(s) have all kinds of error correction schemes built into them, which work automatically and transparently. It’s completely normal for data written to a physical medium to be imperfect, and those imperfections will be corrected by the firmware of the disk.

Well guess what? RAM is a physical medium, and yet we’re simply asked to take the manufacturers’ word that our RAM is reliable enough to never need ECC for the use cases of a desktop workstation. Well, I’m here to say that these practices are reckless, and represent a ticking time bomb for anyone who uses non-ECC memory for anything nontrivial. And it seems I’m not the only one.

(discussion on HackerNews)

Software update round-up

It’s high time to give some love to a few of my older and less-maintained software projects, and bring them up to date with a few much-needed and requested features!

DiskImager

DiskImager is a tool that I’ve used “internally” for a few years now to read and write raw disk images. I’ve simply never found the time to polish it up and make it production-ready, until now. This is a small standalone tool that will dump the contents of any drive connected to your PC to a file on another (larger) drive. It can also write a disk image file to a physical disk. Furthermore, when selecting a disk image to write to a physical disk, you can choose from several types of image formats (besides raw images) including VDI, VMDK, VHD, and E01.

FileSystemAnalyzer

The FileSystemAnalyzer tool has gotten a huge number of bug fixes, as well as these enhancements:

  • Improved compatibility with FAT, exFAT, NTFS, ext4, and UDF filesystems in various states of corruption.
  • Improved previews and metadata for more file types.
  • The main file tree view now has columns with file size and date, similar to Windows Explorer. These columns are clickable to sort the file list by ascending or descending order of the column type.
  • Directories can now be saved from the file tree view (recursively), in addition to individual files.

Outlook PST viewer

The PST viewer tool has been updated to be more compatible with a wider range of PST files from different versions of Outlook, and to be more forgiving of corrupted PST files. There is also a new option to save individual messages as .MSG files.

Minimalist programming, Android edition

Suppose you open Android Studio and create a new project from a template, say, a blank Activity with a button. Then build the project, and look at the APK file that is produced. You’ll notice that the APK is over 3 MB in size. These days we don’t bat an eye at these kinds of numbers, and indeed this size is pretty modest in the grand scheme of today’s software ecosystem. However, objectively speaking, 3 MB is a lot! Let’s take a deep dive into what these 3 MB actually consist of, and see how much we can reduce that size while maintaining the same functionality.

The app I’ve built for this exercise is slightly more complex than “hello world”; it’s an app that could actually be minimally useful: a simple tip calculator.

It’s literally a single input field (an EditText component) where the user enters a number, followed by a few lines of text that tell the user different percentages of that number — 15%, 18%, and 20%, which are the most common tipping percentages in the U.S.

Once again, with the default project settings generated by Android Studio, this app comes out to about 3.2 MB when it’s built. Let’s examine the generated APK file and see what’s taking up all that space:

Right away we can see that the heaviest dependency by far is the AndroidX library, followed by the Kotlin standard library and the Material library (under the com.google package). In fact the code that actually belongs to our package (com.dmitrybrant.tipcalc) is a mere 76 KB, dwarfed by the library dependencies that it’s referencing.

To be fair it’s possible to reduce the size of the APK by a good amount by using the minifyEnabled directive, which is not enabled by default. In fact it would probably optimize away most of the “kotlin” dependency and much of the “androidx” dependency. However, even with minifyEnabled our APK size would still be on the order of megabytes. For the purpose of this exercise I left minifyEnabled off, so that we can see exactly which packages are contributing to the code sizes in our APK.

In any case, let’s start whittling away at this extra weight, and see how lean we can get.

The Kotlin tax

As we can see, merely using Kotlin in our app causes the Kotlin standard library to get bundled into the APK. If we don’t want this library to get bundled, we must no longer use Kotlin. (Although I’ll repeat that if we use minifyEnabled, then Kotlin would pretty much be optimized away, so this is more of an observation than a “tax.”)

After converting the code to plain Java and rebuilding the app, our APK is now 2.7 MB:

That’s a little better! But now can we remove the bulkiest dependency, namely the androidx library?

The AndroidX tax

AndroidX is a fabulous library that ensures your app will run consistently on a huge range of different devices (but not all of them!) and different versions of the Android OS. It makes perfect sense that AndroidX is used by default for new projects, and I’m not saying that you should reject it when building your next app. Buuuut… could we actually get away with not using it? How would our app look and run without it? And would our app still run on the same range of devices?

Getting rid of AndroidX means that our app will rely solely on the SDK libraries that are part of the operating system on the user’s device itself. To get rid of AndroidX in our project, we need to do the following:

  • Our Activity can no longer inherit from AppCompatActivity, and will now simply inherit from the standard Activity class from the SDK.
  • We can no longer use fancy things like ConstraintLayout, and will be limited to using basic components like LinearLayout.
  • Our theme definitions can no longer inherit from predefined Material themes. We will need to apply any color and style overrides ourselves.

After these modifications are all done, here’s what our APK looks like:

That’s right, you’re not dreaming, the app is now 88 KB. That’s kilobytes! Now we’re getting somewhere. And if we look closely at those numbers, we see that the bulk of the size is now taken up by the resources that are bundled in the app. What are those resources, you ask?

Launcher icons

By default Android Studio generates a launcher icon for our app that takes several forms: a mipmap resource, which is a series of PNG files at different scales, which will be chosen by the launcher to match the pixel density and resolution of the device, and also a vector resource that will be used instead of the mipmap on newer devices (Android 8 and higher).

This is all very useful stuff if you need your launcher icon to appear pixel-perfect across all devices. But since our goal is minimalism, we can dispense with all of these things, and instead use just a single PNG file as our launcher icon. I created a 32×32 icon and saved it as a 4-bit PNG file, making it take up a total of 236 bytes. It doesn’t look perfect, but it gets the point across:

So, after getting rid of all that extraneous baggage, how are we looking now?

We’ve arrived at 10.5 KB! This is more like it. It may be possible to squeeze it down even further, but that would necessitate doing even more hacky and inconvenient things, such as removing all XML resources and creating layouts programmatically in our code. While I’m going for minimalism, I do still want the app to be straightforward to develop further, so I’m happy to make this a good stopping point.

This is definitely closer to the size that I would “expect” a tip calculator app for a mobile device to be. Speaking of mobile devices, which devices will this app be able to run on?

Compatibility

By default Android Studio sets our minimum SDK to 21, making our app compatible with Lollipop and above. There are plenty of good reasons to set your minimum SDK to 21, but now that we’ve removed our dependency on AndroidX, as well as our dependency on vector graphics, there’s nothing stopping us from reducing our minimum SDK even lower. How much lower? How about… 1? That’s right, we can set our minimum SDK version to 1. This would make our app compatible with literally every Android device ever made.

I don’t own any devices that actually run Android 1.0, but here is my Tip Calculator app running on the oldest device I own, a Samsung Galaxy Ace from 2011, running Android 2.3 (API 9):

And here is the same app running on my current personal phone in 2021, a Google Pixel 3 XL running Android 11 (API 30):

Takeaways

Aside from being an interesting random exercise, there’s a point I hope to convey here:

As time goes by, software seems to be getting more and more bloated. I believe this might be because developers aren’t always cognizant of the cost of the dependencies they’re using in their projects, whether it’s third-party libraries that provide some kind of convenience over standard functionality, or even the standard libraries of their chosen programming environment that the developer has gotten used to relying upon.

As with anything in life, there should be a balance here — a balance between convenience offered by libraries that might add bloat, and lower-level optimization and active reduction of bloat. However, it feels like this balance is currently not in a healthy place. The overwhelming emphasis seems to be on convenience and abstraction ad infinitum, and virtually no emphasis on stepping back and taking account of the costs that these conveniences incur.

Android is far from the worst offender in the world of bloat, and even though a 3 MB binary may be totally acceptable, it doesn’t have to be that way. Even though bulky standard libraries should be used in the majority of cases, they don’t need to be used all the time, and there may even be cases where the app would benefit from not using them. If only developers would maintain a better sense of how their dependencies are impacting the size of their apps, or indeed what dependencies they’re even using in the first place, we can begin to restore the balance of bloat in our lives.