Blog

Where did we go wrong?

I started programming seriously in the late 1990s, when the concept of “visual” IDEs was really starting to take shape. In one of my first jobs I was fortunate enough to work with Borland Delphi, as well as Borland C++Builder, creating desktop applications for Windows 95.  At that time I did not yet appreciate how ahead of their time these tools really were, but boy oh boy, it’s a striking contrast with the IDEs that we use today.

Take a look: I double-click the icon to launch Delphi, and it launches in a fraction of a second:

But it also does something else: it automatically starts a new project, and takes me directly to the workflow of designing my window (or “Form” in Borland terms), and writing my code that will handle events that come from the components in the window. At any time I can click the “Run” button, which will compile and run my program (again, in a fraction of a second).

Think about this for a bit. The entire workflow, from zero to building a working Windows application, is literally less than a second, and literally two clicks away.  In today’s world, in the year 2020, this is unheard of.  Show me a development environment today that can boast this level of friendliness and efficiency!

The world of software seems to be regressing: our hardware has been getting faster and faster, and our storage capacity larger and larger, and yet our software has been getting… slower. Think about it another way: if we suppose that our hardware has gotten faster by two orders of magnitude over the last 20 years, and we observe that our software is noticeably slower than it was 20 years ago, then our software has gotten slower by two orders of magnitude! Is this… acceptable? What on earth is going on?

Laziness

Engineers like to reuse and build upon existing solutions, and I totally understand the impulse to take an existing tool and repurpose it in a clever way, making it do something for which it wasn’t originally intended. But what we often fail to take into account is the cost of repurposing existing tools, and all the baggage, in terms of performance and size, that they bring along and force us to inherit.

Case in point: suppose that the only language you know is JavaScript, and suppose that you wanted to start building desktop applications, but didn’t want to learn the languages and tools normally associated with desktop development, e.g. C++, C#, etc. What can you do? Well, one option would be to build a compiler from scratch, which would actually compile JavaScript into native machine code. But that would be hard. How about a simpler solution: take a full-blown web browser, and literally bundle it as the engine that will run your desktop app, with the logic of your app being in JavaScript, and the “window” of your app becoming a web page that is run by the bundled browser! This is, of course, the idea behind Electron, an alarmingly popular framework for building desktop apps today.

But what about the cost of using Electron? What is the cost of bundling all of Chromium just to make your crappy desktop app appear on the screen? Just to take an example, let’s look at an app called Etcher, which is a tool for writing disk images onto a USB drive. (Etcher is actually recommended by the Raspberry Pi documentation for copying the operating system onto an SD card.)

We know how large these types of tools are “supposed” to be (i.e. tools that write disk images to USB drives), because there are other tools that do the same thing, namely Rufus and Universal USB Installer, both of which are less than 2 MB in size, and ship as a single executable with no dependencies. And how large is Etcher by comparison? Well, the downloadable installer is 130 MB, and the final install folder weighs in at… 250 MB. There’s your two-orders-of-magnitude regression! Looking inside the install folder of Etcher is just gut-wrenching:

Why is there a DLL for both OpenGL and DirectX in there? Apparently we need a GPU to render a simple window for our app. The “balenaEtcher” executable is nearly 100 MB itself. But do you see that “resources” folder? That’s another 110 MB! And do you see the “locales” folder? You might think that those are different language translations of the text used in the app. Nope — it’s different language translations of Chromium. None of it is used by the app itself. And it’s another 5 MB. And of course when Etcher is running it uses 250+ MB of RAM, and a nonzero amount of CPU time while idle. What is it doing?!

As engineers, this is the kind of thing that should make our skin crawl. So why are we letting this happen? Why are we letting software get bloated beyond all limits and rationalize it by assuming that our hardware will make up for the deficiencies of our software?

The web

The bloat that has been permeating the modern web is another story entirely. At the time of this writing, the New York Times website loads nearly 10 MB of data on a fresh load, spread over 110 requests. This is quite typical of today’s news websites, to the point where we don’t really bat an eye at these numbers, when in fact we should be appalled. If you look at the “source” of these web pages, it’s tiny bits of actual content buried in a sea of <script> tags that are doing… something? Fuck if I know.

The bloat seen on the web, by the way, is being driven by more nefarious forces than sheer laziness. In addition to building a website using your favorite unnecessaryframework” that you can choose willy-nilly (which varies with every web developer you ask, and then has to be hosted on a separate CDN because your web server can’t handle the load), you also have to integrate analytics packages into your website, as requested by your marketing department, and another analytics package requested by your user research team, and another analytics package requested by your design team, etc. If one of the analytics tools goes out of fashion, leave the old code in! Who knows, we might need to switch back to it someday. It doesn’t seem to be impacting load speeds… much… on my latest MacBook Pro. The users won’t even notice.

And of course, ads. Ads everywhere. Ads that are basically free to load whatever arbitrary code they like, and are totally out of the control of the developer. Oh, you say the users are starting to use ad blockers? Let’s add more code that detects ad blockers and forces users to disable them!

The web, in other words, has become a dumpster fire. It’s a dumpster fire of epic proportions, and it’s not getting better.

What to do?

What we need is for more engineers to start looking at the bigger picture, start thinking about the long term, and not be blinded by the novelty of the latest contraption without understanding its costs. Hear me out for a second:

  • Not everything needs to be a “framework” or “library.” Not everything needs to be abstracted for all possible use cases you can dream of. If you need code to do something specific, sometimes it’s OK to borrow and paste just the code you need from another source, or god forbid, write the code yourself, rather than depending on a new framework. Yes, you can technically use a car compactor to crack a walnut, but a traditional nutcracker will do just fine.
  • Something that is clever isn’t necessarily scalable or sustainable. I already gave the example of Electron above, but another good example is node.js, whose package management system is a minor dumpster fire of its own, and whose dependency cache is the butt of actual jokes.
  • Sometimes software needs to be built from scratch, instead of built on top of libraries and frameworks that are already bloated and rotting. Building something from the ground up shouldn’t be intimidating to you, because you’re an engineer, capable of great deeds.
  • Of course, something that is new and shiny isn’t necessarily better, either. In fact, “new” things are often created by fresh and eager engineers who might not have the experience of developing a product that stands the test of time. Treat such things with a healthy bit of skepticism, and hold them to the same high standards as we hold mature products.
  • Start calling out software that is bad, and don’t use it until it’s better. As an engineer you can tell when your fellow engineers can do a better job, so why not encourage them to do better?
  • Learn to say No! When the newest JavaScript framework starts making its rounds, or when the latest “cross-platform” app development framework is unveiled, or when everyone starts talking about microservices, it’s OK to say “No!” “No, thank you!” “Not until we understand how this will be beneficial to us five years from now.” “Not until we understand the costs, in terms of space, performance, and sanity, of adopting this new thing.”

I suppose that with this rant I’m adding my voice to a growing number of voices that have similarly identified the problem and laid it out in even greater detail and eloquence than I have. I wish that more developers would write rants like this. I wish that this was required training at universities. I have a sinking feeling, however, that these rants are falling on deaf ears, which is why I’ll add one more suggestion that we, as engineers, can do to raise awareness of the issue:

Educate regular users about how great software can be. Tell your parents, your friends, your classmates, that a web page shouldn’t actually need ten seconds to load fully. Or that they shouldn’t need to purchase a new generation of laptop every two years, just to keep up with how huge and slow the software is becoming. Or that the software they install could be one tenth of its size, freeing up that much more space for their photos or documents, for example. That way, regular users can be as fed up as we are about the current state of software, and finally start demanding us to do better.

Home security with Raspberry Pi

The versatility of the Raspberry Pi seems to know no bounds. For a while I’ve been wanting to set up a DIY home security system in my house, and it turns out that the Raspberry Pi is the perfect choice for this task, and more. (The desire for a security system isn’t because we live in a particularly unsafe neighborhood or anything like that, but just because it’s an interesting technical challenge that provides a little extra peace of mind in the end.)

Camera integration

I began with a couple of IP cameras, namely the Anpviz Bullet 5MP cameras, which I mounted on the outside of the house, next to the front door and side door.  The cameras use PoE (power over Ethernet), so I only needed to route an Ethernet cable from the cameras to my PoE-capable switch sitting in a closet in the basement.

At first I assumed that I would need to configure my Raspberry Pi (3) to subscribe to the video streams from the two cameras, do the motion detection on each one, re-encode the video onto disk, and then upload the video to cloud storage.  And in fact this is how the first iteration of my setup worked, using the free MotionEye software.  However, the whole thing was very sluggish, since the RPi doesn’t quite have the horsepower to be doing decoding, encoding, and motion detection of multiple streams at once (and I didn’t want to compromise by decreasing the video quality coming from the cameras), so my final output video was less than 1 frame per second, with my RPi running at full load and getting quite warm. Definitely not a sustainable solution.

But then I realized that a much simpler solution is possible. The Anpviz cameras are actually pretty versatile themselves, and can perform their own motion detection. Furthermore, they can write the video stream directly onto a shared NFS folder!  Therefore, all I need to do is set up the RPi to be an NFS server, and direct the cameras to write to the NFS share whenever motion is detected.

And that’s exactly what I did, with a little twist:  I attached two 16 GB USB flash drives to the RPi, with each USB drive becoming an NFS share for each respective camera. That way I’ll get the maximum throughput of data from the cameras directly to USB storage. With this completed setup, the Raspberry Pi barely reaches 1% CPU load, and stays completely cool.

I wrote a Python script that runs continuously in the background and checks for any new video files being written onto the USB drives. If it detects a new file, it automatically uploads it to my Google Drive account, using the Google Drive API which turned out to be fairly easy to work with, once I got the hang of it. The script automatically creates subfolders in Google Drive corresponding to the current day of the week, and which camera the video is from. It also automatically purges videos that are more than a week old.

I have to heap some more praise onto the cameras for supporting H.265 encoding, which compresses the video files very nicely. All in all, with the amount of motion that is typical on a given day, I’m averaging about 1 GB per day of video being recorded (at 1080p resolution!), which makes 7 GB in a rolling week’s worth of video, which is small enough to fit comfortably in my free Google Drive account, without needing to upgrade to a paid tier of storage.

Water sensor

Since my Raspberry Pi still had nearly all of its processing power still left over, I decided to give it some more responsibility.

About a month ago the sewer drain in the house became clogged, which caused it to back up and spill out into the basement.  Fortunately I was in the basement while this was happening and caught it before it could do much more damage. An emergency plumber was called, and the drain was snaked successfully (turned out to be old tree roots).  However, from now on I wanted to be warned immediately in case this kind of thing happens again.

So I built a very simple water sensor and connected it to the Raspberry Pi.  In fact “very simple” is an understatement: the sensor is literally two wires, close together, which will short out if they come into contact with water.  I used some very cheap speaker wire, and routed it from the RPi to the drain from where the water can potentially spill out.

On the Raspberry Pi, one wire is connected to ground, and the other is connected to a GPIO pin with a pull-up resistor enabled. This means that if the wires are shorted out, the GPIO input will go from HIGH to LOW, and this will be an indication that water is present. The sensor is being monitored by the same Python script that monitors and uploads the camera footage, and will automatically send me an email when the sensor is triggered.

For good measure, I installed a second water sensor next to our hot water tank, since these have also been known to fail and leak at the most inconvenient times.

And that’s all for now. The Raspberry Pi still has plenty of GPIO pins left over, so I’ll be able to expand it with additional sensors and other devices in the future.

Notes

Here are just a few random notes related to getting this kind of system up and running:

Enable shared NFS folder(s)

Install the necessary NFS components:
$ sudo apt-get install nfs-kernel-server portmap nfs-common
Add one or more lines to the file /etc/exports:
/folder/path_to_share *(rw,all_squash,insecure,async,no_subtree_check,anonuid=1000,anongid=1000)
And then run the following:
$ sudo exportfs -ra
For good measure, restart the NFS service:
$ sudo /etc/init.d/nfs-kernel-server restart

Run script(s) on startup

  • Add line(s) to /etc/rc.local
  • If it’s a long-running script, or continuously-running, then make sure to put an ampersand at the end of the line, so that the boot process can continue.

Automatically mount USB drive(s) on boot

When the Raspberry Pi is configured to boot into the desktop GUI, it will auto-mount USB drives, mounting them into the /media/pi directory, with the mount points named after the volume label of the drive. However, if the Pi is configured to boot into the console only (not desktop), then it will not auto-mount USB drives, and they will need to be added to /etc/fstab:
/dev/sda1 /media/mount_path vfat defaults,auto,users,rw,nofail,umask=000 0 0

(The umask=000 parameter enables write access to the entire disk.)

Set the network interface to a static IP

Edit the file /etc/dhcpcd.conf. The file contains commented-out example lines for setting a static IP, gateway, DNS server, etc.

And lastly, here are a couple of Gists for sending an email from within Python, and uploading files to a specific folder on Google Drive.

Proper wiping of free space on your Android device

A while ago I introduced a feature into DiskDigger which wipes the free space on your Android device. This is useful for ensuring that no further data could be recovered from the device, even using tools like DiskDigger.

However, some users have been reporting that the wipe feature doesn’t seem to be working properly: certain bits of data still seem to be recoverable even after wiping free space. This is partially my fault for not sufficiently clarifying how to use the feature most effectively, and partially Android’s fault for making its storage and security system exceedingly complex.

Let me clarify a few things about how this feature works, and emphasize some precautions you should take before proceeding with the wiping.

  • In order for the wipe to work correctly, the files that you need to get wiped must be deleted from the filesystem. The wipe can only cover the free (unallocated) space on the device’s storage. If the files still “exist” in the filesystem, they will not be wiped.
  • In addition to deleting files you would like to be wiped, recall that many apps maintain a local cache of data that the app deems necessary to store. This could include things like thumbnails of photos (or even full-resolution copies of photos), draft copies of documents, logs of conversations and activity, etc. To delete all of these things, you would need to go to Settings -> Apps, and for each app, go into “Storage” and tap “Clear cache.”
  • Even worse, some apps break the rules and store data outside of their designated cache folder. Therefore you may even need to tap the “Clear storage” button for each app, instead of just “Clear cache,” or even use a file-manager app to navigate your device’s internal storage and manually delete unwanted files, and only afterwards go back to DiskDigger and wipe free space.

Of course the best way to make sure the data on your device is no longer recoverable is to perform a factory reset of the device, and then use DiskDigger to wipe the free space. But even then, certain portions of the data could be recoverable, depending on the manufacturer’s definition of “factory reset,” or how the given version of Android handles resetting.

All of this is to say, there is unfortunately no single guaranteed fool-proof way to permanently wipe your Android device, but DiskDigger’s “Wipe free space” function goes a long way if used properly.

NTFS compression minutiae

It’s been a rather productive sprint of making enhancements to DiskDigger in recent weeks, specifically in solidifying its support for parsing NTFS filesystems, and supporting the finer details and edge cases that are found in the wild.

Firstly, individual files in NTFS can be compressed, which the user can choose to do to conserve disk space. This is done by taking the file data and compressing it using a variant of LZ77. (DiskDigger has been able to recover compressed files, but I have now improved the speed of the decompression routine.)

However, there is a second way that files in NTFS can be compressed. Starting with Windows 10, there is a system process that runs in the background and looks for system files and program files that are seldom used, and compresses them in a different way: it creates an alternate data stream called WofCompressedData and writes the compressed data to it, using either the Xpress or LZX algorithms (Xpress seems to be the default). The original data stream is turned into a sparse stream, and the file gets a reparse point added to it. This is a bit confusing because these types of files are not shown as “compressed” files in Windows Explorer, and yet they are compressed at the level of the filesystem. You can see which files are compressed this way by running the compact command at the command line.

Anyway, DiskDigger now supports this second type of compression in NTFS filesystems, and will transparently decompress the file when it’s recovered. Of course this also means that these types of files can only be recovered at the filesystem level (“Dig Deep” mode) and cannot be carved heuristically (in “Dig Deeper” mode).

As a side note, various different types of compression are available via Windows APIs or the .NET framework itself:

  • .NET offers the System.IO.Compression namespace which has DeflateStream and GZipStream, which implement the DEFLATE algorithm (using zlib under the hood), and also ZipArchive for dealing with actual Zip files. Many file formats use DEFLATE compression, but it doesn’t help us for compressed NTFS files.
  • Windows itself provides the little-known Compression API (via cabinet.dll) which lets us use the Xpress, LZMS, and MSZIP algorithms. This API can be easily P/Invoked from .NET, but unfortunately this means it can’t be used outside of Windows. This API fulfills half of the requirements for decompressing WofCompressedData streams, which use Xpress compression by default.

However, the above APIs are the extent of what’s offered to us by the system, which means that algorithms like LZ77 and LZX must be implemented manually, or using other third-party libraries. I opted for the former in DiskDigger, to keep my number of dependencies to a minimum.

Euler walks

This is the first 10000 digits of e (the base of the natural logarithm), as interpreted by a spiral walk determined by each successive digit:

Zoom:

And here is a similar interpretation for γ (the Euler-Mascheroni constant):

Zoom: