Reverse-engineering the QICStream tape backup format

TLDR: I developed an open-source tool to read tape backup images that were made using the QICStream tool, and extract the original files from them.

During a recent data recovery contract, I needed to recover files from some old QIC tapes. However, after reading the raw data from the tapes, I couldn’t recognize the format in which the backup was encoded, and none of the usual software I use to read the backups seemed to be compatible with it.

After briefly examining the backup in a hex editor, it was evident that the backup was fortunately not compressed or encrypted, and there were signs that the backup was made using a tool called QICStream.  There doesn’t seem to be any documentation regarding this utility (or the format of the backup it saves) on the web. It’s easy enough to find the tool itself on ancient DOS download sites, and it may have been an interesting project to create an emulated DOS environment where the QICStream tool reads the backup from an emulated tape media, but it turned out to be much easier to reverse-engineer the backup structure and decode the files from the actual raw data.

The binary format of the backup is very simple, once you realize one important thing:  every block of 0x8000 bytes ends with 0x402 bytes of extra data (that’s right, 0x402 bytes, not 0x400). In other words, for every successive block of 0x8000 bytes, only the first 0x7BFE bytes are useful data, and the last 0x402 bytes are some kind of additional data, possibly for parity checking or some other form of error-correcting logic. (I did not reverse-engineer the true purpose of these bytes; they did not turn out to be important in the end.)

Other than that, the format is very straightforward, and basically consists of a sequence of files and directories arranged one after the other, with a short header in front of each file, and “control codes” that determine whether to descend into a subdirectory or to navigate back out of it.

Anyway, I put all of these findings into a small open-source utility that we can now use to extract the original files from QICStream backups. Feel free to look through my code for additional details of the structure of the file headers, and how the large-scale structure of the backup is handled.

When you need some perspective

Whenever I get too caught up in reading the news, or sucked into hopelessly unproductive political discussions, I like to relax with some images from the Hubble Space Telescope and similar telescopic wonders, which give us a glimpse into the wider universe outside the minuscule mote of dust on which we live.

Take, for example, the Ultra Deep Field image, which contains roughly ten thousand galaxies within an area of the sky that is about 2×2 arcminutes. Do you know how small a portion of the sky that is? If you take a 1×1 millimeter square, hold it at arm’s length, and peer through it, that’s the area of the sky captured in the Ultra Deep Field.  The UDF was taken in a deliberately ordinary and uninteresting portion of the sky, which implies that for every 1x1mm portion of the sky, we could get a similarly breathtaking image of thousands of galaxies. Nearly every bright spot in the image is a galaxy, and some of the galaxies are over 12 billion years old.

How about some images that are impressive not in their depth, but in their resolution? Here is the sharpest ever image of the Andromeda galaxy (a composite of numerous other images), where we can zoom in to see literally every individual star in a galaxy outside of our own. If you look closely, you can also find globular clusters and nebulas, all in a different galaxy. There is a similarly high-resolution image of the Triangulum galaxy, as well as a super high-resolution image of the Orion nebula.

Or how about some images of gravitational lenses, where gargantuan clusters of galaxies create such strong curvature in spacetime that the light from more distant galaxies bends around them and becomes distorted, or even splits into multiple images of the same galaxies in different spots.

And of course there’s the recent image of an actual black hole (or rather an image of matter accreting around it) at the center of Messier 87, an extremely massive galaxy with a central black hole of 5 billion solar masses, making it easier to observe than the black hole in our own galaxy.

All of this provides a pleasant counterbalance to our daily political bickering, which all seems laughably “local” by contrast.  To channel Christopher Hitchens for a moment, take some time to let these awe-inspiring images sink in, and then compare them to the story of Moses and his “burning bush.”

Digital hoarding

I have a confession to make: I’m a hoarder. Not a hoarder of material possessions — oh no, my house is almost entirely free of unnecessary stuff. I take pride in actively reducing the amount of physical crap that I own, and donate items I no longer need. My family and I have even made a gift-giving agreement among ourselves where any gifts must either be consumable (specialty foods, restaurant gift cards, etc) or experiences (tickets to a show, subscription to a service, etc).

My hoarding, on the other hand, is of the digital variety. My “collection” only spans a few external hard drives of 1 TB each, occasionally backed up or synced to a duplicate set of external hard drives. The physical space occupied by these drives is less than one cubic foot, but the vastness of the digital stuff that they contain is… considerable. Just to give you a rough idea of what I’m dealing with:

  • Archives of old emails and correspondence from previous jobs and contracts.
  • Archives of Instant Messenger conversations with old friends and ex-girlfriends, dating back to 1997.
  • A collection of viruses and trojans for MS-DOS from the 80s and 90s, originally for research purposes.
  • A huge library of shareware games and programs from the MS-DOS era.
  • An archive of articles, papers, and textbooks (in PDF form) relating to computer science, mathematics, and physics.
  • An archive of high-resolution NASA imagery of planets, nebulas, and galaxies.
  • An extensive library of file formats (i.e. sample files saved by all kinds of different software) and documentation for every file format specification.
  • Emulators and system images of virtually every computer system and game console ever built.
  • My complete genome, which I’ve had sequenced a few years ago.
  • And of course, my personal photo and video library, from my birth to the present day, and also photos from my parents’ and grandparents’ old albums that I have digitized and saved.

So yeah… recently I’ve been asking myself whether digital hoarding is a problem of the same magnitude as physical hoarding. On the surface, one might ask “What’s the problem?” These things aren’t taking up any physical space, and you don’t have to give them another thought after you save them to the disk. And yet, perhaps there is an emotional toll that comes with the mere knowledge that all of this old data still exists, and remains your responsibility. If anything, this is surely at odds with my attitude towards physical possessions, which is quite minimalist.

A basic litmus test for hoarding behavior consists of a simple question: How would you feel about throwing away any random item that you see around you? Will you use this twisty-tie for anything? How about this pen cap? Do you really need three different cheese graters? How about this pile of old magazines? A hoarder will answer these questions with something like, “You never know when it will come in handy.” And if I’m being honest, that’s exactly how I feel about all the digital items I listed above. I feel the same hesitation about deleting any of them as a “physical” hoarder might feel about donating old clothes, or throwing away expired spices from the pantry.

When I look at my enormous pile of digital junk, I see in myself all the symptoms of real hoarding, albeit confined to the digital realm, which is likely why it’s been able to go on for so long. I’m also reminded of how freeing and cathartic it feels to let go of unnecessary possessions, and I theorize that a similar feeling of freedom will result from permanently letting go of digital baggage.

Therefore, I have resolved to stop being hypocritical in this regard, and start practicing digitally what I practice physically.  Many of the items I mentioned in my list are actually replaceable (easily found on the web, or generated with minimal effort).  Some of the items are technically “irreplaceable,” such as my old emails and IM archives, but represent unnecessary cognitive and emotional baggage, and have no real nostalgic value.  The only things that seem to have actual meaning, and are objectively worth keeping, are my personal photos and videos, and even those can probably be trimmed down a bit.

It’s time to let the past go, and embrace the future without anything weighing you down. Let the cleaning begin.

A quick utility for SQLite forensics

When performing forensics on SQLite database files, it’s simple enough to browse through the database directly using a tool like sqlitebrowser, which provides a nice visual interface for exploring the data. However, I’d like to create a tool that goes one step further:  a tool that shows the contents of unallocated or freed blocks within the database, so that it’s possible to see data from rows that once existed, but were later deleted (this can be used, for example, in recovering deleted text messages from an Android device, which usually stores SMS messages in a .sqlite file).

This new utility, which I’ll tentatively call SqliteCarve, represents the minimum solution for accomplishing this task:  it loads a SQLite file, and parses its pages and B-tree structure. While doing this, it detects the portions of the structure that contain unallocated bytes. It then reads these bytes and parses any strings from them.

The tool presents all the strings found in the unallocated space visually, with a quick way to search for keywords within the strings:

A couple of TODOs for this utility:

  • Support strings encoded with UTF-16 and UTF-16BE, in addition to the default UTF-8.
  • Make better inferences about the type of content present in the unallocated areas, to be able to extract strings more precisely.

New tool for analysis of Outlook PST files

I’ve been slowly working towards a utility to analyze .PST files from Microsoft Outlook and Exchange, and examine their contents. A .PST file is the database in which Outlook stores your email locally on your PC. When recovering data from your own PC, or when performing forensic analysis of another PC, it’s often useful to view the contents of .PST files, thereby viewing sent, received, and deleted emails.

OutlookMailViewer (download it!) allows you to open a .PST file (without requiring Outlook to be installed) and examine its contents in an intuitive way, very similarly to the way Outlook itself displays your email. This tool is entirely read-only, meaning that you can be sure that the .PST file won’t be modified in any way.

This software is very much experimental/alpha, and needs a bit more work to be as powerful as possible, but it can still be quite useful as it is:

  • Supports .PST files from nearly all previous versions of Outlook, as well as the latest Outlook 2016 (supports ANSI and Unicode .PST files).
  • Displays plain-text, HTML, and RTF versions of email messages.
  • Displays absolutely all properties associated with each email message (more properties than Outlook itself shows).
  • Allows saving of attachments from messages.

Some to-do items for a future version:

  • Scan the .PST file for orphan messages (i.e. messages that still exist after being emptied from Deleted Items, but before the database is compacted).
  • Filtering and searching of messages.
  • Exporting messages in different formats.

Try it out! If you’re using the current Outlook 2016, you can usually find your .PST file in [My Documents]\Outlook Files\[your account].pst.