It’s that semiannual time when I revise my E-Discovery Workbook in advance of the Georgetown Law Center eDiscovery Training Academy.  That means foregoing sunny Spring days in The Big Easy to pore over 500 pages of content and exercises to make them as durable and endurable as I can.  More-and-more, I find I’m adding historical perspectives.  It’s a fair criticism that, with so much to cover, I should restrict my focus to contemporary technologies and leave the trips down memory lane to my dotage.

I can’t help myself.  Though we’ve come far and fast, the information technologies of my youth are lurking just beneath the slick surfaces of the latest big thing.  The punch card storage and tabulation technologies Herman Hollerith (1860-1929) used to revolutionize the 1890 U.S. census are just a hair’s breadth behind the IBM card technologies that dominated data processing for much of the 20th century and cousin to the oily, yellow perforated paper tape that Bill Gates and I used on opposite coasts to learn to program mainframe computers via a teletype terminal in the 1970s.  The encoding schemes of that obsolete media differ from those we use today principally in speed and scale.  The binary fundamentals are still…fundamental, and connect our toil in e-discovery and computer forensics to the likes of Charles Babbage, Alan Turing, Ada Lovelace, John von Neumann, Robert Noyce and both Steves (Wozniak and Jobs).

In the space of one generation, we have come very far indeed.

The IBM punched cards that dominated digital storage for most of the twentieth century held 80 columns of 12 punch positions or 960 bits. Nominally, that’s 120 bytes, but because eight columns weren’t always used for data storage, the storage capacity was closer to 864 bits or 108 bytes–and not that much in fact, because each column was typically dedicated to just one 7- or 8-bit ASCII character, so the practical capacity of a punch card was 80 characters/80 bytes or less.[1]

Using the 108 byte value, the formatted “1.44mb” 3.5 inch floppy disks commonly used from the mid-1980s to early 2000s held 1,474,560 bytes, so a floppy disk could store the same amount of data as about 13,653 IBM cards, i.e., seven 2,000 card boxes of cards or, at 143 cards to the inch, an eight foot stack.  That’s a common ceiling height and taller than anyone who ever played for the NBA.

The prim Eisenhower-era programmer in the photo below is steadying 62,500 punched cards said to hold the five megabytes of program instructions for the massive SAGE (Semi-Automatic Ground Environment) military computing network (an 80 byte capacity for each card).

Five megabytes happened to be a megabyte larger than the capacity of the first (and hence, the largest) commercial hard drive of the era.  Introduced in 1956, the IBM 350 Disk Storage Unit pictured (child not included) was 60 inches long, 68 inches high and 29 inches deep (so it could fit through a door). Called the RAMAC (for Random Access Method of Accounting and Control), it held fifty 24” magnetic disks of 50,000 sectors, each storing 100 alphanumeric (7-bit) characters. Thus, it held about 3.75 megabytes, or one or two cellphone snapshots today. It weighed a ton (literally), and users paid $3,200.00 per month to rent it. That’s about $30,000.00 now.

Fast forward to today’s capacious hard drives, a fifty dollar terabyte drive holds 1,099,511,627,776 bytes.  That’s over ten billion IBM cards (10,180,663,220 to be precise).  Now, our stack of cards is 1,123 miles, or roughly the driving distance between Washington, D.C. and New Orleans.

So, the 30TB (compressed) capacity of an LTO-8 backup tape cartridge starts to equal something like 305 billion IBM cards—a stack spanning 33,709 miles that would handily circle the globe at the Equator.

These are hypothetical extrapolations, not real-world metrics because much storage capacity is lost to file system overhead.  If you used a warehouse for physical storage, you’d need to sacrifice space for shelving and aisles, and you’d likely find that not everything you store perfectly fits wall-to-wall and floor-to-ceiling.  Similarly, digital storage sacrifices capacity to file tables and wastes space by using fixed cluster sizes.  If a file is smaller than the clusters allocated to its storage, then the bytes between the end of the file and the end of the cluster is wasted “slack space.”

[1] After hours researching the capacity question, I couldn’t arrive at a definitive answer because capacity varied according, inter alia, to the type of information being stored (binary versus ASCII) and a reluctance to punch out too many adjacent perforations lest it become a “lace card” too fragile to use.