How many boxes of documents would fit into a terabyte hard drive?

1,801 views
Skip to first unread message

L Snider

unread,
Nov 12, 2013, 12:59:15 PM11/12/13
to digital-...@googlegroups.com
Hi All,

I know this sounds like an SAT question, but I got thinking yesterday (always dangerous!)...

If a donor gave me a hard drive with a terabyte of data and all files were average Word like docs, what would that be in the paper equivalent? I am trying to figure out how many boxes/folders that would be and linear feet..

Same questions, but what if all those files were jpg photos? This latter question is tough because I know it depends on compression, size, etc.

This may sound like a weird question, but I am trying to show non digital archivists the quantity of material I work with...and I want to know myself as well.

Cheers

Lisa

Lisa Snider
Electronic Records Archivist
Harry Ransom Center
The University of Texas at Austin
P.O. Box 7219
Austin, Texas 78713-7219
P: 512-232-4616
www.hrc.utexas.edu


adam brin

unread,
Nov 12, 2013, 1:06:44 PM11/12/13
to digital-...@googlegroups.com
Hi Lisa,
 I think the answer is likely, it depends who the donor is? The average file size for word documents would be different for someone who's constantly writing books vs. someone who's a graphic designer. It reminds me a bit of this question ( http://what-if.xkcd.com/63/ ). This, if accurate is also interesting: http://www.ask.com/question/how-many-pages-of-text-is-a-terabyte (though some of this would depend on other factors too).


--
You received this message because you are subscribed to the Google Groups "Digital Curation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to digital-curati...@googlegroups.com.
To post to this group, send email to digital-...@googlegroups.com.
Visit this group at http://groups.google.com/group/digital-curation.
For more options, visit https://groups.google.com/groups/opt_out.

Simon Spero

unread,
Nov 12, 2013, 1:40:17 PM11/12/13
to digital-...@googlegroups.com

You can fit a lot of hard drives in a single Hollinger box. Compression is a bad idea even if the lid won't close.

Word documents are tricky to estimate, as they may contain different versions. 

If we assume  500 8 char words per page, we have 4000B/page

Assuming  800GB of text, that's 100 million double sided pages. 1600 sheets per linear foot gives a bit under 100 linear furlongs.

Simon

danielle plumer

unread,
Nov 12, 2013, 1:46:16 PM11/12/13
to digital-...@googlegroups.com
Adam,

Funny that you thought of the What If question ("If all digital data were stored on punch cards, how big would Google's data warehouse be?"). I immediately thought of that, too!

I also thought about the posts by Library of Congress staff over the years on the digital equivalents of the Library of Congress. I think this post, by Leslie Johnston, is the most recent: http://blogs.loc.gov/digitalpreservation/2012/03/how-many-libraries-of-congress-does-it-take/

The Wikipedia article on "Unusual units of measure" (http://en.wikipedia.org/wiki/List_of_unusual_units_of_measurement#Data_volume) referenced by Leslie defines a "Library of Congress" as:

The term Library of Congress is often used as an unusual unit of measurement to represent an impressively large quantity of data when discussing digital storage or networking technologies. It refers to the US Library of Congress. Information researchers have estimated that the entire print collections of the Library of Congress represent roughly 10 terabytes of uncompressed textual data.

So you could impress your colleagues by saying that a one terabyte hard drive holds as much information as one-tenth of the Library of Congress! Of course, that would be an incorrect statement (as Leslie and Wikipedia both go on to say, it totally ignores non-textual collections), but it is impressive to non-techies.

Danielle Cunniff Plumer
dcplumer associates

Johnston, Leslie

unread,
Nov 12, 2013, 3:20:15 PM11/12/13
to digital-...@googlegroups.com

It is one of the joys in my life to keep track of the "holds a Library of Congress worth of stuff" references.  Also explaining just what the Library of Congress collections are, what they include, and how quickly the digital (and print!) collections grow.  A lot of press inquiries get sent my way. 

 

Maybe I should start preparing another blog post. If anyone has references or stories, send them my way.  I collect them  year round from any and all sources.

 

Leslie

------------

Leslie Johnston

Chief of Repository Development

Library of Congress

les...@loc.gov

 

 

 

--

Simon Spero

unread,
Nov 12, 2013, 5:24:34 PM11/12/13
to digital-...@googlegroups.com
On Tue, Nov 12, 2013 at 3:20 PM, Johnston, Leslie <les...@loc.gov> wrote:

It is one of the joys in my life to keep track of the "holds a Library of Congress worth of stuff" references.  Also explaining just what the Library of Congress collections are, what they include, and how quickly the digital (and print!) collections grow.  A lot of press inquiries get sent my way. 

 

Maybe I should start preparing another blog post. If anyone has references or stories, send them my way.  I collect them  year round from any and all sources. 


Maybe it's time to move from   "a Library of Congress" to "a Librarian of Congress" as a unit of measure.  The obvious choice would be the number of bits that can be stored on a stack of MAM Gold DVD's whose weight when measured on the ground floor of the Jefferson Building is equal to that of the Librarian of Congress.  
This amount of information would equal one metric Billingtonne. 

I need to make some calls to Gaithersburg.  

L Snider

unread,
Nov 13, 2013, 8:44:24 AM11/13/13
to digital-...@googlegroups.com
Hi Everyone,

Thanks for the great replies! I like the LOC reference (Simon you gave me a good laugh), must use it!

Cheers

Lisa


--

Johnston, Leslie

unread,
Nov 13, 2013, 11:36:18 AM11/13/13
to digital-...@googlegroups.com

This made me laugh out loud for a long time.  I wonder how I can find out what he weighs…

 

llj

 

From: digital-...@googlegroups.com [mailto:digital-...@googlegroups.com] On Behalf Of Simon Spero
Sent: Tuesday, November 12, 2013 5:25 PM
To: digital-...@googlegroups.com
Subject: Re: [digital-curation] How many boxes of documents would fit into a terabyte hard drive?

 

On Tue, Nov 12, 2013 at 3:20 PM, Johnston, Leslie <les...@loc.gov> wrote:

--

Levy, Michael

unread,
Nov 13, 2013, 4:40:29 PM11/13/13
to digital-...@googlegroups.com
This is answering a different question, but according to Russell Seitz as reported by Robert Krulwich, the Internet weights approximately two ounces.
Reply all
Reply to author
Forward
0 new messages