Why Do We Save Disk Images

147 views
Skip to first unread message

Nathan Tallman

unread,
Mar 5, 2014, 11:03:18 AM3/5/14
to digital-...@googlegroups.com, Digital Preservation Outreach and Education Program
[Apologies for cross-posting]

Recently, when a colleague asked me why we save disk images of legacy media, as opposed to just copying the file structure, I was hard pressed to provide a definitive answer. After some gesticulations about a bit-level copy of the original media and file headers, I couldn't come up with much.

Can someone please elucidate more articulately on this topic? I'd like to provide a more cogent argument than, "because we're supposed to!"

Many Thanks,
Nathan

Trevor Owens

unread,
Mar 5, 2014, 11:08:56 AM3/5/14
to digital-...@googlegroups.com, Digital Preservation Outreach and Education Program
For the same reason that archivists don't just photocopy things that come in and save the photo copies. The bit level information on the disk is the actual digital artifact you want to preserve. When possible, it is in your best interest to get as close to acquiring the thing itself instead of a surrogate of the thing. For some more on this you might enjoy these two blog posts I wrote a while back http://blogs.loc.gov/digitalpreservation/2012/10/the-is-of-the-digital-object-and-the-is-of-the-artifact/ & http://blogs.loc.gov/digitalpreservation/2013/06/respect-des-bits-archival-theory-encounters-digital-objects-media/


--
You received this message because you are subscribed to the Google Groups "Digital Curation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to digital-curati...@googlegroups.com.
To post to this group, send email to digital-...@googlegroups.com.
Visit this group at http://groups.google.com/group/digital-curation.
For more options, visit https://groups.google.com/groups/opt_out.

Seth Shaw

unread,
Mar 5, 2014, 11:31:59 AM3/5/14
to digital-...@googlegroups.com, Digital Preservation Outreach and Education Program
There are a few practical things at play:
1) Due to variances in the metadata & naming constraints supported by various file systems (and operating systems) direct copying usually leads to alteration/loss of metadata. This concern may be countered by harvesting the filesystem metadata (e.g. run fiwalk on the disk itself) and separately storing that metadata outside the filesystem, resulting in individual files + file metadata, but a disk image provides a single original representation. (Not to mention "hidden" content.)
2) Disk images are more efficient at preserving associated representation information (if the software is also included on the disk) resulting in greater ease of emulation (generally speaking).
3) Disk image (as containers) are less likely to experience accidental loss of association and, in my experience, usually easier and more efficient to manage as a unit. E.g. accidentally deleting or moving a file from a folder after a direct copy v. modifying a disk image to remove a file (subject, of course to the various tools and controls put in place). Or validating checksums for a single image v. each file. (Okay, so maybe not a huge difference if managed correctly.)
4) Fragile media may only have a single read left in them and a disk image is the most effective way to ensure you get everything you may need for further analysis if necessary.
n...) probably more that I can't think of off the top of my head.

Be sure to read Kirschenbaum, Matthew, Richard Ovenden, and Gabriela Redwine. Digital Forensics and Born-Digital Content in Cultural Heritage Collections. Washington, D.C.: Council on Library and Information Resources, 2010 for more information on this topic.

Jackson, Andrew

unread,
Mar 5, 2014, 11:53:00 AM3/5/14
to digital-...@googlegroups.com, Digital Preservation Outreach and Education Program

I would add that low level copying also copes better in two important cases:

 

1)      You don’t know how to interpret the filesystem.

2)      You think you know how to interpret the filesystem, or that the filesystem metadata is unimportant, but you’re wrong.

 

If you can make a bit-level copy, you can be reasonably sure that you won’t have to go back to the original media and try to copy it again if you made a mistake or if you aren’t able to determine/handle the partition/volume formats. It allows you to decouple the process of ‘making a safe copy of the data’ from ‘interpreting the data correctly’, so that the latter can be done at your leisure and doesn’t become a workflow bottleneck.

 

Best,

Andy


 
******************************************************************************************************************
Experience the British Library online at www.bl.uk
The British Library’s latest Annual Report and Accounts : www.bl.uk/aboutus/annrep/index.html
Help the British Library conserve the world's knowledge. Adopt a Book. www.bl.uk/adoptabook
The Library's St Pancras site is WiFi - enabled
*****************************************************************************************************************
The information contained in this e-mail is confidential and may be legally privileged. It is intended for the addressee(s) only. If you are not the intended recipient, please delete this e-mail and notify the postm...@bl.uk : The contents of this e-mail must not be disclosed or copied without the sender's consent.
The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of the British Library. The British Library does not take any responsibility for the views of the author.
*****************************************************************************************************************
Think before you print

Tom Creighton

unread,
Mar 5, 2014, 12:02:41 PM3/5/14
to digital-...@googlegroups.com
As a counter point to some points, maintaining the disk image only means that file format deprecation might be harder to deal with.  Obviously, maintaining a disk image over time requires maintaining the software that is capable of reading the image.  That is exactly the same issue faced by all emulation approaches.  One can't assume that the software that originally rendered the file content contained in the image is necessarily part of the image.  Not only that, but even if it is, one can't assume that maintaining the disk image in and of itself ensures viability of that software.  In other words, software that enables reading of a particular disk image does not necessarily support emulation of programs that are stored on said image.

My point here is not to derail the discussion.  All the points made are valid.  But there are more issues to consider.  One way to think of it is that capturing a disk image is necessary but not sufficient with respect to long term preservation of the artifacts contained within that image.

Chris Prom

unread,
Mar 5, 2014, 1:12:40 PM3/5/14
to digital-...@googlegroups.com
I've enjoyed reading this thread, but my perspective here is a little bit different if the word "Save" from Nathan's original question is meant to mean "save permanently."

Let me preface this by saying that creating a disk image is an essential part workflow for capturing, appraising, and processing records, for the reasons indicated.   At Illinois, we capture a disk image for all of these reasons, whenever possible, and those can be articulated to donors.  So, I appreciate the value of disk imaging (in spite of some skepticism that I had a few years ago.)

However, I believe there are legitimate cases where a repository may decided to discard the disk after completing the archival 'business process' of capture, appraisal, arrange, describe, store, leading to the generation of the 'archival information packet'  For example:

  • Case one: You have a disk image for a 3 TB hard drive that includes 20GB of files, and the remainder marked as deleted space.  Why store 3 TB of nothing if your storage infrastructure cannot handle it?
  • Case two: A disk image contains many files that marked deleted, but recoverable using forensics tools, and where the donor did not agree to preserve the deleted files.  Yes, it's nice to think about saving anddeleted files, but does your donor agree with this?  If not, you may be setting yourself up for a huge breach of trust.
  • Case three: Related to Tom's point--A disk image where the files have been processed and migrated to a preservation format and you don't care about emulating or going back to the originals.
  • Case four: A disk image which contains numerous files that based on consultation with the donor and standard archival appraisal, have no continuing value

My point here is not to bash disk imaging, just to say that repositories should be very intentional in considering the role disk imaging will play in the overall digital curation program, by balancing resources, technology, and institutional capacity in a way that makes most sense for the records, donor, and user community.

Thanks,

Chris Prom
University of Illinois at Urbana-Champaign


L Snider

unread,
Mar 5, 2014, 3:23:33 PM3/5/14
to digital-...@googlegroups.com
One word, authenticity...It helps prove what we did and how we did it (if we also properly document all steps IMO)

Cheers

Lisa

-- 
Lisa Snider
Electronic Records Archivist
Harry Ransom Center
The University of Texas at Austin
P.O. Box 7219
Austin, Texas 78713-7219
P: 512-232-4616
www.hrc.utexas.edu

Nathan Tallman

unread,
Mar 6, 2014, 9:20:36 AM3/6/14
to digital-...@googlegroups.com, Digital Preservation Outreach and Education Program
Thank you, everyone, for the on and off list replies. They are very helpful and more articulate than I could have been.

Nathan


On Wed, Mar 5, 2014 at 11:03 AM, Nathan Tallman <ntal...@gmail.com> wrote:

Porter Olsen

unread,
Mar 6, 2014, 10:16:09 AM3/6/14
to digital-...@googlegroups.com
I just want to second what Chris says here. Obviously as part of the BitCurator team I feel disk imaging is an important tool in long-term preservation of born-digital content. But as Chris points out, there may be cases where keeping the disk image doesn't make sense. I will say, however, that working with disk images as you make the evaluations Chris outlines is a much safer practice than working with the original media. So even if you don't plan on preserving the disk image itself, you may find disk imaging an important part of your workflow.

Porter
Reply all
Reply to author
Forward
0 new messages