Subject: [digital-curation] Digest for digital-...@googlegroups.com - 7 Messages in 1 TopicDate: March 6, 2014 at 2:23:09 AM PSTTo: Digest Recipients <digital-...@googlegroups.com>Reply-To: digital-...@googlegroups.comGroup: http://groups.google.com/group/digital-curation/topics
- Why Do We Save Disk Images [7 Updates]
Nathan Tallman <ntal...@gmail.com> Mar 05 11:03AM -0500
[Apologies for cross-posting]
Recently, when a colleague asked me why we save disk images of legacy
media, as opposed to just copying the file structure, I was hard pressed to
provide a definitive answer. After some gesticulations about a bit-level
copy of the original media and file headers, I couldn't come up with much.
Can someone please elucidate more articulately on this topic? I'd like to
provide a more cogent argument than, "because we're supposed to!"
Many Thanks,
NathanTrevor Owens <trevor.j...@gmail.com> Mar 05 11:08AM -0500
For the same reason that archivists don't just photocopy things that come
in and save the photo copies. The bit level information on the disk is the
actual digital artifact you want to preserve. When possible, it is in your
best interest to get as close to acquiring the thing itself instead of a
surrogate of the thing. For some more on this you might enjoy these two
blog posts I wrote a while back
http://blogs.loc.gov/digitalpreservation/2012/10/the-is-of-the-digital-object-and-the-is-of-the-artifact/&
http://blogs.loc.gov/digitalpreservation/2013/06/respect-des-bits-archival-theory-encounters-digital-objects-media/
Seth Shaw <seth....@gmail.com> Mar 05 11:31AM -0500
There are a few practical things at play:
1) Due to variances in the metadata & naming constraints supported by
various file systems (and operating systems) direct copying usually leads
to alteration/loss of metadata. This concern may be countered by harvesting
the filesystem metadata (e.g. run fiwalk on the disk itself) and separately
storing that metadata outside the filesystem, resulting in individual files
+ file metadata, but a disk image provides a single original
representation. (Not to mention "hidden" content.)
2) Disk images are more efficient at preserving associated representation
information (if the software is also included on the disk) resulting in
greater ease of emulation (generally speaking).
3) Disk image (as containers) are less likely to experience accidental loss
of association and, in my experience, usually easier and more efficient to
manage as a unit. E.g. accidentally deleting or moving a file from a folder
after a direct copy v. modifying a disk image to remove a file (subject, of
course to the various tools and controls put in place). Or validating
checksums for a single image v. each file. (Okay, so maybe not a huge
difference if managed correctly.)
4) Fragile media may only have a single read left in them and a disk image
is the most effective way to ensure you get everything you may need for
further analysis if necessary.
n...) probably more that I can't think of off the top of my head.
Be sure to read Kirschenbaum, Matthew, Richard Ovenden, and Gabriela
Redwine. *Digital Forensics and Born-Digital Content in Cultural Heritage
Collections*. Washington, D.C.: Council on Library and Information
Resources, 2010 for more information on this topic.
"Jackson, Andrew" <Andrew....@bl.uk> Mar 05 04:53PM
I would add that low level copying also copes better in two important cases:
1) You don’t know how to interpret the filesystem.
2) You think you know how to interpret the filesystem, or that the filesystem metadata is unimportant, but you’re wrong.
If you can make a bit-level copy, you can be reasonably sure that you won’t have to go back to the original media and try to copy it again if you made a mistake or if you aren’t able to determine/handle the partition/volume formats. It allows you to decouple the process of ‘making a safe copy of the data’ from ‘interpreting the data correctly’, so that the latter can be done at your leisure and doesn’t become a workflow bottleneck.
Best,
Andy
From: digital-...@googlegroups.com [mailto:digital-...@googlegroups.com] On Behalf Of Trevor Owens
Sent: Wednesday 05 March 2014 16:09
To: digital-...@googlegroups.com
Cc: Digital Preservation Outreach and Education Program
Subject: Re: [digital-curation] Why Do We Save Disk Images
For the same reason that archivists don't just photocopy things that come in and save the photo copies. The bit level information on the disk is the actual digital artifact you want to preserve. When possible, it is in your best interest to get as close to acquiring the thing itself instead of a surrogate of the thing. For some more on this you might enjoy these two blog posts I wrote a while back http://blogs.loc.gov/digitalpreservation/2012/10/the-is-of-the-digital-object-and-the-is-of-the-artifact/ & http://blogs.loc.gov/digitalpreservation/2013/06/respect-des-bits-archival-theory-encounters-digital-objects-media/
On Wed, Mar 5, 2014 at 11:03 AM, Nathan Tallman <ntal...@gmail.com<mailto:ntal...@gmail.com>> wrote:
[Apologies for cross-posting]
Recently, when a colleague asked me why we save disk images of legacy media, as opposed to just copying the file structure, I was hard pressed to provide a definitive answer. After some gesticulations about a bit-level copy of the original media and file headers, I couldn't come up with much.
Can someone please elucidate more articulately on this topic? I'd like to provide a more cogent argument than, "because we're supposed to!"
Many Thanks,
Nathan
--
You received this message because you are subscribed to the Google Groups "Digital Curation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to digital-curati...@googlegroups.com<mailto:digital-curati...@googlegroups.com>.
To post to this group, send email to digital-...@googlegroups.com<mailto:digital-...@googlegroups.com>.
Visit this group at http://groups.google.com/group/digital-curation.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "Digital Curation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to digital-curati...@googlegroups.com<mailto:digital-curati...@googlegroups.com>.
To post to this group, send email to digital-...@googlegroups.com<mailto:digital-...@googlegroups.com>.
Visit this group at http://groups.google.com/group/digital-curation.
For more options, visit https://groups.google.com/groups/opt_out.
******************************************************************************************************************
Experience the British Library online at www.bl.uk<http://www.bl.uk/>
The British Library’s latest Annual Report and Accounts : www.bl.uk/aboutus/annrep/index.html<http://www.bl.uk/aboutus/annrep/index.html>
Help the British Library conserve the world's knowledge. Adopt a Book. www.bl.uk/adoptabook<http://www.bl.uk/adoptabook>
The Library's St Pancras site is WiFi - enabled
*****************************************************************************************************************
The information contained in this e-mail is confidential and may be legally privileged. It is intended for the addressee(s) only. If you are not the intended recipient, please delete this e-mail and notify the postm...@bl.uk<mailto:postm...@bl.uk> : The contents of this e-mail must not be disclosed or copied without the sender's consent.
The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of the British Library. The British Library does not take any responsibility for the views of the author.
*****************************************************************************************************************
Think before you printTom Creighton <nt.cre...@gmail.com> Mar 05 10:02AM -0700
As a counter point to some points, maintaining the disk image only means
that file format deprecation might be harder to deal with. Obviously,
maintaining a disk image over time requires maintaining the software that
is capable of reading the image. That is exactly the same issue faced by
all emulation approaches. One can't assume that the software that
originally rendered the file content contained in the image is necessarily
part of the image. Not only that, but even if it is, one can't assume that
maintaining the disk image in and of itself ensures viability of that
software. In other words, software that enables reading of a particular
disk image does not necessarily support emulation of programs that are
stored on said image.
My point here is not to derail the discussion. All the points made are
valid. But there are more issues to consider. One way to think of it is
that capturing a disk image is necessary but not sufficient with respect to
long term preservation of the artifacts contained within that image.
Chris Prom <chris...@gmail.com> Mar 05 12:12PM -0600
I've enjoyed reading this thread, but my perspective here is a little bit different if the word "Save" from Nathan's original question is meant to mean "save permanently."
Let me preface this by saying that creating a disk image is an essential part workflow for capturing, appraising, and processing records, for the reasons indicated. At Illinois, we capture a disk image for all of these reasons, whenever possible, and those can be articulated to donors. So, I appreciate the value of disk imaging (in spite of some skepticism that I had a few years ago.)
However, I believe there are legitimate cases where a repository may decided to discard the disk after completing the archival 'business process' of capture, appraisal, arrange, describe, store, leading to the generation of the 'archival information packet' For example:
Case one: You have a disk image for a 3 TB hard drive that includes 20GB of files, and the remainder marked as deleted space. Why store 3 TB of nothing if your storage infrastructure cannot handle it?
Case two: A disk image contains many files that marked deleted, but recoverable using forensics tools, and where the donor did not agree to preserve the deleted files. Yes, it's nice to think about saving anddeleted files, but does your donor agree with this? If not, you may be setting yourself up for a huge breach of trust.
Case three: Related to Tom's point--A disk image where the files have been processed and migrated to a preservation format and you don't care about emulating or going back to the originals.
Case four: A disk image which contains numerous files that based on consultation with the donor and standard archival appraisal, have no continuing value
My point here is not to bash disk imaging, just to say that repositories should be very intentional in considering the role disk imaging will play in the overall digital curation program, by balancing resources, technology, and institutional capacity in a way that makes most sense for the records, donor, and user community.
Thanks,
Chris Prom
University of Illinois at Urbana-Champaign
chris...@gmail.com
L Snider <lsn...@gmail.com> Mar 05 02:23PM -0600
One word, authenticity...It helps prove what we did and how we did it (if
we also properly document all steps IMO)
Cheers
Lisa
--
Lisa Snider
Electronic Records Archivist
Harry Ransom Center
The University of Texas at Austin
P.O. Box 7219
Austin, Texas 78713-7219
P: 512-232-4616
www.hrc.utexas.edu
--
You received this message because you are subscribed to the Google Groups "Digital Curation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to digital-curati...@googlegroups.com.
To post to this group, send email to digital-...@googlegroups.com.
Visit this group at http://groups.google.com/group/digital-curation.
For more options, visit https://groups.google.com/groups/opt_out.