We've created a disk image, now what?

418 views
Skip to first unread message

Carol Kussmann

unread,
Aug 6, 2014, 3:26:15 PM8/6/14
to bitcurat...@googlegroups.com
Hello,

We are exploring the idea of creating disk images as part of our archival work-flow.  We have made disk images, run reports, and played with the other programs available via the BitCurator environment. 

I know that there are many different ways to work with disk images (the reasons you do what you do with them) and was hoping you could share your experiences with our group. 

Specific questions I have include:
- What do you do with the disk images after you create them?  Do you save them in dark storage?  Make them accessible (to who)?  Discard them?  And why?

- What do you do with the reports BitCurator creates?  What do you find most useful?

- In the end, what do you make available to users?  Individual files/folders? The disk image?  How?


I know these questions are not simple ones, we are just looking for some guidance.  Anything helps. 

Best,
Carol




--


Carol Kussmann
Digital Preservation Analyst
Digital Preservation and Repository Technologies | University of Minnesota Libraries
499 Wilson Library, 309 19th Avenue South, Minneapolis, MN 55455

Matthew Kirschenbaum

unread,
Aug 7, 2014, 9:18:53 AM8/7/14
to bitcurat...@googlegroups.com
Hi Carol,

I'm going to weigh in here, but wearing my researcher/patron/user cap
rather than my BitCurator badge.

From a patron or researcher's standpoint, I believe you will see two
types predominate:

First and most often, folks who just want you to give them the good
stuff. In other words, they want the content of the disk. They'll want
to know what files are on it and they'll want to be able to see copies
of them. If it's a Word doc, they'll want the text. If there are
images, they'll want to look at them. Item-level metadata and access
paths to the files or content independently of the disk image will be
key here.

Second, and a smaller group, will be folks (like me) who would want to
inspect the actual disk image as a complete digital artifact.
Certainly they may be interested in "content," but they'll also want
to do things like look for pieces of files outside of the file system
or other material traces of the disk's history. They may want to see
content recreated in its original environment, which would involve
mounting the disk image in a compatible operating system or accessing
it via emulation. These are the folks who come into the rare book room
to look at bindings and signatures and gatherings and imperfections in
the type. Again, a smaller subset of users, but I believe we will
start to see them too. Here the big question would be whether you want
to give a researcher unrestricted access to a disk image. At present
BC does not allow you to redact content within the image itself, so
it's an all or nothing proposition.

Hope this helps--Best, Matt
> --
> You received this message because you are subscribed to the Google Groups
> "BitCurator Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to bitcurator-use...@googlegroups.com.
> To post to this group, send email to bitcurat...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/bitcurator-users/CAALj18g8tvz6afEB%3DGD5Mv6VvZVk5ZwwqOGxnNcjRGFu3hVK1Q%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.



--
Matthew Kirschenbaum
Associate Professor of English
Associate Director, Maryland Institute for Technology in the Humanities (MITH)
University of Maryland
301-405-8505 or 301-314-7111 (fax)
http://mkirschenbaum.net and @mkirschenbaum on Twitter

Track Changes tumblr: http://trackchangesbook.tumblr.com/

Julie

unread,
Aug 7, 2014, 10:41:03 AM8/7/14
to bitcurat...@googlegroups.com
Carol,

As you probably expect, the answer is: "it depends".  But I'll try to give you some generic answers that point out options.  I am going to assume that you are talking about both disk images from manuscript collections (presumably from a person outside your institution, such as an author) and from electronic records capture inside your institution, meaning they are institutional records (such as if you imaged the retiring university president's hard drive.) 

- What do you do with the disk images after you create them?  Do you save them in dark storage?  Make them accessible (to who)?  Discard them?  And why?

For manuscripts, this will depend on the terms of your donor agreement.  If a donor intended to donate only a selection of files, and not the disk image itself, then you can't keep the disk image.  For institutional records, it depends on policy.

Typically, institutions are storing disk images in dark storage, and only providing access to select files.  That is largely because a tech-savvy user can find content on a disk image that the donor and/or university official never intended anyone to see, such as deleted files.  (Another note: your donor agreement and institutional records policy should cover whether or not you are permitted to recover and/or provide access to deleted files.)  On the other hand, a donor might explicitly **want** people to be able to see the whole hard drive as it was.  I believe that this is what Emory University is doing by emulating Salman Rushdie's older laptops so that end users can see the computer as he used it when he wrote a particular novel.  (I am not sure if they are doing anything to redact deleted files from the image.  If you're not familiar with the Rushdie files, there is more info here: http://www.emory.edu/EMORY_MAGAZINE/2010/winter/authors.html)

In general, if you can afford it and you have permission, it is a good idea to keep the disk images.  As Matt mentioned, those who study technology and its use would appreciate being able to see the files in their natural state - this is very much in line with the archival imperative to maintain context by maintaining provenance and original order.  The disk is the original order of the files.  The disk is also a good safety net.  If something ever happens to the individual files you've extracted, you can return to the disk image (make a copy first!  never touch the original!) and extract the files again.  In the future, there may be even better tools for doing this, which is another reason to hold onto the image if you can.  

- What do you do with the reports BitCurator creates?  What do you find most useful?

I don't know if you are familiar with Chris Prom's Do-It-Yourself Repository: http://e-records.chrisprom.com/recommendations/implement-a-trustworthy-digital-repository/diy-tdr/.  He talks about how you can follow the OAIS model without any fancy preservation software layer.  One part of preservation could be to take the BitCurator reports and put them in a folder with other metadata.  As an example, you could have a folder named "MSS2014_046" (or whatever your naming convention is).  Then, inside that folder, you would have a folder called "data", where you would park the disk image and any extracted files.  (And in the best scenario, the extracted files would be in both their original format (like .docx) and a long-term preservation format (like PDF/A.)  There would be another folder inside "MSS2014_046" called "metadata".  In there, you would store the BitCurator reports and anything else that is data about the contents of the 'data' folder.  (See Chris Prom's post for more info.  That post is a gem.)  Even if you do use a preservation software layer, like Archivematica, you can add the BitCurator reports to the metadata folder that is created by Archivematica.

The searches for personally identifiable information are really useful, because they can help you assess what sort of access you may provide.  The checksums are of course useful for fixity. I plan on keeping all reports, even if they are not immediately useful to me.   This is one of those scenarios where present decisions will influence future possibilities, and I'd rather keep the reports than try to figure out that information again sometime in the future.

Then, you would explain in a finding aid what is in this 'series'.  (For an example of a finding aid that includes both paper and digital records, see section IX of this one: http://oasis.lib.harvard.edu/oasis/deliver/findingAidDisplay?_collection=oasis&inoid=1454&histno=0.  Also, I like the wording in the "Processing Information" and "Access Restrictions" section, which give researchers information about the digital content and how it is accessed.

- In the end, what do you make available to users?  Individual files/folders? The disk image?  How?

What you make available will again depend on agreements and policies.  For individual files/folders, you can just put a copy in a folder on a computer in your reading room.  Depending on restrictions and policies on copying, you may need to disable all the USB and other ports, and disable the disc-writer and so on, so no one can create digital copies of the content without explicit authorization and help from your staff.  For viewing the entire disk image, you may need to use some specialized tools, but this is something you can do on a computer in your reading room.  If you've created an .E01 image in BitCurator, for instance, you could use FTK Imager Lite (the free version) to look at the contents of the disk image.  (Look under FTK Imager, and choose the Lite version: http://www.accessdata.com/support/product-downloads)  Alternatively, you could use something like Mount Image Pro: http://www.mountimage.com/.  This allows the E01 file to be mounted like a hard drive, so that the researcher can navigate through it as though it were a real drive on the computer.  (Always do this with a copy of the disk image, not the original.)  To avoid having to buy 500 different software programs to read files, you may want to also get Avantstar's QuickView Plus: http://www.avantstar.com/metro/home/Products/QuickViewPlusStandardEdition

So, I would conclude by saying that you'll need to decide whether you want to provide access to some files or the entire disk image.  And then you'll have to make sure that your donor agreements have the appropriate wording, and that you follow what the donor has allowed you to do.  For institutional records, such as the president's hard drive, you'll need to consider institutional policy and records transfer agreements.  (In those cases, presumably you have access restrictions on records anyway, and you'll need to store the disk image for a long time before most researchers can access it.  Presumably it may be accessed on occasion by approved people, like members of your Board, if there are reasons of business continuity and/or legal issues that need to be addressed.  But in general, you may be parking the image and derivative files for 50 or more years before they can be used by researchers. In those cases, I think it's a really good idea to hold on to the disk image.  At some point in the future, you may decide that you don't need to keep it, because we might find, 20 years from now, that there is some really good preservation system that makes the disk image unnecessary, especially if you never intend to provide access to the raw image.  But if you decide now not to keep the image, you limit what you can do in the future.  If you decide now to keep the image, your future possibilities will be much wider.

I hope that helps.  Please feel free to contact me on- or off-list if you have more questions. 

Julie C. Swierczek

L Snider

unread,
Aug 7, 2014, 2:10:05 PM8/7/14
to bitcurat...@googlegroups.com
I also wanted to add something about redaction. I don't have answers, since redaction has no real best practices or how to's in an archival sense, yet. However, I just did a lot of in depth research about it, so I thought I would share...

I was looking at how we can deal with disk images that need redacting (specifically email acquisitions). If we redact, how do we note this to researchers who may care about the disk image? Do we then have two 'originals' and how do we note that in a finding aid? Do we note it? Will anyone care? Do we restrict the disk image? Then the real headache was, how do we redact?

This was the only archival redaction article that I could find in my extensive research (doesn't mean something else isn't out there, but I didn't find it):
https://techknowhow.library.emory.edu/blogs/branker/2009/06/03/redaction-software-recommendation-marbl-digital-archiving

There is one archival based retraction script out there (there are non archival ones but they tend to be $5000-$10,000), called iredact.py. However, it is very buggy and I am looking for something else to use. If you are curious it is here:
https://github.com/simsong/dfxml/tree/master/python.

Cheers

Lisa

-- 
Lisa Snider
Electronic Records Archivist
Harry Ransom Center
The University of Texas at Austin
P.O. Box 7219
Austin, Texas 78713-7219
P: 512-232-4616
www.hrc.utexas.edu




--
You received this message because you are subscribed to the Google Groups "BitCurator Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcurator-use...@googlegroups.com.
To post to this group, send email to bitcurat...@googlegroups.com.

L Snider

unread,
Aug 7, 2014, 2:11:56 PM8/7/14
to bitcurat...@googlegroups.com
Oh and QuickView Plus is an excellent program, but it doesn't read disk images and doesn't do audio or moving image. It also messes up most website files.

Cheers

Lisa
Reply all
Reply to author
Forward
0 new messages