The idea of movage echoes the kind of architecture for the Oxford archive;
every layer/column of the storage platform is very loosely coupled to the
other parts, as we plan for parts to be superceded and for content to
migrate naturally and as needed - note 'needed' rather than 'planned'
migration. It's our guess that things need to be kept 'in motion' - checked,
characterised, always ready to be moved.
We are aiming to preserve access to materials by storing a canonical
version, and to maintain a dissemination version (or more as required).
The canonical version is as similar to the original as possible - archiving
the explicit knowledge (tiff images of pages, PDFs, LaTeX, audio files)
alongside the implicit (file format characterisation, who/what/where image
taken, text/data-mined metadata, etc)
The dissemination version(s) are user-driven - can be as simple as a
verbatim document viewer, "page turning app" or whatever, but can be more
useful - OCRd text from a scanned page, rendered to the user with key
sentences and keywords drawn in a larger font, or direct links to eJournals
inserted on top of where citations are referenced in a journal article.
SPIDER project (http://imageweb.zoo.ox.ac.uk/wiki/index.php/Spider_Project)
shows what can be acheived by hand, and IMO there is good scope for
automated improvements to paginated content.
As this is still early days, I am not sure about whether or not to keep
dissemnition copies. If they are referenced, are we obligated to maintain
them? Or should we take the OS versioning strategy and keep the last known
good versions of the numbered releases - only keeping the older
dissemintations if a substantial change was made for a new one [a point
release (0.3 -> 0.4) is an indication that something was fixed or an
addition to the viewing capabilities was made, but a full increment (1.2 ->
2.0) indicates that the new dissemination is substantially different from
the last.]
In some cases, the dissemintation copies can be made on the fly and cached
(TIFF/JPG2000 -> lower res jpgs/pngs) as the amount of total imagery stored
vs.
the amount that will ever be used is a very top heavy ratio :)
In other cases, where the main use is for serendipidous reuse, such as text-
and data-mining, the benefit comes from allowing users immediate access to
this body of derived information, rather than on demand.
Apologies for the inarticulate post, but hey, it's xmas, and I really just
wanted to add that preservation is an extremely active activity, exactly as
was stated in the previous email - your system has to be ready to change at
a drop of a hat.
Oh and one last thing - PREMIS and standardising METS profiles for
interchange of digital items is all well and good, but the white elephant in
the room is the legal and financial issues - legal depts will disallow
transfers based on fear of infringement and accountants will want to charge
someone somehow. So, my yuletide message is standardise for yourselves,
write down all the implicit information and try not to tie too much
information in the packaging - my acid test is that if you just pass the
files for an item in a zip archive to a colleague/peer, someone with little
idea of what you are dealing with or the standards involved, and if they can
work out what all the bits are with the help of google, then you've been
successful.
Merry Xmas all!
Ben O'Steen
2008/12/23 Ed Summers <ed.summ...@gmail.com>:
> We recently had a barcamp style event [1] here at the Library of
> Congress where Ryan McKinley [2] (Solr developer/hacker
> extraordinaire) showed up. Some of us working on digital preservation/
> curation/repository stuff at LC got talking with him over beers, not
> about search, but about what it is our group is trying to do at LC.
> Somewhere in the conversation he mentioned that a friend of his had
> written recently about something he called 'movage' [3].
> """
> The only way to archive digital information is to keep it moving. I
> call this movage instead of storage. Proper movage means transferring
> the material to current platforms on a regular basis -- that is,
> before the old platform completely dies, and it becomes hard to do.
> This movic rythym of refreshing content should be as smooth as a
> respiratory cycle -- in, out, in, out. Copy, move, copy, move.
> """
> It seems to be a counter-intuitive idea, that moving data could lead
> to better preservation. The common wisdom I've heard is that the more
> we touch stuff, the more likely we are to corrupt it, and thwart
> preservation. So, as a consequence we should try not to touch stuff.
> In some ways I guess Kevin's term 'movage' conflates moving with
> format migration...so maybe these ideas aren't really juxtaposed?
> The really fascinating thing about this piece are actually the
> comments. One that stood out was Jim Thomas' analogy between 'movage'
> and preservation of life forms in gene-banks:
> """
> Heres a thought: Your distinction of storage and movage echoes a live
> debate in the world of genetic conservation. Institutional attempts to
> conserve genetic diversity (eg by the gene-banks of the CGIAR) often
> revolve around so called ex-situ conservation - basically storage of
> seeds in large freezers. Farmers have argued that the only way to
> conserve seeds is to use and replicate them in the field every year
> (in situ conservation) - movage if you like.
> Sure enough a worrying proportion of seeds in gene banks lose their
> viability and won't plant out after a few years. One reason may be
> because the environment itself is changing (just as the computer/
> software environment is changing for digital media) but also because
> of physical degredation (just like those CD's).
> """
> However it seems more than 'movage' here Jim is talking about actual
> 'usage'. I'm kind of new to the digital preservation/curation arena,
> and I was wondering if anyone has written about the connection between
> digital preservation and usage before. By usage I don't mean format
> migration, but actual people using the bits that are attempting to be
> preserved.
> //Ed
> [1] http://barcamp.org/SearchCampDC
> [2] http://www.squid-labs.com/people/ryan.html
> [3] http://www.kk.org/thetechnium/archives/2008/12/movage.php