I wrote up a post about versioning bags, riffing pretty liberally on
CDL's ReDD spec. Some of the major changes to that spec:
1. reverse-deltas/ is a directory at the same level as the data/
directory in a bag
2. Deltas directories are a timestamp rather than a version number
3. Each reverse delta acts like a bag, also (so changes are checksummed)
4. Namaste removed
The post is here:
http://davidbrunton.org/2010/03/versioning-bags_18.html
I'm curious, in particular, if anyone sees this as a perversion of the
BagIt spec? It's a lot of other stuff to stick in that top level bag
directory, but doing so also gains a great deal of simplicity to my
reckoning.
-db.
I think it's a cool idea, and would have very real uses. I especially
like the nice backwards compatibility with existing bags and tools.
Brian
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (MingW32)
Comment: What is this? http://pgp.ardvaark.net
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEAREDAAYFAkujjckACgkQ3YdPnMKx1ePLzQCgzhpCkZ2cN+jm/fM/DaYk0bC3
6BgAn1UWu3FvxjMlTAw2g5D0xpUX7gIO
=dH+L
-----END PGP SIGNATURE-----
bag_v01
bag_v02
But this is very wasteful of storage when we're just making a small
change to a many-gigabyte or terabyte bag.
I'd be interested to know if there's anything in the BagIt spec as it
stands that would prevent such an approach.
//Ed
I think the only thing we have right now is the implicit restriction of
directories in the top-level bag. We're planning on making that an
explicit restriction with the next 0.96 release, just for clarification,
but I am thinking we should change it for 0.97. There are too many cool
use cases we're inhibiting with that restriction.
Brian
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (MingW32)
Comment: What is this? http://pgp.ardvaark.net
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEAREDAAYFAkuj78UACgkQ3YdPnMKx1eN5vwCfT9gWegniCih4EvAmvGAN8X4y
b7AAn2RZkyUuYQ3piEt68vHItSBR2jSB
=oY+Y
-----END PGP SIGNATURE-----
- http://tonluong.com/projects/digital-packaging-format
BagIt has a lot of good points. My thoughts was to extend it by loosely coupling it with other great ideas like JSON, distributed versioning (GIT/Mercurial), parity files, and CouchDB (a document based database that never overwrites committed data); and reap the benefits from these software and at the same time keeping everything simple and easy to integrate.
At the same time, each pieces are optional; each item providing an added value without adding complexity.
The base format (would be similar to the BagIt spec with some changes):
========================================================
- manifest/metadata files (in JSON)
Distributed Versioning (GIT/Mercurial) (Optional - add if versioning is a requirement)
=================================================================
- wide array of transport option (SSH, HTTP/HTTPS, local file system, rsync)
- all the benefits of versioning (checksums, revisions, etc)
Parity Files (PAR) (Optional - add if data recovery is a requirement)
===================================================
- provide different data recovery levels
CouchDB (Optional)
================
- maps one-to-one with JSON manifest/metadata
- http://couchdb.apache.org/docs/overview.html
This feels like it will definitely break the BagIt spec. So I am thinking of a different packaging spec which at its very core resemble BagIt and by design, provides additional functionality for various use case; personally i have a need for versioning and data recovery. I would like to hear your thoughts before moving in that direction?
Best,
Ton
> --
> You received this message because you are subscribed to the Google Groups "Digital Curation" group.
> To post to this group, send email to digital-...@googlegroups.com.
> To unsubscribe from this group, send email to digital-curati...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/digital-curation?hl=en.
>
+1
I agree there are too many cool use cases to restrict the bag to only
including data/. If the bag spec were changed to allow directories
other than data/, it might be a good idea to add something normative
that says "when you put stuff in directories other than data/, that
stuff is not part of the bag, and bag tools are only required to grab
stuff they find in the manifest."
What I'm proposing is that yes, we would allow other directories, but
no, they are not "part of the bag" (other than being conveniently
located in the same parent directory).
Does that make sense to anyone else? My general idea there is that
other specs could take advantage of the same structure (e.g. enclose
the information that is of curatorial interest in data/, but use the
surrounding directory for other important stuff) as BagIt without
needing to encumber the BagIt spec.
-db.
Yes that makes sense to me. I think it would be nice if a Bag is
comprised of only the files and directories that the BagIt spec
enumerates. Additional files and directories are not considered part
of the Bag. Bearing in mind that additional files and directories
could invalidate the Bag, as in the case when additional files are
dropped into the data directory.
Generally I'm curious to know if anyone thinks we're straying by
considering the Bag as a unit for digital preservation. Some of us at
LC have this mental shorthand of thinking of Bags as not only useful
for helping bits travel in space (e.g. CDL -> LC) but also in time
(03/23/2010 -> 03/23/2020). So the ability to layer administrative
stuff into the bag, without making it an invalid bag becomes
important.
I was also wondering if the California Digital Library folks have
moved away from talking about Bags and towards talking about D-flats
[1] for a similar reason. It seems like much of the functionality of
Bags has been subsumed by D-flat and Checkm [3] in that each version
directory in a d-flat is effectively a Bag.
It seems like we here at LC are arriving at our notion of a D-flat,
but don't want it to interfere with our notion of a Bag.
//Ed
[1] http://www.cdlib.org/services/uc3/curation/storage.html
[2] https://confluence.ucop.edu/display/Curation/D-flat
[3] https://confluence.ucop.edu/display/Curation/Checkm