Jerome,
> I don't know if this is a huge problem for us; grabbing the content
> and preserving it is our first concern. But if at some point 10 years
> from now someone asks why the file in our Bag doesn't match the file
> sitting at that URI on the web, or when exactly things changed, we
> won't really have any information to hand them. Archive.org's URIs
> contain date stamps within themselves, but the URI's I'll be using
> don't, for the most part.
I see this as a limitation directly imposed by the originally transient
nature of the fetch.txt. Section 4.1 is pretty clear that the fetch.txt
has two purposes:
1) To assist in the transfer of an unwieldy bag; and
2) To permit the transfer of a bag where the pieces are located in
different locations, such as often occurs on a distributed file system.
In both cases, it's a transfer-time artifact. I feel like the confusion
results from trying to apply new semantics (long-term identifiers) to
the already-established semantics of the fetch.txt. As you point out,
how do you know when you've switched meanings - especially in ten years?
Perhaps a better approach is to leverage some as-yet nonexistent bag
extension mechanism to store the URIs? This would be the second
scenario I've heard so far that would be potentially benefit from such a
mechanism (forward error correction being the other).
Brian
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (MingW32)
Comment: What is this? http://pgp.ardvaark.net
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEAREDAAYFAkqyk5UACgkQ3YdPnMKx1eMvrQCeOiZSSyBmXgeDJG2pN/k9eSrU
bAsAnjz5nP4JU0+MrWU0VEcGcZXNFMOf
=lOa5
-----END PGP SIGNATURE-----
"""
TargetFileOrURL is a secondary location for the content that
applications would use as necessary. For instance, a transfer tool
that also renames files could use this token as the destination name.
"""
It's not entirely clear to me if checkm manifests replaces bagit
manifests in the context of dflat. Perhaps someone from CDL on the
list has a better idea of that.
//Ed
[1] http://www.cdlib.org/inside/diglib/dflat/dflatspec.pdf
[2] http://www.cdlib.org/inside/diglib/checkm/checkmspec.html
On 09/16/2009 02:49 PM, Jerome wrote:
>
> I have since my first post thought of one little problem with our
> approach. Our fetch.txt file does, in its own way, constitute a set
> of assertions, that particular URIs represent resources that
> correspond to particular files contained within our Bag. While that
> is true at the time we construct the Bag, we all know that the content
> found at a particular URI can change, and there's no time/date stamp
> associated with my link between a URI and a file in our Bag to say
> when exactly that URI corresponded with this resource.
>
> I don't know if this is a huge problem for us; grabbing the content
> and preserving it is our first concern. But if at some point 10 years
> from now someone asks why the file in our Bag doesn't match the file
> sitting at that URI on the web, or when exactly things changed, we
> won't really have any information to hand them. Archive.org's URIs
> contain date stamps within themselves, but the URI's I'll be using
> don't, for the most part.
This is describes how NC State uses BagIt. I don't normally archive
data in web-accessible locations so when I want to transfer data I point
Apache at directories in my storage array and put those URIs in my fetch
file. Directly following notification that the receiving institution
has validated the data, I discontinue access to the objects.
So, you aren't using BagIt strictly for file transfer, is that right?
Jim