BagIt Profile Specification - request for comments/participation

90 views
Skip to first unread message

Mark Jordan

unread,
Nov 15, 2012, 5:21:51 PM11/15/12
to digital-curation
Hi,

At the Access 2012 Hackfest in early October, a group of us spent the day hashing out a simple specification for expressing machine-readable BagIt profiles. We've made the proposed spec available at https://github.com/ruebot/bagit-profiles for your reading pleasure.

We're hoping this draft spec will generate some discussion among BagIt implementers and that the community will help develop the spec to a state where we can start sharing profiles for Bags created by or ingested by common repository platforms and for other shared applications.

Thanks, looking forward to the discussion,

Mark

Mark Jordan


Mark A. Matienzo

unread,
Nov 15, 2012, 10:48:35 PM11/15/12
to digital-...@googlegroups.com
Hi Mark - I'm really excited to see this, because I've been working
with Bagger profiles and wondering how the best way to extend them
into something that might be actionable outside of the Bagger
application. One of the profiles that we've developed that's currently
in use can be found at [0]. A couple of quick thoughts follow based on
what my needs are.

* Ideally I would like to see any profile implementation have some
(optional?) identifying information within it, e.g. a brief textual
description, a listing of the maintainer/maintainers, etc.

* Does the profile itself need to contain the URI that identifies it?
I can imagine cases where I may like to store a copy of the profile
locally which would then be used to create/validate Bags using that
profile.

* Would it be of value to add versioning information to the profile,
e.g. a version number, or information about if a given version profile
has been superseded, etc.?

* I'd be interested in seeing a mapping against the Bagger profile
implementation.

* Eventually I'd also like to see Bagger to support this emerging
profile spec once it's set.

[0] https://github.com/yalemssa/bagger-profiles/blob/master/YUL_DISKIMG_ACCN_SIP_0.2-profile.json

Cheers,

Mark A. Matienzo <ma...@matienzo.org>
Digital Archivist, Manuscripts and Archives, Yale University Library
Technical Architect, ArchivesSpace
> --
> You received this message because you are subscribed to the Google Groups "Digital Curation" group.
> To post to this group, send email to digital-...@googlegroups.com.
> To unsubscribe from this group, send email to digital-curati...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/digital-curation?hl=en.
>

Mark Jordan

unread,
Nov 16, 2012, 12:19:26 AM11/16/12
to digital-...@googlegroups.com
Mark,

----- Original Message -----
> Hi Mark - I'm really excited to see this, because I've been working
> with Bagger profiles and wondering how the best way to extend them
> into something that might be actionable outside of the Bagger
> application. One of the profiles that we've developed that's
> currently
> in use can be found at [0]. A couple of quick thoughts follow based
> on
> what my needs are.
>
> * Ideally I would like to see any profile implementation have some
> (optional?) identifying information within it, e.g. a brief textual
> description, a listing of the maintainer/maintainers, etc.

Great idea, something like this?

"bagit-profile-info": {
"description": "The best damn BagIt profile out there",
"maintainer": "Foo University Archives",
"version": "1.3",
"uri": "http://foo.edu/bagitprofiles/bar.json"
}

Actually, we may want to consider using the 'reserved metadata element names' identified in the section 2.2.2 of the BagIt spec to describe the profile, as long as they were wrapped in a specific JSON field, like 'bagit-profile-info' above.

>
> * Does the profile itself need to contain the URI that identifies it?
> I can imagine cases where I may like to store a copy of the profile
> locally which would then be used to create/validate Bags using that
> profile.

We require a BagIt bag-info.txt tag 'Bag-Profile' containing the URI, but having the URI in the profile file itself makes a lot of sense. How about a new required field 'uri' (see example above).

>
> * Would it be of value to add versioning information to the profile,
> e.g. a version number, or information about if a given version
> profile
> has been superseded, etc.?

Yes - also see above, but please suggest more detail if you want.

>
> * I'd be interested in seeing a mapping against the Bagger profile
> implementation.
>
> * Eventually I'd also like to see Bagger to support this emerging
> profile spec once it's set.

I'm not that familiar with Bagger, but if we are going to use Bag profiles, ideally we should make them work with as many tools as possible.

Thanks for the feedback, keep it coming! Also, thanks for the formatting improvements to the spec document, I've merged them in.

Mark

Mark A. Matienzo

unread,
Nov 16, 2012, 9:19:35 AM11/16/12
to digital-...@googlegroups.com
On Fri, Nov 16, 2012 at 12:19 AM, Mark Jordan <mjo...@sfu.ca> wrote:
>> * Ideally I would like to see any profile implementation have some
>> (optional?) identifying information within it, e.g. a brief textual
>> description, a listing of the maintainer/maintainers, etc.
>
> Great idea, something like this?
>
> "bagit-profile-info": {
> "description": "The best damn BagIt profile out there",
> "maintainer": "Foo University Archives",
> "version": "1.3",
> "uri": "http://foo.edu/bagitprofiles/bar.json"
> }
>
> Actually, we may want to consider using the 'reserved metadata element names' identified in the section 2.2.2 of the BagIt spec to describe the profile, as long as they were wrapped in a specific JSON field, like 'bagit-profile-info' above.

I like this idea, so perhaps something like the following:

"bagit-profile-info: {
"Source-Organization": "Yale University",
"Contact-Name": "Mark Matienzo",
"External-Description": "BagIt profile for packaging disk images",
"Version": "0.3",
}

>> * Does the profile itself need to contain the URI that identifies it?
>> I can imagine cases where I may like to store a copy of the profile
>> locally which would then be used to create/validate Bags using that
>> profile.
>
> We require a BagIt bag-info.txt tag 'Bag-Profile' containing the URI, but having the URI in the profile file itself makes a lot of sense. How about a new required field 'uri' (see example above).

That sounds good, but if we're thinking about adapting existing
reserved tags from the BagIt spec, would "External-Identifier" work?

>>
>> * Would it be of value to add versioning information to the profile,
>> e.g. a version number, or information about if a given version
>> profile
>> has been superseded, etc.?
>
> Yes - also see above, but please suggest more detail if you want.

I may need to ruminate on this and get back to you.

>> * I'd be interested in seeing a mapping against the Bagger profile
>> implementation.
>>
>> * Eventually I'd also like to see Bagger to support this emerging
>> profile spec once it's set.
>
> I'm not that familiar with Bagger, but if we are going to use Bag profiles, ideally we should make them work with as many tools as possible.

Great, thanks!

Mark

Mark Jordan

unread,
Nov 16, 2012, 10:19:55 AM11/16/12
to digital-...@googlegroups.com
Hi Mark,

I'm cutting some of the email text to reduce static, I hope the context isn't geopardized too much.

----- Original Message -----
>
> "bagit-profile-info: {
> "Source-Organization": "Yale University",
> "Contact-Name": "Mark Matienzo",
> "External-Description": "BagIt profile for packaging disk images",
> "Version": "0.3",
> }

Yes, this is good.

> >
> > We require a BagIt bag-info.txt tag 'Bag-Profile' containing the
> > URI, but having the URI in the profile file itself makes a lot of
> > sense. How about a new required field 'uri' (see example above).
>
> That sounds good, but if we're thinking about adapting existing
> reserved tags from the BagIt spec, would "External-Identifier" work?
>

I'd prefer to keep the two tags the same (rationale being that they should always have the same value), e.g., either 'Bag-Profile' for in both the bag-info.txt file and in the bag-profile-info JSON field, or 'External-Identifier' in both places; however, 'External-Identifier' in bag-info.txt already has a reserved meaning in the BagIt spec. What about 'Bag-Profile-Identifier' in both places?

Mark

Mark Jordan

unread,
Nov 16, 2012, 10:21:57 AM11/16/12
to digital-...@googlegroups.com


----- Original Message -----
> geopardized

jeopardized #notenoughcoffeeyet

Mark A. Matienzo

unread,
Nov 16, 2012, 12:30:29 PM11/16/12
to digital-...@googlegroups.com
On Fri, Nov 16, 2012 at 10:19 AM, Mark Jordan <mjo...@sfu.ca> wrote:
>> That sounds good, but if we're thinking about adapting existing
>> reserved tags from the BagIt spec, would "External-Identifier" work?
>>
>
> I'd prefer to keep the two tags the same (rationale being that they should always have the same value), e.g., either 'Bag-Profile' for in both the bag-info.txt file and in the bag-profile-info JSON field, or 'External-Identifier' in both places; however, 'External-Identifier' in bag-info.txt already has a reserved meaning in the BagIt spec. What about 'Bag-Profile-Identifier' in both places?

Works for me!

Mark

Mark Jordan

unread,
Nov 16, 2012, 12:41:46 PM11/16/12
to digital-...@googlegroups.com
OK, some time before noon PST I'll update the spec on github to include what we've discussed so far, so everyone can have a fresh copy for further discussion.

Mark

Mark Jordan

unread,
Nov 16, 2012, 3:27:55 PM11/16/12
to digital-curation
Hello,

I've incorporated the following changes into the draft spec at https://github.com/ruebot/bagit-profiles :

1) Mark M's suggestion that we add a 'bagit-profile-info' section to the profile file;

2) Clarification that Bags adhering to a profile must contain a bag-info.txt file; previously, the text read 'Assumes presence of bag-info.txt (see commit at https://github.com/ruebot/bagit-profiles/commit/23a3a22cc462b5f38756b0b08c56f8f249e7ccbf for details).

3) Changed the name of the 'accept-version' field to 'accept-bagit-version', to be more descriptive.

Suggestions on the draft so far have been excellent (thanks Mark). I would like to pose the following questions:

1) In the newly-added 'bag-profile-info' section of the spec, any objections/problems with requiring some tags and making others (specifically, Contact-Name, Contact-Phone and Contact-Email) 'recommended'?

2) Any objections to making all the tags introduced by the profile spec use the same capitalization convention as the tags defined in the BagIt spec? Or better to leave them all lower-case to distinguish them from BagIt bag-info.txt tags?

3) The Access Hackfest group discussed allowing regular expressions in the 'value' field of the profile tags; for example, this would be useful in constraining the format of values in "Bagging-Date". If putting anything but literals in the "value" field is not a good idea, we could introduce a third field in the profile field definitions, "pattern". Any thoughts?

Mark

----- Original Message -----

Nick Ruest

unread,
Nov 19, 2012, 11:50:38 PM11/19/12
to digital-...@googlegroups.com
Responses below.

Mark, Mark, thanks for jumping on this while I was away. This is looking
great!

On 12-11-16 03:27 PM, Mark Jordan wrote:
> Hello,
>
> I've incorporated the following changes into the draft spec at https://github.com/ruebot/bagit-profiles :
>
> 1) Mark M's suggestion that we add a 'bagit-profile-info' section to the profile file;
>
> 2) Clarification that Bags adhering to a profile must contain a bag-info.txt file; previously, the text read 'Assumes presence of bag-info.txt (see commit at https://github.com/ruebot/bagit-profiles/commit/23a3a22cc462b5f38756b0b08c56f8f249e7ccbf for details).
>
> 3) Changed the name of the 'accept-version' field to 'accept-bagit-version', to be more descriptive.
>
> Suggestions on the draft so far have been excellent (thanks Mark). I would like to pose the following questions:
>
> 1) In the newly-added 'bag-profile-info' section of the spec, any objections/problems with requiring some tags and making others (specifically, Contact-Name, Contact-Phone and Contact-Email) 'recommended'?

No objections here. The required tags make sense, as do the fields that
are recommended. Given that folks move can around a bit, contact-name,
phone, and email might be worthless in some cases.

>
> 2) Any objections to making all the tags introduced by the profile spec use the same capitalization convention as the tags defined in the BagIt spec? Or better to leave them all lower-case to distinguish them from BagIt bag-info.txt tags?

I would leave them using the capitalization convention. Since the values
will probably be the same, best to keep them the same to avoid confusion.

>
> 3) The Access Hackfest group discussed allowing regular expressions in the 'value' field of the profile tags; for example, this would be useful in constraining the format of values in "Bagging-Date". If putting anything but literals in the "value" field is not a good idea, we could introduce a third field in the profile field definitions, "pattern". Any thoughts?

Instead of venturing down the road of date regular expression hell, why
couldn't we state in the spec that the date value is to be a specific
format?

-nruest

Mark A. Matienzo

unread,
Nov 20, 2012, 12:09:05 AM11/20/12
to digital-...@googlegroups.com
On Mon, Nov 19, 2012 at 11:50 PM, Nick Ruest <rue...@gmail.com> wrote:
>> 1) In the newly-added 'bag-profile-info' section of the spec, any
>> objections/problems with requiring some tags and making others
>> (specifically, Contact-Name, Contact-Phone and Contact-Email) 'recommended'?
>
> No objections here. The required tags make sense, as do the fields that are
> recommended. Given that folks move can around a bit, contact-name, phone,
> and email might be worthless in some cases.

Perhaps - maybe a generic/shared email address is preferable in this case.

>> 2) Any objections to making all the tags introduced by the profile spec
>> use the same capitalization convention as the tags defined in the BagIt
>> spec? Or better to leave them all lower-case to distinguish them from BagIt
>> bag-info.txt tags?
>
> I would leave them using the capitalization convention. Since the values
> will probably be the same, best to keep them the same to avoid confusion.

+1.

> Instead of venturing down the road of date regular expression hell, why
> couldn't we state in the spec that the date value is to be a specific
> format?

In the case of 'Bagging-Date', the BagIt spec already declares
YYYY-MM-DD as the preferred format.

Mark

Nick Ruest

unread,
Nov 20, 2012, 12:19:23 AM11/20/12
to digital-...@googlegroups.com

On 12-11-20 12:09 AM, Mark A. Matienzo wrote:
> On Mon, Nov 19, 2012 at 11:50 PM, Nick Ruest <rue...@gmail.com> wrote:
>>> 1) In the newly-added 'bag-profile-info' section of the spec, any
>>> objections/problems with requiring some tags and making others
>>> (specifically, Contact-Name, Contact-Phone and Contact-Email) 'recommended'?
>> No objections here. The required tags make sense, as do the fields that are
>> recommended. Given that folks move can around a bit, contact-name, phone,
>> and email might be worthless in some cases.
> Perhaps - maybe a generic/shared email address is preferable in this case.
+1
>
>>> 2) Any objections to making all the tags introduced by the profile spec
>>> use the same capitalization convention as the tags defined in the BagIt
>>> spec? Or better to leave them all lower-case to distinguish them from BagIt
>>> bag-info.txt tags?
>> I would leave them using the capitalization convention. Since the values
>> will probably be the same, best to keep them the same to avoid confusion.
> +1.
>
>> Instead of venturing down the road of date regular expression hell, why
>> couldn't we state in the spec that the date value is to be a specific
>> format?
> In the case of 'Bagging-Date', the BagIt spec already declares
> YYYY-MM-DD as the preferred format.
+1

Guess I should have read the spec again ;)

-nruest
>
> Mark
>

Mark Jordan

unread,
Nov 20, 2012, 12:45:06 AM11/20/12
to digital-...@googlegroups.com
Hi guys,

----- Original Message -----
> On Mon, Nov 19, 2012 at 11:50 PM, Nick Ruest <rue...@gmail.com>
> wrote:
> >> 1) In the newly-added 'bag-profile-info' section of the spec, any
> >> objections/problems with requiring some tags and making others
> >> (specifically, Contact-Name, Contact-Phone and Contact-Email)
> >> 'recommended'?
> >
> > No objections here. The required tags make sense, as do the fields
> > that are
> > recommended. Given that folks move can around a bit, contact-name,
> > phone,
> > and email might be worthless in some cases.
>
> Perhaps - maybe a generic/shared email address is preferable in this
> case.

Preferable, but unless we want to suggest using a non-personal email address, the original question remains: should we make Contact-Name, Contact-Phone, and Contact-Email required in the bag-profile-info section, or make them optional?

>
> >> 2) Any objections to making all the tags introduced by the profile
> >> spec
> >> use the same capitalization convention as the tags defined in the
> >> BagIt
> >> spec? Or better to leave them all lower-case to distinguish them
> >> from BagIt
> >> bag-info.txt tags?
> >
> > I would leave them using the capitalization convention. Since the
> > values
> > will probably be the same, best to keep them the same to avoid
> > confusion.
>
> +1.

Looks like we have a consensus here, at least so far. Unless anyone objects, I'll update the spec draft to capitalize the tags.

>
> > Instead of venturing down the road of date regular expression hell,
> > why
> > couldn't we state in the spec that the date value is to be a
> > specific
> > format?
>
> In the case of 'Bagging-Date', the BagIt spec already declares
> YYYY-MM-DD as the preferred format.
>

Yes, an oversight. We should follow the BagIt spec here. Can anyone think of any other types of tag values that might be valid use cases for regex validation? If not, let's drop the question.

Mark J.

Nick Ruest

unread,
Nov 20, 2012, 12:51:51 AM11/20/12
to digital-...@googlegroups.com

On 12-11-20 12:45 AM, Mark Jordan wrote:
> Hi guys,
>
> ----- Original Message -----
>> On Mon, Nov 19, 2012 at 11:50 PM, Nick Ruest <rue...@gmail.com>
>> wrote:
>>>> 1) In the newly-added 'bag-profile-info' section of the spec, any
>>>> objections/problems with requiring some tags and making others
>>>> (specifically, Contact-Name, Contact-Phone and Contact-Email)
>>>> 'recommended'?
>>> No objections here. The required tags make sense, as do the fields
>>> that are
>>> recommended. Given that folks move can around a bit, contact-name,
>>> phone,
>>> and email might be worthless in some cases.
>> Perhaps - maybe a generic/shared email address is preferable in this
>> case.
> Preferable, but unless we want to suggest using a non-personal email address, the original question remains: should we make Contact-Name, Contact-Phone, and Contact-Email required in the bag-profile-info section, or make them optional?

I still prefer the original language to keep the contact info as
recommended. Maybe *strongly recommended* ;)

>
>>>> 2) Any objections to making all the tags introduced by the profile
>>>> spec
>>>> use the same capitalization convention as the tags defined in the
>>>> BagIt
>>>> spec? Or better to leave them all lower-case to distinguish them
>>>> from BagIt
>>>> bag-info.txt tags?
>>> I would leave them using the capitalization convention. Since the
>>> values
>>> will probably be the same, best to keep them the same to avoid
>>> confusion.
>> +1.
> Looks like we have a consensus here, at least so far. Unless anyone objects, I'll update the spec draft to capitalize the tags.
>
>>> Instead of venturing down the road of date regular expression hell,
>>> why
>>> couldn't we state in the spec that the date value is to be a
>>> specific
>>> format?
>> In the case of 'Bagging-Date', the BagIt spec already declares
>> YYYY-MM-DD as the preferred format.
>>
> Yes, an oversight. We should follow the BagIt spec here. Can anyone think of any other types of tag values that might be valid use cases for regex validation? If not, let's drop the question.

I can't think of another tag that would need regex. So, let's drop the
question.

-nruest

>
> Mark J.
>

Nick Ruest

unread,
Dec 11, 2012, 10:52:58 PM12/11/12
to digital-...@googlegroups.com
Possibly a silly logistics question, it looks like we have what appears
to be a consensus on the BagIt Profile spec (if we don't, please let me
know!), does this spec stand alone or does this spec need to be
incorporated into the BagIt spec?

I would assume stand alone since it is ex post facto. I would love to
hear what others think.

cheers!

-nruest
Reply all
Reply to author
Forward
0 new messages