We've been working on a protocol for serving bagged content RESTfully since a small meeting in December. Now that the spec and example are reasonably well fleshed out I wanted to open this up for wider feedback:
The main goal here is to provide a simple path to content replication without tackling larger problems such as extended metadata, specific storage or versioning strategies, etc. and staying as close as possible to common web practices. We're using the project's issue tracker to collect problems and remaining work; use-cases or substantial proposed changes are solicited in the form of pull requests - for example we're currently tracking one proposal for handling content versioning in a branch[1] so we can cleanly maintain the specification and example.
My next steps are implementing a test client and a simple reference server - if anyone's interested in collaborating, both of those will also be done on github and there's been talk of taking the reference server in more interesting directions (e.g. storing each bag in a Git repository and/or adding cloud storage backends).
> The main goal here is to provide a simple path to content replication > without tackling larger problems such as extended metadata, specific > storage or versioning strategies, etc. and staying as close as > possible to common web practices. We're using the project's issue > tracker to collect problems and remaining work; use-cases or > substantial proposed changes are solicited in the form of pull > requests - for example we're currently tracking one proposal for > handling content versioning in a branch[1] so we can cleanly maintain > the specification and example.
Glad you wrote this up, Chris. I would just add one thing: the "we" Chris references is an ad hoc, open group of folks from all over. So if you deal in bags and care about replication, you're welcome to join.
Matt Schultz from the Educopia Institute has been running our monthly calls and keeping momentum going -- if you're interested in participating in this effort, I'd wager Matt (matt.schu...@metaarchive.org) would love to hear from you & add you to the list of folks who receive email on this effort.
Our next call will be on Friday, 3/25 at 3pm ET (1-270-400-2000, 282929#).
Below is a loose agenda for the next all-groups meeting on the
development of a RESTful Bag Server (https://github.com/acdha/restful- bag-server) scheduled for Friday, March 25th at 3pm ET/2pm CT/12pm PT.
Call-in Information is: 1-270-400-2000, 282929#
Tentative Agenda (feel free to suggest changes):
1. Brief review of updated use cases - Archivematica
2. Continue discussions on open issues with spec
* Small file transfers - use of keep-alive? tgz/zip?
* Version handling - use of courtesy URL?
* Validation history - URI or metadata?
* Handling manifest changes - final match on PUT? acceptable
manifests?
* Other?
3. Update on testing suite and reference server
4. Identifying practical next steps
5. Other?
Looking forward to talking with those who can join.
All best,
--
Matt Schultz
Collaborative Services Librarian
Educopia Institute, MetaArchive Cooperative
http://www.metaarchive.org matt.schu...@metaarchive.org
616-566-3204
On Feb 23, 11:33 am, "Michael J. Giarlo" <mich...@psu.edu> wrote:
> > The main goal here is to provide a simple path to content replication
> > without tackling larger problems such as extended metadata, specific
> > storage or versioning strategies, etc. and staying as close as
> > possible to common web practices. We're using the project's issue
> > tracker to collect problems and remaining work; use-cases or
> > substantial proposed changes are solicited in the form of pull
> > requests - for example we're currently tracking one proposal for
> > handling content versioning in a branch[1] so we can cleanly maintain
> > the specification and example.
> Glad you wrote this up, Chris. I would just add one thing: the "we"
> Chris references is an ad hoc, open group of folks from all over. So if
> you deal in bags and care about replication, you're welcome to join.
> Matt Schultz from the Educopia Institute has been running our monthly
> calls and keeping momentum going -- if you're interested in
> participating in this effort, I'd wager Matt
> (matt.schu...@metaarchive.org) would love to hear from you & add you to
> the list of folks who receive email on this effort.
> Our next call will be on Friday, 3/25 at 3pm ET (1-270-400-2000, 282929#).
The next all-groups meeting scheduled for Friday, April 29th at 3-4pm
ET has been re-scheduled for Friday, May 13th at 3-4pm ET. Call-in
info is: 1-270-400-2000, 282929#.
This call is open for new individuals and groups to attend to discuss
on-going development of a specification for managing and tracking the
availability of Bag defined data for the purposes of replication with
a view toward preservation. See the Github site for the current spec
definition: https://github.com/acdha/restful-bag-server.
There will be a request to add agenda items one week before the call.
Final agenda will posted a few days prior. Due to the limited time on
the call to address open issues and progress toward development,
please make every effort to review the Github site prior to the call,
add an issue, or respond to request for an added agenda item.
Notes from our previous meeting are below. Look forward to catching up
in a couple of weeks.
RESTful-Bag-Server Meeting 3
Minutes
03/25/2011
Attendees
1. Chris Adams (LoC)
2. Mike Burek (Chronopolis)
3. Mike Giarlo (Penn State)
4. John Kunze (CDL)
5. Matt Schultz (Educopia)
6. Mike Smorul (Chronopolis)
7. Don Sutton (Chronopolis)
8. Peter Van Garderen (Artefactual)
9. Charles Blair (University of Chicago)
o Peter van Garderen clarified his written use case
https://github.com/acdha/restful-bag-server/blob/master/Use%20Cases/A... o Simple submission/dissemination internal repository exchange (SIP/
AIP/DIP transformations) of Bag-based data
o Peter and Chris Adams commented that the current spec is very well
targeted towards this type of use case
2. Discussed Open Issues
A. Small file transfers – Chris suggested tabling this issue because
the range of potential solutions could probably be handled through
some added nomenclature on good http citizenship.
o Server should support things like keep-alive, pipelining, etc.
o May want to consider embedding some links in this section to
educate on the principles of RESTful Architecture
B. Version handling – Chris confirmed that the best approach to handle
the awareness of most current version of an uploaded bag would be to
dedicate a symbolic link location (/version/latest).
o Chris asked if there were any strong objections to creating a new
branch to bake in the versioning proposal that was approved on the
02/18 call – no objections. See here: https://github.com/acdha/restful-bag-server/tree/versioned-example o Folks are encouraged to look this over before merging
C. Validation history – very briefly discussed the implementation of a
resource for exposing details of last full bag validity check (pass/
fail). Mike Smorul had a question about making this available in the
metadata
o No clear determination on this was recorded on the call – may need
to revisit briefly on 05/13
D. Handling manifest changes – Chris and Mike Smorul suggested making
the spec flexible enough to allow people to query for supported
manifests and to re-upload files as needed.
o Agreed also to accept any additional manifests but require at least
one supported format and report a 409 conflict if the listed files
don't match the standard md5 or sha-256.
3. Update on Testing Suite & Reference Server
o Chris got started on a Python-based testing suite that conforms to
the spec: https://github.com/acdha/restful-bag-server/blob/test-suite/tests.py o Chris indicated that he wanted to work on the reporting
o Was planning on starting with a simple read-only server that would
be a good proof of concept for folks developing custom clients for
their environments that would want to interface
o The reference server would eventually be driven by Python and
should be flexible and extensible
4. Discussed Practical Next Steps
o Chris invited folks to feel free to issue a pull request on the
Github site if they are interested in lending him a hand:
https://github.com/acdha/restful-bag-server o Matt inquired about promoting this work and the most reasonable
time frames/venues - Consensus was that June-August might be most
appropriate once tests of the spec have revealed themselves - Mike
Giarlo mentioned Open Repositories and Curate Camp as potential venues
o Questions about licensing of the spec arose – Chris from Library
of Congress would have to inquire (perhaps BSD or GPL)
o Agreed to check-in prior to the next call on May 13th on progress
toward test suite implementation & reporting and implementation of the
reference server
On Mar 23, 2:07 pm, "matt.schu...@metaarchive.org"
<matt.schu...@metaarchive.org> wrote:
> Hi Everybody,
> Below is a loose agenda for the next all-groups meeting on the
> development of a RESTful Bag Server (https://github.com/acdha/restful- > bag-server) scheduled for Friday, March 25th at 3pm ET/2pm CT/12pm PT.
> Call-in Information is: 1-270-400-2000, 282929#
> Tentative Agenda (feel free to suggest changes):
> 1. Brief review of updated use cases - Archivematica
> 2. Continue discussions on open issues with spec
> * Small file transfers - use of keep-alive? tgz/zip?
> * Version handling - use of courtesy URL?
> * Validation history - URI or metadata?
> * Handling manifest changes - final match on PUT? acceptable
> manifests?
> * Other?
> 3. Update on testing suite and reference server
> 4. Identifying practical next steps
> 5. Other?
> Looking forward to talking with those who can join.
> All best,
> --
> Matt Schultz
> Collaborative Services Librarian
> Educopia Institute, MetaArchive Cooperativehttp://www.metaarchive.org > matt.schu...@metaarchive.org
> 616-566-3204
> On Feb 23, 11:33 am, "Michael J. Giarlo" <mich...@psu.edu> wrote:
> > On 02/22/2011 10:03 PM, Chris Adams wrote:
> > > The main goal here is to provide a simple path to content replication
> > > without tackling larger problems such as extended metadata, specific
> > > storage or versioning strategies, etc. and staying as close as
> > > possible to common web practices. We're using the project's issue
> > > tracker to collect problems and remaining work; use-cases or
> > > substantial proposed changes are solicited in the form of pull
> > > requests - for example we're currently tracking one proposal for
> > > handling content versioning in a branch[1] so we can cleanly maintain
> > > the specification and example.
> > Glad you wrote this up, Chris. I would just add one thing: the "we"
> > Chris references is an ad hoc, open group of folks from all over. So if
> > you deal in bags and care about replication, you're welcome to join.
> > Matt Schultz from the Educopia Institute has been running our monthly
> > calls and keeping momentum going -- if you're interested in
> > participating in this effort, I'd wager Matt
> > (matt.schu...@metaarchive.org) would love to hear from you & add you to
> > the list of folks who receive email on this effort.
> > Our next call will be on Friday, 3/25 at 3pm ET (1-270-400-2000, 282929#).
Below is a starter agenda for the next all-groups meeting on the
development of a RESTful Bag Server (https://github.com/acdha/restful- bag-server) scheduled for Friday, May 13th at 3pm ET/2pm CT/12pm PT.
Call-in Information is: 1-270-400-2000, 282929#.
This will be a great call for groups or individuals who have not yet
participated but are interested in this set of work to drop in and say
hi.
Starter Agenda (feel free to suggest additional items on the call):
1. Brief welcome and catch-up for new callers
2. Overview of a Java Bag Server - Mike Smorul
3. Scheduling future calls
Looking forward to talking with those who can join.
All best,
Matt Schultz
Collaborative Services Librarian
Educopia Institute, MetaArchive Cooperative
http://www.metaarchive.org matt.schu...@metaarchive.org
616-566-3204
On Apr 27, 7:51 am, "matt.schu...@metaarchive.org"
<matt.schu...@metaarchive.org> wrote:
> Hi Everybody,
> The next all-groups meeting scheduled for Friday, April 29th at 3-4pm
> ET has been re-scheduled for Friday, May 13th at 3-4pm ET. Call-in
> info is: 1-270-400-2000, 282929#.
> This call is open for new individuals and groups to attend to discuss
> on-going development of a specification for managing and tracking the
> availability of Bag defined data for the purposes of replication with
> a view toward preservation. See the Github site for the current spec
> definition:https://github.com/acdha/restful-bag-server.
> There will be a request to add agenda items one week before the call.
> Final agenda will posted a few days prior. Due to the limited time on
> the call to address open issues and progress toward development,
> please make every effort to review the Github site prior to the call,
> add an issue, or respond to request for an added agenda item.
> Notes from our previous meeting are below. Look forward to catching up
> in a couple of weeks.
> Attendees
> 1. Chris Adams (LoC)
> 2. Mike Burek (Chronopolis)
> 3. Mike Giarlo (Penn State)
> 4. John Kunze (CDL)
> 5. Matt Schultz (Educopia)
> 6. Mike Smorul (Chronopolis)
> 7. Don Sutton (Chronopolis)
> 8. Peter Van Garderen (Artefactual)
> 9. Charles Blair (University of Chicago)
> o Peter van Garderen clarified his written use casehttps://github.com/acdha/restful-bag-server/blob/master/Use%20Cases/A...
> o Simple submission/dissemination internal repository exchange (SIP/
> AIP/DIP transformations) of Bag-based data
> o Peter and Chris Adams commented that the current spec is very well
> targeted towards this type of use case
> 2. Discussed Open Issues
> A. Small file transfers – Chris suggested tabling this issue because
> the range of potential solutions could probably be handled through
> some added nomenclature on good http citizenship.
> o Server should support things like keep-alive, pipelining, etc.
> o May want to consider embedding some links in this section to
> educate on the principles of RESTful Architecture
> B. Version handling – Chris confirmed that the best approach to handle
> the awareness of most current version of an uploaded bag would be to
> dedicate a symbolic link location (/version/latest).
> o Chris asked if there were any strong objections to creating a new
> branch to bake in the versioning proposal that was approved on the
> 02/18 call – no objections. See here:https://github.com/acdha/restful-bag-server/tree/versioned-example > o Folks are encouraged to look this over before merging
> C. Validation history – very briefly discussed the implementation of a
> resource for exposing details of last full bag validity check (pass/
> fail). Mike Smorul had a question about making this available in the
> metadata
> o No clear determination on this was recorded on the call – may need
> to revisit briefly on 05/13
> D. Handling manifest changes – Chris and Mike Smorul suggested making
> the spec flexible enough to allow people to query for supported
> manifests and to re-upload files as needed.
> o Agreed also to accept any additional manifests but require at least
> one supported format and report a 409 conflict if the listed files
> don't match the standard md5 or sha-256.
> 3. Update on Testing Suite & Reference Server
> o Chris got started on a Python-based testing suite that conforms to
> the spec:https://github.com/acdha/restful-bag-server/blob/test-suite/tests.py > o Chris indicated that he wanted to work on the reporting
> o Was planning on starting with a simple read-only server that would
> be a good proof of concept for folks developing custom clients for
> their environments that would want to interface
> o The reference server would eventually be driven by Python and
> should be flexible and extensible
> 4. Discussed Practical Next Steps
> o Chris invited folks to feel free to issue a pull request on the
> Github site if they are interested in lending him a hand:https://github.com/acdha/restful-bag-server > o Matt inquired about promoting this work and the most reasonable
> time frames/venues - Consensus was that June-August might be most
> appropriate once tests of the spec have revealed themselves - Mike
> Giarlo mentioned Open Repositories and Curate Camp as potential venues
> o Questions about licensing of the spec arose – Chris from Library
> of Congress would have to inquire (perhaps BSD or GPL)
> o Agreed to check-in prior to the next call on May 13th on progress
> toward test suite implementation & reporting and implementation of the
> reference server
> On Mar 23, 2:07 pm, "matt.schu...@metaarchive.org"
> <matt.schu...@metaarchive.org> wrote:
> > Hi Everybody,
> > Below is a loose agenda for the next all-groups meeting on the
> > development of a RESTful Bag Server (https://github.com/acdha/restful- > > bag-server) scheduled for Friday, March 25th at 3pm ET/2pm CT/12pm PT.
> > Call-in Information is: 1-270-400-2000, 282929#
> > Tentative Agenda (feel free to suggest changes):
> > 1. Brief review of updated use cases - Archivematica
> > 2. Continue discussions on open issues with spec
> > * Small file transfers - use of keep-alive? tgz/zip?
> > * Version handling - use of courtesy URL?
> > * Validation history - URI or metadata?
> > * Handling manifest changes - final match on PUT? acceptable
> > manifests?
> > * Other?
> > 3. Update on testing suite and reference server
> > 4. Identifying practical next steps
> > 5. Other?
> > Looking forward to talking with those who can join.
> > On Feb 23, 11:33 am, "Michael J. Giarlo" <mich...@psu.edu> wrote:
> > > On 02/22/2011 10:03 PM, Chris Adams wrote:
> > > > The main goal here is to provide a simple path to content replication
> > > > without tackling larger problems such as extended metadata, specific
> > > > storage or versioning strategies, etc. and staying as close as
> > > > possible to common web practices. We're using the project's issue
> > > > tracker to collect problems and remaining work; use-cases or
> > > > substantial proposed changes are solicited in the form of pull
> > > > requests - for example we're currently tracking one proposal for
> > > > handling content versioning in a branch[1] so we can cleanly maintain
> > > > the specification and example.
> > > Glad you wrote this up, Chris. I would just add one thing: the "we"
> > > Chris references is an ad hoc, open group of folks from all over. So if
> > > you deal in bags and care about replication, you're welcome to join.
> > > Matt Schultz from the Educopia Institute has been running our monthly
> > > calls and keeping momentum going -- if you're interested in
> > > participating in this effort, I'd wager Matt
> > > (matt.schu...@metaarchive.org) would love to hear from you & add you to
> > > the list of folks who receive email on this effort.
> > > Our next call will be on Friday, 3/25 at 3pm ET (1-270-400-2000, 282929#).
Unfortunately it looks like I'm going to be talking to my mortgage rep (buying a place) a bit after 3pm. I'll dial in if she's delayed but it looks like that's the best time.
Key notes from me:
* limited progress on test suite * I intend to go with the proposed versioning scheme so that branch of the spec will be merged in soon * Mike proposed relaxing the upload ordering constraints to only require the manifest and files be complete by the commit. Feedback from potential implementors welcome on this point or related.
Chris
Sent from my iPhone
On May 12, 2011, at 4:18 PM, "matt.schu...@metaarchive.org"
<matt.schu...@metaarchive.org> wrote: > Hi Everybody,
> Below is a starter agenda for the next all-groups meeting on the > development of a RESTful Bag Server (https://github.com/acdha/restful- > bag-server) scheduled for Friday, May 13th at 3pm ET/2pm CT/12pm PT. > Call-in Information is: 1-270-400-2000, 282929#.
> This will be a great call for groups or individuals who have not yet > participated but are interested in this set of work to drop in and say > hi.
> Starter Agenda (feel free to suggest additional items on the call):
> 1. Brief welcome and catch-up for new callers > 2. Overview of a Java Bag Server - Mike Smorul > 3. Scheduling future calls
> Looking forward to talking with those who can join.
> All best,
> Matt Schultz > Collaborative Services Librarian > Educopia Institute, MetaArchive Cooperative > http://www.metaarchive.org > matt.schu...@metaarchive.org > 616-566-3204
> On Apr 27, 7:51 am, "matt.schu...@metaarchive.org" > <matt.schu...@metaarchive.org> wrote: >> Hi Everybody,
>> The next all-groups meeting scheduled for Friday, April 29th at 3-4pm >> ET has been re-scheduled for Friday, May 13th at 3-4pm ET. Call-in >> info is: 1-270-400-2000, 282929#.
>> This call is open for new individuals and groups to attend to discuss >> on-going development of a specification for managing and tracking the >> availability of Bag defined data for the purposes of replication with >> a view toward preservation. See the Github site for the current spec >> definition:https://github.com/acdha/restful-bag-server.
>> There will be a request to add agenda items one week before the call. >> Final agenda will posted a few days prior. Due to the limited time on >> the call to address open issues and progress toward development, >> please make every effort to review the Github site prior to the call, >> add an issue, or respond to request for an added agenda item.
>> Notes from our previous meeting are below. Look forward to catching up >> in a couple of weeks.
>> Attendees >> 1. Chris Adams (LoC) >> 2. Mike Burek (Chronopolis) >> 3. Mike Giarlo (Penn State) >> 4. John Kunze (CDL) >> 5. Matt Schultz (Educopia) >> 6. Mike Smorul (Chronopolis) >> 7. Don Sutton (Chronopolis) >> 8. Peter Van Garderen (Artefactual) >> 9. Charles Blair (University of Chicago)
>> o Peter van Garderen clarified his written use casehttps://github.com/acdha/restful-bag-server/blob/master/Use%20Cases/A... >> o Simple submission/dissemination internal repository exchange (SIP/ >> AIP/DIP transformations) of Bag-based data >> o Peter and Chris Adams commented that the current spec is very well >> targeted towards this type of use case
>> 2. Discussed Open Issues
>> A. Small file transfers – Chris suggested tabling this issue because >> the range of potential solutions could probably be handled through >> some added nomenclature on good http citizenship.
>> o Server should support things like keep-alive, pipelining, etc. >> o May want to consider embedding some links in this section to >> educate on the principles of RESTful Architecture
>> B. Version handling – Chris confirmed that the best approach to handle >> the awareness of most current version of an uploaded bag would be to >> dedicate a symbolic link location (/version/latest).
>> o Chris asked if there were any strong objections to creating a new >> branch to bake in the versioning proposal that was approved on the >> 02/18 call – no objections. See here:https://github.com/acdha/restful-bag-server/tree/versioned-example >> o Folks are encouraged to look this over before merging
>> C. Validation history – very briefly discussed the implementation of a >> resource for exposing details of last full bag validity check (pass/ >> fail). Mike Smorul had a question about making this available in the >> metadata
>> o No clear determination on this was recorded on the call – may need >> to revisit briefly on 05/13
>> D. Handling manifest changes – Chris and Mike Smorul suggested making >> the spec flexible enough to allow people to query for supported >> manifests and to re-upload files as needed.
>> o Agreed also to accept any additional manifests but require at least >> one supported format and report a 409 conflict if the listed files >> don't match the standard md5 or sha-256.
>> 3. Update on Testing Suite & Reference Server
>> o Chris got started on a Python-based testing suite that conforms to >> the spec:https://github.com/acdha/restful-bag-server/blob/test-suite/tests.py >> o Chris indicated that he wanted to work on the reporting >> o Was planning on starting with a simple read-only server that would >> be a good proof of concept for folks developing custom clients for >> their environments that would want to interface >> o The reference server would eventually be driven by Python and >> should be flexible and extensible
>> 4. Discussed Practical Next Steps
>> o Chris invited folks to feel free to issue a pull request on the >> Github site if they are interested in lending him a hand:https://github.com/acdha/restful-bag-server >> o Matt inquired about promoting this work and the most reasonable >> time frames/venues - Consensus was that June-August might be most >> appropriate once tests of the spec have revealed themselves - Mike >> Giarlo mentioned Open Repositories and Curate Camp as potential venues >> o Questions about licensing of the spec arose – Chris from Library >> of Congress would have to inquire (perhaps BSD or GPL) >> o Agreed to check-in prior to the next call on May 13th on progress >> toward test suite implementation & reporting and implementation of the >> reference server
>> On Mar 23, 2:07 pm, "matt.schu...@metaarchive.org"
>> <matt.schu...@metaarchive.org> wrote: >>> Hi Everybody,
>>> Below is a loose agenda for the next all-groups meeting on the >>> development of a RESTful Bag Server (https://github.com/acdha/restful- >>> bag-server) scheduled for Friday, March 25th at 3pm ET/2pm CT/12pm PT. >>> Call-in Information is: 1-270-400-2000, 282929#
>>> Tentative Agenda (feel free to suggest changes):
>>> 1. Brief review of updated use cases - Archivematica >>> 2. Continue discussions on open issues with spec >>> * Small file transfers - use of keep-alive? tgz/zip? >>> * Version handling - use of courtesy URL? >>> * Validation history - URI or metadata? >>> * Handling manifest changes - final match on PUT? acceptable >>> manifests? >>> * Other? >>> 3. Update on testing suite and reference server >>> 4. Identifying practical next steps >>> 5. Other?
>>> Looking forward to talking with those who can join.
>>> All best,
>>> -- >>> Matt Schultz >>> Collaborative Services Librarian >>> Educopia Institute, MetaArchive Cooperativehttp://www.metaarchive.org >>> matt.schu...@metaarchive.org >>> 616-566-3204
>>> On Feb 23, 11:33 am, "Michael J. Giarlo" <mich...@psu.edu> wrote:
>>>> On 02/22/2011 10:03 PM, Chris Adams wrote:
>>>>> The main goal here is to provide a simple path to content replication >>>>> without tackling larger problems such as extended metadata, specific >>>>> storage or versioning strategies, etc. and staying as close as >>>>> possible to common web practices. We're using the project's issue >>>>> tracker to collect problems and remaining work; use-cases or >>>>> substantial proposed changes are solicited in the form of pull >>>>> requests - for example we're currently tracking one proposal for >>>>> handling content versioning in a branch[1] so we can cleanly maintain >>>>> the specification and example.
>>>> Glad you wrote this up, Chris. I would just add one thing: the "we" >>>> Chris references is an ad hoc, open group of folks from all over. So if >>>> you deal in bags and care about replication, you're welcome to join.
>>>> Matt Schultz from the Educopia Institute has been running our monthly >>>> calls and keeping momentum going -- if you're interested in >>>> participating in this effort, I'd wager Matt >>>> (matt.schu...@metaarchive.org) would love to hear from you & add you to >>>> the list of folks who receive email on this effort.
>>>> Our next call will be on Friday, 3/25 at 3pm ET (1-270-400-2000, 282929#).
>>>> -Mike
> -- > You received this message because you are subscribed to the Google Groups "Digital Curation" group. > To post to this group, send email to digital-curation@googlegroups.com. > To unsubscribe from this group, send email to digital-curation+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/digital-curation?hl=en.
A big thanks to Mike Smorul for tapping me on the shoulder to inquire
about this month's scheduled call to discuss the RESTful Bag Server
(https://github.com/acdha/restful-bag-server) development.
This month's call is slated for next week Friday, June 24th at 3pm ET.
Call-in info is 1-270-400-2000, 282929#.
Consider this an open call for agenda items. There are a couple of
relevant threads that have started (see below) since the previous call
that was held on Friday, May 13th. We can definitely follow up on
those. Notes from the May 13th call will be available shortly -
apologies for the delay.
<matt.schu...@metaarchive.org> wrote:
> Hi Everybody,
> Below is a starter agenda for the next all-groups meeting on the
> development of aRESTfulBagServer (https://github.com/acdha/restful-bag-server) scheduled for Friday, May 13th at 3pm ET/2pm CT/12pm PT.
> Call-in Information is: 1-270-400-2000, 282929#.
> This will be a great call for groups or individuals who have not yet
> participated but are interested in this set of work to drop in and say
> hi.
> Starter Agenda (feel free to suggest additional items on the call):
> 1. Brief welcome and catch-up for new callers
> 2. Overview of a JavaBagServer - Mike Smorul
> 3. Scheduling future calls
> Looking forward to talking with those who can join.
> All best,
> Matt Schultz
> Collaborative Services Librarian
> Educopia Institute, MetaArchive Cooperativehttp://www.metaarchive.org > matt.schu...@metaarchive.org
> 616-566-3204
> On Apr 27, 7:51 am, "matt.schu...@metaarchive.org"
> <matt.schu...@metaarchive.org> wrote:
> > Hi Everybody,
> > The next all-groups meeting scheduled for Friday, April 29th at 3-4pm
> > ET has been re-scheduled for Friday, May 13th at 3-4pm ET. Call-in
> > info is: 1-270-400-2000, 282929#.
> > This call is open for new individuals and groups to attend to discuss
> > on-going development of a specification for managing and tracking the
> > availability ofBagdefined data for the purposes of replication with
> > a view toward preservation. See the Github site for the current spec
> > definition:https://github.com/acdha/restful-bag-server.
> > There will be a request to add agenda items one week before the call.
> > Final agenda will posted a few days prior. Due to the limited time on
> > the call to address open issues and progress toward development,
> > please make every effort to review the Github site prior to the call,
> > add an issue, or respond to request for an added agenda item.
> > Notes from our previous meeting are below. Look forward to catching up
> > in a couple of weeks.
> > o Peter van Garderen clarified his written use casehttps://github.com/acdha/restful-bag-server/blob/master/Use%20Cases/A...
> > o Simple submission/dissemination internal repository exchange (SIP/
> > AIP/DIP transformations) ofBag-based data
> > o Peter and Chris Adams commented that the current spec is very well
> > targeted towards this type of use case
> > 2. Discussed Open Issues
> > A. Small file transfers – Chris suggested tabling this issue because
> > the range of potential solutions could probably be handled through
> > some added nomenclature on good http citizenship.
> > o Server should support things like keep-alive, pipelining, etc.
> > o May want to consider embedding some links in this section to
> > educate on the principles ofRESTfulArchitecture
> > B. Version handling – Chris confirmed that the best approach to handle
> > the awareness of most current version of an uploadedbagwould be to
> > dedicate a symbolic link location (/version/latest).
> > o Chris asked if there were any strong objections to creating a new
> > branch to bake in the versioning proposal that was approved on the
> > 02/18 call – no objections. See here:https://github.com/acdha/restful-bag-server/tree/versioned-example > > o Folks are encouraged to look this over before merging
> > C. Validation history – very briefly discussed the implementation of a
> > resource for exposing details of last fullbagvalidity check (pass/
> > fail). Mike Smorul had a question about making this available in the
> > metadata
> > o No clear determination on this was recorded on the call – may need
> > to revisit briefly on 05/13
> > D. Handling manifest changes – Chris and Mike Smorul suggested making
> > the spec flexible enough to allow people to query for supported
> > manifests and to re-upload files as needed.
> > o Agreed also to accept any additional manifests but require at least
> > one supported format and report a 409 conflict if the listed files
> > don't match the standard md5 or sha-256.
> > 3. Update on Testing Suite & Reference Server
> > o Chris got started on a Python-based testing suite that conforms to
> > the spec:https://github.com/acdha/restful-bag-server/blob/test-suite/tests.py > > o Chris indicated that he wanted to work on the reporting
> > o Was planning on starting with a simple read-only server that would
> > be a good proof of concept for folks developing custom clients for
> > their environments that would want to interface
> > o The reference server would eventually be driven by Python and
> > should be flexible and extensible
> > 4. Discussed Practical Next Steps
> > o Chris invited folks to feel free to issue a pull request on the
> > Github site if they are interested in lending him a hand:https://github.com/acdha/restful-bag-server > > o Matt inquired about promoting this work and the most reasonable
> > time frames/venues - Consensus was that June-August might be most
> > appropriate once tests of the spec have revealed themselves - Mike
> > Giarlo mentioned Open Repositories and Curate Camp as potential venues
> > o Questions about licensing of the spec arose – Chris from Library
> > of Congress would have to inquire (perhaps BSD or GPL)
> > o Agreed to check-in prior to the next call on May 13th on progress
> > toward test suite implementation & reporting and implementation of the
> > reference server
> > On Mar 23, 2:07 pm, "matt.schu...@metaarchive.org"
> > <matt.schu...@metaarchive.org> wrote:
> > > Hi Everybody,
> > > Below is a loose agenda for the next all-groups meeting on the
> > > development of aRESTfulBagServer (https://github.com/acdha/restful- > > >bag-server) scheduled for Friday, March 25th at 3pm ET/2pm CT/12pm PT.
> > > Call-in Information is: 1-270-400-2000, 282929#
> > > Tentative Agenda (feel free to suggest changes):
> > > 1. Brief review of updated use cases - Archivematica
> > > 2. Continue discussions on open issues with spec
> > > * Small file transfers - use of keep-alive? tgz/zip?
> > > * Version handling - use of courtesy URL?
> > > * Validation history - URI or metadata?
> > > * Handling manifest changes - final match on PUT? acceptable
> > > manifests?
> > > * Other?
> > > 3. Update on testing suite and reference server
> > > 4. Identifying practical next steps
> > > 5. Other?
> > > Looking forward to talking with those who can join.
> > > On Feb 23, 11:33 am, "Michael J. Giarlo" <mich...@psu.edu> wrote:
> > > > On 02/22/2011 10:03 PM, Chris Adams wrote:
> > > > > The main goal here is to provide a simple path to content replication
> > > > > without tackling larger problems such as extended metadata, specific
> > > > > storage or versioning strategies, etc. and staying as close as
> > > > > possible to common web practices. We're using the project's issue
> > > > > tracker to collect problems and remaining work; use-cases or
> > > > > substantial proposed changes are solicited in the form of pull
> > > > > requests - for example we're currently tracking one proposal for
> > > > > handling content versioning in a branch[1] so we can cleanly maintain
> > > > > the specification and example.
> > > > Glad you wrote this up, Chris. I would just add one thing: the "we"
> > > > Chris references is an ad hoc, open group of folks from all over. So if
> > > > you deal in bags and care about replication, you're welcome to join.
> > > > Matt Schultz from the Educopia Institute has been running our monthly
> > > > calls and keeping momentum going -- if you're interested in
> Below is a starter agenda for the next all-groups meeting on the > development of a RESTful Bag Server (https://github.com/acdha/restful- > bag-server) scheduled for Friday, May 13th at 3pm ET/2pm CT/12pm PT. > Call-in Information is: 1-270-400-2000, 282929#.
> This will be a great call for groups or individuals who have not yet > participated but are interested in this set of work to drop in and say > hi.
> Starter Agenda (feel free to suggest additional items on the call):
> 1. Brief welcome and catch-up for new callers > 2. Overview of a Java Bag Server - Mike Smorul > 3. Scheduling future calls
> Looking forward to talking with those who can join.
> All best,
> Matt Schultz > Collaborative Services Librarian > Educopia Institute, MetaArchive Cooperative > http://www.metaarchive.org > matt.schu...@metaarchive.org > 616-566-3204
> On Apr 27, 7:51 am, "matt.schu...@metaarchive.org" > <matt.schu...@metaarchive.org> wrote: > > Hi Everybody,
> > The next all-groups meeting scheduled for Friday, April 29th at 3-4pm > > ET has been re-scheduled for Friday, May 13th at 3-4pm ET. Call-in > > info is: 1-270-400-2000, 282929#.
> > This call is open for new individuals and groups to attend to discuss > > on-going development of a specification for managing and tracking the > > availability of Bag defined data for the purposes of replication with > > a view toward preservation. See the Github site for the current spec > > definition:https://github.com/acdha/restful-bag-server.
> > There will be a request to add agenda items one week before the call. > > Final agenda will posted a few days prior. Due to the limited time on > > the call to address open issues and progress toward development, > > please make every effort to review the Github site prior to the call, > > add an issue, or respond to request for an added agenda item.
> > Notes from our previous meeting are below. Look forward to catching up > > in a couple of weeks.
> > o Peter van Garderen clarified his written use casehttps:// > github.com/acdha/restful-bag-server/blob/master/Use%20Cases/A... > > o Simple submission/dissemination internal repository exchange (SIP/ > > AIP/DIP transformations) of Bag-based data > > o Peter and Chris Adams commented that the current spec is very well > > targeted towards this type of use case
> > 2. Discussed Open Issues
> > A. Small file transfers – Chris suggested tabling this issue because > > the range of potential solutions could probably be handled through > > some added nomenclature on good http citizenship.
> > o Server should support things like keep-alive, pipelining, etc. > > o May want to consider embedding some links in this section to > > educate on the principles of RESTful Architecture
> > B. Version handling – Chris confirmed that the best approach to handle > > the awareness of most current version of an uploaded bag would be to > > dedicate a symbolic link location (/version/latest).
> > o Chris asked if there were any strong objections to creating a new > > branch to bake in the versioning proposal that was approved on the > > 02/18 call – no objections. See here: > https://github.com/acdha/restful-bag-server/tree/versioned-example > > o Folks are encouraged to look this over before merging
> > C. Validation history – very briefly discussed the implementation of a > > resource for exposing details of last full bag validity check (pass/ > > fail). Mike Smorul had a question about making this available in the > > metadata
> > o No clear determination on this was recorded on the call – may need > > to revisit briefly on 05/13
> > D. Handling manifest changes – Chris and Mike Smorul suggested making > > the spec flexible enough to allow people to query for supported > > manifests and to re-upload files as needed.
> > o Agreed also to accept any additional manifests but require at least > > one supported format and report a 409 conflict if the listed files > > don't match the standard md5 or sha-256.
> > 3. Update on Testing Suite & Reference Server
> > o Chris got started on a Python-based testing suite that conforms to > > the spec: > https://github.com/acdha/restful-bag-server/blob/test-suite/tests.py > > o Chris indicated that he wanted to work on the reporting > > o Was planning on starting with a simple read-only server that would > > be a good proof of concept for folks developing custom clients for > > their environments that would want to interface > > o The reference server would eventually be driven by Python and > > should be flexible and extensible
> > 4. Discussed Practical Next Steps
> > o Chris invited folks to feel free to issue a pull request on the > > Github site if they are interested in lending him a hand: > https://github.com/acdha/restful-bag-server > > o Matt inquired about promoting this work and the most reasonable > > time frames/venues - Consensus was that June-August might be most > > appropriate once tests of the spec have revealed themselves - Mike > > Giarlo mentioned Open Repositories and Curate Camp as potential venues > > o Questions about licensing of the spec arose – Chris from Library > > of Congress would have to inquire (perhaps BSD or GPL) > > o Agreed to check-in prior to the next call on May 13th on progress > > toward test suite implementation & reporting and implementation of the > > reference server
> > On Mar 23, 2:07 pm, "matt.schu...@metaarchive.org"
> > <matt.schu...@metaarchive.org> wrote: > > > Hi Everybody,
> > > Below is a loose agenda for the next all-groups meeting on the > > > development of a RESTful Bag Server (https://github.com/acdha/restful- > > > bag-server) scheduled for Friday, March 25th at 3pm ET/2pm CT/12pm PT. > > > Call-in Information is: 1-270-400-2000, 282929#
> > > Tentative Agenda (feel free to suggest changes):
> > > 1. Brief review of updated use cases - Archivematica > > > 2. Continue discussions on open issues with spec > > > * Small file transfers - use of keep-alive? tgz/zip? > > > * Version handling - use of courtesy URL? > > > * Validation history - URI or metadata? > > > * Handling manifest changes - final match on PUT? acceptable > > > manifests? > > > * Other? > > > 3. Update on testing suite and reference server > > > 4. Identifying practical next steps > > > 5. Other?
> > > Looking forward to talking with those who can join.
> > > On Feb 23, 11:33 am, "Michael J. Giarlo" <mich...@psu.edu> wrote:
> > > > On 02/22/2011 10:03 PM, Chris Adams wrote:
> > > > > The main goal here is to provide a simple path to content > replication > > > > > without tackling larger problems such as extended metadata, > specific > > > > > storage or versioning strategies, etc. and staying as close as > > > > > possible to common web practices. We're using the project's issue > > > > > tracker to collect problems and remaining work; use-cases or > > > > > substantial proposed changes are solicited in the form of pull > > > > > requests - for example we're currently tracking one proposal for > > > > > handling content versioning in a branch[1] so we can cleanly > maintain > > > > > the specification and example.
> > > > Glad you wrote this up, Chris. I would just add one thing: the "we" > > > > Chris references is an ad hoc, open group of folks from all over. So > if > > > > you deal in bags and care about replication, you're welcome to join.
> > > > Matt Schultz from the Educopia Institute has been running our monthly > > > > calls and keeping momentum going -- if you're interested in > > > > participating in this effort, I'd wager Matt > > > > (matt.schu...@metaarchive.org) would love to hear from you & add you > to > > > > the list of folks who receive email on this effort.
> > > > Our next call will be on Friday, 3/25 at 3pm ET (1-270-400-2000, > 282929#).
> > > > -Mike
> -- > You received this message because you are subscribed to the Google Groups > "Digital Curation" group. > To post to this group, send email to digital-curation@googlegroups.com. > To unsubscribe from this group, send email to > digital-curation+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/digital-curation?hl=en.
-- "Se coisas pequenas te atingirem é porque você está precisando ser maior do que tudo isso!!"
I would be very interested in participating in a discussion of this topic, either via the googlegroup or call in.
I am beginning of a project to survey potential technical approaches to this topic, which will be published as a Tech Watch Report by the Digital Preservation Coaltion (UK). I am currently reviewing literature and will be posting notes on my blog as I go along. Information about my project is here:
If anyone is currently undertaking a email preservation project, or contemplating one, I would appreciate the chance to touch base, as soon as possible, either via this list or off list, so that information about your project can help shape my report, and vice versa.
Thanks,
Chris ____
Chris Prom Assistant University Archivist University of Illinois at Urbana-Champaign chris.p...@gmail.com
On Jun 16, 2011, at 6:01 AM, Brenda C. B. Rocco wrote:
Hi Everybody,
Good morning!
What do you think also about to discuss the preservation of e-mails?tanto repositórios quanto gerenciamento dessas mensagens?
Count me in as one interested in discussing this subject.
Chris, I will be on a records management project looking at email preservation and retention for state government agencies into a state archive. I am not sure how well a records management context applies to your interest, but I would be happy to share info with your project where I can.
On Thu, Jun 16, 2011 at 9:36 AM, Chris Prom <chris.p...@gmail.com> wrote: > I would be very interested in participating in a discussion of this topic, > either via the googlegroup or call in.
> I am beginning of a project to survey potential technical approaches to > this topic, which will be published as a Tech Watch Report by the Digital > Preservation Coaltion (UK). I am currently reviewing literature and will > be posting notes on my blog as I go along. Information about my project is > here:
> If anyone is currently undertaking a email preservation project, or > contemplating one, I would appreciate the chance to touch base, as soon as > possible, either via this list or off list, so that information about your > project can help shape my report, and vice versa.
> Thanks,
> Chris > ____
> Chris Prom > Assistant University Archivist > University of Illinois at Urbana-Champaign > chris.p...@gmail.com
> On Jun 16, 2011, at 6:01 AM, Brenda C. B. Rocco wrote:
> Hi Everybody,
> Good morning!
> What do you think also about to discuss the preservation of e-mails?tanto > repositórios quanto gerenciamento dessas mensagens?
> Cordially
> Brenda Rocco > * > *
> -- > You received this message because you are subscribed to the Google Groups > "Digital Curation" group. > To post to this group, send email to digital-curation@googlegroups.com. > To unsubscribe from this group, send email to > digital-curation+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/digital-curation?hl=en.
On Thu, Jun 16, 2011 at 11:02 AM, Chris Prom <chris.p...@gmail.com> wrote: > Thanks Walker! Do you mind sharing either with me or the list which state > archives you work for?
> On Jun 16, 2011, at 10:54 AM, Walker Sampson wrote:
> Count me in as one interested in discussing this subject.
> Chris, I will be on a records management project looking at email > preservation and retention for state government agencies into a state > archive. I am not sure how well a records management context applies to your > interest, but I would be happy to share info with your project where I can.
> Best, > Walker
> On Thu, Jun 16, 2011 at 9:36 AM, Chris Prom <chris.p...@gmail.com> wrote:
>> I would be very interested in participating in a discussion of this topic, >> either via the googlegroup or call in.
>> I am beginning of a project to survey potential technical approaches to >> this topic, which will be published as a Tech Watch Report by the Digital >> Preservation Coaltion (UK). I am currently reviewing literature and will >> be posting notes on my blog as I go along. Information about my project is >> here:
>> If anyone is currently undertaking a email preservation project, or >> contemplating one, I would appreciate the chance to touch base, as soon as >> possible, either via this list or off list, so that information about your >> project can help shape my report, and vice versa.
>> Thanks,
>> Chris >> ____
>> Chris Prom >> Assistant University Archivist >> University of Illinois at Urbana-Champaign >> chris.p...@gmail.com
>> On Jun 16, 2011, at 6:01 AM, Brenda C. B. Rocco wrote:
>> Hi Everybody,
>> Good morning!
>> What do you think also about to discuss the preservation of e-mails?tanto >> repositórios quanto gerenciamento dessas mensagens?
>> Cordially
>> Brenda Rocco >> * >> *
>> -- >> You received this message because you are subscribed to the Google Groups >> "Digital Curation" group. >> To post to this group, send email to digital-curation@googlegroups.com. >> To unsubscribe from this group, send email to >> digital-curation+unsubscribe@googlegroups.com. >> For more options, visit this group at >> http://groups.google.com/group/digital-curation?hl=en.
> -- > You received this message because you are subscribed to the Google Groups > "Digital Curation" group. > To post to this group, send email to digital-curation@googlegroups.com. > To unsubscribe from this group, send email to > digital-curation+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/digital-curation?hl=en.
I'm from Brazil's National Archive and , currently, I am working in the management of e-mails. I'm in a search of literature and best practices.
I think it's a good debate: both how to manage e-mails about how to preserve them, after a lot of information is recorded in this medium. And many of them are official information that produce official documents. Our Project about management of e-mails is in the beginning.
>> On Jun 16, 2011, at 10:54 AM, Walker Sampson wrote:
>> Count me in as one interested in discussing this subject.
>> Chris, I will be on a records management project looking at email >> preservation and retention for state government agencies into a state >> archive. I am not sure how well a records management context applies to your >> interest, but I would be happy to share info with your project where I can.
>> Best, >> Walker
>> On Thu, Jun 16, 2011 at 9:36 AM, Chris Prom <chris.p...@gmail.com> wrote:
>>> I would be very interested in participating in a discussion of this >>> topic, either via the googlegroup or call in.
>>> I am beginning of a project to survey potential technical approaches to >>> this topic, which will be published as a Tech Watch Report by the Digital >>> Preservation Coaltion (UK). I am currently reviewing literature and will >>> be posting notes on my blog as I go along. Information about my project is >>> here:
>>> If anyone is currently undertaking a email preservation project, or >>> contemplating one, I would appreciate the chance to touch base, as soon as >>> possible, either via this list or off list, so that information about your >>> project can help shape my report, and vice versa.
>>> Thanks,
>>> Chris >>> ____
>>> Chris Prom >>> Assistant University Archivist >>> University of Illinois at Urbana-Champaign >>> chris.p...@gmail.com
>>> On Jun 16, 2011, at 6:01 AM, Brenda C. B. Rocco wrote:
>>> Hi Everybody,
>>> Good morning!
>>> What do you think also about to discuss the preservation of e-mails?tanto >>> repositórios quanto gerenciamento dessas mensagens?
>>> Cordially
>>> Brenda Rocco >>> * >>> *
>>> -- >>> You received this message because you are subscribed to the Google Groups >>> "Digital Curation" group. >>> To post to this group, send email to digital-curation@googlegroups.com. >>> To unsubscribe from this group, send email to >>> digital-curation+unsubscribe@googlegroups.com. >>> For more options, visit this group at >>> http://groups.google.com/group/digital-curation?hl=en.
>> -- >> Electronic Records Manager >> Mississippi Department of Archives and History >> http://wsampson.wordpress.com
>> -- >> You received this message because you are subscribed to the Google Groups >> "Digital Curation" group. >> To post to this group, send email to digital-curation@googlegroups.com. >> To unsubscribe from this group, send email to >> digital-curation+unsubscribe@googlegroups.com. >> For more options, visit this group at >> http://groups.google.com/group/digital-curation?hl=en.
> -- > You received this message because you are subscribed to the Google Groups > "Digital Curation" group. > To post to this group, send email to digital-curation@googlegroups.com. > To unsubscribe from this group, send email to > digital-curation+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/digital-curation?hl=en.
-- "Se coisas pequenas te atingirem é porque você está precisando ser maior do que tudo isso!!"
> I would be very interested in participating in a discussion of this topic, either via the googlegroup or call in.
> I am beginning of a project to survey potential technical approaches to this topic, which will be published as a Tech Watch Report by the Digital Preservation Coaltion (UK). I am currently reviewing literature and will be posting notes on my blog as I go along. Information about my project is here:
> If anyone is currently undertaking a email preservation project, or contemplating one, I would appreciate the chance to touch base, as soon as possible, either via this list or off list, so that information about your project can help shape my report, and vice versa.
Thanks. FYI, there is a report out from iPRES last year, here is my zotero entry for it:
Reshaping the Repository: The Challenge of Email Archiving
Type Conference Paper Author Andrea Goethals Author Wendy Gogel Abstract Because of the historical value of email in the late 20th and 21st centuries, Harvard University Libraries began planning for an email archiving project in early 2007. A working group comprised of University archivists, curators, records managers, librarians and technologists studied the problem and recommended the undertaking of a pilot email archiving project at the University Library. This two-year pilot would implement a system for ingest, processing, preservation, and eventual end user delivery of email, in anticipation of it becoming an ongoing central service at the University after the pilot. This paper describes some of the unexpected challenges encountered during the pilot project and how they were addressed by design decisions. Key challenges included the requirement to design the system so that it could handle other types of born digital content in the future, and the effect of archiving email with sensitive data to Harvard’s preservation repository, the Digital Repository Service (DRS). Date 2010 Proceedings Title 7th International Conference on Preservation of Digital Objects (iPRES2010) Place Vienaa, Austria URL http://www.ifs.tuwien.ac.at/dp/ipres2010/schedule.html Accessed Wed Jun 1 19:00:00 2011 Date Added Thu Jun 2 13:05:45 2011 Modified Thu Jun 2 13:10:34 2011
You may want to contact Wendy Gogel and/or Andrea Goethals for more information.
--sla
On Jun 16, 7:36 am, Chris Prom <chris.p...@gmail.com> wrote:
> I would be very interested in participating in a discussion of this topic, either via the googlegroup or call in.
> I am beginning of a project to survey potential technical approaches to this topic, which will be published as a Tech Watch Report by the Digital Preservation Coaltion (UK). I am currently reviewing literature and will be posting notes on my blog as I go along. Information about my project is here:
> If anyone is currently undertaking a email preservation project, or contemplating one, I would appreciate the chance to touch base, as soon as possible, either via this list or off list, so that information about your project can help shape my report, and vice versa.
> Thanks,
> Chris
-- You received this message because you are subscribed to the Google Groups "Digital Curation" group. To post to this group, send email to digital-curation@googlegroups.com. To unsubscribe from this group, send email to digital-curation+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/digital-curation?hl=en.
The most recent Sedona Conference guidelines specifically related to email policy are from 2007 (see attached bibtex entry for Allman:2007.
Thoughts -
Ingest
There are several ways to handle ingest of email. One useful way is to provide an archival email server supporting a standard protocol (e.g. IMAP); content that the recipient considers archival can be dragged and dropped into that server using regular email clients. This form of ingestion is supported by some content/records management systems (OSS e.g - Alfresco). Regular IMAP servers (OSS e.g. cyrus imapd) store message in unmangled, ingest friendly form. Access controls can be set to allow for messages to be added, but not altered or deleted.Cyrus uses one file per message; bagging seems called for.
Proprietary storage formats may take more work to process, especially if header and body information is separated.
One interesting approach to ingest would to set up an archival relay server in front of the operational servers; such a server would route incoming email into the ingest process. Such a server would be mostly transparent to users, and could be operated under the auspices of Archives and Records management business units allowing for cleaner SoD from IT departments.
Note that this layer must be interposed even for messages between internal users, especially for systems such as exchange that allow for senders to remotely delete messages from recipients mailboxes.
Formats
In general, if it weren't for those darn attachments(giant asterisk), email would be one of the simplest formats for preservation. The reason is that in order to be useful, email messages had to be capable of passing through a variety of intermediate systems, each determined to wreak as much havoc on the content as possible, so mutated formats were strongly selected against. The fact that MIME attachments are still usually encapsulated in using base64 is one example of this evolutionary history.
Ah, but those attachments...
Even though any real content management system will automatically separate out any attached files and process them using whatever mechanisms are available for dealing with files of that type, if those files are proprietary and undocumented, or just not supported, they remain unhelpful buckets of bits (though at least they might have a useful mime type).
Versioning
One problem that comes to light with emailed attachments is if email is used for collaborative development of content; if this content is not stored in a version controlled system, envelope metadata (timestamps, message-ids, and forwarded bodies) may be the only way to sequence and assign responsibility for various changes. In addition, a great deal of storage may be wasted on 20K copies of the PDF of the annual report that could be better wasted on more online remote replicas.
A similar problem occurs when email refers to external content (e.g. via links). Unless the content can be captured at the time of sending (or receiving depending on context), essential parts of the context ( = meaning) of the communication may be lost. Attachmemento?
Authenticity
Because much of the metadata in an email envelope is generated or relied on as part of the message transport process, it is generally reliable-unless-tampered-with. Some of this metadata can still be forged (take a look at the full headers in your spam folder and see if you can spot where false headers were injected).
In general, email messages are not inherently more or less trustworthy than paper ones. Email can be self authenticating if signed using a public key; however, this is relatively uncommon outside DoD, the IC and associated entities.
One unsolved cryptographic problem that has especial significance for long term preservation and archival uses is the current lack of protocols providing Perfect Forward Secrecy (PFS). With PFS, if a key is compromised at some point, it cannot be used to attack messages sent before the compromise occurs. Although this property is relatively easy to provide for online communications, it is much harder to provide for digital signatures.
In addition, many hash algorithms (esp. MD5, and probably SHA-1) are considered broken, so messages signed using those older algorithms need to be migrated. It's almost as if preservation were an active continuing process.
Simon
@article{Allman:2007, Editor = {Allman, Thomas Y.}, Group = {Digital Preservation}, Journal = {The Sedona Conference\textregistered\ Journal}, Pages = {239--250}, Title = {{The Sedona Conference\textregistered\ commentary on email management: Guidelines for the selection of retention policy}}, Url = { http://www.thesedonaconference.org/dltForm?did=Commentary_on_Email_Ma...
} On Thu, Jun 16, 2011 at 2:52 PM, Chris Prom <chris.p...@gmail.com> wrote: > Thanks. FYI, there is a report out from iPRES last year, here is my zotero > entry for it:
> Reshaping the Repository: The Challenge of Email Archiving Type Conference > Paper Author Andrea Goethals Author Wendy Gogel
In the February 0.7-alpha release we convert PST to MBOX. In the 0.7.1-alpha release due out next week we also identify attachments and convert them to their designated preservation and access copy formats. The access format for individual email messages is HTML. We should have a screencast up for the 0.7.1 release next week which demonstrates this functionality. We'll post a link to this group.
We are counting on the Archivematica early implementers to test this functionality and get back to us with critiques and further suggestions. This thread is also very helpful.
> The most recent Sedona Conference guidelines specifically related to email > policy are from 2007 (see attached bibtex entry for Allman:2007.
> Thoughts -
> Ingest
> There are several ways to handle ingest of email. One useful way is to > provide an archival email server supporting a standard protocol (e.g. IMAP); > content that the recipient considers archival can be dragged and dropped into > that server using regular email clients. This form of ingestion is supported > by some content/records management systems (OSS e.g - Alfresco). Regular IMAP > servers (OSS e.g. cyrus imapd) store message in unmangled, ingest friendly > form. Access controls can be set to allow for messages to be added, but not > altered or deleted.Cyrus uses one file per message; bagging seems called for.
> Proprietary storage formats may take more work to process, especially if > header and body information is separated.
> One interesting approach to ingest would to set up an archival relay server > in front of the operational servers; such a server would route incoming email > into the ingest process. Such a server would be mostly transparent to users, > and could be operated under the auspices of Archives and Records management > business units allowing for cleaner SoD from IT departments.
> Note that this layer must be interposed even for messages between internal > users, especially for systems such as exchange that allow for senders to > remotely delete messages from recipients mailboxes.
> Formats
> In general, if it weren't for those darn attachments(giant asterisk), email > would be one of the simplest formats for preservation. The reason is that in > order to be useful, email messages had to be capable of passing through a > variety of intermediate systems, each determined to wreak as much havoc on the > content as possible, so mutated formats were strongly selected against. The > fact that MIME attachments are still usually encapsulated in using base64 is > one example of this evolutionary history.
> Ah, but those attachments...
> Even though any real content management system will automatically separate out > any attached files and process them using whatever mechanisms are available > for dealing with files of that type, if those files are proprietary and > undocumented, or just not supported, they remain unhelpful buckets of bits > (though at least they might have a useful mime type).
> Versioning
> One problem that comes to light with emailed attachments is if email is used > for collaborative development of content; if this content is not stored in a > version controlled system, envelope metadata (timestamps, message-ids, and > forwarded bodies) may be the only way to sequence and assign responsibility > for various changes. In addition, a great deal of storage may be wasted on 20K > copies of the PDF of the annual report that could be better wasted on more > online remote replicas.
> A similar problem occurs when email refers to external content (e.g. via > links). Unless the content can be captured at the time of sending (or > receiving depending on context), essential parts of the context ( = meaning) > of the communication may be lost. Attachmemento?
> Authenticity
> Because much of the metadata in an email envelope is generated or relied on as > part of the message transport process, it is generally > reliable-unless-tampered-with. Some of this metadata can still be forged > (take a look at the full headers in your spam folder and see if you can spot > where false headers were injected).
> In general, email messages are not inherently more or less trustworthy than > paper ones. Email can be self authenticating if signed using a public key; > however, this is relatively uncommon outside DoD, the IC and associated entities.
> One unsolved cryptographic problem that has especial significance for long > term preservation and archival uses is the current lack of protocols providing > Perfect Forward Secrecy (PFS). With PFS, if a key is compromised at some > point, it cannot be used to attack messages sent before the compromise occurs. > Although this property is relatively easy to provide for online > communications, it is much harder to provide for digital signatures.
> In addition, many hash algorithms (esp. MD5, and probably SHA-1) are > considered broken, so messages signed using those older algorithms need to be > migrated. It's almost as if preservation were an active continuing process.
> Simon
> @article{Allman:2007, > Editor = {Allman, Thomas Y.}, > Group = {Digital Preservation}, > Journal = {The Sedona Conference\textregistered\ Journal}, > Pages = {239--250}, > Title = {{The Sedona Conference\textregistered\ commentary on email > management: Guidelines for the selection of retention policy}}, > Url = > {http://www.thesedonaconference.org/dltForm?did=Commentary_on_Email_Ma...}, > Volume = {8}, > Year = {2007} > }
> On Thu, Jun 16, 2011 at 2:52 PM, Chris Prom <chris.p...@gmail.com > <mailto:chris.p...@gmail.com>>wrote:
> Thanks. FYI, there is a report out from iPRES last year, here is my > zotero entry for it:
> Reshaping the Repository: The Challenge of Email Archiving
> Type Conference Paper > Author Andrea Goethals > Author Wendy Gogel
> -- > You received this message because you are subscribed to the Google Groups > "Digital Curation" group. > To post to this group, send email to digital-curation@googlegroups.com. > To unsubscribe from this group, send email to > digital-curation+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/digital-curation?hl=en.
My concern is on e-mail when they are archival documents, are due integridde maintain their authenticity and for a long time as it should be managed. Beyond the issues stecnológicas, of course!
Brenda Rocco
2011/6/17 Peter Van Garderen <vangarderen.pe...@gmail.com>
> In the February 0.7-alpha release we convert PST to MBOX. In the > 0.7.1-alpha release due out next week we also identify attachments and > convert them to their designated preservation and access copy formats. The > access format for individual email messages is HTML. We should have a > screencast up for the 0.7.1 release next week which demonstrates this > functionality. We'll post a link to this group.
> We are counting on the Archivematica early implementers to test this > functionality and get back to us with critiques and further suggestions. > This thread is also very helpful.
> Cheers,
> --peter
> Peter Van Garderen > Archivematica Project Manager
> On 06/16/2011 03:44 PM, Simon Spero wrote:
> The most recent Sedona Conference guidelines specifically related to > email policy are from 2007 (see attached bibtex entry for Allman:2007.
> Thoughts -
> Ingest
> There are several ways to handle ingest of email. One useful way is to > provide an archival email server supporting a standard protocol (e.g. IMAP); > content that the recipient considers archival can be dragged and dropped > into that server using regular email clients. This form of ingestion is > supported by some content/records management systems (OSS e.g - Alfresco). > Regular IMAP servers (OSS e.g. cyrus imapd) store message in unmangled, > ingest friendly form. Access controls can be set to allow for messages to > be added, but not altered or deleted.Cyrus uses one file per message; > bagging seems called for.
> Proprietary storage formats may take more work to process, especially if > header and body information is separated.
> One interesting approach to ingest would to set up an archival relay > server in front of the operational servers; such a server would route > incoming email into the ingest process. Such a server would be mostly > transparent to users, and could be operated under the auspices of Archives > and Records management business units allowing for cleaner SoD from IT > departments.
> Note that this layer must be interposed even for messages between > internal users, especially for systems such as exchange that allow for > senders to remotely delete messages from recipients mailboxes.
> Formats
> In general, if it weren't for those darn attachments(giant asterisk), > email would be one of the simplest formats for preservation. The reason is > that in order to be useful, email messages had to be capable of passing > through a variety of intermediate systems, each determined to wreak as much > havoc on the content as possible, so mutated formats were strongly selected > against. The fact that MIME attachments are still usually encapsulated in > using base64 is one example of this evolutionary history.
> Ah, but those attachments...
> Even though any real content management system will automatically > separate out any attached files and process them using whatever mechanisms > are available for dealing with files of that type, if those files are > proprietary and undocumented, or just not supported, they remain unhelpful > buckets of bits (though at least they might have a useful mime type).
> Versioning
> One problem that comes to light with emailed attachments is if email is > used for collaborative development of content; if this content is not stored > in a version controlled system, envelope metadata (timestamps, message-ids, > and forwarded bodies) may be the only way to sequence and assign > responsibility for various changes. In addition, a great deal of storage may > be wasted on 20K copies of the PDF of the annual report that could be > better wasted on more online remote replicas.
> A similar problem occurs when email refers to external content (e.g. via > links). Unless the content can be captured at the time of sending (or > receiving depending on context), essential parts of the context ( = meaning) > of the communication may be lost. Attachmemento?
> Authenticity
> Because much of the metadata in an email envelope is generated or relied > on as part of the message transport process, it is generally > reliable-unless-tampered-with. Some of this metadata can still be forged > (take a look at the full headers in your spam folder and see if you can > spot where false headers were injected).
> In general, email messages are not inherently more or less trustworthy > than paper ones. Email can be self authenticating if signed using a public > key; however, this is relatively uncommon outside DoD, the IC and associated > entities.
> One unsolved cryptographic problem that has especial significance for > long term preservation and archival uses is the current lack of protocols > providing Perfect Forward Secrecy (PFS). With PFS, if a key is compromised > at some point, it cannot be used to attack messages sent before the > compromise occurs. Although this property is relatively easy to provide for > online communications, it is much harder to provide for digital signatures.
> In addition, many hash algorithms (esp. MD5, and probably SHA-1) are > considered broken, so messages signed using those older algorithms need to > be migrated. It's almost as if preservation were an active continuing > process.
> Simon
> @article{Allman:2007, > Editor = {Allman, Thomas Y.}, > Group = {Digital Preservation}, > Journal = {The Sedona Conference\textregistered\ Journal}, > Pages = {239--250}, > Title = {{The Sedona Conference\textregistered\ commentary on email > management: Guidelines for the selection of retention policy}}, > Url = { > http://www.thesedonaconference.org/dltForm?did=Commentary_on_Email_Ma... > }, > Volume = {8}, > Year = {2007} > }
> On Thu, Jun 16, 2011 at 2:52 PM, Chris Prom <chris.p...@gmail.com> wrote:
>> Thanks. FYI, there is a report out from iPRES last year, here is my >> zotero entry for it:
>> Reshaping the Repository: The Challenge of Email Archiving Type Conference >> Paper Author Andrea Goethals Author Wendy Gogel
> -- > You received this message because you are subscribed to the Google Groups > "Digital Curation" group. > To post to this group, send email to digital-curation@googlegroups.com. > To unsubscribe from this group, send email to > digital-curation+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/digital-curation?hl=en.
> -- > You received this message because you are subscribed to the Google Groups > "Digital Curation" group. > To post to this group, send email to digital-curation@googlegroups.com. > To unsubscribe from this group, send email to > digital-curation+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/digital-curation?hl=en.
-- "Se coisas pequenas te atingirem é porque você está precisando ser maior do que tudo isso!!"
Have you come to any conclusion as to which parser you will use? The biggest issue I see is that, aside from Aid4Mail, none of the programs are able to deal with a diversity of formats on the input sid, to get it into mbox. There seems to be a reasonable amount of support for pst, but not so much beyond that. Aid4Mail is nice because the latest version includes a scripting language, so you can export in any format you want. But, it is proprietary and Windows only, so obviously that rules it out for Archivematica.
In the February 0.7-alpha release we convert PST to MBOX. In the 0.7.1-alpha release due out next week we also identify attachments and convert them to their designated preservation and access copy formats. The access format for individual email messages is HTML. We should have a screencast up for the 0.7.1 release next week which demonstrates this functionality. We'll post a link to this group.
We are counting on the Archivematica early implementers to test this functionality and get back to us with critiques and further suggestions. This thread is also very helpful.
> The most recent Sedona Conference guidelines specifically related to email policy are from 2007 (see attached bibtex entry for Allman:2007.
> Thoughts -
> Ingest
> There are several ways to handle ingest of email. One useful way is to provide an archival email server supporting a standard protocol (e.g. IMAP); content that the recipient considers archival can be dragged and dropped into that server using regular email clients. This form of ingestion is supported by some content/records management systems (OSS e.g - Alfresco). Regular IMAP servers (OSS e.g. cyrus imapd) store message in unmangled, ingest friendly form. Access controls can be set to allow for messages to be added, but not altered or deleted.Cyrus uses one file per message; bagging seems called for.
> Proprietary storage formats may take more work to process, especially if header and body information is separated.
> One interesting approach to ingest would to set up an archival relay server in front of the operational servers; such a server would route incoming email into the ingest process. Such a server would be mostly transparent to users, and could be operated under the auspices of Archives and Records management business units allowing for cleaner SoD from IT departments.
> Note that this layer must be interposed even for messages between internal users, especially for systems such as exchange that allow for senders to remotely delete messages from recipients mailboxes.
> Formats
> In general, if it weren't for those darn attachments(giant asterisk), email would be one of the simplest formats for preservation. The reason is that in order to be useful, email messages had to be capable of passing through a variety of intermediate systems, each determined to wreak as much havoc on the content as possible, so mutated formats were strongly selected against. The fact that MIME attachments are still usually encapsulated in using base64 is one example of this evolutionary history.
> Ah, but those attachments...
> Even though any real content management system will automatically separate out any attached files and process them using whatever mechanisms are available for dealing with files of that type, if those files are proprietary and undocumented, or just not supported, they remain unhelpful buckets of bits (though at least they might have a useful mime type).
> Versioning
> One problem that comes to light with emailed attachments is if email is used for collaborative development of content; if this content is not stored in a version controlled system, envelope metadata (timestamps, message-ids, and forwarded bodies) may be the only way to sequence and assign responsibility for various changes. In addition, a great deal of storage may be wasted on 20K copies of the PDF of the annual report that could be better wasted on more online remote replicas.
> A similar problem occurs when email refers to external content (e.g. via links). Unless the content can be captured at the time of sending (or receiving depending on context), essential parts of the context ( = meaning) of the communication may be lost. Attachmemento?
> Authenticity
> Because much of the metadata in an email envelope is generated or relied on as part of the message transport process, it is generally reliable-unless-tampered-with. Some of this metadata can still be forged (take a look at the full headers in your spam folder and see if you can spot where false headers were injected).
> In general, email messages are not inherently more or less trustworthy than paper ones. Email can be self authenticating if signed using a public key; however, this is relatively uncommon outside DoD, the IC and associated entities.
> One unsolved cryptographic problem that has especial significance for long term preservation and archival uses is the current lack of protocols providing Perfect Forward Secrecy (PFS). With PFS, if a key is compromised at some point, it cannot be used to attack messages sent before the compromise occurs. Although this property is relatively easy to provide for online communications, it is much harder to provide for digital signatures.
> In addition, many hash algorithms (esp. MD5, and probably SHA-1) are considered broken, so messages signed using those older algorithms need to be migrated. It's almost as if preservation were an active continuing process.
> Simon
> @article{Allman:2007, > Editor = {Allman, Thomas Y.}, > Group = {Digital Preservation}, > Journal = {The Sedona Conference\textregistered\ Journal}, > Pages = {239--250}, > Title = {{The Sedona Conference\textregistered\ commentary on email management: Guidelines for the selection of retention policy}}, > Url = {http://www.thesedonaconference.org/dltForm?did=Commentary_on_Email_Ma...}, > Volume = {8}, > Year = {2007} > }
> On Thu, Jun 16, 2011 at 2:52 PM, Chris Prom <chris.p...@gmail.com> wrote: > Thanks. FYI, there is a report out from iPRES last year, here is my zotero entry for it:
> Reshaping the Repository: The Challenge of Email Archiving
> Type Conference Paper > Author Andrea Goethals > Author Wendy Gogel -- > You received this message because you are subscribed to the Google Groups "Digital Curation" group. > To post to this group, send email to digital-curation@googlegroups.com. > To unsubscribe from this group, send email to digital-curation+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/digital-curation?hl=en.
-- You received this message because you are subscribed to the Google Groups "Digital Curation" group. To post to this group, send email to digital-curation@googlegroups.com. To unsubscribe from this group, send email to digital-curation+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/digital-curation?hl=en.
I'm also doing an "environmental scan" on the state of email archiving and preservation for the state university libraries of Florida. I had a long discussion with the programmer at Harvard who is implementing their EASI project. They've settled on MBOX as their normalized format, which they then convert to the CERP XML schema for preservation. He said he did an extensive analysis of conversion tools and settled on emailchemy. It converts the largest number of formats reliably, and does the best job of preserving metadata in the conversion, including folder hierarchy and header tags.
> Have you come to any conclusion as to which parser you will use? The > biggest issue I see is that, aside from Aid4Mail, none of the programs > are able to deal with a diversity of formats on the input sid, to get > it into mbox. There seems to be a reasonable amount of support for > pst, but not so much beyond that. Aid4Mail is nice because the latest > version includes a scripting language, so you can export in any format > you want. But, it is proprietary and Windows only, so obviously that > rules it out for Archivematica.
> Best,
> Chris > ____
> Chris Prom > chris.p...@gmail.com <mailto:chris.p...@gmail.com>
> On Jun 17, 2011, at 8:52 AM, Peter Van Garderen wrote:
> In the February 0.7-alpha release we convert PST to MBOX. In the > 0.7.1-alpha release due out next week we also identify attachments and > convert them to their designated preservation and access copy formats. > The access format for individual email messages is HTML. We should > have a screencast up for the 0.7.1 release next week which > demonstrates this functionality. We'll post a link to this group.
> We are counting on the Archivematica early implementers to test this > functionality and get back to us with critiques and further > suggestions. This thread is also very helpful.
> Cheers,
> --peter
> Peter Van Garderen > Archivematica Project Manager
> On 06/16/2011 03:44 PM, Simon Spero wrote: >> The most recent Sedona Conference guidelines specifically related to >> email policy are from 2007 (see attached bibtex entry for Allman:2007.
>> Thoughts -
>> Ingest
>> There are several ways to handle ingest of email. One useful way is >> to provide an archival email server supporting a standard protocol >> (e.g. IMAP); content that the recipient considers archival can be >> dragged and dropped into that server using regular email clients. >> This form of ingestion is supported by some content/records >> management systems (OSS e.g - Alfresco). Regular IMAP servers (OSS >> e.g. cyrus imapd) store message in unmangled, ingest friendly form. >> Access controls can be set to allow for messages to be added, but >> not altered or deleted.Cyrus uses one file per message; bagging seems >> called for.
>> Proprietary storage formats may take more work to process, especially >> if header and body information is separated.
>> One interesting approach to ingest would to set up an archival relay >> server in front of the operational servers; such a server would route >> incoming email into the ingest process. Such a server would be >> mostly transparent to users, and could be operated under the auspices >> of Archives and Records management business units allowing for >> cleaner SoD from IT departments.
>> Note that this layer must be interposed even for messages between >> internal users, especially for systems such as exchange that allow >> for senders to remotely delete messages from recipients mailboxes.
>> Formats
>> In general, if it weren't for those darn attachments(giant >> asterisk), email would be one of the simplest formats for >> preservation. The reason is that in order to be useful, email >> messages had to be capable of passing through a variety of >> intermediate systems, each determined to wreak as much havoc on the >> content as possible, so mutated formats were strongly selected >> against. The fact that MIME attachments are still usually >> encapsulated in using base64 is one example of this evolutionary >> history.
>> Ah, but those attachments...
>> Even though any real content management system will automatically >> separate out any attached files and process them using whatever >> mechanisms are available for dealing with files of that type, if >> those files are proprietary and undocumented, or just not supported, >> they remain unhelpful buckets of bits (though at least they might >> have a useful mime type).
>> Versioning
>> One problem that comes to light with emailed attachments is if email >> is used for collaborative development of content; if this content is >> not stored in a version controlled system, envelope metadata >> (timestamps, message-ids, and forwarded bodies) may be the only way >> to sequence and assign responsibility for various changes. In >> addition, a great deal of storage may be wasted on 20K copies of the >> PDF of the annual report that could be better wasted on more online >> remote replicas.
>> A similar problem occurs when email refers to external content (e.g. >> via links). Unless the content can be captured at the time of >> sending (or receiving depending on context), essential parts of the >> context ( = meaning) of the communication may be lost. Attachmemento?
>> Authenticity
>> Because much of the metadata in an email envelope is generated or >> relied on as part of the message transport process, it is generally >> reliable-unless-tampered-with. Some of this metadata can still be >> forged (take a look at the full headers in your spam folder and see >> if you can spot where false headers were injected).
>> In general, email messages are not inherently more or less >> trustworthy than paper ones. Email can be self authenticating if >> signed using a public key; however, this is relatively uncommon >> outside DoD, the IC and associated entities.
>> One unsolved cryptographic problem that has especial significance >> for long term preservation and archival uses is the current lack of >> protocols providing Perfect Forward Secrecy (PFS). With PFS, if a >> key is compromised at some point, it cannot be used to attack >> messages sent before the compromise occurs. Although this property >> is relatively easy to provide for online communications, it is much >> harder to provide for digital signatures.
>> In addition, many hash algorithms (esp. MD5, and probably SHA-1) are >> considered broken, so messages signed using those older algorithms >> need to be migrated. It's almost as if preservation were an active >> continuing process.
>> Simon
>> @article{Allman:2007, >> Editor = {Allman, Thomas Y.}, >> Group = {Digital Preservation}, >> Journal = {The Sedona Conference\textregistered\ Journal}, >> Pages = {239--250}, >> Title = {{The Sedona Conference\textregistered\ commentary on email >> management: Guidelines for the selection of retention policy}}, >> Url = >> {http://www.thesedonaconference.org/dltForm?did=Commentary_on_Email_Ma...}, >> Volume = {8}, >> Year = {2007} >> }
>> On Thu, Jun 16, 2011 at 2:52 PM, Chris Prom <chris.p...@gmail.com >> <mailto:chris.p...@gmail.com>>wrote:
>> Thanks. FYI, there is a report out from iPRES last year, here is >> my zotero entry for it:
>> Reshaping the Repository: The Challenge of Email Archiving
>> Type Conference Paper >> Author Andrea Goethals >> Author Wendy Gogel
>> -- >> You received this message because you are subscribed to the Google >> Groups "Digital Curation" group. >> To post to this group, send email to digital-curation@googlegroups.com. >> To unsubscribe from this group, send email to >> digital-curation+unsubscribe@googlegroups.com. >> For more options, visit this group at >> http://groups.google.com/group/digital-curation?hl=en.
> -- > You received this message because you are subscribed to the Google > Groups "Digital Curation" group. > To post to this group, send email to digital-curation@googlegroups.com > <mailto:digital-curation@googlegroups.com>. > To unsubscribe from this group, send email to > digital-curation+unsubscribe@googlegroups.com > <mailto:digital-curation+unsubscribe@googlegroups.com>. > For more options, visit this group at > http://groups.google.com/group/digital-curation?hl=en.
> -- > You received this message because you are subscribed to the Google > Groups "Digital Curation" group. > To post to this group, send email to digital-curation@googlegroups.com. > To unsubscribe from this group, send email to > digital-curation+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/digital-curation?hl=en.
Actually, there's quite a bit more, Harvard is worth talking to. Once they convert to MBOX they parse using Mime4j which does a good job of decoding the attachments, and index them in SOLR. They have also done a lot of work on security, because mail contents can be so sensitive.
> Thanks Priscilla, that is helpful. I had planned to contact Harvard, > but it sounds like you provided a good summary of their project.
> Chris > ____
> Chris Prom > chris.p...@gmail.com <mailto:chris.p...@gmail.com>
> On Jun 17, 2011, at 9:35 AM, Priscilla Caplan wrote:
> I'm also doing an "environmental scan" on the state of email archiving > and preservation for the state university libraries of Florida. I had > a long discussion with the programmer at Harvard who is implementing > their EASI project. They've settled on MBOX as their normalized > format, which they then convert to the CERP XML schema for > preservation. He said he did an extensive analysis of conversion > tools and settled on emailchemy. It converts the largest number of > formats reliably, and does the best job of preserving metadata in the > conversion, including folder hierarchy and header tags.
> p
> On 6/17/2011 10:21 AM, Chris Prom wrote: >> Thanks Peter, the cites are useful.
>> Have you come to any conclusion as to which parser you will use? The >> biggest issue I see is that, aside from Aid4Mail, none of the >> programs are able to deal with a diversity of formats on the input >> sid, to get it into mbox. There seems to be a reasonable amount of >> support for pst, but not so much beyond that. Aid4Mail is nice >> because the latest version includes a scripting language, so you can >> export in any format you want. But, it is proprietary and Windows >> only, so obviously that rules it out for Archivematica.
>> Best,
>> Chris >> ____
>> Chris Prom >> chris.p...@gmail.com <mailto:chris.p...@gmail.com>
>> On Jun 17, 2011, at 8:52 AM, Peter Van Garderen wrote:
>> In the February 0.7-alpha release we convert PST to MBOX. In the >> 0.7.1-alpha release due out next week we also identify attachments >> and convert them to their designated preservation and access copy >> formats. The access format for individual email messages is HTML. We >> should have a screencast up for the 0.7.1 release next week which >> demonstrates this functionality. We'll post a link to this group.
>> We are counting on the Archivematica early implementers to test this >> functionality and get back to us with critiques and further >> suggestions. This thread is also very helpful.
>> Cheers,
>> --peter
>> Peter Van Garderen >> Archivematica Project Manager
>> On 06/16/2011 03:44 PM, Simon Spero wrote: >>> The most recent Sedona Conference guidelines specifically related to >>> email policy are from 2007 (see attached bibtex entry for Allman:2007.
>>> Thoughts -
>>> Ingest
>>> There are several ways to handle ingest of email. One useful way is >>> to provide an archival email server supporting a standard protocol >>> (e.g. IMAP); content that the recipient considers archival can be >>> dragged and dropped into that server using regular email clients. >>> This form of ingestion is supported by some content/records >>> management systems (OSS e.g - Alfresco). Regular IMAP servers (OSS >>> e.g. cyrus imapd) store message in unmangled, ingest friendly form. >>> Access controls can be set to allow for messages to be added, but >>> not altered or deleted.Cyrus uses one file per message; bagging >>> seems called for.
>>> Proprietary storage formats may take more work to process, >>> especially if header and body information is separated.
>>> One interesting approach to ingest would to set up an archival >>> relay server in front of the operational servers; such a server >>> would route incoming email into the ingest process. Such a server >>> would be mostly transparent to users, and could be operated under >>> the auspices of Archives and Records management business units >>> allowing for cleaner SoD from IT departments.
>>> Note that this layer must be interposed even for messages between >>> internal users, especially for systems such as exchange that allow >>> for senders to remotely delete messages from recipients mailboxes.
>>> Formats
>>> In general, if it weren't for those darn attachments(giant >>> asterisk), email would be one of the simplest formats for >>> preservation. The reason is that in order to be useful, email >>> messages had to be capable of passing through a variety of >>> intermediate systems, each determined to wreak as much havoc on the >>> content as possible, so mutated formats were strongly selected >>> against. The fact that MIME attachments are still usually >>> encapsulated in using base64 is one example of this evolutionary >>> history.
>>> Ah, but those attachments...
>>> Even though any real content management system will automatically >>> separate out any attached files and process them using whatever >>> mechanisms are available for dealing with files of that type, if >>> those files are proprietary and undocumented, or just not supported, >>> they remain unhelpful buckets of bits (though at least they might >>> have a useful mime type).
>>> Versioning
>>> One problem that comes to light with emailed attachments is if email >>> is used for collaborative development of content; if this content is >>> not stored in a version controlled system, envelope metadata >>> (timestamps, message-ids, and forwarded bodies) may be the only way >>> to sequence and assign responsibility for various changes. In >>> addition, a great deal of storage may be wasted on 20K copies of >>> the PDF of the annual report that could be better wasted on more >>> online remote replicas.
>>> A similar problem occurs when email refers to external content (e.g. >>> via links). Unless the content can be captured at the time of >>> sending (or receiving depending on context), essential parts of the >>> context ( = meaning) of the communication may be lost. Attachmemento?
>>> Authenticity
>>> Because much of the metadata in an email envelope is generated or >>> relied on as part of the message transport process, it is generally >>> reliable-unless-tampered-with. Some of this metadata can still be >>> forged (take a look at the full headers in your spam folder and >>> see if you can spot where false headers were injected).
>>> In general, email messages are not inherently more or less >>> trustworthy than paper ones. Email can be self authenticating if >>> signed using a public key; however, this is relatively uncommon >>> outside DoD, the IC and associated entities.
>>> One unsolved cryptographic problem that has especial significance >>> for long term preservation and archival uses is the current lack of >>> protocols providing Perfect Forward Secrecy (PFS). With PFS, if a >>> key is compromised at some point, it cannot be used to attack >>> messages sent before the compromise occurs. Although this property >>> is relatively easy to provide for online communications, it is much >>> harder to provide for digital signatures.
>>> In addition, many hash algorithms (esp. MD5, and probably SHA-1) are >>> considered broken, so messages signed using those older algorithms >>> need to be migrated. It's almost as if preservation were an active >>> continuing process.
>>> Simon
>>> @article{Allman:2007, >>> Editor = {Allman, Thomas Y.}, >>> Group = {Digital Preservation}, >>> Journal = {The Sedona Conference\textregistered\ Journal}, >>> Pages = {239--250}, >>> Title = {{The Sedona Conference\textregistered\ commentary on email >>> management: Guidelines for the selection of retention policy}}, >>> Url = >>> {http://www.thesedonaconference.org/dltForm?did=Commentary_on_Email_Ma...}, >>> Volume = {8}, >>> Year = {2007} >>> }
>>> On Thu, Jun 16, 2011 at 2:52 PM, Chris Prom <chris.p...@gmail.com >>> <mailto:chris.p...@gmail.com>>wrote:
>>> Thanks. FYI, there is a report out from iPRES last year, here >>> is my zotero entry for it:
>>> Reshaping the Repository: The Challenge of Email Archiving
>>> Type Conference Paper >>> Author Andrea Goethals >>> Author Wendy Gogel
>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Digital Curation" group. >>> To post to this group, send email to digital-curation@googlegroups.com. >>> To unsubscribe from this group, send email to >>> digital-curation+unsubscribe@googlegroups.com. >>> For more options, visit this group at >>> http://groups.google.com/group/digital-curation?hl=en.
>> -- >> You received this message because you are subscribed to the Google >> Groups "Digital Curation" group. >> To post to this group, send email to >> digital-curation@googlegroups.com >> <mailto:digital-curation@googlegroups.com>. >> To unsubscribe from this group, send email to >> digital-curation+unsubscribe@googlegroups.com >> <mailto:digital-curation+unsubscribe@googlegroups.com>. >> For more options, visit this group at >> http://groups.google.com/group/digital-curation?hl=en.
>> -- >> You received this message because you are subscribed to the Google >> Groups "Digital Curation" group. >> To post to this group, send email to digital-curation@googlegroups.com. >> To unsubscribe from this group, send email to
> ** > Actually, there's quite a bit more, Harvard is worth talking to. Once > they convert to MBOX they parse using Mime4j which does a good job of > decoding the attachments, and index them in SOLR. They have also done a lot > of work on security, because mail contents can be so sensitive.
> p
> On 6/17/2011 10:41 AM, Chris Prom wrote:
> Thanks Priscilla, that is helpful. I had planned to contact Harvard, but > it sounds like you provided a good summary of their project.
> On Jun 17, 2011, at 9:35 AM, Priscilla Caplan wrote:
> I'm also doing an "environmental scan" on the state of email archiving and > preservation for the state university libraries of Florida. I had a long > discussion with the programmer at Harvard who is implementing their EASI > project. They've settled on MBOX as their normalized format, which they > then convert to the CERP XML schema for preservation. He said he did an > extensive analysis of conversion tools and settled on emailchemy. It > converts the largest number of formats reliably, and does the best job of > preserving metadata in the conversion, including folder hierarchy and header > tags.
> p
> On 6/17/2011 10:21 AM, Chris Prom wrote:
> Thanks Peter, the cites are useful.
> Have you come to any conclusion as to which parser you will use? The > biggest issue I see is that, aside from Aid4Mail, none of the programs are > able to deal with a diversity of formats on the input sid, to get it into > mbox. There seems to be a reasonable amount of support for pst, but not so > much beyond that. Aid4Mail is nice because the latest version includes a > scripting language, so you can export in any format you want. But, it is > proprietary and Windows only, so obviously that rules it out for > Archivematica.
> In the February 0.7-alpha release we convert PST to MBOX. In the > 0.7.1-alpha release due out next week we also identify attachments and > convert them to their designated preservation and access copy formats. The > access format for individual email messages is HTML. We should have a > screencast up for the 0.7.1 release next week which demonstrates this > functionality. We'll post a link to this group.
> We are counting on the Archivematica early implementers to test this > functionality and get back to us with critiques and further suggestions. > This thread is also very helpful.
> Cheers,
> --peter
> Peter Van Garderen > Archivematica Project Manager
> On 06/16/2011 03:44 PM, Simon Spero wrote:
> The most recent Sedona Conference guidelines specifically related to > email policy are from 2007 (see attached bibtex entry for Allman:2007.
> Thoughts -
> Ingest
> There are several ways to handle ingest of email. One useful way is to > provide an archival email server supporting a standard protocol (e.g. IMAP); > content that the recipient considers archival can be dragged and dropped > into that server using regular email clients. This form of ingestion is > supported by some content/records management systems (OSS e.g - Alfresco). > Regular IMAP servers (OSS e.g. cyrus imapd) store message in unmangled, > ingest friendly form. Access controls can be set to allow for messages to > be added, but not altered or deleted.Cyrus uses one file per message; > bagging seems called for.
> Proprietary storage formats may take more work to process, especially if > header and body information is separated.
> One interesting approach to ingest would to set up an archival relay > server in front of the operational servers; such a server would route > incoming email into the ingest process. Such a server would be mostly > transparent to users, and could be operated under the auspices of Archives > and Records management business units allowing for cleaner SoD from IT > departments.
> Note that this layer must be interposed even for messages between > internal users, especially for systems such as exchange that allow for > senders to remotely delete messages from recipients mailboxes.
> Formats
> In general, if it weren't for those darn attachments(giant asterisk), > email would be one of the simplest formats for preservation. The reason is > that in order to be useful, email messages had to be capable of passing > through a variety of intermediate systems, each determined to wreak as much > havoc on the content as possible, so mutated formats were strongly selected > against. The fact that MIME attachments are still usually encapsulated in > using base64 is one example of this evolutionary history.
> Ah, but those attachments...
> Even though any real content management system will automatically > separate out any attached files and process them using whatever mechanisms > are available for dealing with files of that type, if those files are > proprietary and undocumented, or just not supported, they remain unhelpful > buckets of bits (though at least they might have a useful mime type).
> Versioning
> One problem that comes to light with emailed attachments is if email is > used for collaborative development of content; if this content is not stored > in a version controlled system, envelope metadata (timestamps, message-ids, > and forwarded bodies) may be the only way to sequence and assign > responsibility for various changes. In addition, a great deal of storage may > be wasted on 20K copies of the PDF of the annual report that could be > better wasted on more online remote replicas.
> A similar problem occurs when email refers to external content (e.g. via > links). Unless the content can be captured at the time of sending (or > receiving depending on context), essential parts of the context ( = meaning) > of the communication may be lost. Attachmemento?
> Authenticity
> Because much of the metadata in an email envelope is generated or relied > on as part of the message transport process, it is generally > reliable-unless-tampered-with. Some of this metadata can still be forged > (take a look at the full headers in your spam folder and see if you can > spot where false headers were injected).
> In general, email messages are not inherently more or less trustworthy > than paper ones. Email can be self authenticating if signed using a public > key; however, this is relatively uncommon outside DoD, the IC and associated > entities.
> One unsolved cryptographic problem that has especial significance for > long term preservation and archival uses is the current lack of protocols > providing Perfect Forward Secrecy (PFS). With PFS, if a key is compromised > at some point, it cannot be used to attack messages sent before the > compromise occurs. Although this property is relatively easy to provide for > online communications, it is much harder to provide for digital signatures.
> In addition, many hash algorithms (esp. MD5, and probably SHA-1) are > considered broken, so messages signed using those older algorithms need to > be migrated. It's almost as if preservation were an active continuing > process.
> Simon
> @article{Allman:2007, > Editor = {Allman, Thomas Y.}, > Group = {Digital Preservation}, > Journal = {The Sedona Conference\textregistered\ Journal}, > Pages = {239--250}, > Title = {{The Sedona Conference\textregistered\ commentary on email > management: Guidelines for the selection of retention policy}}, > Url = { > http://www.thesedonaconference.org/dltForm?did=Commentary_on_Email_Ma... > }, > Volume = {8}, > Year = {2007} > }
> On Thu, Jun 16, 2011 at 2:52 PM, Chris Prom <chris.p...@gmail.com> wrote:
>> Thanks. FYI, there is a report out from iPRES last year, here is my >> zotero entry for it:
>> Reshaping the Repository: The Challenge of Email Archiving Type Conference >> Paper Author Andrea Goethals Author Wendy Gogel
> -- > You received this message because you are subscribed to the Google Groups > "Digital Curation" group. > To post to this group, send email to digital-curation@googlegroups.com. > To unsubscribe from this group, send email to > digital-curation+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/digital-curation?hl=en.
> -- > You received this message because you are subscribed to the Google Groups > "Digital Curation" group. > To post to this group, send email to digital-curation@googlegroups.com. > To unsubscribe from this group, send email to > digital-curation+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/digital-curation?hl=en.
> -- > You received this message because you are subscribed to the Google Groups > "Digital Curation" group. > To post to this group, send email to digital-curation@googlegroups.com. > To unsubscribe from this group, send email to > digital-curation+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/digital-curation?hl=en.
> -- > You received this message because you are subscribed to the Google Groups > "Digital Curation" group. > To post to this group, send email to digital-curation@googlegroups.com. > To unsubscribe from this group, send email to > digital-curation+unsubscribe@googlegroups.com. > For more options, visit this group at
Below is a starter agenda for the next all-groups meeting on the
development of a RESTful Bag Server (https://github.com/acdha/restful- bag-server) scheduled for Friday, June 24th at 3pm ET/2pm CT/12pm PT.
Call-in Information is: 1-270-400-2000, 282929#.
Feel free to suggest additional items before or on the call:
<matt.schu...@metaarchive.org> wrote:
> Hi Everybody,
> A big thanks to Mike Smorul for tapping me on the shoulder to inquire
> about this month's scheduled call to discuss the RESTful Bag Server
> (https://github.com/acdha/restful-bag-server) development.
> This month's call is slated for next week Friday, June 24th at 3pm ET.
> Call-in info is 1-270-400-2000, 282929#.
> Consider this an open call for agenda items. There are a couple of
> relevant threads that have started (see below) since the previous call
> that was held on Friday, May 13th. We can definitely follow up on
> those. Notes from the May 13th call will be available shortly -
> apologies for the delay.
> I'll shoot out a tentative agenda by COB next week Wednesday, June
> 22nd once we've had a chance to hear from folks. Look forward to
> catching up.
> All best,
> Matt Schultz
> Collaborative Services Librarian
> Educopia Institute, MetaArchive Cooperativehttp://www.metaarchive.org > matt.schu...@metaarchive.org
> 616-566-3204
> On May 12, 4:17 pm, "matt.schu...@metaarchive.org"
> <matt.schu...@metaarchive.org> wrote:
> > Hi Everybody,
> > Below is a starter agenda for the next all-groups meeting on the
> > development of aRESTfulBagServer (https://github.com/acdha/restful-bag-server) scheduled for Friday, May 13th at 3pm ET/2pm CT/12pm PT.
> > Call-in Information is: 1-270-400-2000, 282929#.
> > This will be a great call for groups or individuals who have not yet
> > participated but are interested in this set of work to drop in and say
> > hi.
> > Starter Agenda (feel free to suggest additional items on the call):
> > 1. Brief welcome and catch-up for new callers
> > 2. Overview of a JavaBagServer - Mike Smorul
> > 3. Scheduling future calls
> > Looking forward to talking with those who can join.
> > On Apr 27, 7:51 am, "matt.schu...@metaarchive.org"
> > <matt.schu...@metaarchive.org> wrote:
> > > Hi Everybody,
> > > The next all-groups meeting scheduled for Friday, April 29th at 3-4pm
> > > ET has been re-scheduled for Friday, May 13th at 3-4pm ET. Call-in
> > > info is: 1-270-400-2000, 282929#.
> > > This call is open for new individuals and groups to attend to discuss
> > > on-going development of a specification for managing and tracking the
> > > availability ofBagdefined data for the purposes of replication with
> > > a view toward preservation. See the Github site for the current spec
> > > definition:https://github.com/acdha/restful-bag-server.
> > > There will be a request to add agenda items one week before the call.
> > > Final agenda will posted a few days prior. Due to the limited time on
> > > the call to address open issues and progress toward development,
> > > please make every effort to review the Github site prior to the call,
> > > add an issue, or respond to request for an added agenda item.
> > > Notes from our previous meeting are below. Look forward to catching up
> > > in a couple of weeks.
> > > o Peter van Garderen clarified his written use casehttps://github.com/acdha/restful-bag-server/blob/master/Use%20Cases/A...
> > > o Simple submission/dissemination internal repository exchange (SIP/
> > > AIP/DIP transformations) ofBag-based data
> > > o Peter and Chris Adams commented that the current spec is very well
> > > targeted towards this type of use case
> > > 2. Discussed Open Issues
> > > A. Small file transfers – Chris suggested tabling this issue because
> > > the range of potential solutions could probably be handled through
> > > some added nomenclature on good http citizenship.
> > > o Server should support things like keep-alive, pipelining, etc.
> > > o May want to consider embedding some links in this section to
> > > educate on the principles ofRESTfulArchitecture
> > > B. Version handling – Chris confirmed that the best approach to handle
> > > the awareness of most current version of an uploadedbagwould be to
> > > dedicate a symbolic link location (/version/latest).
> > > o Chris asked if there were any strong objections to creating a new
> > > branch to bake in the versioning proposal that was approved on the
> > > 02/18 call – no objections. See here:https://github.com/acdha/restful-bag-server/tree/versioned-example > > > o Folks are encouraged to look this over before merging
> > > C. Validation history – very briefly discussed the implementation of a
> > > resource for exposing details of last fullbagvalidity check (pass/
> > > fail). Mike Smorul had a question about making this available in the
> > > metadata
> > > o No clear determination on this was recorded on the call – may need
> > > to revisit briefly on 05/13
> > > D. Handling manifest changes – Chris and Mike Smorul suggested making
> > > the spec flexible enough to allow people to query for supported
> > > manifests and to re-upload files as needed.
> > > o Agreed also to accept any additional manifests but require at least
> > > one supported format and report a 409 conflict if the listed files
> > > don't match the standard md5 or sha-256.
> > > 3. Update on Testing Suite & Reference Server
> > > o Chris got started on a Python-based testing suite that conforms to
> > > the spec:https://github.com/acdha/restful-bag-server/blob/test-suite/tests.py > > > o Chris indicated that he wanted to work on the reporting
> > > o Was planning on starting with a simple read-only server that would
> > > be a good proof of concept for folks developing custom clients for
> > > their environments that would want to interface
> > > o The reference server would eventually be driven by Python and
> > > should be flexible and extensible
> > > 4. Discussed Practical Next Steps
> > > o Chris invited folks to feel free to issue a pull request on the
> > > Github site if they are interested in lending him a hand:https://github.com/acdha/restful-bag-server > > > o Matt inquired about promoting this work and the most reasonable
> > > time frames/venues - Consensus was that June-August might be most
> > > appropriate once tests of the spec have revealed themselves - Mike
> > > Giarlo mentioned Open Repositories and Curate Camp as potential venues
> > > o Questions about licensing of the spec arose – Chris from Library
> > > of Congress would have to inquire (perhaps BSD or GPL)
> > > o Agreed to check-in prior to the next call on May 13th on progress
> > > toward test suite implementation & reporting and implementation of the
> > > reference server
> > > On Mar 23, 2:07 pm, "matt.schu...@metaarchive.org"
> > > > Below is a loose agenda for the next all-groups meeting on the
> > > > development of aRESTfulBagServer (https://github.com/acdha/restful- > > > >bag-server) scheduled for Friday, March 25th at 3pm ET/2pm CT/12pm PT.
> > > > Call-in Information is: 1-270-400-2000, 282929#
Notes from our previous all groups RESTful Bag Server Meeting held on
Friday, May 13th are below. Apologies once again for the delay on
these. For those who were on the call, if I missed anything please
feel free to add to these notes.
Look forward to catching up tomorrow (Friday, 06/24) at 3pm ET,
1-270-400-2000, 282929#.
RESTful-Bag-Server Meeting 4
Minutes
05/13/2011
Attendees
1. Mike Burek (Chronopolis)
2. Esme Cowles (UCSD Libraries)
3. Declan Fleming (UCSD Libraries)
4. Mike Giarlo (Penn State)
5. Matt Schultz (Educopia)
6. Mike Smorul (Chronopolis)
7. Don Sutton (Chronopolis)
Minutes
The May call was brief and intended primarily as an opportunity for
new and interested groups or individuals to catch up on the
developments thus far. The call also provided an opportunity to check
in on Mike Smorul’s work to develop a useful tool that would layer
with the spec.
1. Brief Welcome & Catch-Up for New Callers
a. We were newly joined by Declan Fleming & Esme Cowles from the UCSD
Libraries
b. Matt provided a brief recap of the development work since last
December – goal has been to produce a spec for managing and tracking
the replication of BagIt-based data within and between preservation
repositories
c. Declan & Esme explained that they were interested in the
development work as collaborators with Chronopolis and perhaps for
other UCSD applications
d. Matt invited them to feel free to thoroughly review the github site
(https://github.com/acdha/restful-bag-server) and add a use case or
issue as desired
2. Overview of Java Bag Server
a. Mike Smorul gave a brief overview of his Java Bag Server – a set of
java libraries and tools that provides a straight-forward endpoint
directory setup for both creating and pushing/pulling bags
b. Mike mentioned that this will likely have direct applications for
Chronopolis but it is able to be embedded in other local applications
and would encourage other groups on this project to play with it and
provide feedback (now available here:
http://adaptvm01.umiacs.umd.edu:8080/jenkins/job/Chronopolis%20Ingest... source code available here: https://subversion.umiacs.umd.edu/ingestion/trunk/)
• Mike Giarlo (Penn State) mentioned that he had spent some brief time
with it prior to the call and though he couldn’t comment on the Java
framework or Penn’s implementation he thought it would be useful
c. Mike Smorul would especially appreciate Chris Adams’s feedback in
terms of its integrations with the proposed spec
d. Matt inquired about the timeline for its dissemination, and Mike
requested time to do some debugging and think about licensing (now
available here: http://adaptvm01.umiacs.umd.edu:8080/jenkins/job/Chronopolis%20Ingest... source code available here: https://subversion.umiacs.umd.edu/ingestion/trunk/)
3. Additional Items Covered
a. Chris Adams was unable to attend but requested feedback from the
group on Mike Smorul’s proposal to relax the upload ordering
constraints to only require the manifest and files be “committed” upon
a final PUT – until then changes should probably be allowed (see Issue
#20: https://github.com/acdha/restful-bag-server/issues/20)
• Mike Smorul expanded on this discussion in regards to flexibility of
DELETE (see Issue #22: https://github.com/acdha/restful-bag-server/issues/22)
• Mike Smorul documented a proposed implementation (see Issue #23:
https://github.com/acdha/restful-bag-server/issues/23)
b. Ed Summers mentioned on the call that the BagIt spec has its own
notion of “commit” in terms of final validation of a manifest – should
revisit – may or may not have bearing
• Followed up the call based on Mike Smorul's comments by posting an
Issue to the github site (see Issue #24: https://github.com/acdha/restful-bag-server/issues/24)
4. Scheduling Future Calls
a. Matt checked in with the group on the monthly meeting schedule –
whether we needed to change dates/times or frequency of the calls?
• Agreed that for now the last Friday of each month at 3pm ET was
still working out
b. June call was scheduled for Friday, 06/24 at 3pm ET
(1-270-400-2000, 282929#)
All best,
Matt Schultz
Collaborative Services Librarian
Educopia Institute, MetaArchive Cooperative
http://www.metaarchive.org matt.schu...@metaarchive.org
616-566-3204
On Jun 22, 3:15 pm, "matt.schu...@metaarchive.org"
<matt.schu...@metaarchive.org> wrote:
> Hi Everybody,
> Below is a starter agenda for the next all-groups meeting on the
> development of a RESTful Bag Server (https://github.com/acdha/restful- > bag-server) scheduled for Friday, June 24th at 3pm ET/2pm CT/12pm PT.
> Call-in Information is: 1-270-400-2000, 282929#.
> Feel free to suggest additional items before or on the call:
> Looking forward to talking with those who can join.
> All best,
> Matt Schultz
> Collaborative Services Librarian
> Educopia Institute, MetaArchive Cooperativehttp://www.metaarchive.org > matt.schu...@metaarchive.org
> 616-566-3204
> On Jun 15, 4:11 pm, "matt.schu...@metaarchive.org"
> <matt.schu...@metaarchive.org> wrote:
> > Hi Everybody,
> > A big thanks to Mike Smorul for tapping me on the shoulder to inquire
> > about this month's scheduled call to discuss the RESTful Bag Server
> > (https://github.com/acdha/restful-bag-server) development.
> > This month's call is slated for next week Friday, June 24th at 3pm ET.
> > Call-in info is 1-270-400-2000, 282929#.
> > Consider this an open call for agenda items. There are a couple of
> > relevant threads that have started (see below) since the previous call
> > that was held on Friday, May 13th. We can definitely follow up on
> > those. Notes from the May 13th call will be available shortly -
> > apologies for the delay.
> > I'll shoot out a tentative agenda by COB next week Wednesday, June
> > 22nd once we've had a chance to hear from folks. Look forward to
> > catching up.
> > On May 12, 4:17 pm, "matt.schu...@metaarchive.org"
> > <matt.schu...@metaarchive.org> wrote:
> > > Hi Everybody,
> > > Below is a starter agenda for the next all-groups meeting on the
> > > development of aRESTfulBagServer (https://github.com/acdha/restful-bag-server) scheduled for Friday, May 13th at 3pm ET/2pm CT/12pm PT.
> > > Call-in Information is: 1-270-400-2000, 282929#.
> > > This will be a great call for groups or individuals who have not yet
> > > participated but are interested in this set of work to drop in and say
> > > hi.
> > > Starter Agenda (feel free to suggest additional items on the call):
> > > 1. Brief welcome and catch-up for new callers
> > > 2. Overview of a JavaBagServer - Mike Smorul
> > > 3. Scheduling future calls
> > > Looking forward to talking with those who can join.
> > > > The next all-groups meeting scheduled for Friday, April 29th at 3-4pm
> > > > ET has been re-scheduled for Friday, May 13th at 3-4pm ET. Call-in
> > > > info is: 1-270-400-2000, 282929#.
> > > > This call is open for new individuals and groups to attend to discuss
> > > > on-going development of a specification for managing and tracking the
> > > > availability ofBagdefined data for the purposes of replication with
> > > > a view toward preservation. See the Github site for the current spec
> > > > definition:https://github.com/acdha/restful-bag-server.
> > > > There will be a request to add agenda items one week before the call.
> > > > Final agenda will posted a few days prior. Due to the limited time on
> > > > the call to address open issues and progress toward development,
> > > > please make every effort to review the Github site prior to the call,
> > > > add an issue, or respond to request for an added agenda item.
> > > > Notes from our previous meeting are below. Look forward to catching up
> > > > in a couple of weeks.