HathiTrust Ingest Tools

83 views
Skip to first unread message

Jeremy York

unread,
Feb 12, 2013, 5:14:42 PM2/12/13
to hathitru...@googlegroups.com
Summary of Meeting on next steps for HathiTrust Ingest Tools, 1/15/13

Introduction
The initial goal of the ingest tools was to provide access to the core system in use for digital object ingest at the University of Michigan (on behalf of HathiTrust), through distribution of the same code, so that institutions with some programmer resources could use the tools to prepare content for ingest into HathiTrust. We would like to learn what experiences have been, what issues have been encountered, what would be useful going forward.

Challenges encountered
  1. Validation: not getting files to validate to HathiTrust specs; problems have had to do with vendor-produced files.
  2. Lack of resources (e.g., time) or other issues (e.g., installation difficulty, system compatibility) are impediment to exploring or using tools

Suggestions
  1. HathiTrust should make specific commands, e.g., for making conforming JP2s using kakadu, available along with the ingest tools.
  2. To the extent HathiTrust knows common pitfalls vendors have in meeting specs, make this information available to vendors;
  3. Keep bar for institution using the tools low
  4. Create out-of-box support for institutions to convert outputs of common vendors, such as Kirtas, to HathiTrust-compliant files and volumes
  5. Create graphical user interface for tools.
  6. Create single-image validator where institutions can upload an image and get a report on where the file fails validation.
  7. Develop a cloud-based service where institutions can upload entire volumes, which would be validated for compliance with HathiTrust specifications; a log would be generated containing validation errors.
    1. This would require institutions to have a standard submission package to the cloud-based system, which would encompass, for instance, directory structure.

We discussed the relative importance of developing tools for institutions to use, versus producing well-documented specifications, which institutions could use to build their own tools. The sense was that both of these were important; the specifications in particular for vendors.

There was further discussion to clarify and get feedback on suggestions 6 and 7 above. Everyone agreed that both would be great to have. Some expressed that there would be disappointment if only #7 and not #6 were offered.

Outcomes
Based on the meeting and after further discussion, staff at Michigan are pursuing the following course:

  1. Posting additional image specifications as soon as possible, including specific instructions for creating conforming JP2 images using Kakadu
  2. Proceeding with developing a cloud-based service where entire volumes or single images (JP2 or TIFF) can be uploaded and logs received on compliance with HathiTrust specifications. The whole-volume validation will encompass validation of volume structure, which single-image validation will not.
  3. In some cases, (e.g., for well-known vendors such as Kirtas), it may be possible to offer remediation through the cloud-based service, so that files could be submitted by institutions, run through validation and remediation, and be submitted subsequently and directly for ingest into HathiTrust. Michigan will explore implementing this option, initially targeting Kirtas-digitized content.

When we released the ingest tools we created a discussion list for issues surrounding HathiTrust Ingest. If you have not already, please join this group to facilitate notifications, discussion and feedback as we move forward with the steps above: https://groups.google.com/forum/?fromgroups#!forum/hathitrust-ingest. We will be in touch, via the list, with progress updates, and welcome additional input and feedback at any time.
Please feel free to post to the list, or email lit-cs...@umich.edu for help with ingest.
Reply all
Reply to author
Forward
0 new messages