Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Welcome

21 views
Skip to first unread message

Nathan Tallman

unread,
Jul 8, 2024, 9:17:27 AM7/8/24
to DART User Group
Dear DART Users,

Welcome to the DART User Group! We’re excited to have you join this community dedicated to using and enhancing the DART tool for packaging and transferring digital content for preservation.

As we kick off, please note that the group may be a bit quiet for the next few months as we are in the process of hiring and onboarding our new Lead Developer. In the meantime, we encourage you to use this mailing list as a resource. If you have questions, challenges, or tips related to DART, please don’t hesitate to share them here. Your contributions will help fellow users and foster a collaborative environment where we can all learn and grow together.

Thank you for being part of this community. We look forward to your active participation and the exciting developments ahead!

Best wishes,
Nathan Tallman
Executive Director, APTrust

Michael J Dulock

unread,
Jul 15, 2024, 4:27:15 PM7/15/24
to DART User Group

Hi folks,

 

I guess I have sort of an opinion, or “is anyone doing this?” question. I hope it fits into the context of the intent of this list. If this doesn’t fit, apologies, and I’ll find another venue to ask.

 

Background: We’re using DART for all of our packaging & validating when prepping deposits to APTrust, but not for all of our uploads. I haven’t had great luck with larger bags (say, 500GB+), so I’ve taken to shutting off upload targets in DART for big bags, bagging & validating, then using CyberDuck to upload.

As our footprint increases, and I’m eyeing future space needs for both our backlog and our ongoing production, I’ve been thinking about compression.


Real question: Is anyone using gzip to compress your bags before upload to save space? As far as I can see, it’s lossless compression so it should not compromise the content. And it seems like an easy/obvious way to cut back on storage footprint by 70-90% (as advertised – I have yet to test). But I haven’t heard it mentioned in these circles in the couple years we’ve been onboard, and I’m always wary of “easy/obvious” solutions to problems. 😊

In another life I used it all the time working tech support for a long-since-gobbled-up server & solutions company to send patches to customers, with no ill effects that I ever saw. Just wondering if there’s something in this arena I’m not thinking of (related to preservation practice, or some technical gotcha) or if I should just start doing it and see how much space it saves.

 

Thanks,

Michael

_________________________________________________________________________

Michael Dulock

(he/him/his)

Associate Professor

Digital Asset Librarian

Lead, Digital Asset Management and Production Services

 

Digitization, Description, & Discovery Services

Research & Innovation Strategies

University of Colorado Boulder Libraries

 

184 UCB, 1720 Pleasant Street

Boulder, Colorado 80309-0184

 

Phone: 303-492-5518

E-mail: michael...@colorado.edu

http://www.colorado.edu/libraries/

_________________________________________________________________________

 

 

 

Nathan Tallman

unread,
Jul 15, 2024, 4:54:54 PM7/15/24
to DART User Group, Michael J Dulock
Hi Michael,

Thanks for sending this message; it's good for this list. Just remember that this list is for DART users generally and not all list members are APTrust members. 

If you send a GZIP compressed bag (.tar.gz or .tgz) to APTrust, it will be rejected. That's because we're not set up to inspect or detect if a TAR is uncompressed; it's assumed to be uncompressed. Ingest services can't handle it or even tell what it is right now. We've discussed using GZIP on TARs before, but not in several years. If you want APTrust to be able to accept GZIPed TARs, we can talk about that in Advisory.

Of course, you're free to compress individual data/payload files in the bag, before bagging, but they will be preserved as such in APTrust.

Thanks,
Nathan

--
Nathan Tallman
Executive Director, APTrust

--
You received this message because you are subscribed to the Google Groups "DART User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dart-users+...@aptrust.org.
To view this discussion on the web visit https://groups.google.com/a/aptrust.org/d/msgid/dart-users/SJ0PR03MB5886A24F77C778AD621468E8E4A12%40SJ0PR03MB5886.namprd03.prod.outlook.com.

Joshua Allan Westgard

unread,
Jul 17, 2024, 12:40:15 PM7/17/24
to Nathan Tallman, DART User Group, Michael J Dulock
Hi Michael,

We don't compress our bags, but we do have a workflow that involves bagging locally and uploading in a separate step. We do this with DART runner (with a configuration file set for "no upload"). This bag>pause>upload workflow allows us to do any last checks before we transfer the bag into the receiving bucket. In our case that includes comparing the contents of the manifest to a previously created batch inventory file to ensure everything is complete and correct. We use the APTrust partner tools command line script to do the final upload.

My question regarding compression is what sorts of media files are in your bags, and whether you've done a real-world comparison to see how much space you'd actually save? In my experience the space-saving yield of compression varies a lot depending on the files.

Josh

--
Joshua A. Westgard, MLS, PhD (he/him/his) | Systems Librarian
Digital Programs and Initiatives | University of Maryland Libraries
Affiliate Faculty | College of Information Studies (iSchool)
McKeldin Library | 7649 Library Ln | College Park, MD 20742-7011
www.lib.umd.edu | west...@umd.edu | +1-301-405-9136 office

Michael J Dulock

unread,
Jul 25, 2024, 12:55:02 PM7/25/24
to Joshua Allan Westgard, Nathan Tallman, DART User Group

Hi Josh,

 

Sorry for the late response.

 

I haven’t done any testing yet, figured I’d ask about feasibility first, just to see if there was a relatively easy way to save space over the long term. Gzip “advertises” 70-90% compression, but even at 50% that’s a significant chunk for some bigger bags, especially media.

 

Our bags run the gamut from garden-variety TIFF/JPG image files, PDFs, and quite a bit of time-based formats like WAV/MP3 & MOV/MP4. We started digitizing film a few years ago, and of course those files are big – demand for film has been consistent, and I’d say growing. (So few facilities can do it anymore.) There are some other odds & ends sometimes, but the stuff I’ve listed is the lion’s share.

If it became an option, I would do some testing to see if it was worth the effort. For now I’ll just not worry about it. 😊


Thanks,
Michael

 

From: Joshua Allan Westgard <west...@umd.edu>
Sent: Wednesday, July 17, 2024 10:40 AM
To: Nathan Tallman <nathan....@aptrust.org>; DART User Group <dart-...@aptrust.org>
Cc: Michael J Dulock <Michael...@Colorado.EDU>
Subject: Re: [dart-users] APT deposits & gzip

 

[External email - use caution]

Reply all
Reply to author
Forward
0 new messages