Hi folks,
I guess I have sort of an opinion, or “is anyone doing this?” question. I hope it fits into the context of the intent of this list. If this doesn’t fit, apologies, and I’ll find another venue to ask.
Background: We’re using DART for all of our packaging & validating when prepping deposits to APTrust, but not for all of our uploads. I haven’t had great luck with larger bags (say, 500GB+), so I’ve taken to shutting off upload targets in DART for big bags, bagging & validating, then using CyberDuck to upload.
As our footprint increases, and I’m eyeing future space needs for both our backlog and our ongoing production, I’ve been thinking about compression.
Real question: Is anyone using gzip to compress your bags before upload to save space? As far as I can see, it’s lossless compression so it should not compromise the content. And it seems like an easy/obvious way to cut back on storage footprint by 70-90%
(as advertised – I have yet to test). But I haven’t heard it mentioned in these circles in the couple years we’ve been onboard, and I’m always wary of “easy/obvious” solutions to problems.
😊
In another life I used it all the time working tech support for a long-since-gobbled-up server & solutions company to send patches to customers, with no ill effects that I ever saw. Just wondering if there’s something in this arena I’m not thinking of (related to preservation practice, or some technical gotcha) or if I should just start doing it and see how much space it saves.
Thanks,
Michael
_________________________________________________________________________
Michael Dulock
(he/him/his)
Associate Professor
Digital Asset Librarian
Lead, Digital Asset Management and Production Services
Digitization, Description, & Discovery Services
Research & Innovation Strategies
University of Colorado Boulder Libraries
184 UCB, 1720 Pleasant Street
Boulder, Colorado 80309-0184
Phone: 303-492-5518
E-mail: michael...@colorado.edu
http://www.colorado.edu/libraries/
_________________________________________________________________________
--
You received this message because you are subscribed to the Google Groups "DART User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dart-users+...@aptrust.org.
To view this discussion on the web visit https://groups.google.com/a/aptrust.org/d/msgid/dart-users/SJ0PR03MB5886A24F77C778AD621468E8E4A12%40SJ0PR03MB5886.namprd03.prod.outlook.com.
To view this discussion on the web visit https://groups.google.com/a/aptrust.org/d/msgid/dart-users/CAFdf%3DSJbqsMK-NAXoNkF79MahP-X_dtmRaU%2Bb8kZQyHaniD8rw%40mail.gmail.com.
Hi Josh,
Sorry for the late response.
I haven’t done any testing yet, figured I’d ask about feasibility first, just to see if there was a relatively easy way to save space over the long term. Gzip “advertises” 70-90% compression, but even at 50% that’s a significant chunk for some bigger bags, especially media.
Our bags run the gamut from garden-variety TIFF/JPG image files, PDFs, and quite a bit of time-based formats like WAV/MP3 & MOV/MP4. We started digitizing film a few years ago, and of course those files are big – demand for film has been consistent, and I’d say growing. (So few facilities can do it anymore.) There are some other odds & ends sometimes, but the stuff I’ve listed is the lion’s share.
If it became an option, I would do some testing to see if it was worth the effort. For now I’ll just not worry about it. 😊
Thanks,
Michael
From: Joshua Allan Westgard <west...@umd.edu>
Sent: Wednesday, July 17, 2024 10:40 AM
To: Nathan Tallman <nathan....@aptrust.org>; DART User Group <dart-...@aptrust.org>
Cc: Michael J Dulock <Michael...@Colorado.EDU>
Subject: Re: [dart-users] APT deposits & gzip
[External email - use caution]