--
You received this message because you are subscribed to the Google Groups "Digital Curation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to digital-curati...@googlegroups.com.
To post to this group, send email to digital-...@googlegroups.com.
Visit this group at http://groups.google.com/group/digital-curation.
For more options, visit https://groups.google.com/groups/opt_out.
send an email to digital-curation+unsubscribe@googlegroups.com
<mailto:digital-curation%2Bunsu...@googlegroups.com>.
To post to this group, send email to
Visit this group at http://groups.google.com/group/digital-curation.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google
Groups "Digital Curation" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to digital-curation+unsubscribe@googlegroups.com.
To post to this group, send email to digital-curation@googlegroups.com.
Visit this group at http://groups.google.com/group/digital-curation.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "Digital Curation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to digital-curation+unsubscribe@googlegroups.com.
To post to this group, send email to digital-curation@googlegroups.com.
Hi Adam,
What is your “original JPG”? If it’s a compressed JEPG, why would your further compress it? If it’s uncompressed, then is it already a JPEG2000?
Ricky
Ricky Erway
Senior Program Officer
OCLC Research
San Mateo, CA USA
send an email to digital-curati...@googlegroups.com
<mailto:digital-curation%2Bunsu...@googlegroups.com>.
To post to this group, send email to
Visit this group at
http://groups.google.com/group/digital-curation.
For more options, visit
https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google
Groups "Digital Curation" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
digital-curati...@googlegroups.com.
To post to this group, send email to
digital-...@googlegroups.com.
Visit this group at
http://groups.google.com/group/digital-curation.
For more options, visit
https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "Digital Curation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
digital-curati...@googlegroups.com.
To post to this group, send email to
digital-...@googlegroups.com.
Visit this group at
http://groups.google.com/group/digital-curation.
For more options, visit
https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "Digital Curation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
digital-curati...@googlegroups.com.
To post to this group, send email to
digital-...@googlegroups.com.
If it’s a compressed JEPG, why would your further compress it?
> To unsubscribe from this group and stop receiving emails from it, send an email todigital-curation+unsub...@googlegroups.com.
Hi Ricky,
In our case "original jpg" means that a user has provided a compressed JPG when submitting files. If most archives are not converting these files as they're in an open format (regardless of lossy-ness) that's useful to know. We've been mainly following the guidance of the LOC (digitalpreservation.gov ) and others, which show a strong preference for not using the JPG format as an archival format. Are most folks now using JPEG2k?
Sent from my free software system <http://fsf.org/>.
Gzip recovery tool : https://github.com/arenn/gzrt
Suggests using gnu cpio to extract data from recovered tgz streams, as it will skip unrecovered bytes.
FEC worth looking at: RAPTORQ, which has some nice theoretical properties.
Access vs. Compression &archiving:
Zip files do allow for random access; however, all compression is done on a per file basis, which means that for big collections of small files, the compression algorithms are barely getting started before the file is over. This really becomes significant if the files have a lot of commonality (e.g. individual XML or JSON records, or emails stored in individual files).
Small files that are bundled together will still save space as any partially used blocks are a more significant part of the overall size. Also, samfs (and other near line systems that keep part of the file data around) is happier when it has fewer files to migrate.
Compressed streams only allow random access to the start of compressed blocks, which can be somewhat large. If you record the offset of the desired file, you can seek to the start of the compressed block, and decode sequentially from there. Given how much more expensive seeks are relative to sequential reads (on HDD at least), this may not be too big a penalty.