[ANN] rat, an extension to the tar archive, allowing random file access

331 views
Skip to first unread message

Máximo Cuadros

unread,
Apr 21, 2015, 6:21:52 AM4/21/15
to golan...@googlegroups.com
Hi folks,

I want to share this small library:
rat is an extension to the classical tar archive, focused on allowing
constant-time random file access with linear memory consumption
increase. tape archive, was originally developed to write and read
streamed sources, making random access to the content very
inefficient.

Based on the benchmarks, we found that rat is 4x to 60x times faster
over SSD and HDD than the classic tar file, when reading a single file
from a tar archive.

https://github.com/mcuadros/go-rat

Any feedback is more than welcome!

Best regards

Tamás Gulácsi

unread,
Apr 21, 2015, 7:40:43 AM4/21/15
to golan...@googlegroups.com
Hi,

As I see you're replicated the functional ity of zip, without its standardness. What are the use cases?

Máximo Cuadros

unread,
Apr 21, 2015, 8:05:50 AM4/21/15
to Tamás Gulácsi, golan...@googlegroups.com
To be honest I didn't know that zip has a feature for random access to
files on a zip, without read the whole archive.
Reading about it now, I see that my implementation is very similar to
the central directory. Thanks!

BTW We have like 10TB of tars so we cannot simply move to zip.

Best regards

Gulácsi Tamás

unread,
Apr 21, 2015, 8:15:03 AM4/21/15
to Máximo Cuadros, golan...@googlegroups.com
http://www.artpol-software.com/ZipArchive/KB/0610051629.aspx
Go supports Zip64 nicely, so 10TB is not an excuse :)

Máximo Cuadros

unread,
Apr 21, 2015, 8:19:44 AM4/21/15
to Gulácsi Tamás, golan...@googlegroups.com
10TB of millions of tars, so I want move everything to zip. :D

Brandon Philips

unread,
Apr 23, 2015, 10:27:31 PM4/23/15
to Máximo Cuadros, golan...@googlegroups.com, Gulácsi Tamás

We have been thinking about this problem in the application container spec[1] and the other issue is random access to the gzip compression of a tar.gz.

To that end we have two tools:

github.com/coreos/gzran
github.com/vbatts/tar-split

The idea is to have a secondary index alongside the tar.gz.

Brandon

[1] github.com/appc/spec

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

andrewc...@gmail.com

unread,
Apr 23, 2015, 10:45:59 PM4/23/15
to golan...@googlegroups.com
How is this implemented? is the first entry an offset to a special index file or something?

Brandon Philips

unread,
Apr 24, 2015, 1:49:21 AM4/24/15
to andrewc...@gmail.com, golan...@googlegroups.com
The indexes would be external in our initial use case:
https://github.com/coreos/rkt/issues/544

Gomgoru Koee

unread,
Apr 26, 2015, 1:39:08 PM4/26/15
to golan...@googlegroups.com, tgula...@gmail.com, mcua...@gmail.com
Brandon, have you looked at using the ECMA 208 standard ( its an ISO standard as well)? It does most of what your trying to get done right now. And it's easily extended,  too.

There are a lot of defined standards that are being "Reinvented" because no one is checking for them...sigh.

Brandon Philips

unread,
Apr 30, 2015, 11:36:28 AM4/30/15
to Gomgoru Koee, Máximo Cuadros, Gulácsi Tamás, golan...@googlegroups.com

There are tons of little known archive formats like ecma 208 or dar.

But we decided we  would rather tell the user just use tar, which has bindings and implementations everywhere, and we will do the fancy stuff out of band in tooling.

Thanks,

Brandon

Reply all
Reply to author
Forward
0 new messages