archive struct clone support

12 views
Skip to first unread message

ben.tra...@gmail.com

unread,
Jan 28, 2018, 10:14:14 PM1/28/18
to libarchive-discuss
Hi all,

To support random reads, the ability to deep clone an archive struct seems like it would be helpful. The client application could then generate cloned archive and archive entry pairs at interesting positions in the source archive, decompressing the data at these positions on demand multiple times in the future using archive_read_data. Does this feature seem like a useful addition?

Cheers,
Ben

Tim Kientzle

unread,
Jan 28, 2018, 10:19:33 PM1/28/18
to ben.tra...@gmail.com, libarchiv...@googlegroups.com


> On Jan 28, 2018, at 2:19 PM, ben.tra...@gmail.com wrote:
>
> Hi all,
>
> To support random reads, the ability to deep clone an archive struct seems like it would be helpful. The client application could then generate cloned archive and archive entry pairs at interesting positions in the source archive, decompressing the data at these positions on demand multiple times in the future using archive_read_data. Does this feature seem like a useful addition?

It's an interesting approach, and one I hadn't considered.

Have you tried implementing it?

I presume that you would not try to support reading an archive from stdin.

I'm curious how you would handle reading a tar.gz archive from a file?

Tim



ben.tra...@gmail.com

unread,
Jan 30, 2018, 10:01:52 PM1/30/18
to libarchive-discuss
I started looking into this today. It looks like we could clone struct archive_read, although that would require adding a few size members to copy existing arrays of unknown lengths. At what point do the functions for pulling data off of the disk get executed? Is there a default filter that is created at the end of the chain to perform this step? It looks like the layout of the private struct archive_read_disk is different than that of archive_read, but both use the same public struct archive. How does this part work?

ben.tra...@gmail.com

unread,
Jan 31, 2018, 10:06:56 AM1/31/18
to libarchive-discuss
Never mind, I see how the data is pulled off of the disk now. It looks like the system does use a filter at the end of the chain, but the read function that is used does not use the code requiring the archive struct to be of the layout archive_read_disk. What if we cloned archive_read? Assuming that the file descriptors for the contained structs were correctly duplicated, it looks like things should work as expected. Cloning archive_read_filter.data and archive_read_data_node.data is tricky because they point to structs that contain nested pointers, file descriptors and other things that need to be cloned carefully. If instead of using a void pointer in these places we used a pointer to a new 'cloneable' struct with a vtable initialized depending on the actual data, we could write a generic clone function for archive_read_filter. Once archive_read_filter is cloneable, we could clone archive_read and then archive. Does this sound feasible?
Reply all
Reply to author
Forward
0 new messages