Alexandru <
alexandr...@meshparts.de> wrote:
> I have a procedure that unpacks files given by a list of file paths from an archive like this:
>
> proc ::meshparts::AssemblyArchiveUnpack {zipfile {paths {}} {targetpaths {}}} {
Confustion above for yourself in the future. A zip file is not a tar
file, and a tar file is not a zip file (zip and tar are two very
different formats). Having the variable of the name be 'zipfile'
implies a "zip" not a "tar" at first glance.
> set f [open $zipfile rb]
> fconfigure $f -encoding binary -translation lf -eofchar {}
> zlib push gunzip $f
> if {[llength $paths]==0} {
> set result [tar::untar $f -chan]
> } else {
> foreach path $paths targetpath $targetpaths {
> set dir [file dirname $targetpath]
> set code [catch {file mkdir $dir} err]
> if {$code} {
> ::meshparts::message "*** [mc {%1$s} $err]" -errorlog 0
> continue
> }
> set result [tar::untar $f -file $path -dir $dir -chan]
> seek $f 0
> }
> }
> close $f
> return 1
> }
If your tar file is indeed gzipped, implied by this:
> zlib push gunzip $f
then simply doing this:
> seek $f 0
will not work, because just seeking to the beginning does not reset the
gunzip state to the same as it was at initial file opening. Which is
most likely why things are failing for you.
Try closing and reopening the file inside the loop. If that works,
then this was the cause.
> For me, It looks like the untar procedure has a bug.
Looks to me like you are creating the problem by trying to seek around
inside gzipped data. You also have to be able to reset the gunzip
uncompress state to the identical state it was in for the file offset to
make that work.
If you can't formulate a glob pattern for the set of files you want to
extract, then you'll have to do one of four things:
1) unpack the entire tar file into a temporary location, then move out
the files of interest and delete the unwanted files
2) close and reopen the file inside the loop around tar::untar. But you
are still left with scanning all of the preceeding tar data up to the
file of interest, which means you are quite close to an O(N^2)
complexity here
3) Create your own 'untar' by making calls into the tar module
internals to read file headers, decide if the header is for a file of
interest, and extract the file if so. This, however, does mean you are
calling procs that are not documented as part of the visible api to the
tar module, so should the internals change, your code would break until
you adapted. This method, however, does give you the most efficient
extract, because only a single pass over the tar file is needed.
4) Extend the tar module's untar proc to take an additional parameter
that is a list of filenames to match tar entries against and extract
each when found, and consider contributing the changes back to Tcllib.
This has the identical benefits of #3, with the added benefit that if
accepted, your change becomes part of the documented API so less likely
to change "out from under you" in the future.