Alexandru <
alexandr...@meshparts.de> wrote:
> Hi,
>
> It seems, that there is no encoding option for the command
> tar:encoding.
There is also no comamnd tar::encoding.
Tar (the archive format) is so old that it does not have an 'encoding'.
It just stores bytes, and upper level code has to decide what to do
with the bytes.
> I use this command to create an archive of multiple files:
>
> set fd [open $zipfile wb]
> zlib push gzip $fd -level 9
> tar::create $fd $paths -chan
> close $fd
>
> Now I realize, all file with Umlaute in the path/name are wrongly
> encoded when unpacking the archive with the Windows program 7z.
The issue here could be Tcllib tar, or it could be 7z. Right now you
don't know, and Tar (the format) has no way to communicate a flag that
says "filenames herein are UTF8 (or any other encoding)".
> What could be the solution to this issue?
Several:
1) (easiest, but may not be practical) -- don't use Umlaute's (or other
non-ascii characters) in filenames.
2) If you look through the source of Tcllib's tar, you will find that
it inserts the filenames into the tar header block using binary format
"a" (which simply inserts the codepoint value modulo 256, and that will
only be correct for an 8-bit fixed length encoding). Which likely
means the breakage happens during tar::create.
If you look further up the call chain, you find that directories are
resolved to lists of filenames via glob, and the proc which writes each
tar component is fed a filename to work with.
So, you could use tcllib's find to pre-aquire the filenames you want to
pack into the Tar file, pre-encode them into the appropriate encoding
using 'encoding convertto', and output the tar file by calling
'formatHeader' with the 'encoded' name, and fcopying the file contents
yourself.
3) You could patch tcllib's tar to encode filenames to an encoding
(including allowing specification of that encoding type via an option
to tar::create). And then contribute the patches back to Tcllib so
everyone benefits.