Output binary files

39 views
Skip to first unread message

Steve Checkoway

unread,
May 14, 2009, 4:42:47 AM5/14/09
to
I've been trying to coax TeX into writing a binary file purely for its
own sake. I thought something like the following would work.

\catcode`\^^00=12
\newwrite\file
\immediate\openout\file=foo.txt
\immediate\write\file{^^00}
\immediate\closeout\file
\end

I would expect TeX to output the two bytes 0x00 0x0a. Instead of the
0, it outputs the three bytes ^^@. The same thing happens using
category code 11. No other category code seems appropriate and I can't
think of another way to write data to a file.

Is this doomed to failure or is there a way to output binary data from
TeX?

zappathustra

unread,
May 14, 2009, 4:29:17 PM5/14/09
to

TeX has the following mechanism: given the 128 ASCII characters, if a
character X has number n, then ^^X represents the character that has
number n-64 or n+64; for instance, carriage return has number 13 and M
has number 77, so ^^M denotes carriage return.

Another mechanism is: given a 256 character table, ^^xy, where x and y
are 0-9, a-f, is the denotation whose number is xy in hexadecimal.

So, when you say \catcode`\`^^00=12, you give category 12 to the null
character (since ^^00 is obviously 0). When you ask TeX to write this
character to an external file, it uses the first process, i.e. it
denotes the null character with ^^@, since @ has number 64. (Why TeX
doesn't use the second process, I don't know.)

Turning to category 11 doesn't change anything, since the only
difference between 11 and 12 is the ability to form control sequences.

Now, I really can't see what you were trying to do. I'm no computer
scientist, so you should explain more slowly... Anyway, TeX doesn't
know anything about bytes, it only know characters.

Best,

Paul

Dan Luecking

unread,
May 14, 2009, 5:41:24 PM5/14/09
to
On Thu, 14 May 2009 01:42:47 -0700 (PDT), Steve Checkoway
<schec...@gmail.com> wrote:

>I've been trying to coax TeX into writing a binary file purely for its
>own sake. I thought something like the following would work.
>
>\catcode`\^^00=12
>\newwrite\file
>\immediate\openout\file=foo.txt
>\immediate\write\file{^^00}
>\immediate\closeout\file
>\end
>
>I would expect TeX to output the two bytes 0x00 0x0a. Instead of the
>0, it outputs the three bytes ^^@.

I assume you meant 4 bytes: those three and 0x0a.

This is built into TeX by default: non-printable ascii
characters are written to files (and the screen) in the above
control character format. Reading it into TeX will result in
a token with character code 0x00.

Some (perhaps most) versions of tex have a command line
option to write such things as single bytes. In my TeX
Live 2008 it is
tex --8bit texfile
(There's also a "--translate-file" option to read in a
code page translation file (*.tcx) that makes only selected
characters writable as single bytes.)

Unfortunately, this state is not testable withing a TeX
document, except by writing a file and examining it later
after changing the category of "^" to 11.


Dan
To reply by email, change LookInSig to luecking

Heiko Oberdiek

unread,
May 14, 2009, 4:37:41 PM5/14/09
to
Steve Checkoway <schec...@gmail.com> wrote:

It depends on your TeX engine. For instance, pdfTeX knows option
"-8bit":

\newwrite\file
\immediate\openout\file=test.data\relax
\begingroup
\catcode`\@=11 %
\catcode0=12 %
\def\x{}%
\count@=0 %
\loop
\ifnum\count@<256 %
\lccode0=\count@
\lowercase{%
\edef\x{\x^^@}%
}%
\advance\count@ by 1 %
\repeat
\immediate\write\file{\x}%
\endgroup
\immediate\closeout\file
\csname @@end\endcsname\end

$ pdftex -8bit test.tex
=> Result is file "test.data" with bytes 0..255 and an end of line.

TeX compilers based on Web2C know option -translate-file,
thus you can write a file byte.tcx with
0x00 0x00 %
0x01 0x01 %
...
0xff 0xff %

$ tex -translate-file=byte.tcx test.tex

Yours sincerely
Heiko <ober...@uni-freiburg.de>

Dan

unread,
May 14, 2009, 11:40:48 PM5/14/09
to
On May 14, 3:37 pm, Heiko Oberdiek <oberd...@uni-freiburg.de> wrote:

> Steve Checkoway <schecko...@gmail.com> wrote:
> > I've been trying to coax TeX into writing a binary file purely for its
> > own sake. I thought something like the following would work.
>
> > \catcode`\^^00=12
> > \newwrite\file
> > \immediate\openout\file=foo.txt
> > \immediate\write\file{^^00}
> > \immediate\closeout\file
> > \end
>
> > I would expect TeX to output the two bytes 0x00 0x0a. Instead of the
> > 0, it outputs the three bytes ^^@. The same thing happens using
> > category code 11. No other category code seems appropriate and I can't
> > think of another way to write data to a file.
>
> > Is this doomed to failure or is there a way to output binary data from
> > TeX?
>
> It depends on your TeX engine. For instance, pdfTeX knows option
> "-8bit":

Actually, I tried --8bit with both tex and pdftex and both
worked as expected (TeX Live 2008). Similar options were
available in emTeX (before pdftex existed) and MiKTeX
(certainly before pdftex was common). This depends
more on the TeX distribution than the engine.

> TeX compilers based on Web2C know option -translate-file,
> thus you can write a file byte.tcx with
> 0x00 0x00 %
> 0x01 0x01 %
> ...
> 0xff 0xff %

That would almost be natural.tcx, which already exists in
TL2008 (as well as some previous TLs) . Unfortunately, that
file seems to have a bug: ^^7f is omitted while comments in
that file indicate it shouldn't be.

[The file cp227.tcx contains the comment that TeX makes
codes 32 through 127 printable. In fact 127 is not.]


Dan

Heiko Oberdiek

unread,
May 15, 2009, 3:40:41 AM5/15/09
to
Dan <luec...@uark.edu> wrote:

> On May 14, 3:37�pm, Heiko Oberdiek <oberd...@uni-freiburg.de> wrote:
> > It depends on your TeX engine. For instance, pdfTeX knows option
> > "-8bit":
>
> Actually, I tried --8bit with both tex and pdftex and both
> worked as expected (TeX Live 2008).

It also depends on the version:
tex (Web2C 7.5.2) does not know -8bit, whereas
tex (Web2C 7.5.7) supports it.

Yours sincerely
Heiko <ober...@uni-freiburg.de>

Steve Checkoway

unread,
May 15, 2009, 1:56:49 PM5/15/09
to
On May 15, 12:40 am, Heiko Oberdiek <oberd...@uni-freiburg.de> wrote:
> Dan <lueck...@uark.edu> wrote:

On May 14, 2:41 pm, Dan Luecking <LookIn...@uark.edu> wrote:
> On Thu, 14 May 2009 01:42:47 -0700 (PDT), Steve Checkoway
>
> >I would expect TeX to output the two bytes 0x00 0x0a. Instead of the
> >0, it outputs the three bytes ^^@.
>
> I assume you meant 4 bytes: those three and 0x0a.
>

Instead of the 0, it outputs 3 bytes, so 4 bytes in total.

On May 14, 1:37 pm, Heiko Oberdiek <oberd...@uni-freiburg.de> wrote:
> > Is this doomed to failure or is there a way to output binary data from
> > TeX?
>

> It depends on your TeX engine. For instance, pdfTeX knows option
> "-8bit":

Great, thanks! I realize this is perhaps a strange thing to do.

- S

Reply all
Reply to author
Forward
0 new messages