Trouble with binary files?

agemo...@yahoo.com

unread,

Sep 19, 2003, 3:00:53 PM9/19/03

to

I'm trying to write a program that will read a binary
file into an buffer, do stuff with it, and then write
the result back into another file. However, I'm
running into a problem.

I haven't been able to find a way that will read in
more than the first 160 bytes (of a 910 byte file).
I've tried using each_byte and looping with getc, as
well as storing the results in a string or an array.

I've never had this sort of problem working with text
files. Is there something else I have to do to be able
to work with binary data?

-Morgan.

__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

Mike Stok

unread,

Sep 19, 2003, 3:05:47 PM9/19/03

to

In article <2003091919005...@web14006.mail.yahoo.com>,

<agemo...@yahoo.com> wrote:
>I'm trying to write a program that will read a binary
>file into an buffer, do stuff with it, and then write
>the result back into another file. However, I'm
>running into a problem.
>
>I haven't been able to find a way that will read in
>more than the first 160 bytes (of a 910 byte file).
>I've tried using each_byte and looping with getc, as
>well as storing the results in a string or an array.
>
>I've never had this sort of problem working with text
>files. Is there something else I have to do to be able

If you're on a windows platform this can happen if you have a ^Z in your
file.

If this is your problem then binmode may help:

[mike@ratdog mike]$ ri binmode
This is a test 'ri'. Please report errors and omissions
on http://www.rubygarden.org/ruby?RIOnePointEight

------------------------------------------------------------- IO#binmode
ios.binmode -> ios
------------------------------------------------------------------------
Puts ios into binary mode. This is useful only in
MS-DOS/Windows environments. Once a stream is in binary mode, it cannot
be reset to nonbinary mode.

Hope this helps,

Mike

--
mi...@stok.co.uk | The "`Stok' disclaimers" apply.
http://www.stok.co.uk/~mike/ | GPG PGP Key 1024D/059913DA
mi...@exegenix.com | Fingerprint 0570 71CD 6790 7C28 3D60
http://www.exegenix.com/ | 75D2 9EC4 C1C0 0599 13DA

Heinz Werntges

unread,

Sep 19, 2003, 3:08:40 PM9/19/03

to

agemo...@yahoo.com wrote:

Did you try IO#binmode ? Unix guys (like me) typically miss that
when working in a Windows environment.

Cheers,

-- Heinz

Michael Campbell

unread,

Sep 19, 2003, 3:11:30 PM9/19/03

to

> -----Original Message-----
> From: agemo...@yahoo.com [mailto:agemo...@yahoo.com]
> Sent: Friday, September 19, 2003 2:01 PM
> To: ruby-talk ML
> Subject: Trouble with binary files?
>
>
> I'm trying to write a program that will read a binary
> file into an buffer, do stuff with it, and then write
> the result back into another file. However, I'm
> running into a problem.
>
> I haven't been able to find a way that will read in
> more than the first 160 bytes (of a 910 byte file).
> I've tried using each_byte and looping with getc, as
> well as storing the results in a string or an array.
>
> I've never had this sort of problem working with text
> files. Is there something else I have to do to be able
> to work with binary data?

Are you using the "b" modifier on the open call?

Joey Gibson

unread,

Sep 19, 2003, 3:11:55 PM9/19/03

to

On 9/19/2003 3:00 PM, agemo...@yahoo.com wrote:

>I've never had this sort of problem working with text
>files. Is there something else I have to do to be able
>to work with binary data?
>

Are you on a Windows box and are you opening the file in binary mode? I had this same problem recently with the same results. The problem was a Ctrl-Z some way into the file that was being interpreted as EOF. The fix was to add a 'b' to the open flags:

File.open("foo.db", "rb") do ...

and then all was right with the world.

--
Dean saor, dean saor an spiorad. Is seinn d'orain beo.

http://www.joeygibson.com
http://www.joeygibson.com/blog/life/Wisdom.html

agemo...@yahoo.com

unread,

Sep 19, 2003, 3:22:09 PM9/19/03

to

--- Heinz Werntges

<wern...@informatik.fh-wiesbaden.de> wrote:
> >files. Is there something else I have to do to be
> able
> >to work with binary data?

> Did you try IO#binmode ? Unix guys (like me)

> typically miss that
> when working in a Windows environment.

Nope, hadn't seen that. And that fixes it.

I don't particularly understand *why* though. Those
first 160 bytes were being read properly, what stops
it from getting the rest? Why 160? Is it going to
suddenly require something else to be done when I go
over to the 1.73mb file that the program is designed
to process? (I wouldn't *think* so, but then I didn't
expect this 160 thing either...)

Anyway, thanks for the help.

agemo...@yahoo.com

unread,

Sep 19, 2003, 3:26:51 PM9/19/03

to

--- Joey Gibson <jo...@joeygibson.com> wrote:
> On 9/19/2003 3:00 PM, agemo...@yahoo.com wrote:
>
> >I've never had this sort of problem working with
> text
> >files. Is there something else I have to do to be
> able
> >to work with binary data?
> >
> Are you on a Windows box and are you opening the
> file in binary mode? I had this same problem
> recently with the same results. The problem was a
> Ctrl-Z some way into the file that was being
> interpreted as EOF. The fix was to add a 'b' to the
> open flags:

*checks the file* Next byte was hex 1A, which... is
Ctrl-Z. Well, that answers the other questions...

Tim Hammerquist

unread,

Sep 19, 2003, 7:11:28 PM9/19/03

to

<agemo...@yahoo.com> graced us by uttering:

> <wern...@informatik.fh-wiesbaden.de> wrote:
> > > files. Is there something else I have to do to be able to
> > > work with binary data?
>
> > Did you try IO#binmode ? Unix guys (like me) typically miss
> > that when working in a Windows environment.
>
> Nope, hadn't seen that. And that fixes it.
>
> I don't particularly understand *why* though. Those first 160
> bytes were being read properly, what stops it from getting the
> rest? Why 160? Is it going to suddenly require something else
> to be done when I go over to the 1.73mb file that the program
> is designed to process? (I wouldn't *think* so, but then I
> didn't expect this 160 thing either...)

You misunderstand the 160-byte barrier as being related to Ruby.
It's a Win32/DOS issue.

<story-mode>

Way back in PC-/MS-DOS days, it was decided that the
non-printable ASCII-26 (^Z) character would mark the end of a
textmode file. The difference between textmode and binmode of a
DOS file is important, though it need not ever have become an
issue.

The requirement for ^Z to terminate a textfile has since been
changed. However, (for backward compatibility?) when the ^Z *is*
encountered in a textmode file, DOS (and subsequently, Windows)
still set the EOF flag and stop reading.

</story-mode>

This becomes more of an issue when the default file open mode for
DOS/Win is in text mode, creating the need for a completely new
function call almost exclusively for DOS/Win platforms; in this
case, binmode(), which explicitly sets the file read mode to
binary, preventing the OS from stopping at the first ^Z (and from
changing line endings, blah, blah...).

As someone else mentioned above, this isn't an issue on Unix or
many other systems, since EOF on these OSes isn't determined by
file contents. I'm not if it's an issue for Macs, as they also
historically use different line endings. This also may have
changed with OS X; anyone know?

The moral of the story is:

Always call fh.binmode() before reading
any non-text file on non-Unix platforms.

HTH,
Tim Hammerquist
--
scanf() is evil.

Hal Fulton

unread,

Sep 19, 2003, 7:41:25 PM9/19/03

to

True, but let's be fair.

MSDOS stole many things from Unix, such as the notion of a
hierarchical directory structure and the use of < > | at the
shell level. (Many things were incompletely stolen, unfortunately.)

The binmode/textmode distinction came from Unix. At that time
Unix had an EOF character of control-D (which explains the ^D we
still type occasionally at the terminal).

So historically Unix's behavior with respect to ^D was the same as
DOS's with respect to ^Z. But Unix/Linux moved beyond that, and
DOS/Windows never did.

Hal

Steven Jenkins

unread,

Sep 19, 2003, 7:57:05 PM9/19/03

to

Hal Fulton wrote:
> True, but let's be fair.
>
> MSDOS stole many things from Unix, such as the notion of a
> hierarchical directory structure and the use of < > | at the
> shell level. (Many things were incompletely stolen, unfortunately.)
>
> The binmode/textmode distinction came from Unix. At that time
> Unix had an EOF character of control-D (which explains the ^D we
> still type occasionally at the terminal).

No. Unix never distinguished between text and binary files. Unix did
(and does) interpret ASCII EOT (ctrl-d) as an end-of-input indicator for
terminal devices, but it never used any in-band character to mark the
end of a file. The EOT never got past the terminal driver, and was never
delivered to an application.

Steve

Hal Fulton

unread,

Sep 19, 2003, 10:10:24 PM9/19/03

to

If Unix never distinguished between text and binary files, what
was the binary mode flag for?

Hal

Shashank Date

unread,

Sep 19, 2003, 10:26:09 PM9/19/03

to

"Hal Fulton" <hal...@hypermetrics.com> wrote in message

> If Unix never distinguished between text and binary files, what
> was the binary mode flag for?

For CR-LF may be ... just a wild guess.

Daniel Kelley

unread,

Sep 19, 2003, 10:39:08 PM9/19/03

to

>>>>> "Hal" == Hal Fulton <hal...@hypermetrics.com> writes:

Hal> Steven Jenkins wrote:
>> Hal Fulton wrote:
>>> True, but let's be fair.
>>>
>>> MSDOS stole many things from Unix, such as the notion of a
>>> hierarchical directory structure and the use of < > | at the
>>> shell level. (Many things were incompletely stolen,
>>> unfortunately.)
>>>
>>> The binmode/textmode distinction came from Unix. At that time
>>> Unix had an EOF character of control-D (which explains the ^D
>>> we still type occasionally at the terminal).
>>
>>
>> No. Unix never distinguished between text and binary
>> files. Unix did (and does) interpret ASCII EOT (ctrl-d) as an
>> end-of-input indicator for terminal devices, but it never used
>> any in-band character to mark the end of a file. The EOT never
>> got past the terminal driver, and was never delivered to an
>> application.

Hal> If Unix never distinguished between text and binary files,
Hal> what was the binary mode flag for?

I recall that the whole ^Z terminator from CP/M and the fact that file
sizes were always multiples of 128 bytes (saving 7 bits in a size
field being important at the time), so a text file needed a special
character to mark the end of the text. MSDOS carried that "tradition"
on, to ease porting of CP/M applications to DOS, and, well, saving bits
was important at that time, at least it *seemed* to be important!

d.k.

--
Daniel Kelley - San Jose, CA
For email, replace the first dot in the domain with an at.

Michael Campbell

unread,

Sep 19, 2003, 11:20:51 PM9/19/03

to

> If Unix never distinguished between text and binary files, what
> was the binary mode flag for?

Don't equate unix and C, for the 2 are not the same thing.

My old K&R C book says of this, "...SOME systems distinguish between
text and binary files; for the latter, a "b" must be appended to the
mode string." (emphasis mine).

Or were you referring to something else?

YANAGAWA Kazuhisa

unread,

Sep 19, 2003, 11:24:31 PM9/19/03

to

In Message-Id: <3F6BB6FB...@hypermetrics.com>
Hal Fulton <hal...@hypermetrics.com> writes:

> If Unix never distinguished between text and binary files, what
> was the binary mode flag for?

For ANSI-C compliance. From fopen(3) of FreeBSD 4.8-RELEASE:

The mode string can also include the letter ``b'' either as a third char-
acter or as a character between the characters in any of the two-charac-
ter strings described above. This is strictly for compatibility with
ISO/IEC 9899:1990 (``ISO C89'') and has no effect; the ``b'' is ignored.

I believe most of Unix like platforms stand on a similar position.

--
kj...@dm4lab.to September 20, 2003
A man is known by the company he keeps.

Jim Weirich

unread,

Sep 19, 2003, 11:29:05 PM9/19/03

to

On Fri, 2003-09-19 at 22:10, Hal Fulton wrote:

> If Unix never distinguished between text and binary files, what
> was the binary mode flag for?

Unix originally didn't have one. Only has it now for compatibility.

--
-- Jim Weirich jwei...@one.net http://onestepback.org
-----------------------------------------------------------------
"Beware of bugs in the above code; I have only proved it correct,
not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas)

Hal Fulton

unread,

Sep 20, 2003, 12:59:44 AM9/20/03

to

Jim Weirich wrote:
> On Fri, 2003-09-19 at 22:10, Hal Fulton wrote:
>
>
>>If Unix never distinguished between text and binary files, what
>>was the binary mode flag for?
>
>
> Unix originally didn't have one. Only has it now for compatibility.
>

I'll have to assume you're correct, as I can't prove my position.

But I definitely remember being led to believe that EOT was an
end-of-file marker. And I remember wondering how it worked for
binary files, did it store the length in the inode or what?

This was System III, around 1980 (out of date even then).

I'll have to dig into the old kernel to see how it actually
worked. I only have it in hardcopy, though.

Hal

Steven Jenkins

unread,

Sep 20, 2003, 1:44:01 AM9/20/03

to

Hal Fulton wrote:
> If Unix never distinguished between text and binary files, what
> was the binary mode flag for?

The 'b' modifier was added to ANSI C to support non-Unix execution
environments that distinguish between text and binary files. It didn't
exist in Unix until ANSI C required it; since then, it's been a no-op.

http://www.lysator.liu.se/c/rat/d9.html#4-9-2

Steve

Hal Fulton

unread,

Sep 20, 2003, 1:50:06 AM9/20/03

to

Thanks, Steve.

Very, very frustrating to me when the facts don't fit my memories...
I still think I was misinformed at some point about EOT and such.

I'm not sure when ANSI C came along, but I think it was *after* I
learned C and Unix, and *after* the introduction of the IBM PC.

Have to go look up Tobin Maginnis and see what he says...

Hal

Tim Hammerquist

unread,

Sep 20, 2003, 5:29:58 AM9/20/03

to

YANAGAWA Kazuhisa graced us by uttering:

> For ANSI-C compliance. From fopen(3) of FreeBSD 4.8-RELEASE:
>
> The mode string can also include the letter ``b'' either
> as a third char- acter or as a character between the
> characters in any of the two-charac- ter strings described
> above. This is strictly for compatibility with ISO/IEC
> 9899:1990 (``ISO C89'') and has no effect; the ``b'' is
> ignored.
>
> I believe most of Unix like platforms stand on a similar
> position.

fopen(3) on my Debian box adds:
[...]; the ``b'' is ignored on all POSIX conforming systems,
including Linux. (Other systems may treat text files and
binary files differently, and adding the ``b'' may be a good
idea if you do I/O to a binary file and expect that your
program may be ported to non-Unix environments.)

If the POSIX standard dictates that 'b' is ignored, it must also
dictate that binary and text files are treated identically, no?

[...checking...]

Correct. From the POSIX spec
<http://www.opengroup.org/onlinepubs/007904975/toc.htm>:

The character 'b' shall have no effect, but is allowed for
ISO C standard conformance.

(That's the subjunctive mood, so any implementation that is
affected by the 'b' flag is _not_ POSIX compliant.)

And from dmr's "The UNIX Time-Sharing System" (C)1974
<http://cm.bell-labs.com/cm/cs/who/dmr/cacm.html>:

III. THE FILE SYSTEM
<snip>
3.1 Ordinary files

A file contains whatever information the user places on it,
for example, symbolic or binary (object) programs. No
particular structuring is expected by the system. A file of
text consists simply of a string of characters, with lines
demarcated by the newline character. Binary programs are
sequences of words as they will appear in core memory when
the program starts executing. A few user programs manipulate
files with more structure; for example, the assembler
generates, and the loader expects, an object file in a
particular format. However, the structure of files is
controlled by the programs that use them, not by the system.

This seems to be a CPM/DOS/Win32 issue.

Cheers,
Tim Hammerquist
--
Did I mention that I can't tell you how to get rich?
If I could, I'd be rich, and not here.
-- Martien Verbruggen in comp.lang.perl.misc

Steven Jenkins

unread,

Sep 20, 2003, 10:37:37 AM9/20/03

to

Hal Fulton wrote:
> Very, very frustrating to me when the facts don't fit my memories...
> I still think I was misinformed at some point about EOT and such.

Your batting average is still pretty good.

> I'm not sure when ANSI C came along, but I think it was *after* I
> learned C and Unix, and *after* the introduction of the IBM PC.

The standard was published in 1989. K&R second edition (1988) mentions
the "b" modifier to fopen() (the standard was nearing ratification at
the time), but the first edition (1978) doesn't.

I remember all this because I was trying to write code in the mid-80s to
run on both BSD Unix and MS-DOS. Turbo C required the "b" modifier or
some library function (binmode()?), but the (pre-gcc) Unix C compiler
didn't allow them. I had to do it with preprocessor conditionals. Yuck.

I grumble (quietly) when other people carry on off-topic discussions,
and now I'm doing it. We return now to ruby-talk, already in progress.

Steve

Benjamin Peterson

unread,

Sep 22, 2003, 10:08:36 AM9/22/03

to

Tim Hammerquist <t...@vegeta.ath.cx> wrote in message news:<slrnbmn36...@vegeta.ath.cx>...

> However, (for backward compatibility?) when the ^Z *is*
> encountered in a textmode file,

...or, better yet, when any text character whose encoding happens to
include an 0x1a is encountered! My, that *was* annoying.

> As someone else mentioned above, this isn't an issue on Unix or
> many other systems,

Or indeed on Windows, provided you avoid ruby :I

> The moral of the story is:
>
> Always call fh.binmode() before reading
> any non-text file on non-Unix platforms.

You have to call it before reading *any* file, unless you just know
that only ASCII was used, for the reason above. I think ruby is the
only software I've ever used that has this issue. I suppose ruby must
check for the 0x1a *before* allowing for the encoding system.

Michael Campbell

unread,

Sep 22, 2003, 10:17:27 AM9/22/03

to

> > As someone else mentioned above, this isn't an issue on Unix or
> > many other systems,
>
> Or indeed on Windows, provided you avoid ruby :I

Or C, or perl, or a host of other languages. It's not ruby specific.
(Has perl "magiced" around this? I haven't used it much since 4.0x)

Tim Hammerquist

unread,

Sep 22, 2003, 11:01:47 AM9/22/03

to

Benjamin Peterson graced us by uttering:

> Tim Hammerquist <t...@vegeta.ath.cx> wrote:
>> However, (for backward compatibility?) when the ^Z *is*
>> encountered in a textmode file,
>
> ...or, better yet, when any text character whose encoding
> happens to include an 0x1a is encountered! My, that *was*
> annoying.

Multi-byte encodings weren't recognized as "text files" by
DOS/Win until fairly recently, if currently (I don't use them),
so I probably wouldn't have tried to get away with it.

>> As someone else mentioned above, this isn't an issue on Unix
>> or many other systems,
>
> Or indeed on Windows, provided you avoid ruby :I

No, Perl has a binmode() function as well, for exactly this
issue. This question's asked on c.l.p.m almost weekly.

>> The moral of the story is:
>>
>> Always call fh.binmode() before reading
>> any non-text file on non-Unix platforms.
>
> You have to call it before reading *any* file, unless you just
> know that only ASCII was used, for the reason above. I think
> ruby is the only software I've ever used that has this issue.
> I suppose ruby must check for the 0x1a *before* allowing for
> the encoding system.

AFAIK, Ruby just calls the system read calls, as do all the other
scripting languages. If there's any "magic" in the Ruby
implementation, that would be interesting to see.

How about:
| Always call fh.binmode() before reading any non-7-bit-clean
| file on non-Unix platforms.

Cheers,
Tim Hammerquist
--
It's there as a sop to former Ada programmers. :-)
-- Larry Wall regarding 10_000_000 in <11...@jpl-devvax.JPL.NASA.GOV>

furlan primus

unread,

Sep 22, 2003, 9:10:17 PM9/22/03

to

On Fri, 2003-09-19 at 19:41, Hal Fulton wrote:

> True, but let's be fair.
>
> MSDOS stole many things from Unix, such as the notion of a
> hierarchical directory structure and the use of < > | at the
> shell level. (Many things were incompletely stolen, unfortunately.)
>
> The binmode/textmode distinction came from Unix. At that time
> Unix had an EOF character of control-D (which explains the ^D we
> still type occasionally at the terminal).
>
> So historically Unix's behavior with respect to ^D was the same as
> DOS's with respect to ^Z. But Unix/Linux moved beyond that, and
> DOS/Windows never did.
>
> Hal

I thought that the Control-Z usage was borrowed from CP/M in order to
make it easier to port programs from that to MS-DOS.

http://www.finseth.com/~fin/craft/Chapter-5.html

--

http://thispaceavailable.uxb.net/blog/index.html

The use of COBOL cripples the mind; its teaching should, therefore, be regarded as a criminal offense. -- E. W. Dijkstra