open() in binary vs. text mode

Bob Roberts

unread,

Mar 20, 2003, 1:03:17 PM3/20/03

to

I just finished tracking down a cross-platorm bug. The problem was
that I didn't open() a file in binary ("rb") mode. What exactly does
the binary flag do on windows? What is it's purpose?

Syver Enstad

unread,

Mar 20, 2003, 1:50:51 PM3/20/03

to

bobn...@byu.edu (Bob Roberts) writes:

On windows text files are written with \r\n (carriage return and
linefeed) as the end of line marker. On Macintosh \r is the end of
line marker, and on *nix systems just \n is the end of line marker.

When opening a file in 'r' mode under windows the library translates
all the \r\n in your file into \n. When you write to a file with 'w'
under windows, library translates all \n your data to \r\n in the
file.

To avoid this auto translation, you can specify b ('rb' or 'wb') to
indicate that the file library should not translate (useful for binary
files, therefore called binary or mode).

Hope this clears up things.

--

Vennlig hilsen

Syver Enstad

Jp Calderone

unread,

Mar 20, 2003, 1:57:19 PM3/20/03

to

In text mode, \n's written to disk are translated to \r\n's before they
are actually written. This translation is reversed when reading. Binary
files have no translations performed on them.

Jp

--
Examinations are formidable even to the best prepared, for
even the greatest fool may ask more the the wisest man can answer.
-- C.C. Colton
--
up 13:58, 4 users, load average: 0.55, 0.57, 0.47

Hal Wine

unread,

Mar 20, 2003, 10:03:15 PM3/20/03

to

As others explained, sometimes the binary flag causes different
behavior w.r.t. line endings. However, it can (should) always
serve as documentation that the file is not a text file native to
the current platform. (I use the rule "all files are binary
unless provably text".)

Since you mention cross platform work, let me clarify some
terminology from one of the other posts. It's a bit pendantic,
but a distinction that will never steer you wrong.

'\n', '\r' and the like are not characters. They are escape
sequences (meta characters) whose bit pattern is defined by the C
compiler used to generate python. (Same issue in Perl, Tcl etc.)

If you find yourself writing code that cares about the bit value
representation of an escape character, you are dealing with a
binary file, and should not use escape characters.

If you need to refer to specific bit patterns in your strings,
use a constant you define, e.g.
CRLF = '\x0d\x0a'

--Hal (who learned this the hard way years ago on a platform
where different compiler vendors had different ideas of the
internal bit pattern of \n ...)

P.S. if you know the file is a text file (but perhaps from
another platform), you can normalize the input string thusly:
contents = open( "foo", "rb").read()
contents.replace( '\x0d\x0a', '\n' )
contents.replace( '\r', '\n' )
Now contents looks like a native platform text string. (I _think_
all the platforms that used LFCR as a separator are long dead now...)

Bob Roberts

unread,

Mar 21, 2003, 2:45:05 PM3/21/03

to

Syver Enstad <syver-e...@online.no> wrote in message news:<uadfpg...@online.no>...

> bobn...@byu.edu (Bob Roberts) writes:
>
> > I just finished tracking down a cross-platorm bug. The problem was
> > that I didn't open() a file in binary ("rb") mode. What exactly does
> > the binary flag do on windows? What is it's purpose?
>
> On windows text files are written with \r\n (carriage return and
> linefeed) as the end of line marker. On Macintosh \r is the end of

....

This problem does not appear to have anything to do with the \r\n vs.
\r vs \n problem.

When in windows, reading in text mode, if it came across ASCII
character 26, it would quit and not read any more of the file. This
does not happen on other platforms or on windows when reading in
binary mode.

Why would a specific character cause this behavior?

Tim Peters

unread,

Mar 21, 2003, 3:04:54 PM3/21/03

to

[Bob Roberts]

> This problem does not appear to have anything to do with the \r\n vs.
> \r vs \n problem.
>
> When in windows, reading in text mode, if it came across ASCII
> character 26, it would quit and not read any more of the file. This
> does not happen on other platforms or on windows when reading in
> binary mode.
>
> Why would a specific character cause this behavior?

Backward compatibility with DOS 1.0, which didn't save the sizes of files in
the system file directory. They needed some other way to recognize the end
of a file, and picked on Ctrl+Z (chr(26)) to mean EOF. This is specific to
CPM and DOS derivatives (like Windows), AFAIK.

It's fine by the ANSI C standard, BTW: the distinction between text mode
and binary mode is part of standard C, and I/O on text mode files is
essentially undefined if you print any characters outside of printable 7-bit
ASCII, tab, and newline. The idea that text mode and binary mode is the
same is universal across Unix variants, but I think historically rare before
Unix.

Grant Edwards

unread,

Mar 21, 2003, 4:38:07 PM3/21/03

to

In article <mailman.1048277318...@python.org>, Tim Peters wrote:

>> When in windows, reading in text mode, if it came across ASCII
>> character 26, it would quit and not read any more of the file.
>> This does not happen on other platforms or on windows when
>> reading in binary mode.
>>
>> Why would a specific character cause this behavior?
>
> Backward compatibility with DOS 1.0, which didn't save the
> sizes of files in the system file directory. They needed some
> other way to recognize the end of a file, and picked on Ctrl+Z
> (chr(26)) to mean EOF. This is specific to CPM and DOS
> derivatives (like Windows), AFAIK.

Which means it's probably ultimately feature left over from a
DEC filesystem used by either RSX-11 or RSTS back in the days
of the PDP-11...

--
Grant Edwards grante Yow! I've got to get
at these SNACK CAKES to NEWARK
visi.com by DAWN!!

Jon Nicoll

unread,

Mar 21, 2003, 6:47:40 PM3/21/03

to

Hi Bob

[...]

> This problem does not appear to have anything to do with the \r\n vs.
> \r vs \n problem.
>
> When in windows, reading in text mode, if it came across ASCII
> character 26, it would quit and not read any more of the file. This
> does not happen on other platforms or on windows when reading in
> binary mode.
>
> Why would a specific character cause this behavior?

Because 26 is 0x1A, or Control-Z, and in ye olde days of DOS 2.x, this
byte value was used as a marker value for 'End of file'. You still
occasionally see this as some form of option on Text editors wishing
to preserve backwards compatability.

Jon N

John Machin

unread,

Mar 21, 2003, 8:27:46 PM3/21/03

to

bobn...@byu.edu (Bob Roberts) wrote in message news:<c4e6b17d.03032...@posting.google.com>...

>
> When in windows, reading in text mode, if it came across ASCII
> character 26, it would quit and not read any more of the file. This
> does not happen on other platforms or on windows when reading in
> binary mode.
>
> Why would a specific character cause this behavior?

Ctrl-Z is treated as end-of-file. The behaviour is inherited from CP/M
via MS-DOS, as was use of CRLF as line terminator. CP/M files were a
whole number of 128-byte sectors. The convention was that in files
containing text, the actual text was terminated by ctrl-Z, and the
remainder of the sector (usually) padded out with NULs. The "stdio"
kits for C compilers on CP/M, MS-DOS & Windows treat input ctrl-Z as
EOF. I.e. this is not a Python-only feature.

Unfortunately many applications don't apply elementary validations
(like "names shouldn't contain control characters"), so one can be
supplied with files with embedded ctrl-Zs (typically a typo for
shift-Z). Consequently one needs to be ctrl-Z-aware; paranoid
programmers read data files in binary mode and validate their
contents.

John Machin

unread,

Mar 21, 2003, 8:35:49 PM3/21/03

to

Hal Wine <hal_...@yahoo.com> wrote in message news:<3E7A80F3...@yahoo.com>...

>
> P.S. if you know the file is a text file (but perhaps from
> another platform), you can normalize the input string thusly:
> contents = open( "foo", "rb").read()
> contents.replace( '\x0d\x0a', '\n' )
> contents.replace( '\r', '\n' )

Or just [no pun intended] use the new (in Python 2.3) "rU" (universal
text format) ...

> Now contents looks like a native platform text string. (I _think_
> all the platforms that used LFCR as a separator are long dead now...)

What platforms used LFCR? I had to deal once upon a time with a data
source that provided files that used LFCR but I just assumed that they
were crazy ...