http://docs.python.org/tutorial/inputoutput.html#reading-and-writing-files
"Windows makes a distinction between text and binary files;
"the end-of-line characters in text files are automatically altered
"slightly when data is read or written.
I don't see any obvious way to at docs.python.org to get that corrected: Is
there some standard procedure?
Steve
What's wrong with it?
Carl Banks
Perhaps because it's unclear whether it is Windows, or Python, or both,
which is automatically altering the data.
As for getting the docs changed, you can submit a bug request at the bug
tracker:
--
Steven
If it's the former, just lookup the function in the reference
documentation (eg. the chm file in a Windows installation).
The way to control the behavior is with the 'mode' parameter to open().
If mode has a 'b' in it, the file is considered binary, which means no
translation is done. If the mode has a 'u' in it, or neither 'b' nor
'u', then some translation is done. The purpose of the translation is
to let the program always use \n to mean end of line, for code that'll
be portable between the various operating system conventions. Windows
typically does text files with \r\n at the end of each line. Some Macs
do just a \r, and Unix and Linux use a \n.
One reason a programmer has to be aware of it is that he/she may be
reading or writing a file from a different operating environment, for
example, a script that'll be uploaded to a web server running a
different OS.
1) Windows does not make a distinction between text and binary files.
2) end-of-line characters in text files are not automatically altered by
Windows.
(david)
Thanks, we've submitted a bug request,
http://bugs.python.org/issue6301
Steve
The Windows implementation of the C standard makes the distinction. E.g. using
stdio to write out "foo\nbar\n" in a file opened in text mode will result in
"foo\r\nbar\r\n" in the file. Reading such a file in text mode will result in
"foo\nbar\n" in memory. Reading such a file in binary mode will result in
"foo\r\nbar\r\n". In your bug report, you point out several proprietary APIs
that do not make such a distinction, but that does not remove the
implementations of the standard APIs that do make such a distinction.
http://msdn.microsoft.com/en-us/library/yeby3zcb.aspx
Perhaps it's a bit dodgy to blame "Windows" per se rather than its C runtime,
but I think it's a reasonable statement on the whole.
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
> 1) Windows does not make a distinction between text and binary files.
Of course it does.
Python 2.6.2 (r262:71605, Apr 14 2009, 22:40:02) [MSC v.1500 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> f = open("test", "wb")
>>> f.write("abc\x1Adef")
>>> f.close()
>>> f = open("test", "r") # read as text
>>> f.read()
'abc'
>>> f.close()
>>> f = open("test", "rb") # read as binary
>>> f.read()
'abc\x1adef'
>>> f.close()
>>>
--
Steven
Which is where I came in: I was looking for simple file IO in the tutorial.
The tutorial tells me something false about Windows, rather than something
true about Python.
I'm looking at a statement that is clearly false (for anyone who knows
anything about Windows file systems and Windows file io), which leaves the
Python behaviour completely undefined (for anyone who knows nothing about
Python).
I understand that many of you don't really have any understanding of
Windows, much less any background with Windows, and I'm here to help. That
part was simple.
The next part is where I can't help: What is the behaviour of Python?
I'm sure you don't think that tutorial is only for readers who can guess
that they have to extrapolate from the behaviour of the Visual C library in
order to work out what Python does.
Steve
Ok, Python makes a distinction between text and binary files.
Steve.
> "Steven D'Aprano" <ste...@REMOVE.THIS.cybersource.com.au> wrote in
> message news:pan.2009.06...@REMOVE.THIS.cybersource.com.au...
>> On Thu, 18 Jun 2009 10:36:01 +1000, steve wrote:
>>
>>> 1) Windows does not make a distinction between text and binary files.
>>
>> Of course it does.
...
> Ok, Python makes a distinction between text and binary files.
Microsoft have reported a bug where cmd.exe fails to recognise EOF in a
text file:
http://support.microsoft.com/kb/156258
The behaviour of reading past the \0x1A character is considered a bug,
which says that cmd.exe at least (and by extension Windows apps in
general) are expected to stop reading at \0x1A for text files.
Technically, the Windows file systems record the length of text files and
so an explicit EOF character is redundant, nevertheless, the behaviour of
stopping the read at \0x1A is expected. Whether you want to claim it is
"Windows" or "the Windows shell" or something else is a fine distinction
that makes little difference in practice.
Anyway, here's Raymond Chen of Microsoft explaining more:
http://blogs.msdn.com/oldnewthing/archive/2004/03/16/90448.aspx
--
Steven
If you're pleased to be learning something about Windows, then
I'm pleased for you.
The reason that I didn't give a full discussion about the history of
DOS and Microsoft C was that I didn't think it was relevant to a
Python newsgroup.
My Bad. I didn't think anyone would care about the behaviour of
copy vs xcopy in DOS 6-.
I'd like to see the Tutorial corrected so that it gives some useful
information about the behaviour of Python. As part of that, I'd like
to see it corrected so that it doesn't include patently false information,
but only because the patently false information about Windows
obscures the message about Python.
Believe me, I really don't care what myths you believe about
Windows, or why you believe them. I've got a full and interesting
life of my own.
I'm only interested in getting the Python tutorial corrected so that it
gives some sensible information to someone who hasn't already had
the advantage of learning what the popular myths represent to the
Python community.
So far I've been pointed to a discussion of C, a discussion of DOS,
and a discussion of Windows NT 4.
Great. Glad to see that you know how to use the Internet.
I'll give you that if you already have a meaning to assign to those
meaningless words, you know more Python than I do.
And I'll give you that if you already have a meaning to assign to
those meaningless words, you know more Visual C than I do.
Is that all there is? You're going to leave the tutorial because
you can mount an obscure justification and it makes sense to
someone who already knows what it means?
Tell me it isn't so :~(
> So far I've been pointed to a discussion of C, a discussion of DOS,
> and a discussion of Windows NT 4. Great. Glad to see that you know how
> to use the Internet.
Says the person who doesn't want to attach an identity to his messages.
(Yes, that's ad hominem if used to dismiss your argument; but it's *you*
that is raising “know how to use the internet”, so at that point you
become fair game, IMO.)
> Is that all there is? You're going to leave the tutorial because you
> can mount an obscure justification and it makes sense to someone who
> already knows what it means? Tell me it isn't so :~(
Who is “you”? Someone who knows how to use the internet surely can tell
that comp.lang.python isn't the place to come expecting *changes* in the
Python tutorial.
You started out asking how to *interpret* it, which is fine for this
forum; but discussing it here isn't going to lead automatically to any
*midification* to a document developed within the core of Python.
--
\ “Whatever you do will be insignificant, but it is very |
`\ important that you do it.” —Mahatma Gandhi |
_o__) |
Ben Finney
I really loved CP/M in its day but isn't it time we let go?
--
D'Arcy J.M. Cain <da...@druid.net> | Democracy is three wolves
http://www.druid.net/darcy/ | and a sheep voting on
+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
>http://support.microsoft.com/kb/156258
That says that Windows NT 3.5 and NT 4 couldn't make
a distinction between text and binary files. I don't think
that advances your case.
If they had changed the Windows behaviour, yes, but
Windows 7 seems to be compatible with NT 3.5 rather
than with DOS.
Peter Bell.
I don't think it's false. I think it's a fair statement given the Windows
implementation of the C standard library. Such things are frequently considered
to be part of the OS. This isn't just some random API; it's the implementation
of the C standard.
> I'm looking at a statement that is clearly false (for anyone who knows
> anything about Windows file systems and Windows file io), which leaves the
> Python behaviour completely undefined (for anyone who knows nothing about
> Python).
>
> I understand that many of you don't really have any understanding of
> Windows, much less any background with Windows, and I'm here to help. That
> part was simple.
>
> The next part is where I can't help: What is the behaviour of Python?
The full technical description is where it belongs, in the reference manual
rather than a tutorial:
http://docs.python.org/library/functions.html#open
> I'm sure you don't think that tutorial is only for readers who can guess
> that they have to extrapolate from the behaviour of the Visual C library in
> order to work out what Python does.
All a tutorial level documentation needs to know is what is described: when a
file is opened in text mode, the actual bytes written to a file for a newline
may be different depending on the platform. The reason that it does not explain
the precise behavior on each and every platform is because it *is* undefined.
Python 2.x does whatever the C standard library implementation for stdio does.
It mentions Windows as a particularly common example of a difference between
text mode and binary mode.
I definitely want to see how python doc be midified, last time I checked
MIDI cannot play spoken words, don't know whether there is
text-to-speech sound font though ;)
I will freely admit to having no idea of just how many pythonastis have
good Windows experience/background, but how about you give us the
benefit of the doubt and tell us exactly which languages/routines you
play with *in windows* that fail to make a distinction between text and
binary?
>>> "Steven D'Aprano" <ste...@REMOVE.THIS.cybersource.com.au>
>PB> writes
>>> http://support.microsoft.com/kb/156258
>PB> That says that Windows NT 3.5 and NT 4 couldn't make
>PB> a distinction between text and binary files. I don't think
>PB> that advances your case.
And that was a bug apparently (euphemistically called a `problem').
>PB> If they had changed the Windows behaviour, yes, but
>PB> Windows 7 seems to be compatible with NT 3.5 rather
>PB> than with DOS.
If that is true then they may still be `researching this problem'. :=(
--
Piet van Oostrum <pi...@cs.uu.nl>
URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
Private email: pi...@vanoostrum.org
'Windows', in its broad sense of Windoes system, includes the standards
and protocols mandated by its maker, Microsoft Corporation, and
implemented in its C compiler, which it uses to compile the software
that other interact with. I am pretty sure that WixXP Notepad *still*
requires \r\n in text files, even though Wordpad does not. Don't know
about Haste (la Vista) and the upcoming Win7.
It is a common metaphor in English to ascribe agency to products and
blame them for the sins (or virtues) of their maker.
'Unix' and 'Linux' are also used in the double meaning of OS core and OS
system that includes core, languages tools, and standard utilities.
>>>> 2) end-of-line characters in text files are not automatically
>>>> altered by
>>>> Windows.
>>>
>>> The Windows implementation of the C standard makes the distinction.
>>> E.g. using stdio to write out "foo\nbar\n" in a file opened in text
>>> mode will result in "foo\r\nbar\r\n" in the file. Reading such a file
>>> in text mode will result in "foo\nbar\n" in memory. Reading such a
>>> file in binary mode will result in "foo\r\nbar\r\n". In your bug
>>> report, you point out several proprietary APIs that do not make such
>>> a distinction, but that does not remove the implementations of the
>>> standard APIs that do make such a distinction.
>>>
>>> http://msdn.microsoft.com/en-us/library/yeby3zcb.aspx
>>>
>>> Perhaps it's a bit dodgy to blame "Windows" per se rather than its C
>>> runtime, but I think it's a reasonable statement on the whole.
I agree. There are much worse sins in the docs to be fixed.
Hmmm. "Bill Gates, his successors, and minions, still require, after 28
years, that we jump through artificial hoops, confuse ourselves, and
waste effort, by differentiating text and binary files and fiddling with
line endings."
More accurate, perhaps, but probably less wise than the current text.
Terry Jan Reedy
+1 QOTW
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/
"as long as we like the same operating system, things are cool." --piranha
I agree with Robert Kern, it isn't necessary to distinguish between
Windows OS and a particular Windows runtime library for the purposes
of a tutorial.
Carl Banks