ifstream text and binary!

65 views
Skip to first unread message

Sebastien Alix

unread,
Jan 22, 1999, 3:00:00 AM1/22/99
to
Hi,

I have problem with fstream. I simply (and dumbly) want to read each
char of a text file in a sequential order. So I want to use an
ifstream to handle the file buffer. To be sure the stream will send me
all char I open in binary mode. Here's the code:

#include <fstream.h>
#include <iostream.h>

//This file is a plain code test file (it can be this source)
const char *TESTFILE = "..\\test\\Test1.bin";

void main()
{
ifstream theFile;
theFile.open(TESTFILE, ios::binary);

#ifdef (MSVC)
theFile.setmode(filebuf::binary);
#endif
while (!theFile.eof())
{
unsigned char c;
theFile >> c;
if (c == ' ' || c == '\n')
cout << "Ho Yeah, a white space or a return..." << endl;
cout << c;
}
}

My problem is that all white space and carriage return are never given
to me. I am in binary mode so why in this world is it not sending the
right thing to me....

I have tried this on these compiler I allways get the same result:
- MS-Visual C++ 6.0
- MS-Visual C++ 5.0
- Borland C++ 5.02

Thanks a lot for your help

Sebastien Alix

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]

Paul Lutus

unread,
Jan 22, 1999, 3:00:00 AM1/22/99
to
Use this method instead:

#include <fstream.h>
#include <iostream.h>

const char *filename = "yourfilename";

int main()
{
ifstream theFile;
theFile.open(filename,ios::in | ios::binary);
char c;
while ((theFile.get(c)).good())
{


if (c == ' ' || c == '\n')
cout << "Ho Yeah, a white space or a return..." << endl;
cout << c;
}

cout << endl;
return 0;
}

Paul Lutus

Sebastien Alix wrote in message <36a82bf1...@news.videotron.ca>...

Ash

unread,
Jan 22, 1999, 3:00:00 AM1/22/99
to
On 22 Jan 1999 00:46:45 -0500, ala...@videotron.ca (Sebastien Alix)
wrote:

>Hi,
>
> I have problem with fstream. I simply (and dumbly) want to read each
>char of a text file in a sequential order. So I want to use an
>ifstream to handle the file buffer. To be sure the stream will send me
>all char I open in binary mode. Here's the code:

code snipped...

>My problem is that all white space and carriage return are never given
>to me. I am in binary mode so why in this world is it not sending the
>right thing to me....

You cannot use the >> operator. It will read the stream as text
regardless of the mode you open the file in. With fstreams, the only
binary option is via the read() and write() methods.

ashley

Francis Glassborow

unread,
Jan 22, 1999, 3:00:00 AM1/22/99
to
In article <36a82bf1...@news.videotron.ca>, Sebastien Alix
<ala...@videotron.ca> writes

>Hi,
>
> I have problem with fstream. I simply (and dumbly) want to read each
>char of a text file in a sequential order. So I want to use an
>ifstream to handle the file buffer. To be sure the stream will send me
>all char I open in binary mode. Here's the code:
>
>#include <fstream.h>
>#include <iostream.h>
>
>//This file is a plain code test file (it can be this source)
>const char *TESTFILE = "..\\test\\Test1.bin";
>
>void main()
>{
> ifstream theFile;
> theFile.open(TESTFILE, ios::binary);
>
>#ifdef (MSVC)
> theFile.setmode(filebuf::binary);
>#endif
> while (!theFile.eof())
> {
> unsigned char c;
> theFile >> c;

But you are using the wrong method (one that skips all sorts of things)
Check out the various versions og the get member function.

> if (c == ' ' || c == '\n')
> cout << "Ho Yeah, a white space or a return..." << endl;
> cout << c;
> }
>}
>

>My problem is that all white space and carriage return are never given
>to me. I am in binary mode so why in this world is it not sending the
>right thing to me....
>

>I have tried this on these compiler I allways get the same result:
> - MS-Visual C++ 6.0
> - MS-Visual C++ 5.0
> - Borland C++ 5.02
>
>Thanks a lot for your help
>
>Sebastien Alix
>

> [ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
> [ about comp.lang.c++.moderated. First time posters: do this! ]

Francis Glassborow Chair of Association of C & C++ Users
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

Thiemo Seufer

unread,
Jan 22, 1999, 3:00:00 AM1/22/99
to
Sebastien Alix wrote in message <36a82bf1...@news.videotron.ca>...
>Hi,
>
> I have problem with fstream. I simply (and dumbly) want to read each
>char of a text file in a sequential order. So I want to use an
>ifstream to handle the file buffer. To be sure the stream will send me
>all char I open in binary mode. Here's the code:

[snip]


>My problem is that all white space and carriage return are never given
>to me. I am in binary mode so why in this world is it not sending the
>right thing to me....


Below a modified and commented version which solves the problem
(and some others :-)

// Use standard includes with namespace std
#include <fstream>
#include <iostream>

// for ws check
#include <cctype>

// reserve all-uppercase names for macros
const char *Testfile = "test.cpp";

// main is required to return int
int main()
{
std::ifstream theFile;
theFile.open(Testfile, std::ios::binary);

// this prevents skipping ws, which is standard behaviour
theFile >> std::noskipws;

// MSVC not defined, no need for setmode()
//#ifdef (MSVC)
// theFile.setmode(filebuf::binary);
//#endif
// works now for non-eof errors also
while (theFile)


{
unsigned char c;
theFile >> c;

// takes now all ws ('\n' is a ws also)
// if (c == ' ' || c == '\n')
if (isspace(c))
std::cout << "Ho Yeah, a white space!" << std::endl;
std::cout << c;
}

// To satisfy 'int main()'
return 0;
}


Thiemo Seufer

Steve Clamage

unread,
Jan 23, 1999, 3:00:00 AM1/23/99
to
"Paul Lutus" <nos...@nosite.com> writes:

>Use this method instead:

> ifstream theFile;
> theFile.open(filename,ios::in | ios::binary);
> char c;
> while ((theFile.get(c)).good())
> {

> if (c == ' ' || c == '\n')

> cout << "Ho Yeah, a white space or a return..." << endl;
> cout << c;
> }

If the code works, it will be by accident. When you open
a file in binary mode, you usually disable the conversion
of end-of-line to '\n'. If the end-of-line marker in the
external file is the same as '\n' in the program, you will
find the line ends. Otherwise, you will get extra garbage,
and perhaps never find an end-of-line.

On Unix systems, the end-of-line marker is usually the same
as '\n' (that's not an accident). On other systems, the
end-of-line marker is usually NOT the same as '\n'.

David

unread,
Jan 24, 1999, 3:00:00 AM1/24/99
to
You shouldn't be in binary mode. That's for data (binary) files. Use
ios::in instead. If, for some reason, the extraction does work, it would
not be suprising that it would ignore the control characters.

David

Sebastien Alix wrote:

> Hi,
>
> I have problem with fstream. I simply (and dumbly) want to read each
> char of a text file in a sequential order. So I want to use an
> ifstream to handle the file buffer. To be sure the stream will send me
> all char I open in binary mode. Here's the code:
>

> #include <fstream.h>
> #include <iostream.h>
>
> //This file is a plain code test file (it can be this source)
> const char *TESTFILE = "..\\test\\Test1.bin";
>
> void main()
> {
> ifstream theFile;
> theFile.open(TESTFILE, ios::binary);
>
> #ifdef (MSVC)
> theFile.setmode(filebuf::binary);
> #endif
> while (!theFile.eof())
> {

> unsigned char c;
> theFile >> c;

> if (c == ' ' || c == '\n')
> cout << "Ho Yeah, a white space or a return..." << endl;
> cout << c;
> }
> }
>

> My problem is that all white space and carriage return are never given
> to me. I am in binary mode so why in this world is it not sending the
> right thing to me....
>

> I have tried this on these compiler I allways get the same result:
> - MS-Visual C++ 6.0
> - MS-Visual C++ 5.0
> - Borland C++ 5.02
>
> Thanks a lot for your help

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Paul Lutus

unread,
Jan 24, 1999, 3:00:00 AM1/24/99
to
<< If the code works, it will be by accident. >>

Nonsense. The original poster said "To be sure the stream will send me all
char I open in binary mode." The main problem with his code was he only had
the binary flag, no other.

He wants to examine each character without any filtering or conversion. This
code provides that, and its functioning is not an accident.

Also, do you really think "theFile.get(c)" cares if there is an end-of-line
character? It doesn't -- it simply reads the file, character by character,
until it runs out of characters -- just as the original poster requested.

<< On Unix systems, the end-of-line marker is usually the same
as '\n' (that's not an accident). On other systems, the
end-of-line marker is usually NOT the same as '\n'. >>

UNIX: "\n"

Windows/DOS: "\r\n"

Either line ending will be detected by this program, even thought that is
not the purpose -- the purpose is to read all the characters without
filtering or conversion.

Paul Lutus

Steve Clamage wrote in message <78da28$796$1...@engnews1.eng.sun.com>...


>"Paul Lutus" <nos...@nosite.com> writes:
>
>>Use this method instead:
>
>> ifstream theFile;
>> theFile.open(filename,ios::in | ios::binary);
>> char c;
>> while ((theFile.get(c)).good())
>> {

>> if (c == ' ' || c == '\n')
>> cout << "Ho Yeah, a white space or a return..." << endl;
>> cout << c;
>> }
>

>If the code works, it will be by accident. When you open
>a file in binary mode, you usually disable the conversion
>of end-of-line to '\n'. If the end-of-line marker in the
>external file is the same as '\n' in the program, you will
>find the line ends. Otherwise, you will get extra garbage,
>and perhaps never find an end-of-line.
>
>On Unix systems, the end-of-line marker is usually the same
>as '\n' (that's not an accident). On other systems, the
>end-of-line marker is usually NOT the same as '\n'.

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Steve Clamage

unread,
Jan 25, 1999, 3:00:00 AM1/25/99
to
On 24 Jan 1999 13:54:21 -0500, "Paul Lutus" <nos...@nosite.com> wrote:

><< If the code works, it will be by accident. >>
>
>Nonsense. The original poster said "To be sure the stream will send me all

>char I open in binary mode." ...


>He wants to examine each character without any filtering or conversion. This
>code provides that, and its functioning is not an accident.

I didn't say you can't read a text file in binary mode.
The article to which I was responding had the following code, after
opening the file in binary mode:


while ((theFile.get(c)).good())
{
if (c == ' ' || c == '\n')
cout << "Ho Yeah, a white space or a return..." << endl;
cout << c;
}

On systems where end-of-line is not '\n', the end-of-line will not be
identified as whitespace. This code will work only in the case where
end-of-line is '\n', which is what I did say.

><< On Unix systems, the end-of-line marker is usually the same
>as '\n' (that's not an accident). On other systems, the
>end-of-line marker is usually NOT the same as '\n'. >>
>
>UNIX: "\n"
>
>Windows/DOS: "\r\n"
>
>Either line ending will be detected by this program, even thought that is
>not the purpose -- the purpose is to read all the characters without
>filtering or conversion.

But the '\r' will not be detected as whitespace by the program, and
will likely be treated as garbage by the program, since it doesn't
seem to be prepared to accept a '\r'. (But we don't really know.)

On some systems, the end-of-line is a CR without an LF. On those
systems, testing for '\n' will never find an end of line if the file
is opened in binary mode.

On VAX/VMS, a text file has no end-of-line character. A text file is a
file of variable-length records. If you open a text file in text mode,
the runtime systems translates the end-of-record to '\n' on input, and
does the reverse on output. If you open a text file in binary mode,
you read the inter-record markers as bytes. This is seldom a good
idea.

Another item of concern is detecting the end of the data. Often the
end of the data is the end of the file (including the final
end-of-line as data). But on MSDOS, a control-Z (hex 1A) in a text
file is interpreted as the end of the data. A file might have more
bytes in it, but the runtime system won't read them if the file is
opened in text mode. If you open the file in binary mode and test only
for EOF as reported by the stream, you might wind up processing
characters that are not really part of the file, but are just leftover
junk.

My point is that if you open a text file in binary mode and then
proceed to read characters, you must also be prepared to deal with a
variety of end-of-line and end-of-data conventions.

Carl Barron

unread,
Jan 25, 1999, 3:00:00 AM1/25/99
to
Paul Lutus <nos...@nosite.com> wrote:


> Also, do you really think "theFile.get(c)" cares if there is an end-of-line
> character? It doesn't -- it simply reads the file, character by character,
> until it runs out of characters -- just as the original poster requested.
>
>

Yes it does, care in text mode the system endline is converted to the
standard endline, before any user level function can access the input.
If you don't want endline conversion, and wish to be portable then you
must open with mode or'ed with ios::binary.

Paul Lutus

unread,
Jan 26, 1999, 3:00:00 AM1/26/99
to
<< But on MSDOS, a control-Z (hex 1A) in a text
file is interpreted as the end of the data. A file might have more
bytes in it, but the runtime system won't read them if the file is
opened in text mode. >>

Wait a minute -- you were just arguing that opening the file in binary mode
was a mistake, that this would result in "extra garbage." Please make up
your mind.

<< If you open the file in binary mode and test only
for EOF as reported by the stream, you might wind up processing
characters that are not really part of the file, but are just leftover
junk. >>

This will never happen unless an originating program opens a file in
random-access mode, writes to it, attaches a text-mode end-of-file marker
before the end of the original file's data, and closes it. This is a source
problem, and it is completely outside the matter under discussion.

In fact, if you open a file in binary mode, the provided program will read
each and every character in the file, not one less, not one more. That is
the purpose -- the original poster wanted to read and examine all the
characters in the file. His purpose was to circumvent any filtering that the
OS might otherwise do.

<< My point is that if you open a text file in binary mode and then
proceed to read characters, you must also be prepared to deal with a
variety of end-of-line and end-of-data conventions. >>

If you are reading and examining each character with a specific purpose in
mind, then the interpretation is left up to the program -- it doesn't have
to know anything about the internal data representation, or for that matter
the nature of the platform. If the line endings are not unix-style, it
doesn't matter. The program will proceed to read all the characters in the
file, every one of them, and there will be no "extra garbage," to use your
phrase, unless the originator of the file placed that garbage there.

<< On VAX/VMS, a text file has no end-of-line character. A text file is a
file of variable-length records. If you open a text file in text mode,
the runtime systems translates the end-of-record to '\n' on input, and
does the reverse on output. If you open a text file in binary mode,
you read the inter-record markers as bytes. This is seldom a good
idea. >>

If this is actually true, one may want to overcome the operating system's
obfuscating behavior, and process the file's contents oneself. It is easy to
imagine a student wishing to do just this, and posting a message similar to
the one to which I responded.

BTW, if I personally witnessed an OS do this to data, I would promptly
upgrade to a civilized OS -- DOS's behavior is bad enough. In general, the
idea of two modes for files (text/binary) is just uncivilized, and unix
knows better.

But your objection makes no sense. First you objected to the idea that I
would open a file in binary mode, saying this would lead to "extra garbage,"
then you make the entirely correct point that if a hex 1A happens to be in a
file and the file is not a text file, you will lose valid data by opening it
in text mode under DOS/Windows.

This happens to be an almost daily inquiry from students (I had one this
morning) -- "Why can't I read the entire length of this binary file?" But
this particular correspondent used:

theFile.setmode(filebuf::binary);

, thus losing any chance of reading the file at all. My original post was
meant to show him how to open a file in binary mode, his expressed purpose
and intent.

Paul Lutus

Steve Clamage wrote in message <36acb982....@news.earthlink.net>...

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Steve Clamage

unread,
Jan 27, 1999, 3:00:00 AM1/27/99
to
On 26 Jan 1999 01:52:24 -0500, "Paul Lutus" <nos...@nosite.com> wrote:

><< But on MSDOS, a control-Z (hex 1A) in a text
>file is interpreted as the end of the data. A file might have more
>bytes in it, but the runtime system won't read them if the file is
>opened in text mode. >>
>
>Wait a minute -- you were just arguing that opening the file in binary mode
>was a mistake, that this would result in "extra garbage." Please make up
>your mind.

That is not what I said. Here's what I actually did say:

- If the code works, it will be by accident. When you open
- a file in binary mode, you usually disable the conversion
- of end-of-line to '\n'. If the end-of-line marker in the
- external file is the same as '\n' in the program, you will
- find the line ends. Otherwise, you will get extra garbage,
- and perhaps never find an end-of-line.

I went on to give examples of what would happen with the sample
program under different end-of-line conventions. In some of those
examples, end-of-line would not be detected, and characters that were
never knowingly written to the file would be retrieved. That is, by
definition in C and C++, when you write the '\n' character to a text
file, it is converted by the runtime system to whatever the local
convention is for end-of-line. If you then read back the file in
binary mode, you might not get back what you wrote; you can get
different characters, unrelated in representation and number to what
was written. That's what I meant by extra garbage; it would certainly
be considered garbage by a naive programmer who didn't get back what
what written.

><< If you open the file in binary mode and test only
>for EOF as reported by the stream, you might wind up processing
>characters that are not really part of the file, but are just leftover
>junk. >>
>
>This will never happen unless an originating program opens a file in
>random-access mode, writes to it, attaches a text-mode end-of-file marker
>before the end of the original file's data, and closes it. This is a source
>problem, and it is completely outside the matter under discussion.

Not so. The MSDOS/Windows rule, which was one of my examples, is that
a control-Z marks the end of data in a text file. Any text-generating
program, such as a text editor, is free to mark the end of data in a
buffer and write the whole buffer to disk. It was common in earlier
days for text files to have leftover junk in them following the
control-Z, because few programs bothered to truncate files to the
exact size of the text. I don't know whether that is still common, but
it doesn't matter: the definition of a text file is that a control-Z
marks the end of the data, and anything that follows should be ignored
on input.

(The rule has some history to it. MSDOS was orginally designed to be
compatible with the older CP/M OS. CP/M kept track of file size as the
number of disk blocks. There was no way to know where the real data
ended in a file. Text files used control-Z to mark the end. To read a
binary file, you needed to know the convention used by the originating
program to mark the actual data.)

>
>In fact, if you open a file in binary mode, the provided program will read
>each and every character in the file, not one less, not one more. That is
>the purpose -- the original poster wanted to read and examine all the
>characters in the file. His purpose was to circumvent any filtering that the
>OS might otherwise do.

That isn't what the original poster said. Here is what he did say:
- I simply (and dumbly) want to read each char of a text file in a
- sequential order.
He will read every character that is in the disk file, whether they
have any meaning in the text file or not. Special markers for
end-of-line and end-of-data would be read, as well as leftover stuff
following the end-of-data. I wouldn't consider those "chars in the
text file". They are artifacts of the OS, and the results he gets will
depend on the OS and the conventions of the program that created the
file.

It's possible that a programmer might want to do that -- examine the
actual bytes in disk file, whether they are part of the text or not --
but it isn't obvious to me that the original poster had that in mind.

Paul Lutus

unread,
Jan 28, 1999, 3:00:00 AM1/28/99
to
This is really quite unbelievable. You have succeeded in objecting to
opening a file in binary mode --

<< If the code works, it will be by accident. When you open


a file in binary mode, you usually disable the conversion

of end-of-line to '\n'. If the end-of-line marker in the

external file is the same as '\n' in the program, you will

find the line ends. Otherwise, you will get extra garbage,

and perhaps never find an end-of-line. >>

-- and you have objected to opening a file in text mode --

<< But on MSDOS, a control-Z (hex 1A) in a text
file is interpreted as the end of the data. A file might have more
bytes in it, but the runtime system won't read them if the file is
opened in text mode. >>

Since a file can only be opened to read its contents in one of two ways, and
since you have successfully and logically objected to both methods, I can
only say "You're right -- both are incorrect approaches, and both can, in
sufficiently bizarre circumstances, lead to the reading of bytes not
formally part of the file as intended by the originating program."

Meanwhile, in the real world, as to my suggestion to open a file in binary
mode and read all the bytes, making whatever interpretation one cares to
make, your objection -- "If the code works, it will be by accident" -- is
not correct.

Also, for the record, this remark --

<< Any text-generating program, such as a text editor, is free to mark the
end of data in a buffer and write the whole buffer to disk. >>

-- begs the original question, because it requires the cooperation of a
buggy originating program to create illogical file contents. In normal
practice in a conforming operating system, a file, let us say originally
16,000 bytes in length, that is closed, reopened and written to either
sequentially or by way of block writes (not by way of random access
methods), will have only the bytes written to it, no more. The remaining
storage will be returned to the storage device's free space.

Or, to avoid the pitfalls of English, please compile and run this test
program:

#include <stdio.h>
#include <stdlib.h>

void fileWrite(char *name,long length)
{
FILE *fp = fopen(name,"wb");
long i;
unsigned int c;
srand(1); /* set the random number sequence */
for(i = 0;i < length;i++) {
c = rand() & 255;
putc(c,fp);
}
fclose(fp);
}

void fileRead(char *name,long length)
{
FILE *fp = fopen(name,"rb");
unsigned int c,cc;
long bytesRead = 0;
srand(1); /* set the random number sequence */
while((c = getc(fp)) != EOF) {
cc = rand() & 255;
if(c == cc) { /* if the byte just read is the same */
bytesRead++;
}
}
if(bytesRead != length) {
printf("Error: %ld bytes read, expected %ld\n",bytesRead,length);
}
else {
printf("Successfully read %ld bytes.\n",bytesRead);
}

fclose(fp);
}

int main()
{
fileWrite("test.bin",100000);
fileRead("test.bin",100000);
fileWrite("test.bin",1000);
fileRead("test.bin",1000);
fileWrite("test.bin",100);
fileRead("test.bin",100);
remove("test.bin");
return 0;
}


I think you can imagine the outcome, but please run it anyway -- it
contradicts your assertion:

<< Special markers for
end-of-line and end-of-data would be read, as well as leftover stuff
following the end-of-data. I wouldn't consider those "chars in the
text file". They are artifacts of the OS, and the results he gets will
depend on the OS and the conventions of the program that created the
file. >>

In a conforming file system, these statements about "artifacts of the OS"
and "leftover stuff" are simply not true.

Opening a file in binary mode is a logical procedure with very predictable
consequences. The interpretation of the data is up to the programmer, and a
conforming OS will only provide data provided to it by the originating
program, nothing less, nothing more.

In a case where this is not true, the operating system has a flaw that
prevents it from conforming to the language standard behavior.

Paul Lutus

Steve Clamage wrote in message <36ade27f...@news.earthlink.net>...


>On 26 Jan 1999 01:52:24 -0500, "Paul Lutus" <nos...@nosite.com> wrote:
>
>><< But on MSDOS, a control-Z (hex 1A) in a text
>>file is interpreted as the end of the data. A file might have more
>>bytes in it, but the runtime system won't read them if the file is
>>opened in text mode. >>
>>
>>Wait a minute -- you were just arguing that opening the file in binary
mode
>>was a mistake, that this would result in "extra garbage." Please make up
>>your mind.
>

>That is not what I said. Here's what I actually did say:
>
> - If the code works, it will be by accident. When you open
> - a file in binary mode, you usually disable the conversion
> - of end-of-line to '\n'. If the end-of-line marker in the
> - external file is the same as '\n' in the program, you will
> - find the line ends. Otherwise, you will get extra garbage,
> - and perhaps never find an end-of-line.
>
>I went on to give examples of what would happen with the sample
>program under different end-of-line conventions. In some of those
>examples, end-of-line would not be detected, and characters that were
>never knowingly written to the file would be retrieved. That is, by
>definition in C and C++, when you write the '\n' character to a text
>file, it is converted by the runtime system to whatever the local
>convention is for end-of-line. If you then read back the file in
>binary mode, you might not get back what you wrote; you can get
>different characters, unrelated in representation and number to what
>was written. That's what I meant by extra garbage; it would certainly
>be considered garbage by a naive programmer who didn't get back what
>what written.
>

>><< If you open the file in binary mode and test only
>>for EOF as reported by the stream, you might wind up processing
>>characters that are not really part of the file, but are just leftover
>>junk. >>
>>
>>This will never happen unless an originating program opens a file in
>>random-access mode, writes to it, attaches a text-mode end-of-file marker
>>before the end of the original file's data, and closes it. This is a
source
>>problem, and it is completely outside the matter under discussion.
>

>Not so. The MSDOS/Windows rule, which was one of my examples, is that
>a control-Z marks the end of data in a text file. Any text-generating
>program, such as a text editor, is free to mark the end of data in a
>buffer and write the whole buffer to disk. It was common in earlier
>days for text files to have leftover junk in them following the
>control-Z, because few programs bothered to truncate files to the
>exact size of the text. I don't know whether that is still common, but
>it doesn't matter: the definition of a text file is that a control-Z
>marks the end of the data, and anything that follows should be ignored
>on input.
>
>(The rule has some history to it. MSDOS was orginally designed to be
>compatible with the older CP/M OS. CP/M kept track of file size as the
>number of disk blocks. There was no way to know where the real data
>ended in a file. Text files used control-Z to mark the end. To read a
>binary file, you needed to know the convention used by the originating
>program to mark the actual data.)
>
>>

>>In fact, if you open a file in binary mode, the provided program will read
>>each and every character in the file, not one less, not one more. That is
>>the purpose -- the original poster wanted to read and examine all the
>>characters in the file. His purpose was to circumvent any filtering that
the
>>OS might otherwise do.
>

>That isn't what the original poster said. Here is what he did say:
> - I simply (and dumbly) want to read each char of a text file in a
> - sequential order.
>He will read every character that is in the disk file, whether they
>have any meaning in the text file or not. Special markers for
>end-of-line and end-of-data would be read, as well as leftover stuff
>following the end-of-data. I wouldn't consider those "chars in the
>text file". They are artifacts of the OS, and the results he gets will
>depend on the OS and the conventions of the program that created the
>file.
>
>It's possible that a programmer might want to do that -- examine the
>actual bytes in disk file, whether they are part of the text or not --
>but it isn't obvious to me that the original poster had that in mind.

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Steve Clamage

unread,
Jan 30, 1999, 3:00:00 AM1/30/99
to
"Paul Lutus" <nos...@nosite.com> writes:

>This is really quite unbelievable. You have succeeded in objecting to
>opening a file in binary mode --

><< If the code works, it will be by accident. When you open
>a file in binary mode, you usually disable the conversion
>of end-of-line to '\n'. If the end-of-line marker in the
>external file is the same as '\n' in the program, you will
>find the line ends. Otherwise, you will get extra garbage,
>and perhaps never find an end-of-line. >>

>-- and you have objected to opening a file in text mode --

><< But on MSDOS, a control-Z (hex 1A) in a text
>file is interpreted as the end of the data. A file might have more
>bytes in it, but the runtime system won't read them if the file is
>opened in text mode. >>

>Since a file can only be opened to read its contents in one of two ways, and
>since you have successfully and logically objected to both methods,

I did no such thing. Please cease misquoting me and attributing things
to me that I did not say.

I said: If you have a text file and open it in text mode, you get
back what you wrote. If you have a binary file and open it in
binary mode, you get back what you wrote.

I also said: If you have a text file and open it in binary mode,
or a binary file and open it in text mode, what you get back depends
on the OS. You cannot in either case depend on getting back what
you wrote, and you can make no portable assumptions about what
you will get back. In both cases, you might get back things you
didn't knowingly write, or fail to get back things you did
intentionally write, or both.

I provided examples of all of those possibilties.

If you can provide a factual refutation of any statement I
actually did make, feel free to do so.


><< Special markers for
>end-of-line and end-of-data would be read, as well as leftover stuff
>following the end-of-data. I wouldn't consider those "chars in the
>text file". They are artifacts of the OS, and the results he gets will
>depend on the OS and the conventions of the program that created the
>file. >>

>In a conforming file system, these statements about "artifacts of the OS"
>and "leftover stuff" are simply not true.

There is no notion of a "conforming file system" in C or C++.
The standards instead explicitly recognize the variability of
file system conventions, and make no promises at all about what
you get when you mix file modes. A C or C++ implementation is
obliged to convert between file system conventions and C/C++
language conventions when you use a file in a single mode.
Beyond that, you are not entitled (by the language standards)
to have any particular expectations. (Some operating systems
have standards of their own that might provide additional
guarantees.)

If you can point out in either language standard something that
contradicts what I have said, I will be happy to be corrected.

--
Steve Clamage, stephen...@sun.com

Omry Yadan

unread,
Jan 30, 1999, 3:00:00 AM1/30/99
to
Sebastien Alix wrote in message <36a82bf1...@news.videotron.ca>...
>Hi,
>
>#include <fstream.h>
>#include <iostream.h>
>
>//This file is a plain code test file (it can be this source)
>const char *TESTFILE = "..\\test\\Test1.bin";
>
>void main()
>{
> ifstream theFile;
> theFile.open(TESTFILE, ios::binary);
>
>#ifdef (MSVC)
> theFile.setmode(filebuf::binary);
>#endif
> while (!theFile.eof())
> {
> unsigned char c;
> theFile >> c;
> if (c == ' ' || c == '\n')
> cout << "Ho Yeah, a white space or a return..." << endl;
> cout << c;
> }
>}
>
>My problem is that all white space and carriage return are never given
>to me. I am in binary mode so why in this world is it not sending the
>right thing to me....

here is your loop, fixed. (I think)

while (!theFile.eof())
{
unsigned char c;

theFile.read(&c,sizeof(unsigned char)); // << -- Use
this.


if (c == ' ' || c == '\n')
cout << "Ho Yeah, a white space or a return..." << endl;
cout << c;
}


--
Omry Yadan
Israel.

Reply all
Reply to author
Forward
0 new messages