I am writing a program to verify if it is an rtf file or not. Now, an rtf file
starts with {\rtf<N>.
My code is as follows:
int main(int argc, char **argv){
char str[5], c;
FILE *fstream
//code for connecting to file stream
c=getc(fstream);
if(c!='{'){
// An rtf file should start with `{\rtf'
fprintf(stderr, "invalid rtf file\n");
}else if(fscanf(fstream, "%4s", &str), strcmp(str, "\rtf")!=0 ){
fprintf(stderr, "rtf version unspecified.\n");
// check: if str prints "\rtf", why strcmp returns non-zero value?
printf("%s\n", str);
}else{
// stuff to do
}
return 0;
}
I tried to check the code against an rtf file. The "\rtf" tag check fails
In a stream of characters, \rtf is 4 characters.
In C, "\rtf" is 3 characters.
Phil
--
Marijuana is indeed a dangerous drug.
It causes governments to wage war against their own people.
-- Dave Seaman (sci.math, 19 Mar 2009)
Hey,
'\r' is an escape sequence for the character with ASCII code 13
(Carriage Return). To put a literal backslash in your string, use '\\'.
Hope this helps,
Vlad
"\rtf" is 3 characters long: { '\r', 't', 'f' }, where '\r' is a
return character. Try "\\rtf".
Incidentally, I'm not sure why you read the '{' and the following
"\\rtf" in two separate steps.
> printf("%s\n", str);
> }else{
> // stuff to do
> }
> return 0;
> }
>
> I tried to check the code against an rtf file. The "\rtf" tag check fails
--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
return type of getc() is int, not char. Need to check for EOF.
> if(c!='{'){
> // An rtf file should start with `{\rtf'
> fprintf(stderr, "invalid rtf file\n");
> }else if(fscanf(fstream, "%4s", &str), strcmp(str, "\rtf")!=0 ){
'&str' is wrong, could use '&str[0]' or 'str'.
using comma operator here is terrible for readability and wrong, in case
of read error you invoke UB.
> fprintf(stderr, "rtf version unspecified.\n");
> // check: if str prints "\rtf", why strcmp returns non-zero value?
others have answered that...
> printf("%s\n", str);
> }else{
> // stuff to do
> }
> return 0;
> }
>
> I tried to check the code against an rtf file. The "\rtf" tag check fails
/**
* check if file has prefix "{\rtf"
*/
int is_rtf(FILE *f)
{
char buf[6]={0};
const char *prefix = "{\\rtf";
if (NULL == fgets(buf, sizeof buf, f)) {
perror("is_rtf()");
exit(EXIT_FAILURE);
}
return 0==strcmp(prefix, buf);
}
--
Tor <echo bwz...@wvtqvm.vw | tr i-za-h a-z>
>> }else if(fscanf(fstream, "%4s", &str), strcmp(str, "\rtf")!=0 ){
>
> '&str' is wrong, could use '&str[0]' or 'str'.
>
> using comma operator here is terrible for readability and wrong, in case
> of read error you invoke UB.
I take the UB part back, but the content of 'str' may be anything and
the test can thus succeed, even if the file isn't having a valid RTF
header. Unlikely, but nevertheless so.
[...]
> Thank you for the explanation Tor.
You are welcome. :) One thing you might want to consider if using the
is_rtf() function and want to analyze file content more, is to add a
couple of rewind() calls, one before the fgets() call and one after.
That shouldn't matter much performance wise, and makes the check
function more robust.
Ahhh, it's not UB there in the strcmp because he's comparing with a
three-character string ('\r', 't', 'f'), but later he prints str as if
it's a string, so if fscanf failed, it might not be null-terminated, and
thus invoke UB.
So if I call is_rtf with a file that has fewer than five characters, the
program will exit?!? Shouldn't this function merely return 0 in that case?
No, it exit in case of read error, or if the file was empty. If 1 char
could be read before EOF, there should be no exit.
> Shouldn't this function merely return 0 in that case?
Well, it really depends, often error recovery isn't needed, just
fail-safe. Propagating error codes all over the place makes the code
hard to read, just look at how easy Stevens C code in "UNIX Network
Programmer" is to read.
In case fgets() returns NULL above, I don't think you can safely answer
the question "Do the file have RTF format?" with a NO.
--
pete
OK, I see now that fgets returns non-null if at least one character was
read, but why should it exit the program on an empty file? Imagine the
program is an interactive word processor and the user has several
documents open, and goes to open another one that happens to be an empty
file. He would be pleased if it said "not an RTF file", and furious if the
program abruptly exited.
> > Shouldn't this function merely return 0 in that case?
>
> Well, it really depends, often error recovery isn't needed, just
> fail-safe. Propagating error codes all over the place makes the code
> hard to read, just look at how easy Stevens C code in "UNIX Network
> Programmer" is to read.
>
> In case fgets() returns NULL above, I don't think you can safely answer
> the question "Do the file have RTF format?" with a NO.
And neither can you safely answer "there was a read error". fgets is
inappropriate to use here.
use ferror() or feof()
> Tor Rustad wrote:
<snip>
>> In case fgets() returns NULL above, I don't think you can safely
>> answer the question "Do the file have RTF format?" with a NO.
>
> And neither can you safely answer "there was a read error". fgets
> is inappropriate to use here.
Why inappropriate? RTF is a text format, after all. Sure, you have
to know what you're doing - but, if you /do/ know what you're
doing, fgets is a perfectly reasonable way to get the data into
memory.
--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Forged article? See
http://www.cpax.org.uk/prg/usenet/comp.lang.c/msgauth.php
"Usenet is a strange place" - dmr 29 July 1999
[...]
> Imagine the
> program is an interactive word processor and the user has several
> documents open, and goes to open another one that happens to be an empty
> file. He would be pleased if it said "not an RTF file", and furious if the
> program abruptly exited.
Not the hardest thing in the world to replace the exit() call with some
function showing a message dialog box instead. However in a batch type
of application, that is not what you want, so it depends.
>> In case fgets() returns NULL above, I don't think you can safely answer
>> the question "Do the file have RTF format?" with a NO.
>
> And neither can you safely answer "there was a read error". fgets is
> inappropriate to use here.
There is a HUGE difference between giving no answer and stop processing,
and giving the WRONG answer and to continue processing.
Using fgets() is OK, and testing it for NULL is correct too. For info on
how to handle EOF and IO errors, see FAQ 16.8
But even a user of a batch type application wants to know the difference
between an I/O error and an error due to the input file having zero bytes.
Since the program might report "not an RTF file" for non-RTF files of one
or more bytes, he will probably not conclude that a different error,
"is_rtf()", is due to him feeding the program a zero-byte file.
> >> In case fgets() returns NULL above, I don't think you can safely answer
> >> the question "Do the file have RTF format?" with a NO.
> >
> > And neither can you safely answer "there was a read error". fgets is
> > inappropriate to use here.
>
> There is a HUGE difference between giving no answer and stop processing,
> and giving the WRONG answer and to continue processing.
>
> Using fgets() is OK, and testing it for NULL is correct too. For info on
> how to handle EOF and IO errors, see FAQ 16.8
Apparently I'm just failing to communicate here. To a user of a program,
it should not report the same error for an empty file as for an I/O error.
The original function lumped these two together. I'm not asking how to do
this correctly; I'm pointing out that the posted function did not do them
correctly.
perror() give diagnostics
> Since the program might report "not an RTF file" for non-RTF files of one
> or more bytes, he will probably not conclude that a different error,
> "is_rtf()", is due to him feeding the program a zero-byte file.
I have no clue what you are trying to say here.
>>>> In case fgets() returns NULL above, I don't think you can safely answer
>>>> the question "Do the file have RTF format?" with a NO.
>>> And neither can you safely answer "there was a read error". fgets is
>>> inappropriate to use here.
>> There is a HUGE difference between giving no answer and stop processing,
>> and giving the WRONG answer and to continue processing.
>>
>> Using fgets() is OK, and testing it for NULL is correct too. For info on
>> how to handle EOF and IO errors, see FAQ 16.8
>
> Apparently I'm just failing to communicate here. To a user of a program,
> it should not report the same error for an empty file as for an I/O error.
> The original function lumped these two together. I'm not asking how to do
> this correctly; I'm pointing out that the posted function did not do them
> correctly.
The point is that
1. empty file may or may not be an error
2. on error, it may or may not be OK to call exit()
it simply depend on the application at hand. If one want to silently
accept empty files, by all means do that via ferror().
If calling exit() on error isn't acceptable, FCOL just replace it with a
custom made error handler!
blargg originally said:
"OK, I see now that fgets returns non-null if at least one
character
was read, but why should it exit the program on an empty file? "
> >>> Imagine the
> >>> program is an interactive word processor and the user has several
> >>> documents open, and goes to open another one that happens to be an empty
> >>> file. He would be pleased if it said "not an RTF file", and furious if the
> >>> program abruptly exited.
an argument against a library program aborting the program.
(I don't think this is always the wrong answer).
> >> Not the hardest thing in the world to replace the exit() call with some
> >> function showing a message dialog box instead.
this seems just as bad. Now your library breaks my toaster program
as I don't know where to display the dialog box (burn it
on to the toast?)
> >>However in a batch type
> >> of application, that is not what you want, so it depends.
so you call an application specific error handler so it can abort,
display a dialog box, abend or launch the missiles as appropriate
> > But even a user of a batch type application wants to know the difference
> > between an I/O error and an error due to the input file having zero bytes.
>
> perror() give diagnostics
this makes no sense without some context. Writing to stderr
is a null op on some well known graphical user interface.
The Standard makes no mention that fgets() sets errno so perror()
may not do anything useful. (Admittedly, most implementations do
set errno to something sane).
> > Since the program might report "not an RTF file" for non-RTF files of one
> > or more bytes, he will probably not conclude that a different error,
> > "is_rtf()", is due to him feeding the program a zero-byte file.
>
> I have no clue what you are trying to say here.
>
> >>>> In case fgets() returns NULL above, I don't think you can safely answer
> >>>> the question "Do the file have RTF format?" with a NO.
nor a "YES". "A Suffusion of Yellow", perhaps.
> >>> And neither can you safely answer "there was a read error". fgets is
> >>> inappropriate to use here.
no
> >> There is a HUGE difference between giving no answer and stop processing,
> >> and giving the WRONG answer and to continue processing.
I'm with you here!
> >> Using fgets() is OK, and testing it for NULL is correct too. For info on
> >> how to handle EOF and IO errors, see FAQ 16.8
yes
> > Apparently I'm just failing to communicate here. To a user of a program,
> > it should not report the same error for an empty file as for an I/O error.
hence the FAQ reference...
> > The original function lumped these two together. I'm not asking how to do
> > this correctly; I'm pointing out that the posted function did not do them
> > correctly.
>
> The point is that
>
> 1. empty file may or may not be an error
well it ain't an RTF file
> 2. on error, it may or may not be OK to call exit()
yes
> it simply depend on the application at hand. If one want to silently
> accept empty files, by all means do that via ferror().
>
> If calling exit() on error isn't acceptable, FCOL just replace it with a
> custom made error handler!
you may be agreeing with each other
--
Nick Keighley
The fscanf equivalent of fgets is so simple
that it can be used inline whenever needed:-
char s[NN + 1] = "", c;
int rc = fscanf(fp, "%NN[^\n]%1[\n]", s, &c);
if (rc == 1) fscanf("%*[^\n]%*c);
if (rc == 0) getc(fp);
(Dan Pop comp.lang.c)