Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

rtf file verification

44 views
Skip to first unread message

Cross

unread,
Jun 11, 2009, 4:51:43 PM6/11/09
to
Hello

I am writing a program to verify if it is an rtf file or not. Now, an rtf file
starts with {\rtf<N>.

My code is as follows:
int main(int argc, char **argv){
char str[5], c;
FILE *fstream

//code for connecting to file stream

c=getc(fstream);
if(c!='{'){
// An rtf file should start with `{\rtf'
fprintf(stderr, "invalid rtf file\n");
}else if(fscanf(fstream, "%4s", &str), strcmp(str, "\rtf")!=0 ){
fprintf(stderr, "rtf version unspecified.\n");
// check: if str prints "\rtf", why strcmp returns non-zero value?
printf("%s\n", str);
}else{
// stuff to do
}
return 0;
}

I tried to check the code against an rtf file. The "\rtf" tag check fails

Phil Carmody

unread,
Jun 11, 2009, 4:57:51 PM6/11/09
to

In a stream of characters, \rtf is 4 characters.
In C, "\rtf" is 3 characters.

Phil
--
Marijuana is indeed a dangerous drug.
It causes governments to wage war against their own people.
-- Dave Seaman (sci.math, 19 Mar 2009)

Vlad Dogaru

unread,
Jun 11, 2009, 5:04:35 PM6/11/09
to

Hey,

'\r' is an escape sequence for the character with ASCII code 13
(Carriage Return). To put a literal backslash in your string, use '\\'.

Hope this helps,
Vlad

Keith Thompson

unread,
Jun 11, 2009, 5:05:42 PM6/11/09
to
Cross <X...@X.tv> writes:
> I am writing a program to verify if it is an rtf file or not. Now,
> an rtf file starts with {\rtf<N>.
>
> My code is as follows:
> int main(int argc, char **argv){
> char str[5], c;
> FILE *fstream
>
> //code for connecting to file stream
>
> c=getc(fstream);
> if(c!='{'){
> // An rtf file should start with `{\rtf'
> fprintf(stderr, "invalid rtf file\n");
> }else if(fscanf(fstream, "%4s", &str), strcmp(str, "\rtf")!=0 ){
> fprintf(stderr, "rtf version unspecified.\n");
> // check: if str prints "\rtf", why strcmp returns non-zero value?

"\rtf" is 3 characters long: { '\r', 't', 'f' }, where '\r' is a
return character. Try "\\rtf".

Incidentally, I'm not sure why you read the '{' and the following
"\\rtf" in two separate steps.

> printf("%s\n", str);
> }else{
> // stuff to do
> }
> return 0;
> }
>
> I tried to check the code against an rtf file. The "\rtf" tag check fails

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Cross

unread,
Jun 11, 2009, 5:05:52 PM6/11/09
to
Thanks Phil. sorry for being n00bish.

Cross

unread,
Jun 11, 2009, 5:07:08 PM6/11/09
to
Thanks everyone. I am clear now.

Tor Rustad

unread,
Jun 11, 2009, 7:37:10 PM6/11/09
to
Cross wrote:
> Hello
>
> I am writing a program to verify if it is an rtf file or not. Now, an rtf file
> starts with {\rtf<N>.
>
> My code is as follows:
> int main(int argc, char **argv){
> char str[5], c;
> FILE *fstream
>
> //code for connecting to file stream
>
> c=getc(fstream);

return type of getc() is int, not char. Need to check for EOF.

> if(c!='{'){
> // An rtf file should start with `{\rtf'
> fprintf(stderr, "invalid rtf file\n");
> }else if(fscanf(fstream, "%4s", &str), strcmp(str, "\rtf")!=0 ){

'&str' is wrong, could use '&str[0]' or 'str'.

using comma operator here is terrible for readability and wrong, in case
of read error you invoke UB.

> fprintf(stderr, "rtf version unspecified.\n");
> // check: if str prints "\rtf", why strcmp returns non-zero value?

others have answered that...

> printf("%s\n", str);
> }else{
> // stuff to do
> }
> return 0;
> }
>
> I tried to check the code against an rtf file. The "\rtf" tag check fails

/**
* check if file has prefix "{\rtf"
*/
int is_rtf(FILE *f)
{
char buf[6]={0};
const char *prefix = "{\\rtf";

if (NULL == fgets(buf, sizeof buf, f)) {
perror("is_rtf()");
exit(EXIT_FAILURE);
}

return 0==strcmp(prefix, buf);
}

--
Tor <echo bwz...@wvtqvm.vw | tr i-za-h a-z>

Tor Rustad

unread,
Jun 11, 2009, 7:46:50 PM6/11/09
to
Tor Rustad wrote:
> Cross wrote:


>> }else if(fscanf(fstream, "%4s", &str), strcmp(str, "\rtf")!=0 ){
>
> '&str' is wrong, could use '&str[0]' or 'str'.
>
> using comma operator here is terrible for readability and wrong, in case
> of read error you invoke UB.

I take the UB part back, but the content of 'str' may be anything and
the test can thus succeed, even if the file isn't having a valid RTF
header. Unlikely, but nevertheless so.

Cross

unread,
Jun 12, 2009, 7:35:22 AM6/12/09
to
Thank you for the explanation Tor.

zorro

unread,
Jun 12, 2009, 8:59:12 AM6/12/09
to
On 12 Jun, 13:35, Cross <X...@X.tv> wrote:

[...]

> Thank you for the explanation Tor.

You are welcome. :) One thing you might want to consider if using the
is_rtf() function and want to analyze file content more, is to add a
couple of rewind() calls, one before the fgets() call and one after.
That shouldn't matter much performance wise, and makes the check
function more robust.

Cross

unread,
Jun 12, 2009, 10:31:35 AM6/12/09
to
zorro wrote:
You are welcome. :) One thing you might want to consider if using the
> is_rtf() function and want to analyze file content more, is to add a
> couple of rewind() calls, one before the fgets() call and one after.
> That shouldn't matter much performance wise, and makes the check
> function more robust.
I will keep the warning in mind.

blargg

unread,
Jun 13, 2009, 3:24:14 PM6/13/09
to
Tor Rustad wrote:
> Tor Rustad wrote:
> > Cross wrote:
> >> char str[5], c;
[...no initialization if str...]

> >> }else if(fscanf(fstream, "%4s", &str), strcmp(str, "\rtf")!=0 ){
> >
> > '&str' is wrong, could use '&str[0]' or 'str'.
> >
> > using comma operator here is terrible for readability and wrong, in case
> > of read error you invoke UB.
>
> I take the UB part back, but the content of 'str' may be anything and
> the test can thus succeed, even if the file isn't having a valid RTF
> header. Unlikely, but nevertheless so.

Ahhh, it's not UB there in the strcmp because he's comparing with a
three-character string ('\r', 't', 'f'), but later he prints str as if
it's a string, so if fscanf failed, it might not be null-terminated, and
thus invoke UB.

blargg

unread,
Jun 13, 2009, 3:26:08 PM6/13/09
to
Tor Rustad wrote:
[...]

> /**
> * check if file has prefix "{\rtf"
> */
> int is_rtf(FILE *f)
> {
> char buf[6]={0};
> const char *prefix = "{\\rtf";
>
> if (NULL == fgets(buf, sizeof buf, f)) {
> perror("is_rtf()");
> exit(EXIT_FAILURE);
> }
>
> return 0==strcmp(prefix, buf);
> }

So if I call is_rtf with a file that has fewer than five characters, the
program will exit?!? Shouldn't this function merely return 0 in that case?

Tor Rustad

unread,
Jun 13, 2009, 6:55:44 PM6/13/09
to
blargg wrote:
> Tor Rustad wrote:
> [...]
>> /**
>> * check if file has prefix "{\rtf"
>> */
>> int is_rtf(FILE *f)
>> {
>> char buf[6]={0};
>> const char *prefix = "{\\rtf";
>>
>> if (NULL == fgets(buf, sizeof buf, f)) {
>> perror("is_rtf()");
>> exit(EXIT_FAILURE);
>> }
>>
>> return 0==strcmp(prefix, buf);
>> }
>
> So if I call is_rtf with a file that has fewer than five characters, the
> program will exit?!?

No, it exit in case of read error, or if the file was empty. If 1 char
could be read before EOF, there should be no exit.

> Shouldn't this function merely return 0 in that case?

Well, it really depends, often error recovery isn't needed, just
fail-safe. Propagating error codes all over the place makes the code
hard to read, just look at how easy Stevens C code in "UNIX Network
Programmer" is to read.

In case fgets() returns NULL above, I don't think you can safely answer
the question "Do the file have RTF format?" with a NO.

pete

unread,
Jun 13, 2009, 7:51:07 PM6/13/09
to


--
pete

blargg

unread,
Jun 15, 2009, 11:48:23 AM6/15/09
to
Tor Rustad wrote:
> blargg wrote:
> > Tor Rustad wrote:
> > [...]
> >> /**
> >> * check if file has prefix "{\rtf"
> >> */
> >> int is_rtf(FILE *f)
> >> {
> >> char buf[6]={0};
> >> const char *prefix = "{\\rtf";
> >>
> >> if (NULL == fgets(buf, sizeof buf, f)) {
> >> perror("is_rtf()");
> >> exit(EXIT_FAILURE);
> >> }
> >>
> >> return 0==strcmp(prefix, buf);
> >> }
> >
> > So if I call is_rtf with a file that has fewer than five characters, the
> > program will exit?!?
>
> No, it exit in case of read error, or if the file was empty. If 1 char
> could be read before EOF, there should be no exit.

OK, I see now that fgets returns non-null if at least one character was
read, but why should it exit the program on an empty file? Imagine the
program is an interactive word processor and the user has several
documents open, and goes to open another one that happens to be an empty
file. He would be pleased if it said "not an RTF file", and furious if the
program abruptly exited.

> > Shouldn't this function merely return 0 in that case?
>
> Well, it really depends, often error recovery isn't needed, just
> fail-safe. Propagating error codes all over the place makes the code
> hard to read, just look at how easy Stevens C code in "UNIX Network
> Programmer" is to read.
>
> In case fgets() returns NULL above, I don't think you can safely answer
> the question "Do the file have RTF format?" with a NO.

And neither can you safely answer "there was a read error". fgets is
inappropriate to use here.

Nick Keighley

unread,
Jun 16, 2009, 4:12:27 AM6/16/09
to

use ferror() or feof()

Richard Heathfield

unread,
Jun 16, 2009, 5:27:59 AM6/16/09
to
blargg said:

> Tor Rustad wrote:

<snip>

>> In case fgets() returns NULL above, I don't think you can safely
>> answer the question "Do the file have RTF format?" with a NO.
>
> And neither can you safely answer "there was a read error". fgets
> is inappropriate to use here.

Why inappropriate? RTF is a text format, after all. Sure, you have
to know what you're doing - but, if you /do/ know what you're
doing, fgets is a perfectly reasonable way to get the data into
memory.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Forged article? See
http://www.cpax.org.uk/prg/usenet/comp.lang.c/msgauth.php
"Usenet is a strange place" - dmr 29 July 1999

Tor Rustad

unread,
Jun 16, 2009, 7:11:42 AM6/16/09
to
blargg wrote:
> Tor Rustad wrote:

[...]

> Imagine the


> program is an interactive word processor and the user has several
> documents open, and goes to open another one that happens to be an empty
> file. He would be pleased if it said "not an RTF file", and furious if the
> program abruptly exited.

Not the hardest thing in the world to replace the exit() call with some
function showing a message dialog box instead. However in a batch type
of application, that is not what you want, so it depends.


>> In case fgets() returns NULL above, I don't think you can safely answer
>> the question "Do the file have RTF format?" with a NO.
>
> And neither can you safely answer "there was a read error". fgets is
> inappropriate to use here.

There is a HUGE difference between giving no answer and stop processing,
and giving the WRONG answer and to continue processing.

Using fgets() is OK, and testing it for NULL is correct too. For info on
how to handle EOF and IO errors, see FAQ 16.8

blargg

unread,
Jun 16, 2009, 3:14:53 PM6/16/09
to
Tor Rustad wrote:
> blargg wrote:
> > Tor Rustad wrote:
>
> [...]
>
> > Imagine the
> > program is an interactive word processor and the user has several
> > documents open, and goes to open another one that happens to be an empty
> > file. He would be pleased if it said "not an RTF file", and furious if the
> > program abruptly exited.
>
> Not the hardest thing in the world to replace the exit() call with some
> function showing a message dialog box instead. However in a batch type
> of application, that is not what you want, so it depends.

But even a user of a batch type application wants to know the difference
between an I/O error and an error due to the input file having zero bytes.
Since the program might report "not an RTF file" for non-RTF files of one
or more bytes, he will probably not conclude that a different error,
"is_rtf()", is due to him feeding the program a zero-byte file.

> >> In case fgets() returns NULL above, I don't think you can safely answer
> >> the question "Do the file have RTF format?" with a NO.
> >
> > And neither can you safely answer "there was a read error". fgets is
> > inappropriate to use here.
>
> There is a HUGE difference between giving no answer and stop processing,
> and giving the WRONG answer and to continue processing.
>
> Using fgets() is OK, and testing it for NULL is correct too. For info on
> how to handle EOF and IO errors, see FAQ 16.8

Apparently I'm just failing to communicate here. To a user of a program,
it should not report the same error for an empty file as for an I/O error.
The original function lumped these two together. I'm not asking how to do
this correctly; I'm pointing out that the posted function did not do them
correctly.

Tor Rustad

unread,
Jun 18, 2009, 2:15:24 PM6/18/09
to
blargg wrote:
> Tor Rustad wrote:
>> blargg wrote:
>>> Tor Rustad wrote:
>> [...]
>>
>>> Imagine the
>>> program is an interactive word processor and the user has several
>>> documents open, and goes to open another one that happens to be an empty
>>> file. He would be pleased if it said "not an RTF file", and furious if the
>>> program abruptly exited.
>> Not the hardest thing in the world to replace the exit() call with some
>> function showing a message dialog box instead. However in a batch type
>> of application, that is not what you want, so it depends.
>
> But even a user of a batch type application wants to know the difference
> between an I/O error and an error due to the input file having zero bytes.

perror() give diagnostics

> Since the program might report "not an RTF file" for non-RTF files of one
> or more bytes, he will probably not conclude that a different error,
> "is_rtf()", is due to him feeding the program a zero-byte file.

I have no clue what you are trying to say here.

>>>> In case fgets() returns NULL above, I don't think you can safely answer
>>>> the question "Do the file have RTF format?" with a NO.
>>> And neither can you safely answer "there was a read error". fgets is
>>> inappropriate to use here.
>> There is a HUGE difference between giving no answer and stop processing,
>> and giving the WRONG answer and to continue processing.
>>
>> Using fgets() is OK, and testing it for NULL is correct too. For info on
>> how to handle EOF and IO errors, see FAQ 16.8
>
> Apparently I'm just failing to communicate here. To a user of a program,
> it should not report the same error for an empty file as for an I/O error.
> The original function lumped these two together. I'm not asking how to do
> this correctly; I'm pointing out that the posted function did not do them
> correctly.

The point is that

1. empty file may or may not be an error
2. on error, it may or may not be OK to call exit()

it simply depend on the application at hand. If one want to silently
accept empty files, by all means do that via ferror().

If calling exit() on error isn't acceptable, FCOL just replace it with a
custom made error handler!

Nick Keighley

unread,
Jun 19, 2009, 4:10:39 AM6/19/09
to
On 18 June, 19:15, Tor Rustad <bwz...@wvtqvm.vw> wrote:
> blargg wrote:
> > Tor Rustad wrote:
> >> blargg wrote:
> >>> Tor Rustad wrote:

blargg originally said:

"OK, I see now that fgets returns non-null if at least one
character
was read, but why should it exit the program on an empty file? "

> >>> Imagine the
> >>> program is an interactive word processor and the user has several
> >>> documents open, and goes to open another one that happens to be an empty
> >>> file. He would be pleased if it said "not an RTF file", and furious if the
> >>> program abruptly exited.

an argument against a library program aborting the program.
(I don't think this is always the wrong answer).

> >> Not the hardest thing in the world to replace the exit() call with some
> >> function showing a message dialog box instead.

this seems just as bad. Now your library breaks my toaster program
as I don't know where to display the dialog box (burn it
on to the toast?)


> >>However in a batch type
> >> of application, that is not what you want, so it depends.

so you call an application specific error handler so it can abort,
display a dialog box, abend or launch the missiles as appropriate

> > But even a user of a batch type application wants to know the difference
> > between an I/O error and an error due to the input file having zero bytes.
>
> perror() give diagnostics

this makes no sense without some context. Writing to stderr
is a null op on some well known graphical user interface.

The Standard makes no mention that fgets() sets errno so perror()
may not do anything useful. (Admittedly, most implementations do
set errno to something sane).


> > Since the program might report "not an RTF file" for non-RTF files of one
> > or more bytes, he will probably not conclude that a different error,
> > "is_rtf()", is due to him feeding the program a zero-byte file.
>
> I have no clue what you are trying to say here.
>
> >>>> In case fgets() returns NULL above, I don't think you can safely answer
> >>>> the question "Do the file have RTF format?" with a NO.

nor a "YES". "A Suffusion of Yellow", perhaps.

> >>> And neither can you safely answer "there was a read error". fgets is
> >>> inappropriate to use here.

no

> >> There is a HUGE difference between giving no answer and stop processing,
> >> and giving the WRONG answer and to continue processing.

I'm with you here!

> >> Using fgets() is OK, and testing it for NULL is correct too. For info on
> >> how to handle EOF and IO errors, see FAQ 16.8

yes

> > Apparently I'm just failing to communicate here. To a user of a program,
> > it should not report the same error for an empty file as for an I/O error.

hence the FAQ reference...


> > The original function lumped these two together. I'm not asking how to do
> > this correctly; I'm pointing out that the posted function did not do them
> > correctly.
>
> The point is that
>
> 1. empty file may or may not be an error

well it ain't an RTF file

> 2. on error, it may or may not be OK to call exit()

yes

> it simply depend on the application at hand. If one want to silently
> accept empty files, by all means do that via ferror().
>
> If calling exit() on error isn't acceptable, FCOL just replace it with a
> custom made error handler!

you may be agreeing with each other


--
Nick Keighley

The fscanf equivalent of fgets is so simple
that it can be used inline whenever needed:-

char s[NN + 1] = "", c;
int rc = fscanf(fp, "%NN[^\n]%1[\n]", s, &c);
if (rc == 1) fscanf("%*[^\n]%*c);
if (rc == 0) getc(fp);

(Dan Pop comp.lang.c)


0 new messages