Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

ignoring \0 within char

92 views
Skip to first unread message

sebastian

unread,
May 5, 2011, 12:35:24 PM5/5/11
to
Hello,

i try to read out information from a file. there will be sometime
within the read information a charendterminator. is there a possibilty
to ignore this and step to the next character?

gcc options

Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro
4.4.4-14ubuntu5' --with-bugurl=file:///usr/share/doc/gcc-4.4/
README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/
usr --program-suffix=-4.4 --enable-shared --enable-multiarch --enable-
linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-
included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/
include/c++/4.4 --libdir=/usr/lib --enable-nls --with-sysroot=/ --
enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --disable-
werror --with-arch-32=i686 --with-tune=generic --enable-
checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --
target=x86_64-linux-gnu
Thread model: posix
gcc version 4.4.5 (Ubuntu/Linaro 4.4.4-14ubuntu5)

input line

T^@e^@s^@t

code example

long lReadBytes = 0;
size_t t = 100;
char *buffer = (char *) malloc(sizeof(char*));
buffer = NULL;

while ( (lReadBytes = getline( &buffer, &t, pdfstream)) > 0 )
{
int pos = 0;
for (pos = 0; pos <= lReadBytes; pos++)
{
printf("%c\n", buffer[pos]);
}
}


thanks.
--
comp.lang.c.moderated - moderation address: cl...@plethora.net -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line. Sorry.

Keith Thompson

unread,
May 5, 2011, 5:04:29 PM5/5/11
to
sebastian <cup...@gmx.de> writes:
> i try to read out information from a file. there will be sometime
> within the read information a charendterminator. is there a possibilty
> to ignore this and step to the next character?
>
> gcc options
>
> Target: x86_64-linux-gnu
> Configured with:
[snip]
> input line
>
> T^@e^@s^@t

That looks like text encoded with UTF-16. Ignoring the null characters
might work if all the represented characters are in the range 0..255,
but you'd be better off either converting it to another form or figuring
out how to read it as UTF-16 (which includes getting the endianness
right).

> code example
>
> long lReadBytes = 0;
> size_t t = 100;
> char *buffer = (char *) malloc(sizeof(char*));

The cast is unnecessary and potentially harmful.

Why are you allocating sizeof(char*) bytes? If getline() works the way
I think it does, the number of bytes in the initial allocation probably
doesn't matter (for that matter, I think you can start with a null
pointer), but allocationg sizeof(char*) bytes for a char array doesn't
make sense.

> buffer = NULL;
>
> while ( (lReadBytes = getline( &buffer, &t, pdfstream)) > 0 )
> {
> int pos = 0;
> for (pos = 0; pos <= lReadBytes; pos++)
> {
> printf("%c\n", buffer[pos]);

If you want to ignore null characters, just ignore them:

if (buffer[pos] != '\0')
{
/* whatever */
}

> }
> }

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Fred

unread,
May 5, 2011, 5:03:44 PM5/5/11
to

buffer holds only one byte?

>         buffer = NULL;

Now buffer doesn't point to anything at all

>
>         while ( (lReadBytes = getline( &buffer, &t, pdfstream)) > 0 )

What is getline() ?

>         {
>                 int pos = 0;
>                 for (pos = 0; pos <= lReadBytes; pos++)
>                 {
>                         printf("%c\n", buffer[pos]);
>                 }
>         }
>

Show us a complete, compilable program.
--
Fred K

Barry Schwarz

unread,
May 9, 2011, 2:38:27 AM5/9/11
to
On Thu, 5 May 2011 11:35:24 -0500 (CDT), sebastian <cup...@gmx.de>
wrote:

>Hello,
>
>i try to read out information from a file. there will be sometime
>within the read information a charendterminator. is there a possibilty
>to ignore this and step to the next character?

The standard string functions will stop processing data a the first
'\0'. However, in code you write, you are free to do whatever you
want when you encounter such a character. You do need to make sure
that you have some other way of determining the end of the data since
you are not following string conventions.

--
Remove del for email

Jasen Betts

unread,
May 9, 2011, 2:38:42 AM5/9/11
to
On 2011-05-05, Fred <fred.l.kl...@boeing.com> wrote:

>>         char *buffer = (char *) malloc(sizeof(char*));

> buffer holds only one byte?

pointers are atleast 2 bytes in size, according to the standard
(8 in this specific case).

>>         while ( (lReadBytes = getline( &buffer, &t, pdfstream)) > 0 )
>
> What is getline() ?

an Open Group extension the C library. It combines the behaviour for fgets() with
that of realloc() to read a whole line whatever the size.
http://pubs.opengroup.org/onlinepubs/9699919799/functions/getline.html

--
⚂⚃ 100% natural

James Kuyper

unread,
May 18, 2011, 1:13:26 PM5/18/11
to
On 05/09/2011 02:38 AM, Jasen Betts wrote:
> On 2011-05-05, Fred <fred.l.kl...@boeing.com> wrote:
>
>>> char *buffer = (char *) malloc(sizeof(char*));
>
>> buffer holds only one byte?
>
> pointers are atleast 2 bytes in size, according to the standard
> (8 in this specific case).

In comp.lang.c.moderated, unless otherwise specified, "the standard"
normally refers to the C standard, which imposes no such requirement. I
presume you're referring to some other standard?
--
James Kuyper

Dag-Erling Smørgrav

unread,
May 18, 2011, 1:13:41 PM5/18/11
to
sebastian <cup...@gmx.de> writes:
> i try to read out information from a file. there will be sometime
> within the read information a charendterminator. is there a possibilty
> to ignore this and step to the next character?

What is a charendterminator?

> size_t t = 100;
> char *buffer = (char *) malloc(sizeof(char*));

Why malloc? You need a char *, and buffer is already a char *.

Furthermore, a foo * is a pointer to a foo or to an array of foo, so the
size passed to malloc() should be a multiple of the sizeof(foo). It
just so happens that *everything* is a multiple of sizeof(char), but
this assignment is still a conceptual mistake.

Here is a common malloc() idiom:

foo *p = malloc(n * sizeof(*p));

This is particularly useful if p is declared somewhere else, e.g.

foo *p;

/* 100 lines of code */

p = malloc(n * sizeof(*p));

because you do not need to change the malloc() call (which is easy to
forget) if you decide to change the type of p.

> buffer = NULL;

The memory you malloc()ed is orphaned here, so the (incorrect) malloc()
call is completely wasted.

> while ( (lReadBytes = getline( &buffer, &t, pdfstream)) > 0 )

On the first iteration, since buffer is NULL, the value of t is not
used; a new buffer is allocated, and its size is stored in t. Thus,
there is no need to initialize t. It is generally not a good idea to
initialize a variable to a non-zero value unless the initial value is
actually used, as it may give the reader the impression that the value
is meaningful and cause him to waste time trying to figure out where it
is used. In this piece of code, the only initialization needed is that
of buffer to NULL.

BTW, you should free(buffer) after the loop.

DES
--
Dag-Erling Smørgrav - d...@des.no

Francis Glassborow

unread,
May 18, 2011, 1:13:56 PM5/18/11
to
On 09/05/2011 07:38, Jasen Betts wrote:
> On 2011-05-05, Fred<fred.l.kl...@boeing.com> wrote:
>
>>> char *buffer = (char *) malloc(sizeof(char*));
>
>> buffer holds only one byte?
>
> pointers are atleast 2 bytes in size, according to the standard
> (8 in this specific case).

Since when? In C a byte is equivalent to a char and is not necessarily
an octet. There are systems where a char occupies 32 bits and that means
that within the de3finitions of Standard C, a byte is 32 bits. Yes, I
know that conflicts with the wider use of the term and the common
(mis)understanding that a byte is 8 bits.

The only guarantee is that the sizeof a char* is at least 1 and that the
range of values it can hold is at least (IIRC) 32768.

>
>>> while ( (lReadBytes = getline(&buffer,&t, pdfstream))> 0 )


>>
>> What is getline() ?
>
> an Open Group extension the C library. It combines the behaviour for fgets() with
> that of realloc() to read a whole line whatever the size.
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/getline.html

I cannot comment on whether that is an Open Group extension but it is
part of the C++ standard library (and yes, I know that C++ is not C)

Keith Thompson

unread,
May 18, 2011, 1:14:41 PM5/18/11
to
Jasen Betts <ja...@xnet.co.nz> writes:
> On 2011-05-05, Fred <fred.l.kl...@boeing.com> wrote:
>
>>>         char *buffer = (char *) malloc(sizeof(char*));
>
>> buffer holds only one byte?
>
> pointers are atleast 2 bytes in size, according to the standard
> (8 in this specific case).
[...]

There is no requirement in the standard that pointers must be at least 2
bytes. The requirement for support of objects of at least 65535 bytes,
along with the fact that each byte of an object has a distinct address,
does imply that pointers must be at least 16 bits (for hosted
implementations), but a byte can be bigger than 8 bits.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Kenneth Brody

unread,
May 18, 2011, 1:14:56 PM5/18/11
to
On 5/9/2011 2:38 AM, Jasen Betts wrote:
> On 2011-05-05, Fred<fred.l.kl...@boeing.com> wrote:
>
>>> char *buffer = (char *) malloc(sizeof(char*));

Note that the cast is unneeded, and in C is considered "bad form", as it can
hide warnings if you forget to #include the proper header.

>> buffer holds only one byte?
>
> pointers are atleast 2 bytes in size, according to the standard

C&V, please.

> (8 in this specific case).

But, what is the purpose of allocating a buffer of sizeof(char*), especially
given that the next line is:

buffer = NULL;

>>> while ( (lReadBytes = getline(&buffer,&t, pdfstream))> 0 )


>>
>> What is getline() ?
>
> an Open Group extension the C library. It combines the behaviour for fgets() with
> that of realloc() to read a whole line whatever the size.
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/getline.html


--
Kenneth Brody

Barry Schwarz

unread,
May 18, 2011, 1:15:11 PM5/18/11
to
On Mon, 9 May 2011 01:38:42 -0500 (CDT), Jasen Betts
<ja...@xnet.co.nz> wrote:

>On 2011-05-05, Fred <fred.l.kl...@boeing.com> wrote:
>
>>>         char *buffer = (char *) malloc(sizeof(char*));
>
>> buffer holds only one byte?
>
>pointers are atleast 2 bytes in size, according to the standard

Where do you think the standard specifies the byte size of a pointer?
And why do you think a byte is always 8 bits?

>(8 in this specific case).

And you know this because?


--
Remove del for email

Keith Thompson

unread,
May 26, 2011, 2:44:16 PM5/26/11
to
Francis Glassborow <francis.g...@btinternet.com> writes:
> On 09/05/2011 07:38, Jasen Betts wrote:
>> On 2011-05-05, Fred<fred.l.kl...@boeing.com> wrote:
[...]

>>>> while ( (lReadBytes = getline(&buffer,&t, pdfstream))> 0 )
>>>
>>> What is getline() ?
>>
>> an Open Group extension the C library. It combines the behaviour for
>> fgets() with that of realloc() to read a whole line whatever the size.
>> http://pubs.opengroup.org/onlinepubs/9699919799/functions/getline.html
>
> I cannot comment on whether that is an Open Group extension but it is
> part of the C++ standard library (and yes, I know that C++ is not C)

The C++ standard library does have a getline() function (actually
several of them, I think), but that's not the same as the Open Group
getline() function. (The C++ version, because of namespaces and
overloading, doesn't intrude on the user namespace.)

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Jasen Betts

unread,
May 26, 2011, 3:14:41 PM5/26/11
to
On 2011-05-18, Barry Schwarz <schw...@dqel.com> wrote:
> On Mon, 9 May 2011 01:38:42 -0500 (CDT), Jasen Betts
><ja...@xnet.co.nz> wrote:
>
>>On 2011-05-05, Fred <fred.l.kl...@boeing.com> wrote:
>>
>>>>         char *buffer = (char *) malloc(sizeof(char*));
>>
>>> buffer holds only one byte?
>>
>>pointers are atleast 2 bytes in size, according to the standard
>
> Where do you think the standard specifies the byte size of a pointer?
> And why do you think a byte is always 8 bits?

I seem to recall somwhere a lower limit on pointer range is given that's
greater than can fit in 8 bits, I had forgotten that the c standard allows
for bytes of unusual size.

>
>>(8 in this specific case).
>
> And you know this because?

I have a passing familiarity with the compiler mentioned in the
original post.

> --
> Remove del for email


--
⚂⚃ 100% natural

0 new messages