Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
ignoring \0 within char
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  13 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
sebastian  
View profile  
 More options May 5 2011, 12:35 pm
Newsgroups: comp.lang.c.moderated
From: sebastian <cup...@gmx.de>
Date: Thu, 5 May 2011 11:35:24 -0500 (CDT)
Local: Thurs, May 5 2011 12:35 pm
Subject: ignoring \0 within char
Hello,

i try to read out information from a file. there will be sometime
within the read information a charendterminator. is there a possibilty
to ignore this and step to the next character?

gcc options

Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro
4.4.4-14ubuntu5' --with-bugurl=file:///usr/share/doc/gcc-4.4/
README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/
usr --program-suffix=-4.4 --enable-shared --enable-multiarch --enable-
linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-
included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/
include/c++/4.4 --libdir=/usr/lib --enable-nls --with-sysroot=/ --
enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --disable-
werror --with-arch-32=i686 --with-tune=generic --enable-
checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --
target=x86_64-linux-gnu
Thread model: posix
gcc version 4.4.5 (Ubuntu/Linaro 4.4.4-14ubuntu5)

input line

T^@e^@s^@t

code example

        long lReadBytes = 0;
        size_t t = 100;
        char *buffer = (char *) malloc(sizeof(char*));
        buffer = NULL;

        while ( (lReadBytes = getline( &buffer, &t, pdfstream)) > 0 )
        {
                int pos = 0;
                for (pos = 0; pos <= lReadBytes; pos++)
                {
                        printf("%c\n", buffer[pos]);
                }
        }

thanks.
--
comp.lang.c.moderated - moderation address: c...@plethora.net -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line.  Sorry.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Keith Thompson  
View profile  
 More options May 5 2011, 5:04 pm
Newsgroups: comp.lang.c.moderated
From: Keith Thompson <ks...@mib.org>
Date: Thu, 5 May 2011 16:04:29 -0500 (CDT)
Local: Thurs, May 5 2011 5:04 pm
Subject: Re: ignoring \0 within char

sebastian <cup...@gmx.de> writes:
> i try to read out information from a file. there will be sometime
> within the read information a charendterminator. is there a possibilty
> to ignore this and step to the next character?

> gcc options

> Target: x86_64-linux-gnu
> Configured with:
[snip]
> input line

> T^@e^@s^@t

That looks like text encoded with UTF-16.  Ignoring the null characters
might work if all the represented characters are in the range 0..255,
but you'd be better off either converting it to another form or figuring
out how to read it as UTF-16 (which includes getting the endianness
right).

> code example

>    long lReadBytes = 0;
>    size_t t = 100;
>    char *buffer = (char *) malloc(sizeof(char*));

The cast is unnecessary and potentially harmful.

Why are you allocating sizeof(char*) bytes?  If getline() works the way
I think it does, the number of bytes in the initial allocation probably
doesn't matter (for that matter, I think you can start with a null
pointer), but allocationg sizeof(char*) bytes for a char array doesn't
make sense.

>    buffer = NULL;

>    while ( (lReadBytes = getline( &buffer, &t, pdfstream)) > 0 )
>    {
>            int pos = 0;
>            for (pos = 0; pos <= lReadBytes; pos++)
>            {
>                    printf("%c\n", buffer[pos]);

If you want to ignore null characters, just ignore them:

                        if (buffer[pos] != '\0')
                        {
                            /* whatever */
                        }

>            }
>    }

--
Keith Thompson (The_Other_Keith) ks...@mib.org  <http://www.ghoti.net/~kst>
Nokia
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
--
comp.lang.c.moderated - moderation address: c...@plethora.net -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line.  Sorry.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Fred  
View profile  
 More options May 5 2011, 5:03 pm
Newsgroups: comp.lang.c.moderated
From: Fred <fred.l.kleinschm...@boeing.com>
Date: Thu, 5 May 2011 16:03:44 -0500 (CDT)
Local: Thurs, May 5 2011 5:03 pm
Subject: Re: ignoring \0 within char
On May 5, 9:35 am, sebastian <cup...@gmx.de> wrote:

buffer holds only one byte?

>         buffer = NULL;

Now buffer doesn't point to anything at all

>         while ( (lReadBytes = getline( &buffer, &t, pdfstream)) > 0 )

What is getline() ?

>         {
>                 int pos = 0;
>                 for (pos = 0; pos <= lReadBytes; pos++)
>                 {
>                         printf("%c\n", buffer[pos]);
>                 }
>         }

Show us  a complete, compilable program.
--
Fred K
--
comp.lang.c.moderated - moderation address: c...@plethora.net -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line.  Sorry.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Barry Schwarz  
View profile  
 More options May 9 2011, 2:38 am
Newsgroups: comp.lang.c.moderated
From: Barry Schwarz <schwa...@dqel.com>
Date: Mon, 9 May 2011 01:38:27 -0500 (CDT)
Local: Mon, May 9 2011 2:38 am
Subject: Re: ignoring \0 within char
On Thu, 5 May 2011 11:35:24 -0500 (CDT), sebastian <cup...@gmx.de>
wrote:

>Hello,

>i try to read out information from a file. there will be sometime
>within the read information a charendterminator. is there a possibilty
>to ignore this and step to the next character?

The standard string functions will stop processing data a the first
'\0'.  However, in code you write, you are free to do whatever you
want when you encounter such a character.  You do need to make sure
that you have some other way of determining the end of the data since
you are not following string conventions.

--
Remove del for email
--
comp.lang.c.moderated - moderation address: c...@plethora.net -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line.  Sorry.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jasen Betts  
View profile  
 More options May 9 2011, 2:38 am
Newsgroups: comp.lang.c.moderated
From: Jasen Betts <ja...@xnet.co.nz>
Date: Mon, 9 May 2011 01:38:42 -0500 (CDT)
Local: Mon, May 9 2011 2:38 am
Subject: Re: ignoring \0 within char
On 2011-05-05, Fred <fred.l.kleinschm...@boeing.com> wrote:

>>         char *buffer = (char *) malloc(sizeof(char*));
> buffer holds only one byte?

pointers are atleast 2 bytes in size, according to the standard
(8 in this specific case).

>>         while ( (lReadBytes = getline( &buffer, &t, pdfstream)) > 0 )

> What is getline() ?

an Open Group extension the C library. It combines the behaviour for fgets() with
that of realloc() to read a whole line whatever the size.
http://pubs.opengroup.org/onlinepubs/9699919799/functions/getline.html

--
⚂⚃ 100% natural
--
comp.lang.c.moderated - moderation address: c...@plethora.net -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line.  Sorry.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
James Kuyper  
View profile  
 More options May 18 2011, 1:13 pm
Newsgroups: comp.lang.c.moderated
From: James Kuyper <jameskuy...@verizon.net>
Date: Wed, 18 May 2011 12:13:26 -0500 (CDT)
Local: Wed, May 18 2011 1:13 pm
Subject: Re: ignoring \0 within char
On 05/09/2011 02:38 AM, Jasen Betts wrote:

> On 2011-05-05, Fred <fred.l.kleinschm...@boeing.com> wrote:

>>>         char *buffer = (char *) malloc(sizeof(char*));

>> buffer holds only one byte?

> pointers are atleast 2 bytes in size, according to the standard
> (8 in this specific case).

In comp.lang.c.moderated, unless otherwise specified, "the standard"
normally refers to the C standard, which imposes no such requirement. I
presume you're referring to some other standard?
--
James Kuyper
--
comp.lang.c.moderated - moderation address: c...@plethora.net -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line.  Sorry.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dag-Erling Smørgrav  
View profile  
 More options May 18 2011, 1:13 pm
Newsgroups: comp.lang.c.moderated
From: Dag-Erling Smørgrav <d...@des.no>
Date: Wed, 18 May 2011 12:13:41 -0500 (CDT)
Local: Wed, May 18 2011 1:13 pm
Subject: Re: ignoring \0 within char

sebastian <cup...@gmx.de> writes:
> i try to read out information from a file. there will be sometime
> within the read information a charendterminator. is there a possibilty
> to ignore this and step to the next character?

What is a charendterminator?

>    size_t t = 100;
>    char *buffer = (char *) malloc(sizeof(char*));

Why malloc?  You need a char *, and buffer is already a char *.

Furthermore, a foo * is a pointer to a foo or to an array of foo, so the
size passed to malloc() should be a multiple of the sizeof(foo).  It
just so happens that *everything* is a multiple of sizeof(char), but
this assignment is still a conceptual mistake.

Here is a common malloc() idiom:

    foo *p = malloc(n * sizeof(*p));

This is particularly useful if p is declared somewhere else, e.g.

    foo *p;

    /* 100 lines of code */

    p = malloc(n * sizeof(*p));

because you do not need to change the malloc() call (which is easy to
forget) if you decide to change the type of p.

>    buffer = NULL;

The memory you malloc()ed is orphaned here, so the (incorrect) malloc()
call is completely wasted.

>    while ( (lReadBytes = getline( &buffer, &t, pdfstream)) > 0 )

On the first iteration, since buffer is NULL, the value of t is not
used; a new buffer is allocated, and its size is stored in t.  Thus,
there is no need to initialize t.  It is generally not a good idea to
initialize a variable to a non-zero value unless the initial value is
actually used, as it may give the reader the impression that the value
is meaningful and cause him to waste time trying to figure out where it
is used.  In this piece of code, the only initialization needed is that
of buffer to NULL.

BTW, you should free(buffer) after the loop.

DES
--
Dag-Erling Smørgrav - d...@des.no
--
comp.lang.c.moderated - moderation address: c...@plethora.net -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line.  Sorry.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Francis Glassborow  
View profile  
 More options May 18 2011, 1:13 pm
Newsgroups: comp.lang.c.moderated
From: Francis Glassborow <francis.glassbo...@btinternet.com>
Date: Wed, 18 May 2011 12:13:56 -0500 (CDT)
Local: Wed, May 18 2011 1:13 pm
Subject: Re: ignoring \0 within char
On 09/05/2011 07:38, Jasen Betts wrote:

> On 2011-05-05, Fred<fred.l.kleinschm...@boeing.com>  wrote:

>>>          char *buffer = (char *) malloc(sizeof(char*));

>> buffer holds only one byte?

> pointers are atleast 2 bytes in size, according to the standard
> (8 in this specific case).

Since when?  In C a byte is equivalent to a char and is not necessarily
an octet. There are systems where a char occupies 32 bits and that means
that within the de3finitions of Standard C, a byte is 32 bits. Yes, I
know that conflicts with the wider use of the term and the common
(mis)understanding that a byte is 8 bits.

The only guarantee is that the sizeof a char* is at least 1 and that the
range of values it can hold is at least (IIRC) 32768.

>>>          while ( (lReadBytes = getline(&buffer,&t, pdfstream))>  0 )

>> What is getline() ?

> an Open Group extension the C library. It combines the behaviour for fgets() with
> that of realloc() to read a whole line whatever the size.
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/getline.html

I cannot comment on whether that is an Open Group extension but it is
part of the C++ standard library (and yes, I know that C++ is not C)


--
comp.lang.c.moderated - moderation address: c...@plethora.net -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line.  Sorry.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Keith Thompson  
View profile  
 More options May 18 2011, 1:14 pm
Newsgroups: comp.lang.c.moderated
From: Keith Thompson <ks...@mib.org>
Date: Wed, 18 May 2011 12:14:41 -0500 (CDT)
Local: Wed, May 18 2011 1:14 pm
Subject: Re: ignoring \0 within char
Jasen Betts <ja...@xnet.co.nz> writes:
> On 2011-05-05, Fred <fred.l.kleinschm...@boeing.com> wrote:

>>>         char *buffer = (char *) malloc(sizeof(char*));

>> buffer holds only one byte?

> pointers are atleast 2 bytes in size, according to the standard
> (8 in this specific case).

[...]

There is no requirement in the standard that pointers must be at least 2
bytes.  The requirement for support of objects of at least 65535 bytes,
along with the fact that each byte of an object has a distinct address,
does imply that pointers must be at least 16 bits (for hosted
implementations), but a byte can be bigger than 8 bits.

--
Keith Thompson (The_Other_Keith) ks...@mib.org  <http://www.ghoti.net/~kst>
Nokia
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
--
comp.lang.c.moderated - moderation address: c...@plethora.net -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line.  Sorry.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Kenneth Brody  
View profile  
 More options May 18 2011, 1:14 pm
Newsgroups: comp.lang.c.moderated
From: Kenneth Brody <kenbr...@spamcop.net>
Date: Wed, 18 May 2011 12:14:56 -0500 (CDT)
Local: Wed, May 18 2011 1:14 pm
Subject: Re: ignoring \0 within char
On 5/9/2011 2:38 AM, Jasen Betts wrote:

> On 2011-05-05, Fred<fred.l.kleinschm...@boeing.com>  wrote:

>>>          char *buffer = (char *) malloc(sizeof(char*));

Note that the cast is unneeded, and in C is considered "bad form", as it can
hide warnings if you forget to #include the proper header.

>> buffer holds only one byte?

> pointers are atleast 2 bytes in size, according to the standard

C&V, please.

> (8 in this specific case).

But, what is the purpose of allocating a buffer of sizeof(char*), especially
given that the next line is:

     buffer = NULL;

>>>          while ( (lReadBytes = getline(&buffer,&t, pdfstream))>  0 )

>> What is getline() ?

> an Open Group extension the C library. It combines the behaviour for fgets() with
> that of realloc() to read a whole line whatever the size.
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/getline.html

--
Kenneth Brody
--
comp.lang.c.moderated - moderation address: c...@plethora.net -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line.  Sorry.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Barry Schwarz  
View profile  
 More options May 18 2011, 1:15 pm
Newsgroups: comp.lang.c.moderated
From: Barry Schwarz <schwa...@dqel.com>
Date: Wed, 18 May 2011 12:15:11 -0500 (CDT)
Local: Wed, May 18 2011 1:15 pm
Subject: Re: ignoring \0 within char
On Mon, 9 May 2011 01:38:42 -0500 (CDT), Jasen Betts

<ja...@xnet.co.nz> wrote:
>On 2011-05-05, Fred <fred.l.kleinschm...@boeing.com> wrote:

>>>         char *buffer = (char *) malloc(sizeof(char*));

>> buffer holds only one byte?

>pointers are atleast 2 bytes in size, according to the standard

Where do you think the standard specifies the byte size of a pointer?
And why do you think a byte is always 8 bits?

>(8 in this specific case).

And you know this because?

--
Remove del for email
--
comp.lang.c.moderated - moderation address: c...@plethora.net -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line.  Sorry.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Keith Thompson  
View profile  
 More options May 26 2011, 2:44 pm
Newsgroups: comp.lang.c.moderated
From: Keith Thompson <ks...@mib.org>
Date: Thu, 26 May 2011 13:44:16 -0500 (CDT)
Local: Thurs, May 26 2011 2:44 pm
Subject: Re: ignoring \0 within char

Francis Glassborow <francis.glassbo...@btinternet.com> writes:
> On 09/05/2011 07:38, Jasen Betts wrote:
>> On 2011-05-05, Fred<fred.l.kleinschm...@boeing.com>  wrote:
[...]
>>>>          while ( (lReadBytes = getline(&buffer,&t, pdfstream))>  0 )

>>> What is getline() ?

>> an Open Group extension the C library. It combines the behaviour for
>> fgets() with that of realloc() to read a whole line whatever the size.
>> http://pubs.opengroup.org/onlinepubs/9699919799/functions/getline.html

> I cannot comment on whether that is an Open Group extension but it is
> part of the C++ standard library (and yes, I know that C++ is not C)

The C++ standard library does have a getline() function (actually
several of them, I think), but that's not the same as the Open Group
getline() function.  (The C++ version, because of namespaces and
overloading, doesn't intrude on the user namespace.)

--
Keith Thompson (The_Other_Keith) ks...@mib.org  <http://www.ghoti.net/~kst>
Nokia
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
--
comp.lang.c.moderated - moderation address: c...@plethora.net -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line.  Sorry.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jasen Betts  
View profile  
 More options May 26 2011, 3:14 pm
Newsgroups: comp.lang.c.moderated
From: Jasen Betts <ja...@xnet.co.nz>
Date: Thu, 26 May 2011 14:14:41 -0500 (CDT)
Local: Thurs, May 26 2011 3:14 pm
Subject: Re: ignoring \0 within char
On 2011-05-18, Barry Schwarz <schwa...@dqel.com> wrote:

> On Mon, 9 May 2011 01:38:42 -0500 (CDT), Jasen Betts
><ja...@xnet.co.nz> wrote:

>>On 2011-05-05, Fred <fred.l.kleinschm...@boeing.com> wrote:

>>>>         char *buffer = (char *) malloc(sizeof(char*));

>>> buffer holds only one byte?

>>pointers are atleast 2 bytes in size, according to the standard

> Where do you think the standard specifies the byte size of a pointer?
> And why do you think a byte is always 8 bits?

I seem to recall somwhere a lower limit on pointer range is given that's
greater than can fit in 8 bits, I had forgotten that the c standard allows
for bytes of unusual size.

>>(8 in this specific case).

> And you know this because?

I have a passing familiarity with the compiler mentioned in the
original post.

> --
> Remove del for email

--
⚂⚃ 100% natural
--
comp.lang.c.moderated - moderation address: c...@plethora.net -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line.  Sorry.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »