How is strlen implemented?

3 vues
Accéder directement au premier message non lu

roy

non lue,
22 avr. 2005, 23:38:0622/04/2005
à
Hi,

I was wondering how strlen is implemented.
What if the input string doesn't have a null terminator, namely the
'\0'?
Thanks a lot
Roy

Chris McDonald

non lue,
22 avr. 2005, 23:44:1722/04/2005
à
"roy" <roy...@hotmail.com> writes:

>Hi,

>I was wondering how strlen is implemented.
>What if the input string doesn't have a null terminator, namely the
>'\0'?

Without the null-byte terminator, it's not a string!
strlen() can then do whatever it wants.

--
Chris.

roy

non lue,
22 avr. 2005, 23:59:4922/04/2005
à
Thanks. Maybe my question should be "what if the input is a char array
without a null terminator". But from my experimental results, it seems
that strlen can still return the number of characters of a char array.
I am just not sure whether I am just lucky or sth else happened inside
strlen.

Jason

non lue,
23 avr. 2005, 00:04:1023/04/2005
à

strlen will read from the char* until it finds a '\0' char. If your
string does not use the '\0' as a terminator, then you should avoid
most of the <string.h> functions.

-Jason

Chris Torek

non lue,
22 avr. 2005, 23:52:2522/04/2005
à
In article <1114227486....@g14g2000cwa.googlegroups.com>

roy <roy...@hotmail.com> wrote:
>I was wondering how strlen is implemented.
>What if the input string doesn't have a null terminator, namely the
>'\0'?

Q: What if a tree growing in a forest is made of plastic?
A: Then it is not a tree, or at least, it is not growing.

If something someone else is calling a "string" does not have the
'\0' terminator, it is not a string, or at least, not a C string.
In C, the word "string" means "data structure consisting of zero
or more characters, followed by a '\0' terminator". No terminator,
no string.

Since strlen() requires a string, it may assume it gets one.

There are functions that work on "non-stringy arrays"; in particular,
the mem* functions -- memcpy(), memmove(), memcmp(), memset(),
memchr() -- but they take more than one argument. If you have an
array that always contains exactly 40 characters, and it is possible
that none of them is '\0' but you want to find out whether there
is a '\0' in those 40 characters, you can use memchr():

char *p = memchr(my_array, '\0', 40);

memchr() stops when it finds the first '\0' or has used up the
count, whichever occurs first. (It then returns a pointer to the
found character, or NULL if the count ran out.) The strlen()
function has an effect much like memchr() with an "infinite" count,
except that because the count is "infinite", it "always" finds the
'\0':

size_t much_like_strlen(const char *p) {
const char *q = memchr(p, '\0', INFINITY);
return q - p;
}

except of course C does not really have a way to express "infinity"
here. (You can approximate it with (size_t)-1, though.)
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

Chris McDonald

non lue,
23 avr. 2005, 00:10:4523/04/2005
à
"roy" <roy...@hotmail.com> writes:

You were just lucky.

--
Chris.

Martin Ambuhl

non lue,
23 avr. 2005, 01:08:1823/04/2005
à
roy wrote:
> Hi,
>
> I was wondering how strlen is implemented.

It could be implemented in several ways. The obvious one is to count
characters until a '\0' is encountered.

> What if the input string doesn't have a null terminator, namely the
> '\0'?

Then it isn't a string, which has such a terminator by definition.

Keith Thompson

non lue,
23 avr. 2005, 03:09:2423/04/2005
à

It's helpful to provide some context when you post a followup. I
happen to have read the previous articles just before I read this one,
but I could as easily have seen your followup first.

If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers.

As for your question, strlen()'s argument isn't a char array, it's a
pointer to a char. Normally the pointer should point to the first
element of a "string" (i.e., a sequence of characters marked by a '\0'
terminator). strlen() has doesn't know how many characters are
actually in the array. By calling strlen(), you're promising that
there's a '\0' terminator somewhere within the array; if you break
that promise, there's no telling what will happen.

A typical implementation of strlen() will simply traverse the elements
of what it assumes to be your array until it finds a '\0' character.
If it doesn't find a '\0' character within the array, it has no way of
knowing it should stop searching, so it will just continue until it
finds a '\0'. As soon as it passes the end of the array, it invokes
undefined behavior. It might happen to find a '\0' character (which
is what happened in your case). Or it might run past the memory owned
by your program and trigger a segmentation fault or something similar.
Or, as far as the C standard is concerned, it might make demons fly
out your nose.

So don't do that.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Joe Wright

non lue,
23 avr. 2005, 08:46:5723/04/2005
à

More precisely, if your char array does not have a 0 terminator, it is
not a string.
--
Joe Wright mailto:joeww...@comcast.net
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---

Richard Tobin

non lue,
23 avr. 2005, 08:47:0223/04/2005
à
In article <1114228789.4...@l41g2000cwc.googlegroups.com>,
roy <roy...@hotmail.com> wrote:

>Thanks. Maybe my question should be "what if the input is a char array
>without a null terminator". But from my experimental results, it seems
>that strlen can still return the number of characters of a char array.

Bear in mind that a char array usually *does* have a null terminator.

If it doesn't, it's quite likely to be followed in by memory by a zero
byte, which is the representation of nul on almost all systems, so it
will often work by luck.

Debugging systems often have an option to initialize variables to
non-zero values, precisely to stop this kind of "luck" from obscuring
real errors. Some readers will remember the many bugs that were
revealed when dynamic linking was added to SunOS, causing
uninitialized variables in main() to no longer be zero.

-- Richard

Gregory Pietsch

non lue,
23 avr. 2005, 09:09:4223/04/2005
à
There has to be a null terminator somewhere.

Here's a short implementation:

#include <string.h>
size_t (strlen)(char *s)
{
char *p = s;

while (*p != '\0')
p++;
return (size_t)(p - s);
}

/* Gregory Pietsch */

Joe Estock

non lue,
23 avr. 2005, 09:30:5923/04/2005
à
Interesting seeing \0 so widely in use. On most systems, NULL is defined
as \0, however there are a few special cases where it is not. Shouldn't
we be using NULL instead of \0?

Joe Estock

Joe Wright

non lue,
23 avr. 2005, 11:10:3723/04/2005
à

No Joe, NULL is the 'null pointer constant' while '\0' is a constant
character (with int type) and value zero. This is often called the null
character or the NUL character. Never NULL character.

Minti

non lue,
23 avr. 2005, 13:23:1423/04/2005
à

Pardon me Chris, but I really don't get the drift of what you are
trying to convey. These strings are also "stringy", I don't see how
these are "non-stringy".

IOW you are assuming that these "non-stringy" arrays are also supposed
to end with a null character. "Stringy" I say.

--
Imanpreet Singh Arora

Chris Torek

non lue,
23 avr. 2005, 14:07:2123/04/2005
à
>Chris Torek wrote:
>>There are functions that work on "non-stringy arrays"; in particular,
>>the mem* functions ... If you have an array that always contains

>>exactly 40 characters, and it is possible that none of them is '\0'
>>but you want to find out whether there is a '\0' in those 40
>>characters, you can use memchr() ...

In article <1114276994.5...@o13g2000cwo.googlegroups.com>,


Minti <iman...@gmail.com> wrote:
>Pardon me Chris, but I really don't get the drift of what you are
>trying to convey. These strings are also "stringy", I don't see how
>these are "non-stringy".

If there is no '\0' byte in all 40 characters, it is not a string.
If there is a '\0' byte somewhere within those 40 characters, it
*is* a string -- and any characters after the first such '\0' are
not part of the string (but remain part of the array).

>IOW you are assuming that these "non-stringy" arrays are also supposed
>to end with a null character. "Stringy" I say.

In other words, I am saying that these arrays do not contain strings
if and only if they do not contain a '\0'. Note that strncpy()
sometimes makes such arrays (which is one reason some people invented
strlcpy()).

If I may draw an analogy: in mathematics, a statement is false if
there is even a single counterexample. Hence "x * (1/x) = 1" is
a false statement mathematically, because it does not hold for x=0.
(But note that if we limit it, "x * (1/x) = 1 provided x \ne 0",
the statement becomes true for x \elem real, while it remains false
for x \elem integer, and so on.) (Note that details like "x is a
real number" also matter in computing, where float and double do
not really give us "real numbers", but rather approximations.)

Keith Thompson

non lue,
23 avr. 2005, 16:13:0223/04/2005
à
"Gregory Pietsch" <GK...@flash.net> writes:
> There has to be a null terminator somewhere.

To clarify: This doesn't mean that there's a guarantee that there will
be a null terminator somewhere. It means that if there isn't a null
terminator anyway, you must not call strlen(). The burden is on the
caller.

(I briefly read your statement the other way.)

Mark McIntyre

non lue,
23 avr. 2005, 19:24:2623/04/2005
à
On 22 Apr 2005 20:59:49 -0700, in comp.lang.c , "roy"
<roy...@hotmail.com> wrote:

>Thanks. Maybe my question should be "what if the input is a char array
>without a null terminator".

your question was already answered. However, a quote from hte ISO
Standard may help:

7.21.6.3 The strlen function

3. The strlen function returns the number of characters that precede
the terminating null character.

Clearly if there's no terminating null, this function can't return
anything meaningful. It may in fact not return at all, and its not
uncommon for it to return absurd numbers such as 5678905 or -456


>But from my experimental results, it seems
>that strlen can still return the number of characters of a char array.

How can it do that? Its /required/ to search for the terminating null.
Your compiler is either not standard compilant, or its exhibiting
random behaviour.

>I am just not sure whether I am just lucky or sth else happened inside
>strlen.

lucky


--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.ungerhu.com/jxh/clc.welcome.txt>

----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups
----= East and West-Coast Server Farms - Total Privacy via Encryption =----

Keith Thompson

non lue,
23 avr. 2005, 20:32:2323/04/2005
à
Mark McIntyre <markmc...@spamcop.net> writes:
> On 22 Apr 2005 20:59:49 -0700, in comp.lang.c , "roy"
> <roy...@hotmail.com> wrote:
[...]

>>But from my experimental results, it seems
>>that strlen can still return the number of characters of a char array.
>
> How can it do that? Its /required/ to search for the terminating null.
> Your compiler is either not standard compilant, or its exhibiting
> random behaviour.

strlen() is almost certainly finding a zero byte immediately after his
array. I'd expect that to be a very common manifestation of the
undefined behavior in this case.

>>I am just not sure whether I am just lucky or sth else happened inside
>>strlen.
>
> lucky

No, if he'd been lucky it would have crashed the program (with a
meaningful diagnostic) rather than quietly returning a meaningless
result.

Stan Milam

non lue,
24 avr. 2005, 00:01:1524/04/2005
à

I found some C functions coded in assembler for the 8086 way back when.

;
; -------------------------------------------------------
; int strlen(s)
; char *s;
; Purpose: Returns the length of the string, not
; including the NULL character
; -------------------------------------------------------
;
ifndef pca
include macro2.asm
include libdef.asm
endif
;
idt strlen
def strlen
strlen: qenter bx,di
mov di,parm1[bx]
; cmp di,zero
; jz null
mov ax,ds
mov es,ax
mov cx,-1
xor al,al
cld
repnz scasb
not cx
dec cx
mov ax,cx
exitf
;null xor ax,ax
; exitf
modend strlen

I guess it's C equivelent is:

unsigned
strlen( char *string )
{
unsigned rv = -1;

while( *string ) rv--, *string++;

rv = (-rv) - 1;
return rv;
}

of course I'd just write it like this:

size_t
strlen( char *string )
{
size_t rv = 0;
while ( *string++ ) rv++;
return rv;
}

Stan Milam

non lue,
24 avr. 2005, 00:02:0924/04/2005
à
Keith Thompson wrote:

> Mark McIntyre <markmc...@spamcop.net> writes:
>
>>On 22 Apr 2005 20:59:49 -0700, in comp.lang.c , "roy"
>><roy...@hotmail.com> wrote:
>
> [...]
>
>>>But from my experimental results, it seems
>>>that strlen can still return the number of characters of a char array.
>>
>>How can it do that? Its /required/ to search for the terminating null.
>>Your compiler is either not standard compilant, or its exhibiting
>>random behaviour.
>
>
> strlen() is almost certainly finding a zero byte immediately after his
> array. I'd expect that to be a very common manifestation of the
> undefined behavior in this case.
>
>
>>>I am just not sure whether I am just lucky or sth else happened inside
>>>strlen.
>>
>>lucky
>
>
> No, if he'd been lucky it would have crashed the program (with a
> meaningful diagnostic) rather than quietly returning a meaningless
> result.
>

So, you are saying this is a poorly implemented compiler?

Keith Thompson

non lue,
24 avr. 2005, 01:02:1024/04/2005
à
Stan Milam <stm...@swbell.net> writes:
> Keith Thompson wrote:
>> Mark McIntyre <markmc...@spamcop.net> writes:
>>>On 22 Apr 2005 20:59:49 -0700, in comp.lang.c , "roy"
>>><roy...@hotmail.com> wrote:
>> [...]
>>
>>>>But from my experimental results, it seems
>>>>that strlen can still return the number of characters of a char array.
[...]

>>>>I am just not sure whether I am just lucky or sth else happened inside
>>>>strlen.
>>>
>>>lucky
>> No, if he'd been lucky it would have crashed the program (with a
>> meaningful diagnostic) rather than quietly returning a meaningless
>> result.
>
> So, you are saying this is a poorly implemented compiler?

Not at all.

First, strlen() is part of the runtime library, not part of the
compiler.

An implementation of strlen() that was able to detect the case where
the argument points to the first element of an array that doesn't
contain any '\0' characters would most likely add significant overhead
to *all* operations. The obvious way to implement it is to make all
pointers "fat", so each pointer includes both the base address and
bounds information; strlen() would then have to check the bounds.

James McIninch

non lue,
24 avr. 2005, 08:45:1924/04/2005
à roy
<posted & mailed>

By definition, a character array without a null terminator is not a string.

Calling strlen on somthing that isn't a string will cause undefined behavior
(an error).

roy wrote:

--
Remove '.nospam' from e-mail address to reply by e-mail

Mark McIntyre

non lue,
24 avr. 2005, 09:24:1824/04/2005
à
On Sun, 24 Apr 2005 00:32:23 GMT, in comp.lang.c , Keith Thompson
<ks...@mib.org> wrote:

>Mark McIntyre <markmc...@spamcop.net> writes:
>> On 22 Apr 2005 20:59:49 -0700, in comp.lang.c , "roy"
>> <roy...@hotmail.com> wrote:
>[...]
>>>But from my experimental results, it seems
>>>that strlen can still return the number of characters of a char array.
>>
>> How can it do that? Its /required/ to search for the terminating null.
>> Your compiler is either not standard compilant, or its exhibiting
>> random behaviour.
>
>strlen() is almost certainly finding a zero byte immediately after his
>array. I'd expect that to be a very common manifestation of the
>undefined behavior in this case.

that comes under my definition of 'random' - its by chance finding a
null just shortly after the string, possibly due to some debugging
mode 'helpfulness'.

Of course, if the string were zero length, then....
:-)

Emmanuel Delahaye

non lue,
24 avr. 2005, 14:02:2524/04/2005
à
roy wrote on 23/04/05 :

If the string is malformed (missing terminating 0), the behaviour is
undefined. Any thing could happen.

--
Emmanuel
The C-FAQ: http://www.eskimo.com/~scs/C-faq/faq.html
The C-library: http://www.dinkumware.com/refxc.html

.sig under repair

Emmanuel Delahaye

non lue,
24 avr. 2005, 14:05:1224/04/2005
à
Stan Milam wrote on 24/04/05 :

> So, you are saying this is a poorly implemented compiler?

What would be a better implementation ? If the limit is not here,
anything happens. Blame the coder, not the compiler.

"Clearly your code does not meet the original spec."
"You are sentenced to 30 lashes with a wet noodle."
-- Jerry Coffin in a.l.c.c++

Emmanuel Delahaye

non lue,
24 avr. 2005, 14:10:1824/04/2005
à
Joe Estock wrote on 23/04/05 :

> Interesting seeing \0 so widely in use. On most systems, NULL is defined as
> \0, however there are a few special cases where it is not. Shouldn't we be
> using NULL instead of \0?

No, because here, we are talking about the null character that is 0 or
'\0' (but I'm too lazy to type the latter).

"C is a sharp tool"

Stan Milam

non lue,
24 avr. 2005, 15:57:5024/04/2005
à
Stan Milam wrote:
> Keith Thompson wrote:

>>
>> No, if he'd been lucky it would have crashed the program (with a
>> meaningful diagnostic) rather than quietly returning a meaningless
>> result.
>>
>
> So, you are saying this is a poorly implemented compiler?

Okay guys, that was a joke.

Gregory Pietsch

non lue,
24 avr. 2005, 19:42:0324/04/2005
à
I checked my libraries, and the following may be faster than the above:

#include <string.h>
#ifndef _OPTIMIZED_FOR_SIZE
#include <limits.h>
/* Nonzero if either X or Y is not aligned on a "long" boundary. */
#ifdef _ALIGN
#define UNALIGNED1(X) ((long)X&(sizeof(long)-1))
#else
#define UNALIGNED1(X) 0
#endif

/* Macros for detecting endchar */
#if ULONG_MAX == 0xFFFFFFFFUL
#define DETECTNULL(X) (((X) - 0x01010101) & ~(X) & 0x80808080)
#elif ULONG_MAX == 0xFFFFFFFFFFFFFFFFUL
/* Nonzero if X (a long int) contains a NULL byte. */
#define DETECTNULL(X) (((X) - 0x0101010101010101) & ~(X) &
0x8080808080808080)
#else
#define _OPTIMIZED_FOR_SIZE
#endif

#ifdef DETECTNULL
#define DETECTCHAR(X,MASK) DETECTNULL(X^MASK)
#endif

#endif
/* strlen */
size_t (strlen)(const char *s)
{
const char *t = s;
#ifndef _OPTIMIZED_FOR_SIZE
unsigned long *aligned_addr;

if (!UNALIGNED1(s)) {
aligned_addr = (unsigned long *) s;
while (!DETECTNULL(*aligned_addr))
aligned_addr++;
/* The block of bytes currently pointed to by aligned_addr
contains a null. We catch it using the bytewise search. */
s = (const char *) aligned_addr;
}
#endif
while (*s)
s++;
return (size_t) (s - t);
}

/* Gregory Pietsch */

Gregory Pietsch

non lue,
24 avr. 2005, 19:43:2124/04/2005
à
NULL is usually reserved for the null pointer. Here, we're checking for
the null character, '\0'.

Gregory Pietsch

Flash Gordon

non lue,
25 avr. 2005, 04:26:3625/04/2005
à
Gregory Pietsch wrote:
> I checked my libraries,

Do you mean your personal libraries or your implementations. Remember
that the implementation is allowed to do things you are not allowed to do.

> and the following may be faster than the above:

What above? Please quote enough of the message you are replying to for
us to see what you are talking about. There is an option that gets
Google to do the right thing and if you search the group I'm sure you
will find the instructions. It's in someone's sig, but I can't remember who.

> #include <string.h>
> #ifndef _OPTIMIZED_FOR_SIZE

An implementation could declare that or not for any reason it wants.

> #include <limits.h>
> /* Nonzero if either X or Y is not aligned on a "long" boundary. */
> #ifdef _ALIGN

Again, a compiler could declare that or not as it saw fit.

> #define UNALIGNED1(X) ((long)X&(sizeof(long)-1))

There is no guarantee that this will tell you if it is aligned. Some
people around here have worked on word addressed systems where the byte
within the word was flagged in the *high* bits of the address.

> #else
> #define UNALIGNED1(X) 0
> #endif
>
> /* Macros for detecting endchar */
> #if ULONG_MAX == 0xFFFFFFFFUL
> #define DETECTNULL(X) (((X) - 0x01010101) & ~(X) & 0x80808080)

Misleading name, I initially read that as a screwy attempt to detect a
NULL pointer. DETECTNULCHAR would be better.

> #elif ULONG_MAX == 0xFFFFFFFFFFFFFFFFUL
> /* Nonzero if X (a long int) contains a NULL byte. */
> #define DETECTNULL(X) (((X) - 0x0101010101010101) & ~(X) &
> 0x8080808080808080)
> #else
> #define _OPTIMIZED_FOR_SIZE

Isn't that macro you are defining in the implementation name space?
Anything could happen.

> #endif
>
> #ifdef DETECTNULL
> #define DETECTCHAR(X,MASK) DETECTNULL(X^MASK)
> #endif
>
> #endif
> /* strlen */
> size_t (strlen)(const char *s)
> {
> const char *t = s;
> #ifndef _OPTIMIZED_FOR_SIZE
> unsigned long *aligned_addr;
>
> if (!UNALIGNED1(s)) {
> aligned_addr = (unsigned long *) s;
> while (!DETECTNULL(*aligned_addr))
> aligned_addr++;

The above could read bytes off the end of a properly nul terminated
string. For example,
size_t len = strlen("a");

> /* The block of bytes currently pointed to by aligned_addr
> contains a null. We catch it using the bytewise search. */
> s = (const char *) aligned_addr;
> }
> #endif
> while (*s)
> s++;
> return (size_t) (s - t);

No need to cast the result of the subtraction. The compiler already
knows is is returning a size_t so will do the conversion anyway.

> }
>
> /* Gregory Pietsch */
--
Flash Gordon
Living in interesting times.
Although my email address says spam, it is real and I read it.

Lawrence Kirby

non lue,
25 avr. 2005, 12:48:0425/04/2005
à
On Sun, 24 Apr 2005 05:02:10 +0000, Keith Thompson wrote:

> Stan Milam <stm...@swbell.net> writes:
>> Keith Thompson wrote:
>>> Mark McIntyre <markmc...@spamcop.net> writes:
>>>>On 22 Apr 2005 20:59:49 -0700, in comp.lang.c , "roy"
>>>><roy...@hotmail.com> wrote:
>>> [...]
>>>
>>>>>But from my experimental results, it seems
>>>>>that strlen can still return the number of characters of a char array.
> [...]
>>>>>I am just not sure whether I am just lucky or sth else happened inside
>>>>>strlen.
>>>>
>>>>lucky
>>> No, if he'd been lucky it would have crashed the program (with a
>>> meaningful diagnostic) rather than quietly returning a meaningless
>>> result.
>>
>> So, you are saying this is a poorly implemented compiler?
>
> Not at all.
>
> First, strlen() is part of the runtime library, not part of the
> compiler.

It is part of the implementation which covers both compiler and library.
Many compilers can generate their own inline code for strlen() in which
case the "library" as a separate concept has little to do with it.

Lawrence

Gregory Pietsch

non lue,
25 avr. 2005, 16:40:5525/04/2005
à

Flash Gordon wrote:
> Gregory Pietsch wrote:
> > I checked my libraries,
>
> Do you mean your personal libraries or your implementations. Remember

> that the implementation is allowed to do things you are not allowed
to do.

It was my implementation, based on unravelling the "while(*s)s++" loop.

>
> > and the following may be faster than the above:
>
> What above? Please quote enough of the message you are replying to
for
> us to see what you are talking about. There is an option that gets
> Google to do the right thing and if you search the group I'm sure you

> will find the instructions. It's in someone's sig, but I can't
remember who.
>
> > #include <string.h>
> > #ifndef _OPTIMIZED_FOR_SIZE
>
> An implementation could declare that or not for any reason it wants.

If _OPTIMIZED_FOR_SIZE is declared, the implementation tries to unravel
the "while(*s)s++" loop somewhat.

>
> > #include <limits.h>
> > /* Nonzero if either X or Y is not aligned on a "long" boundary.
*/
> > #ifdef _ALIGN
>
> Again, a compiler could declare that or not as it saw fit.

There's no way to portably detect whether a pointer-to-char is aligned
on a long boundary, is there?

>
> > #define UNALIGNED1(X) ((long)X&(sizeof(long)-1))
>
> There is no guarantee that this will tell you if it is aligned. Some
> people around here have worked on word addressed systems where the
byte
> within the word was flagged in the *high* bits of the address.

I bet that makes for some funky internal pointer arithmetic!

>
> > #else
> > #define UNALIGNED1(X) 0
> > #endif
> >
> > /* Macros for detecting endchar */
> > #if ULONG_MAX == 0xFFFFFFFFUL
> > #define DETECTNULL(X) (((X) - 0x01010101) & ~(X) & 0x80808080)
>
> Misleading name, I initially read that as a screwy attempt to detect
a
> NULL pointer. DETECTNULCHAR would be better.
>
> > #elif ULONG_MAX == 0xFFFFFFFFFFFFFFFFUL
> > /* Nonzero if X (a long int) contains a NULL byte. */
> > #define DETECTNULL(X) (((X) - 0x0101010101010101) & ~(X) &
> > 0x8080808080808080)
> > #else
> > #define _OPTIMIZED_FOR_SIZE
>
> Isn't that macro you are defining in the implementation name space?
> Anything could happen.
>

I tried two types of optimizations, one for time (try to unravel the
loop) and one for size. If I don't get a kind of system where casting
a pointer-to-char to a pointer-to-unsigned-long doesn't make much
sense, #defining _OPTIMIZED_FOR_SIZE allows me to leave out code that
wouldn't work in that situation.

> > #endif
> >
> > #ifdef DETECTNULL
> > #define DETECTCHAR(X,MASK) DETECTNULL(X^MASK)
> > #endif
> >
> > #endif
> > /* strlen */
> > size_t (strlen)(const char *s)
> > {
> > const char *t = s;
> > #ifndef _OPTIMIZED_FOR_SIZE
> > unsigned long *aligned_addr;
> >
> > if (!UNALIGNED1(s)) {
> > aligned_addr = (unsigned long *) s;
> > while (!DETECTNULL(*aligned_addr))
> > aligned_addr++;
>
> The above could read bytes off the end of a properly nul terminated
> string. For example,
> size_t len = strlen("a");

I'm testing for having a null character somewhere among the characters
that make up the area that aligned_addr points to. If I don't get a
sane environment (as indicated by the _OPTIMIZED_FOR_SIZE macro), this
code isn't even compiled in.

Here's the general idea: suppose, for example, sizeof(unsigned long) is
4. I can freely cast a pointer-to-char to a pointer-to-unsigned-long. I
don't care if *aligned_addr is big-end-aligned or little-end-aligned.
Oh, well, is there a better way to unravel "while(*s)s++"?

>
> > /* The block of bytes currently pointed to by aligned_addr
> > contains a null. We catch it using the bytewise search.
*/
> > s = (const char *) aligned_addr;
> > }
> > #endif
> > while (*s)
> > s++;
> > return (size_t) (s - t);
>
> No need to cast the result of the subtraction. The compiler already
> knows is is returning a size_t so will do the conversion anyway.

The cast is only for my eyes. ;-)

>
> > }
> >
> > /* Gregory Pietsch */
> --
> Flash Gordon
> Living in interesting times.
> Although my email address says spam, it is real and I read it.

Gregory Pietsch

Keith Thompson

non lue,
25 avr. 2005, 17:21:1025/04/2005
à
Lawrence Kirby <lkn...@netactive.co.uk> writes:
> On Sun, 24 Apr 2005 05:02:10 +0000, Keith Thompson wrote:
[...]

>> First, strlen() is part of the runtime library, not part of the
>> compiler.
>
> It is part of the implementation which covers both compiler and library.
> Many compilers can generate their own inline code for strlen() in which
> case the "library" as a separate concept has little to do with it.

You're right. I should have said that strlen() is *typically
implemented as* part of the runtime library, not part of the compiler.
(I don't know how many compilers generate inline code, and therefore
how accurate "typically" is.)

Christian Bau

non lue,
25 avr. 2005, 18:35:1625/04/2005
à

> Thanks. Maybe my question should be "what if the input is a char array

> without a null terminator". But from my experimental results, it seems


> that strlen can still return the number of characters of a char array.

> I am just not sure whether I am just lucky or sth else happened inside
> strlen.

You are not lucky, you are unlucky.

If you were lucky, your program would crash as soon as try this, and
then you would know there is a bug that needs fixing. If you are
unlucky, you get a result that doesn't show the bug.