Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Why strNlen is not in Std ?

420 views
Skip to first unread message

raxitsh...@yahoo.co.in

unread,
Oct 18, 2006, 5:32:08 AM10/18/06
to
we have strNcpy, strNcmp etc in C-Std, (small n obviously), but why not
strNlen (in Std) ?

I have asked same question in comp.lang.c and someone suggested its
more comp.std.c issue,

many vendor's of C providing this strNlen function as extension (as
strNlen seems to be more Secure Coding Practice)

i am curious to know any reason for not including strNlen in std.

--raxit sheth

rain.man

unread,
Oct 18, 2006, 6:11:04 AM10/18/06
to
strlen --Std

raxitsh...@yahoo.co.in 写道:

kuy...@wizard.net

unread,
Oct 18, 2006, 7:22:28 AM10/18/06
to
rain.man wrote:
> strlen --Std

I'm curious - what was the point of that laconic reply? My best guess
is that you're suggesting that strlen() has the functionality that the
OP was asking for. However, as the name clearly implies, strnlen()
would differ from strlen() the same way that strncpy() differs from
strcpy(): it would take an extra argument of type size_t containing a
character count, and strnlen(s,n) would never process more than the
first n chasracters of the string pointed at by s while looking for the
terminating null character.

I'm not sure why no such function exists, but the fact that
memchr(s,'\0',n) can be used for the same purpose may be part of the
reason.

Casper H.S. Dik

unread,
Oct 18, 2006, 7:42:48 AM10/18/06
to
kuy...@wizard.net writes:

>I'm not sure why no such function exists, but the fact that
>memchr(s,'\0',n) can be used for the same purpose may be part of the
>reason.

Or the fact that when the standard was written, there was no or
insufficient prior art.

Casper

raxitsh...@yahoo.co.in

unread,
Oct 19, 2006, 1:52:06 AM10/19/06
to
>
> I'm not sure why no such function exists, but the fact that
> memchr(s,'\0',n) can be used for the same purpose may be part of the
> reason.

Thanks, yes memchr can do the same stuff but need to do pointer
arithmatic (as memchr returns void*), i want to know that as memchr
is having following args,returntypes, is it depend on byte, not char
(ascii/unicode).

void *memchr(const void *s, int c, size_t n);

consider this(also correct me if i m doing any mistake), if in my
string implementation for specific platform if char and byte are not
same,in terms of size, or simply is memchr gives me platform
independent result from the prespective of String representation, size
of char and size of string (and also for Unicode, i need to do some HW
on unicode)

i think that same logic can apply Even if we are having memcpy, we are
having strcpy.


--raxit sheth

raxitsh...@yahoo.co.in

unread,
Oct 19, 2006, 2:06:02 AM10/19/06
to
Casper/other,

frankly i am new to comp.std.c (but not to C lang).

also i am not havin much knowledge of Ansi/ISO std, (i learnt the
C-lang by reading book and writing program and googling)

may i gently ask the group that is there any thing extra to do when
some one want to suggest strNlen should be in the std. ? i am also not
knowing C-Std commitee people are reading this or not (regret as i have
told i am new to comp.std.c)

any suggestion is appreciated,

--raxit sheth

Keith Thompson

unread,
Oct 19, 2006, 2:28:48 AM10/19/06
to
raxitsh...@yahoo.co.in writes:
>> I'm not sure why no such function exists, but the fact that
>> memchr(s,'\0',n) can be used for the same purpose may be part of the
>> reason.
>
> Thanks, yes memchr can do the same stuff but need to do pointer
> arithmatic (as memchr returns void*), i want to know that as memchr
> is having following args,returntypes, is it depend on byte, not char
> (ascii/unicode).
>
> void *memchr(const void *s, int c, size_t n);
>
> consider this(also correct me if i m doing any mistake), if in my
> string implementation for specific platform if char and byte are not
> same,in terms of size, or simply is memchr gives me platform
> independent result from the prespective of String representation, size
> of char and size of string (and also for Unicode, i need to do some HW
> on unicode)

The size of a char is 1 byte; that's how the C standard defines the
word "byte".

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

raxitsh...@yahoo.co.in

unread,
Oct 19, 2006, 7:18:08 AM10/19/06
to
Keith / Other,

thanks for your promting of "char is always a byte as per std"

when one use memchr to achieve simillar functionality like strNlen , do
we need to worry about any Platform Specific - Data representation
specific issue ?


--raxit

kuy...@wizard.net

unread,
Oct 19, 2006, 7:51:34 AM10/19/06
to

Not in that case. The biggest potential platform-specific issue that
might apply to memchr() is character encoding, but you're specifically
looking for a null character, and that's required to be represented by
a value of 0. I usually write that as '\0', just to remiind myself that
I'm working in a character context, but the value is the same, either
way.

By the way, char is allowed to be wider than 8 bits. What this means is
that a byte, in C, can refer to more than 8 bits. This is confusingly
inconsistent with the way that the rest of the world defines bytes.

SuperKoko

unread,
Oct 19, 2006, 11:36:54 AM10/19/06
to

kuy...@wizard.net wrote:
> By the way, char is allowed to be wider than 8 bits. What this means is
> that a byte, in C, can refer to more than 8 bits. This is confusingly
> inconsistent with the way that the rest of the world defines bytes.

I don't know what you name "rest of the world", but, at least in
France, everybody makes a distinction between an octet (the most
commonly used word) and a byte which is defined (not very rigourously)
as the minimum manipulable (e.g. readable or writable from memory or on
which arithmetic can be done) sequence of bits of a CPU or of a data
block sent over a network, or the granularity of addresses at the
assembly level on an architecture.

http://en.wikipedia.org/wiki/Byte

Jean-Marc Bourguet

unread,
Oct 19, 2006, 11:54:36 AM10/19/06
to
kuy...@wizard.net writes:

> By the way, char is allowed to be wider than 8 bits. What this means is
> that a byte, in C, can refer to more than 8 bits. This is confusingly
> inconsistent with the way that the rest of the world defines bytes.

Define the "rest of the world". According to Blaauw and Brooks, byte was a
term coined in 1958 by Werner Buccholz to designate a group of bits
sufficient to represent one character. Until the mid eighties, machines
using bytes with 7 or 9 bits were not uncommon (at at time, one such
machine was the most common on Arpanet). Earlier, 6 bits bytes were
common. When the standard was being defined, these machines were not past
history, they were the present of some.

What C add to this use is the constraint of byte been adressable. On word
adressable machines this may be done in two way: either using a very wide
byte -- I think this is done nowadays in some DSP for which the risk of
being inconsistant with common use of the machine is not existent -- or
using fat pointer containing an address and a byte designator -- that's the
main reason for which C allows pointer to vary in size.

Yours

--
Jean-Marc

SuperKoko

unread,
Oct 19, 2006, 12:04:55 PM10/19/06
to
strncpy is not similar to the "more secure" strnlen function provided
as extension by some compilers.

Read the documentation of strncpy twice, and you'll see that it doesn't
work on zero terminated strings. The destination string must be put
into a buffer whose size is known (usually fixed-size, but that's not
necessary) and it pads with zeros (filling entirely the buffer) if the
source string is shorter than the buffer, and doesn't zero terminate
the string if the source string has the same size than the buffer or is
longer.

This special semantic might be useful for storing a string in a small
fixed-size array of chars, for example, to store DOS 8.3 file names in
a structure designed to be written on the disk, without loosing a
precious character.

strncmp was also not designed with "security" in mind, in the old days
of K&R C or standard C89.
It was not designed to avoid buffer problems (and I doubt that it helps
much for buffer problems, except if there is *already* a bug creating a
non-zero terminated string in your program, such as an abuse of strncpy
:p).
It was mainly designed (quote from Borland C++ Programmer's Guide) to:

"Compare a portion of one string to a portion of another."

Well, in fact, strncmp can also be used to compare two strings such as
strings that strncpy produce: zero-padded strings in a buffer whose
size is known.
It can also be used as a pseudo-secure function.

BTW, have you ever heard of TR 24731 ?

strnlen_s is the function you want.

kuy...@wizard.net

unread,
Oct 19, 2006, 12:18:57 PM10/19/06
to

Sorry, I overstated my case. However, in my experience, the
overwhelming majority of the people who use the word "byte" have no
idea that it could be interpreted as meaning anything other than a
group of 8 bits. They have either never heard of "octects" or have
mistaken it for a fancy synonym for "byte". This newsgroup is an
obvious exception, and if you're willing to assert that all of France
is also an exception, I can't muster any counterexamples.

SuperKoko

unread,
Oct 19, 2006, 4:25:44 PM10/19/06
to

Jean-Marc Bourguet wrote:
> kuy...@wizard.net writes:
>
> > By the way, char is allowed to be wider than 8 bits. What this means is
> > that a byte, in C, can refer to more than 8 bits. This is confusingly
> > inconsistent with the way that the rest of the world defines bytes.
>
> Define the "rest of the world". According to Blaauw and Brooks, byte was a
> term coined in 1958 by Werner Buccholz to designate a group of bits
> sufficient to represent one character. Until the mid eighties, machines
> using bytes with 7 or 9 bits were not uncommon (at at time, one such
> machine was the most common on Arpanet). Earlier, 6 bits bytes were
> common. When the standard was being defined, these machines were not past
> history, they were the present of some.
>
But C doesn't support 7 or 6 bits bytes (CHAR_BIT must be >= 8).
Nevertheless it "supports" 7 or 6 bits machines, since they just have
to emulate 12 or 14 bits bytes.

Jun Woong

unread,
Oct 20, 2006, 2:57:18 AM10/20/06
to

SuperKoko wrote:
[...]

> strncpy is not similar to the "more secure" strnlen function provided
> as extension by some compilers.
>
> Read the documentation of strncpy twice, and you'll see that it doesn't
> work on zero terminated strings. The destination string must be put
> into a buffer whose size is known (usually fixed-size, but that's not
> necessary) and it pads with zeros (filling entirely the buffer) if the
> source string is shorter than the buffer, and doesn't zero terminate
> the string if the source string has the same size than the buffer or is
> longer.
>
> This special semantic might be useful for storing a string in a small
> fixed-size array of chars, for example, to store DOS 8.3 file names in
> a structure designed to be written on the disk, without loosing a
> precious character.
>

IIRC, strncpy() was used in early days of UNIX to handle file names
whose sizes were fixed (to 13?). I'm not sure it was designed for
that purpose, but think it's very likely or at least the purpose
gave strncpy()'s behavior significant impact.


--
Jun, Woong (woong at icu.ac.kr)
Samsung Electronics Co., Ltd.

``All opinions expressed are mine, and do not represent
the official opinions of any organization.''

raxitsh...@yahoo.co.in

unread,
Oct 20, 2006, 4:00:11 AM10/20/06
to
SuperKoko wrote:
>
> Read the documentation of strncpy twice, and you'll see that it doesn't
> work on zero terminated strings.
Not Agree (or not understand above)

>The destination string must be put into a buffer whose size is known

exactly,

...(portion removed in reply)


> strncmp was also not designed with "security" in mind, in the old days
> of K&R C or standard C89.
> It was not designed to avoid buffer problems (and I doubt that it helps
> much for buffer problems, except if there is *already* a bug creating a
> non-zero terminated string in your program, such as an abuse of strncpy
> :p).

> It was mainly designed (quote from Borland C++ Programmer's Guide) to:
>

> "Compare a portion of one string to a portion of another."

above quote is very generalize, and i think can apply in secure coding
context(not in all place, when 'dumb' programmer forgot to allocate
destination memory like case,)

where portion implicitly defines the limit (where the limit is known
prior to call) .

as you/other mentioned all strN* can be used when size (or max possible
valid length) is known for any data prior to calling the strN

(mostly fixed length, but not strictly)

>
> Well, in fact, strncmp can also be used to compare two strings such as
> strings that strncpy produce: zero-padded strings in a buffer whose
> size is known.
> It can also be used as a pseudo-secure function.
>
> BTW, have you ever heard of TR 24731 ?
>
> strnlen_s is the function you want.

lastly really thanks for prompting about this, i was unaware, still not
read full content but looks to fit what i am searching.

Thanks
Raxit Sheth

raxitsh...@yahoo.co.in

unread,
Oct 20, 2006, 4:04:42 AM10/20/06
to
>
> IIRC, strncpy() was used in early days of UNIX to handle file names
> whose sizes were fixed (to 13?). I'm not sure it was designed for
> that purpose, but think it's very likely or at least the purpose
> gave strncpy()'s behavior significant impact.
>

after reading man pages and c-std availble with me, i can GUESS that it
is for general purpose, and one specific purpose is for "secure coding"
kind of thing while before making call to any str* function if limit
(or size ) is known. (it may be fixed size where limit is known priory,
It may not be of fixed size, but limit should be known prior to call)

--raxit sheth

SuperKoko

unread,
Oct 20, 2006, 7:53:34 AM10/20/06
to

raxitsh...@yahoo.co.in wrote:
> SuperKoko wrote:
> >
> > Read the documentation of strncpy twice, and you'll see that it doesn't
> > work on zero terminated strings.
> Not Agree (or not understand above)
>
I meant that:

char buffer[3];
strncpy(buffer, "hello world!", 3);

Puts the three characters "hel" in the buffer, but doesn't put any zero
terminator.


> >The destination string must be put into a buffer whose size is known
> exactly,
>

> above quote is very generalize, and i think can apply in secure coding
> context(not in all place, when 'dumb' programmer forgot to allocate
> destination memory like case,)
>
> where portion implicitly defines the limit (where the limit is known
> prior to call) .
>

Yes, strncmp can be used for "secure" coding. It's alright for that.
I said that it wasn't designed with security in mind, thus (explaining
the absence of strnlen) not that it wasn't good for that.
However, I think strncpy is far more obscure and is less usable for
secure coding.
You've to remember to terminate the string with a '\0' character,
assigning it manually.

It has pretty bad performances if the string is short and the buffer is
large, but I suppose it's not an issue here.

So, I think that it would be better to write your own strcpy_safe
function instead of using strncpy.
Or use the TR 24731 function or the strlcpy function of BSD or whatever
your compiler provide if it does and portability isn't an issue.

Douglas A. Gwyn

unread,
Oct 20, 2006, 2:52:37 PM10/20/06
to
raxitsh...@yahoo.co.in wrote:
> when one use memchr to achieve simillar functionality like strNlen , do
> we need to worry about any Platform Specific - Data representation
> specific issue ?

memchr operates on bytes without interpreting their contents,
so it is immune to "data representation" issues.

A general problem with use of null-terminated byte arrays
("C strings") can occur if you use the str* functions with
data that may contain embedded 0-valued bytes, such as some
multibyte character encodings. The C standard provides a
different set of facilities for operating with character
encodings, namely those associated with the wchar_t typedef.

Using type "char" for coded characters works well only with
small character sets such as ASCII or EBCDIC, although some
of the str* operations are usable with encoding schemes
that have been carefully designed to not contain embedded
0-valued bytes (e.g. UTF-8).

Douglas A. Gwyn

unread,
Oct 20, 2006, 2:57:08 PM10/20/06
to
kuy...@wizard.net wrote:
> By the way, char is allowed to be wider than 8 bits. What this means is
> that a byte, in C, can refer to more than 8 bits. This is confusingly
> inconsistent with the way that the rest of the world defines bytes.

It's not really the whole "rest of the world"; "byte" has
historically been used for any sized piece of a machine word.

"Byte" when applied to memory capacity in recent times has
the universally understood meaning of "octet", and the
assumption of an octet for "byte" in other programming
contexts seems to have originated with the IBM System/360,
PDP-11, and other "byte addressable" architectures.

Note that the IETF Internet RFCs generally specify "octet"
when they mean 8 bits, rather than "byte" which is not
always understood to mean 8 bits.

Douglas A. Gwyn

unread,
Oct 20, 2006, 3:06:11 PM10/20/06
to
Jean-Marc Bourguet wrote:
> What C add to this use is the constraint of byte been adressable. On word
> adressable machines this may be done in two way: either using a very wide
> byte -- I think this is done nowadays in some DSP for which the risk of
> being inconsistant with common use of the machine is not existent -- or
> using fat pointer containing an address and a byte designator -- that's the
> main reason for which C allows pointer to vary in size.

Indeed, C could have added bitwise addressability with no change
to the rest of the language, apart from removing the requirement
that sizeof(char)==1. There was a "Catch-22" in that computer
makers didn't want to add bit addressing to the hardware if
there was no good way to exploit it from HLLs such as C, while
the C standards committee didn't want to require bit
addressability if there was no hardware to expedite it. Around
1986 I proposed a "short char" data type that could have been
any size from 1 to CHAR_BITS, with sizeof always reporting in
units of short char. That proposal was coupled to using type
char for the "universal" character set, instead of adding a
second character type wchar_t and (as I predicted) a duplicate
set of library functions to operate with that type. However,
I lost the argument, and so there is no "short char" and no
good way to tie it to a potential for bit addressability on
platforms that would benefit from it. Instead, we have to rely
on kludge functions/macros that explicitly shift and mask
larger units.

Douglas A. Gwyn

unread,
Oct 20, 2006, 3:09:13 PM10/20/06
to
SuperKoko wrote:
> strncmp ... was mainly designed (quote from Borland C++ Programmer's Guide) to:

> "Compare a portion of one string to a portion of another."

The strn* functions were of use mainly for tightly packed
structures such as the PDP-11 Unix directory entry, where
a 14-character filename was stored without null terminator.

Jean-Marc Bourguet

unread,
Oct 20, 2006, 3:52:14 PM10/20/06
to
"Douglas A. Gwyn" <DAG...@null.net> writes:

> Jean-Marc Bourguet wrote:
>> What C add to this use is the constraint of byte been adressable. On word
>> adressable machines this may be done in two way: either using a very wide
>> byte -- I think this is done nowadays in some DSP for which the risk of
>> being inconsistant with common use of the machine is not existent -- or
>> using fat pointer containing an address and a byte designator -- that's the
>> main reason for which C allows pointer to vary in size.
>
> Indeed, C could have added bitwise addressability with no change
> to the rest of the language, apart from removing the requirement
> that sizeof(char)==1. There was a "Catch-22" in that computer
> makers didn't want to add bit addressing to the hardware if
> there was no good way to exploit it from HLLs such as C, while
> the C standards committee didn't want to require bit
> addressability if there was no hardware to expedite it.

I know of three different kinds of bit addresses which have been provided.

The IBM Stretch allowed data to be of any size between 1 and 64 bits and
there was no alignment constraint.

The DEC PDP-10 was a word machine but with byte pointers and the byte could
vary in length from 1 to 36 (the size of the word). Bytes may not span the
word. I think there were other mini with varying length bytes.

The bit addresses of the 8051 allow to address individual bit of a selected
part of the (internal) memory and the next level is the byte with no clear
relationship between the byte address and the bit address excepted that
group of 8 consecutive bits are in the same byte.

Of these, only the 8051 is still of practical use.

Ob-C: The PDP-10 is the only one of the set I know there is a C compiler
for. In fact, there is at least 2: KCC and gcc. KCC has a compilation
switch allowing to use bytes with 7, 8, and 9 bits (I know, the first two
are non conforming -- 7 is too short and neither allow char to cover all
memory) but I've use it only with chars of 9 bits.

Yours,

--
Jean-Marc

Jun Woong

unread,
Oct 22, 2006, 4:20:18 AM10/22/06
to

Yes, strn* functions CAN be used for general purpose including what
you call "secure coding." My point was about their origination (as
given a full detail by one of recent posts), not about their possible
usage. And if you really think strncpy is intended for general
purpose, especailly for the "secure coding" even after reading its
specification, your guess is wrong.

those who know me have no need of my name

unread,
Oct 22, 2006, 11:20:30 PM10/22/06
to
in comp.std.c i read:

>may i gently ask the group that is there any thing extra to do when
>some one want to suggest strNlen should be in the std. ? i am also not
>knowing C-Std commitee people are reading this or not (regret as i have
>told i am new to comp.std.c)

some of them do read this group. to get something into the standard you
need a champion, perhaps you if you join the committee via whatever is
required by your country. only by a very rare chance would presentation
here be sufficient.

--
a signature

Casper H.S. Dik

unread,
Oct 23, 2006, 5:48:40 AM10/23/06
to
"Douglas A. Gwyn" <DAG...@null.net> writes:

And it's still used for and useful for such things as utmp(x)
structures where fixed size structures are copied to/read from a file.

In this case the other feature of strncpy is also important: the fact
that it zero-fills the remainder of the string.

Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

Richard Bos

unread,
Oct 23, 2006, 9:18:41 AM10/23/06
to

All of them? I thought that was only strncpy(), with strncat() and
strncmp() operating perfectly well on ordinary strings.

Richard

kuy...@wizard.net

unread,
Oct 23, 2006, 12:31:07 PM10/23/06
to

There aren't any strn* functions listed in my copy of the C standard
other than those three. Does your copy have any additional ones?

Keith Thompson

unread,
Oct 23, 2006, 4:07:08 PM10/23/06
to

Richard didn't imply that there are more than three. Try reading "All
of them?" as "All three of them?".

All three strn*() functions can work with, for example, an existing
PDP-11 unix directory as input data. Only strncpy() can *generate*
such data.

strncat() can work with an unterminated source array, but it always
zero-terminates the target. strncmp() can work with either terminated
or unterminated operands.

kuy...@wizard.net

unread,
Oct 23, 2006, 4:32:24 PM10/23/06
to
Keith Thompson wrote:
> kuy...@wizard.net writes:
> > Richard Bos wrote:
> >> "Douglas A. Gwyn" <DAG...@null.net> wrote:
> >>
> >> > SuperKoko wrote:
> >> > > strncmp ... was mainly designed (quote from Borland C++
> >> > > Programmer's Guide) to: "Compare a portion of one string to a
> >> > > portion of another."
> >> >
> >> > The strn* functions were of use mainly for tightly packed
> >> > structures such as the PDP-11 Unix directory entry, where
> >> > a 14-character filename was stored without null terminator.
> >>
> >> All of them? I thought that was only strncpy(), with strncat() and
> >> strncmp() operating perfectly well on ordinary strings.
> >
> > There aren't any strn* functions listed in my copy of the C standard
> > other than those three. Does your copy have any additional ones?
>
> Richard didn't imply that there are more than three. Try reading "All
> of them?" as "All three of them?".

OK, that was my mistake - I see how I parsed his sentence incorrectly.

> All three strn*() functions can work with, for example, an existing

> PDP-11 unix directory as input data. ...

That fact seems sufficient, to me, to justify Doug's comment about them
being useful with such strings.

> ... Only strncpy() can *generate*


> such data.
>
> strncat() can work with an unterminated source array, but it always
> zero-terminates the target. strncmp() can work with either terminated
> or unterminated operands.

That is an important distinction, but not one that I see as relevant to
the validity of Doug's comment. It just confirms that those functions
are useful in different ways, and to different degrees, when dealing
with unterminated strings.

0 new messages