Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How do C -programmers do reliable string handling?

33 views
Skip to first unread message

JiiPee

unread,
Sep 24, 2022, 1:23:10 PM9/24/22
to
I have done C++ since 1997. I personally do not understand why use C if
C++ is extented C and has more in it (and thats why I immediately
swiched from C to C++ when I found C++).

But language-fight aside :) .. I have a serious/honest question:
What is the best way to represend a string in C language? How do best C
programmers represend a string?

Because I am worried that that the length of the string is not
"protected" as there is no private in C language.

In C++ it is well protected and safe something like this:

class String {
// all access functions

private:
int m_lenght;
};

So, it is basically impossible to have a wrong lenght value for a string.
Am I right that C programmers do it something like:

struct String {
char* data;
int lenght;
};

but obviously somebody might put: lenght = 99999999999, and then its
going bad....

How do C programmers make string safe that the above problem does not
occur (that the lenght of the string goes wrong)?

Bo Persson

unread,
Sep 24, 2022, 3:01:08 PM9/24/22
to
By not storing the length separately? You just have to call strlen each
time you need to know the length.


Gawr Gura

unread,
Sep 24, 2022, 3:29:48 PM9/24/22
to
In C, if you really want to protect data from someone else mishandling
it then you can use an opaque type. In a header you would write:

struct string;

struct string* create_string(const char *const data);
void destroy_string(struct string *const str);
size_t get_string_length(const struct string *const str);
/* etc. */

In the corresponding source file you would define struct string.
Consumers of the header file don't know the structure of struct string
so they are required to access it through the provided procedures rather
than directly manipulating it.

I think that this kind of protection is often unnecessary though. Access
modifiers in C++ help keep things sane but they don't provide any real
guarantee against outside manipulation. If someone really wants to
modify your object they can always convert it to a char array and fiddle
with it.

Paavo Helde

unread,
Sep 24, 2022, 4:04:31 PM9/24/22
to
24.09.2022 20:22 JiiPee kirjutas:
> I have done C++ since 1997. I personally do not understand why use C if
> C++ is extented C and has more in it (and thats why I immediately
> swiched from C to C++ when I found C++).
>
> But language-fight aside :) .. I have a serious/honest question:
> What is the best way to represend a string in C language? How do best C
> programmers represend a string?
>
> Because I am worried that that the length of the string is not
> "protected" as there is no private in C language.
>
> In C++ it is well protected and safe something like this:
>
> class String {
> // all access functions
>
> private:
> int m_lenght;
> };
>
> So, it is basically impossible to have a wrong lenght value for a string.
> Am I right that C programmers do it something like:
>
> struct String {
> char* data;
> int lenght;
> };
>
> but obviously somebody might put: lenght = 99999999999, and then its
> going bad....

In most C code I have seen there has been no attempt to store the string
length separately. It's just a char array, when you need the string
length you call strlen().

If there is a C struct encapsulating the string and its length, in
well-written C code it should be only manipulated via dedicated access
functions, never directly. The C programmers are trusted to follow such
rules.


JiiPee

unread,
Sep 24, 2022, 5:44:53 PM9/24/22
to
On 24/09/2022 23:04, Paavo Helde wrote:
> If there is a C struct encapsulating the string and its length, in
> well-written C code it should be only manipulated via dedicated access
> functions, never directly. The C programmers are trusted to follow such
> rules.

I would prefer this. One can just be careful not to modify it.

JiiPee

unread,
Sep 24, 2022, 5:47:30 PM9/24/22
to
On 24/09/2022 22:29, Gawr Gura wrote:
> but they don't provide any real
> guarantee against outside manipulation.


If someone really wants to
> modify your object they can always convert it to a char array and fiddle
> with it.

Sure, although in my 30 years C++ I don't remember ever facing this
problem. It is quite difficult to accidentally do this.

Mut...@dastardlyhq.com

unread,
Sep 25, 2022, 4:50:44 AM9/25/22
to
Many years ago faced with needing to access a private variable in a class
that didn't have a getter method written by a team that refused to provide one
for reasons I never understood, so I simply took the address of the first
public variable following it, counted back and derefrenced the pointer. Very
dodgy but it worked (until they changed the class layout obv).

eg:

#include <stdio.h>

class myclass
{
int i;
public:
int j;
myclass(): i(123) { }
};


int main()
{
myclass mc;
int *p = &mc.j;
--p;
printf("%d\n",*p);
return 0;
}



JiiPee

unread,
Sep 25, 2022, 6:38:15 AM9/25/22
to
On 25/09/2022 11:50, Mut...@dastardlyhq.com wrote:
> int main()
> {
> myclass mc;
> int *p = &mc.j;
> --p;
> printf("%d\n",*p);
> return 0;
> }

Right, but it would never really come to my mind doing such things ...
unless I purposely did that for reasons like yours.

JiiPee

unread,
Sep 25, 2022, 6:39:07 AM9/25/22
to
On 25/09/2022 11:50, Mut...@dastardlyhq.com wrote:
> int main()
> {
> myclass mc;
> int *p = &mc.j;
> --p;
> printf("%d\n",*p);
> return 0;
> }
>
>

I guess the power and the weakness also of the pointers

Bonita Montero

unread,
Sep 25, 2022, 7:34:50 AM9/25/22
to
Why not (&mc.j)[-1] ?

>
>
>


Bonita Montero

unread,
Sep 25, 2022, 11:00:04 AM9/25/22
to
I'm asking myself if this and Mutterly's code is an
illegal aliasing which the compiler musn't notice.


Mut...@dastardlyhq.com

unread,
Sep 25, 2022, 11:15:10 AM9/25/22
to
Never thought of that. Useful to remember if I need to do something similar
again.


Gawr Gura

unread,
Sep 25, 2022, 12:22:20 PM9/25/22
to
> Sure, although in my 30 years C++ I don't remember ever facing this
> problem. It is quite difficult to accidentally do this.

That being the case, why worry about someone mishandling a struct in C?
It's also quite difficult to accidentally write

str->length = 65535

when you meant to write

str->length

JiiPee

unread,
Sep 25, 2022, 6:02:03 PM9/25/22
to
But I see other ways it could fail. Like imagine a funktion taking a
reference:

void foo(int a, int& l)

and then you call: foo(a, str->length);
If foo() changes the variable l, then lenght gets changed. And this
might be a mistake so that is not seeing the reference.

I mean, it opens much more doors to human mistakes or risks.

Richard Damon

unread,
Sep 25, 2022, 7:02:37 PM9/25/22
to
But C doesn't have references, so you can't do that.

In C++, it would be private, so you couldh't do it.

Gawr Gura

unread,
Sep 25, 2022, 7:35:02 PM9/25/22
to

On 9/25/22 15:01, JiiPee wrote:
>
> I mean, it opens much more doors to human mistakes or risks.

I prefer the convenience of C++ but if you need to ensure good behavior
in C you can use the technique I outlined. I think the amount of risk
undertaken by a competent programmer in this case is very low.

JiiPee

unread,
Sep 26, 2022, 12:27:24 AM9/26/22
to
oh but could use pointer then. so if it was int* l

JiiPee

unread,
Sep 26, 2022, 12:32:56 AM9/26/22
to
On 26/09/2022 02:34, Gawr Gura wrote:
> I think the amount of risk
> undertaken by a competent programmer in this case is very low.

was it Python creator or some other language who said that a good
programmer does not accidentally change public member variable

Juha Nieminen

unread,
Sep 26, 2022, 4:33:20 AM9/26/22
to
Gawr Gura <gawr...@mail.hololive.com> wrote:
> In C, if you really want to protect data from someone else mishandling
> it then you can use an opaque type. In a header you would write:
>
> struct string;
>
> struct string* create_string(const char *const data);
> void destroy_string(struct string *const str);
> size_t get_string_length(const struct string *const str);
> /* etc. */

Which is a horrendously inefficient thing to do. Basically never do that!

(I have even seen suggstions of doing the above with very simple struct
types, like ones containing a couple of ints, and which would ostensibly
be instantiated millions of times (such as structs representing a point
or a pixel). Burn that kind of suggestion with fire!)

Malcolm McLean

unread,
Sep 26, 2022, 5:03:45 AM9/26/22
to
On Saturday, 24 September 2022 at 18:23:10 UTC+1, JiiPee wrote:
>
> But language-fight aside :) .. I have a serious/honest question:
> What is the best way to represend a string in C language? How do best C
> programmers represend a string?
>
> How do C programmers make string safe that the above problem does not
> occur (that the lenght of the string goes wrong)?
>
If the program's primary purpose is not string processing, then it's best to represent
string as character pointers to nul-terminated sequences of bytes. This is because
the efficency improvement you can get from storing the length separately isn't
worth the additonal complexity and possibility for error.

If you are worried about reliability, it's best to make a rule that a string is either
a string literal (embedded in double quotes), or allocated with malloc. So there's
no confusion between the string and the buffer which holds the string. This also
causes a slight efficiency loss, since most strings are short and it means short
allocations.

(Note that this won't always be feasible in embedded applications, where malloc can
be problematic.)

Paavo Helde

unread,
Sep 26, 2022, 5:18:36 AM9/26/22
to
If by that you mean that dynamic allocations are slow, then you can
create also larger opaque types in C without any dynamic allocations:

struct item_tag {
// use a type guaranteeing proper alignment,
// choose big enough N to cover the real struct size.
uint64_t opaque[N];
};
typedef struct item_tag item_t;

void InitItem(item_t* it) {
// cast it to real struct type and do things.
// ...
}

// Client C code example:

item_t x;
InitItem(&x);
// ...
DestroyItem(&x);


This way one could also support e.g. SSO strings, avoiding both
arbitrary string length limits and excessive dynamic allocation
overheads. Been there, done that.

Juha Nieminen

unread,
Sep 26, 2022, 5:47:59 AM9/26/22
to
Paavo Helde <ees...@osa.pri.ee> wrote:
> If by that you mean that dynamic allocations are slow, then you can
> create also larger opaque types in C without any dynamic allocations:

Of course it depends on what exactly the struct is for, and how it's used.

For example, if it's a large struct which basically contains nothing the
programmer may be directly interested in, and which is ostensibly
instantiated only relatively rarely and infrequently, then it's not
wrong to use this idiom per se. (A lot of C libraries do this, such
as libpng, libz, etc, and that's ok, because those structs are
usually no instantiated in the millions nor accessed in tight
inner loops requiring maximum speed.)

However, when it comes to small structs that are instantiated in
the millions and which should be as efficient as possible, this
idiom would completely kill the performance. Not only is instantiating
them slow, but also handling them is very slow as well (in comparison
to the structs being "public" and directly accessed.)

This is especially so in number-crunching applications (which things
like image manipulation etc. tend to be in practice). Not only would
this idiom consume significantly more RAM than necessary, and not
only would instantiating the objects be slower, but accessing them
would be a lot slower as well. (Modern compilers are relatively good
at autovectorizing linear accesses to values in an array. However,
if these accesses are done via non-inline functions, ie. resulting
in actual function calls, that pretty much kills all these
autovectorization optimizations.)

> struct item_tag {
> // use a type guaranteeing proper alignment,
> // choose big enough N to cover the real struct size.
> uint64_t opaque[N];
> };
> typedef struct item_tag item_t;
>
> void InitItem(item_t* it) {
> // cast it to real struct type and do things.
> // ...
> }
>
> // Client C code example:
>
> item_t x;
> InitItem(&x);
> // ...
> DestroyItem(&x);
>
>
> This way one could also support e.g. SSO strings, avoiding both
> arbitrary string length limits and excessive dynamic allocation
> overheads. Been there, done that.

If such a struct is intended to be as efficient as possible, then that
idiom might be acceptable, assuming that all the accessor functions are
inline.

Gawr Gura

unread,
Sep 26, 2022, 10:41:19 AM9/26/22
to
On 9/26/22 01:33, Juha Nieminen wrote:
> Which is a horrendously inefficient thing to do. Basically never do that!
>
> (I have even seen suggstions of doing the above with very simple struct
> types, like ones containing a couple of ints, and which would ostensibly
> be instantiated millions of times (such as structs representing a point
> or a pixel). Burn that kind of suggestion with fire!)

The example I've given is deliberately simplistic for the sake of
illustrating the technique. A competent programmer has to weigh their
needs against the tools available. If the dynamic allocations implied
here are too much overhead then, as has already been explained, there
are ways to achieve the same effect without dynamic allocations. If the
function calls are too much overhead then you can do away with this
technique entirely and go the route of simply not mishandling data.

I don't think, "basically never do that," is good advice. Don't do it
when it's inappropriate. Always engage in critical thinking about your
software. Use techniques that are appropriate for your needs. Almost
every C library I've encountered uses opaque types somewhere (FILE being
one obvious example in the standard itself).

Mut...@dastardlyhq.com

unread,
Sep 26, 2022, 11:19:44 AM9/26/22
to
Its a bit hard to accidentaly write &(str->length) rather than str->length

JiiPee

unread,
Sep 26, 2022, 2:53:36 PM9/26/22
to
On 26/09/2022 18:19, Mut...@dastardlyhq.com wrote:
> Its a bit hard to accidentaly write &(str->length) rather than str->length

no, I mean if the function takes a pointer (not a variable), and we are
meant to pass a pointer to the lenght. But we dont know its also
changing it.

Lets *assume* that a calculation is done like this:

float calculate(int round, int* lenght);

And then you call it with your lenght:

calculate(1110, &(str->length));

thinking that its only reading the lenght. but its changing it also.

Richard Damon

unread,
Sep 26, 2022, 8:42:37 PM9/26/22
to
In C, a function taking a pointer is likely going to change it, why else
use a pointer.

This makes the intent fairly clear.

C++, allowing functions have non-const reference parameters means it is
harder to tell by looking at a call site, what are outputs.

This is handy at times, but means you need to think of things differently.

Keith Thompson

unread,
Sep 26, 2022, 8:54:41 PM9/26/22
to
Richard Damon <Ric...@Damon-Family.org> writes:
> On 9/26/22 2:53 PM, JiiPee wrote:
>> On 26/09/2022 18:19, Mut...@dastardlyhq.com wrote:
>>> Its a bit hard to accidentaly write &(str->length) rather than
>>> str->length
>> no, I mean if the function takes a pointer (not a variable), and we
>> are meant to pass a pointer to the lenght. But we dont know its also
>> changing it.
>> Lets *assume* that a calculation is done like this:
>> float calculate(int round, int* lenght);
>> And then you call it with your lenght:
>> calculate(1110, &(str->length));
>> thinking that its only reading the lenght. but its changing it also.
>
> In C, a function taking a pointer is likely going to change it, why
> else use a pointer.
>
> This makes the intent fairly clear.

And if the function doesn't intend to change it, it should be a
pointer-to-const (which makes good sense if copying the value would be
unreasonably expensive or would have side effects).

> C++, allowing functions have non-const reference parameters means it
> is harder to tell by looking at a call site, what are outputs.

And again, you can use a const reference if you don't want to change the
value. If a parameter is a non-const pointer (in C or C++) or a
non-const reference (in C++), you should assume that the referenced
object may be modified. If it isn't, complain to the person who's
responsible for the function.

> This is handy at times, but means you need to think of things differently.

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
Working, but not speaking, for Philips
void Void(void) { Void(); } /* The recursive call of the void */

Mut...@dastardlyhq.com

unread,
Sep 27, 2022, 5:36:34 AM9/27/22
to
On Mon, 26 Sep 2022 17:54:20 -0700
Keith Thompson <Keith.S.T...@gmail.com> wrote:
>Richard Damon <Ric...@Damon-Family.org> writes:
>> On 9/26/22 2:53 PM, JiiPee wrote:
>>> On 26/09/2022 18:19, Mut...@dastardlyhq.com wrote:
>>>> Its a bit hard to accidentaly write &(str->length) rather than
>>>> str->length
>>> no, I mean if the function takes a pointer (not a variable), and we
>>> are meant to pass a pointer to the lenght. But we dont know its also
>>> changing it.
>>> Lets *assume* that a calculation is done like this:
>>> float calculate(int round, int* lenght);
>>> And then you call it with your lenght:
>>> calculate(1110, &(str->length));
>>> thinking that its only reading the lenght. but its changing it also.
>>
>> In C, a function taking a pointer is likely going to change it, why
>> else use a pointer.
>>
>> This makes the intent fairly clear.
>
>And if the function doesn't intend to change it, it should be a
>pointer-to-const (which makes good sense if copying the value would be
>unreasonably expensive or would have side effects).

If the function takes a pointer to a struct then you can't really tell if it
will change the contents since its a lot more efficient to pass a pointer than
copy the whole structure onto the stack and for passing arrays you don't have
a choice except pass by reference (unless you embed in a struct) in which case
a pointer to const might be useful.

HOWEVER. If it requires a pointer to a primitive type then the function will
almost certainly be changing it otherwise as someone else wrote, why bother
requiring a pointer as they'll be no efficiency gains.

Ben Bacarisse

unread,
Sep 27, 2022, 10:29:07 AM9/27/22
to
Keith Thompson <Keith.S.T...@gmail.com> writes:

> Richard Damon <Ric...@Damon-Family.org> writes:
<cut>
>> C++, allowing functions have non-const reference parameters means it
>> is harder to tell by looking at a call site, what are outputs.
>
> And again, you can use a const reference if you don't want to change the
> value. If a parameter is a non-const pointer (in C or C++) or a
> non-const reference (in C++), you should assume that the referenced
> object may be modified. If it isn't, complain to the person who's
> responsible for the function.

The trouble is you can't tell that you might have to check the function
declaration by looking at the call. f(a, &b) tells me I might want to
look at f's declaration to see if the second parameter is a pointer to
const or not, but f(a, b) does not. Either a or b might be modified and
I have to check f's declaration every time.

--
Ben.
0 new messages