Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

A string pointer to a static or dynamic string : how to free the dynamic one ?

99 views
Skip to first unread message

R.Wieser

unread,
Jan 21, 2023, 9:28:24 AM1/21/23
to
Hello all,

I'm a rather newbie to C++ programming who is trying figure out how to deal
with string pointers - or rather, with what they point at.

Case in point :

char* message = "hello world";
char* message = strdup("hello world");

I can user either of those in a function and return the pointer, and the
caller will be none-the-wiser which form (the static or dymnamic string) it
gets.

The problem is that the dynamic one needs to be "free"d but tyhe static one
not. How do I look at that "message" pointer whats "in" it so I can take
the correct action.

By the way: the same thing goes for when I want to replace a string. I don't
think that C++ has a garbage-collector running, so I need to do it myself.
:-)

Regards,
Rudy Wieser


Paavo Helde

unread,
Jan 21, 2023, 9:40:35 AM1/21/23
to
21.01.2023 16:28 R.Wieser kirjutas:
> Hello all,
>
> I'm a rather newbie to C++ programming who is trying figure out how to deal
> with string pointers - or rather, with what they point at.
>
> Case in point :
>
> char* message = "hello world";
> char* message = strdup("hello world");
>
> I can user either of those in a function and return the pointer, and the
> caller will be none-the-wiser which form (the static or dymnamic string) it
> gets.

This is C++, so just return a std::string from your function, and forget
everything about C-style strings, malloc, free and strdup. Good riddance!




Bonita Montero

unread,
Jan 21, 2023, 9:56:57 AM1/21/23
to
Am 21.01.2023 um 15:40 schrieb Paavo Helde:

> This is C++, so just return a std::string from your function, and forget
> everything about C-style strings, malloc, free and strdup. Good riddance!

And local variables which are top-level variables (f.e. not part of
a pair) are implicitly move constructed. So you won't have to move
the return.

Richard Damon

unread,
Jan 21, 2023, 10:14:03 AM1/21/23
to
There is no portable way of telling. As others have mentioned,
std:string lets you get around this problem. In a standard
implementation this will copy the static string into the heap, so
std:string knows to always delete its version.

In embedded applications where space may be tight, I use a replacement
version for std::string that looks where the pointer for the string is
coming from, and if it is in system flash, I know it can't change, so I
don't copy it, and inside the string class keep track of that fact.

Malcolm McLean

unread,
Jan 21, 2023, 11:04:41 AM1/21/23
to
It's a mess, and a hangover from C. C tresats string as character pointers,
and doesn't give you an easy way to tell if they are allocated on the heap, on
the stack, or in read-only memory. So the C programmer has to be careful.

In C++ there is a std::string. Normally when you are using C++, you should make
sure that all strings are turned into std::strings at the earliest opportunity.
Becasue of the magic of destructors, you then needn't worry about the memory the
string points to.
An std::string can be assigned like a variable. Which is frequently useful.
Assignment is relatively expensive because it tends to involve an allocation
and a copy, but it's rare for string assignment to be a time critical step.

Mut...@dastardlyhq.com

unread,
Jan 21, 2023, 11:21:08 AM1/21/23
to
Unless you want to parse the string which may involve chopping it about. Then
its simpler to use a C string and pointers rather than mess about with
inefficient substr() etc.

R.Wieser

unread,
Jan 21, 2023, 11:56:25 AM1/21/23
to
"Richard Damon" <Ric...@Damon-Family.org> wrote in message
news:KiTyL.760492$GNG9....@fx18.iad...
> On 1/21/23 9:28 AM, R.Wieser wrote:

>> char* message = "hello world";
>> char* message = strdup("hello world");
...
>> The problem is that the dynamic one needs to be "free"d but tyhe static
>> one
>> not. How do I look at that "message" pointer whats "in" it so I can
>> take
>> the correct action.

> There is no portable way of telling.

I was already afraid of that.

Regards,
Rudy Wieser


Paavo Helde

unread,
Jan 21, 2023, 1:15:54 PM1/21/23
to
If you want to speed up substring processing, then there is
std::string_view for you. As fast as plain pointers and much cleaner.



Mike Terry

unread,
Jan 21, 2023, 1:29:25 PM1/21/23
to
As others have said - in C++ we have a string class, which would be the much preferred approach.
Also with resources generally which need to be managed (freed at some point to avoid resource leaks)
the common approach is for those resources to be represented by classes that free the resources when
they go out of scope (or earlier by calling some kind of dispose method).

But this doesn't really answer your question... If you had asked the question in comp.lang.c it
would be more relevant as there is no string class, and the question deserves a literal answer.

The answer is that the documentation for using the function in question needs to clearly specify
whether or not the caller is responsible for freeing particular resources returned by the function.

So perhaps the function documentation might say "Caller is responsible for freeing the returned
string, using free() function". Of course, if you're writing the called function, and want to
return some literal string, you would need to strdup() that literal string yourself, so that it can
be correctly freed by the caller. (The topic is more general than just freeing memory - calls to
the OS often return other resources like handles representing kernal objects, or GUI objects, and
sometimes those resources must be freed by the caller, and sometimes they represent pre-existing
objects owned elsewhere in the system that the caller mustn't free. The whole question of resource
ownership can be troublesome for callers, so the only answer is good documentation. Maybe function
naming conventions like create_xx() vs get_xx() etc. can help, but documentation is the key.

Regards,
Mike.

Bonita Montero

unread,
Jan 21, 2023, 1:30:10 PM1/21/23
to
Am 21.01.2023 um 17:20 schrieb Mut...@dastardlyhq.com:

> Unless you want to parse the string which may involve chopping it about. Then
> its simpler to use a C string and pointers rather than mess about with
> inefficient substr() etc.

What's wrong with sth. like this ?

string substr()
{
string hw( "hello world!" );
hw.erase( hw.begin(), hw.begin() + 6 );
hw.resize( 5 );
return hw;
}

R.Wieser

unread,
Jan 21, 2023, 2:10:06 PM1/21/23
to
"Mike Terry" <news.dead.p...@darjeeling.plus.com> wrote in message
news:tPqdnZTbnINss1H-...@brightview.co.uk...

> the question deserves a literal answer.

Thats what I thought too.

> The answer is that the documentation for using the function in question
> needs to clearly specify whether or not the caller is responsible for
> freeing particular resources returned by the function.

My post was more in the direction that /either of/ those string types could
be returned, and I would like to create a "safe release" routine for such a
pointer.

> Of course, if you're writing the called function, and want to return some
> literal string, you would need to strdup() that literal string yourself,
> so that it can be correctly freed by the caller.

Thats good for a simple situation. I was also thinking of a string pointer
which could be initialized by the user but changed by a routine.

Ofcourse, the "you may only provide type 'X' there" documentation rule could
also be applied there. I was just hoping that I could create a bit more
flexibility.

Thanks for the well-aimed reply.

Regards,
Rudy Wieser


Richard Damon

unread,
Jan 21, 2023, 4:33:12 PM1/21/23
to
So the "string pointer" type you are thinking of needs to be told, and
remember the answer to the question, "Does this memory need to be freed?"


Mike Terry

unread,
Jan 21, 2023, 5:10:48 PM1/21/23
to
On 21/01/2023 19:09, R.Wieser wrote:
> "Mike Terry" <news.dead.p...@darjeeling.plus.com> wrote in message
> news:tPqdnZTbnINss1H-...@brightview.co.uk...
>
>> the question deserves a literal answer.
>
> Thats what I thought too.
>
>> The answer is that the documentation for using the function in question
>> needs to clearly specify whether or not the caller is responsible for
>> freeing particular resources returned by the function.
>
> My post was more in the direction that /either of/ those string types could
> be returned, and I would like to create a "safe release" routine for such a
> pointer.

A noble aim, but... the C++ and C language raw pointers are basically just "pointers to type xxx"
They don't encapsulate how to free referenced resource(s) - pointers to an object on the stack, or
in static program storage, or on the C/C++ heap or even in the OS process heap will all typically
look similar, i.e. just 32- or 64-bit (or whatever) address space pointer values. You could analyse
the pointer value and try to work out what region of memory it belongs to, but that will be
non-portable and too much to reasonably ask, I'd say.

C++ functions could return some kind of embellished pointer - a class object including the pointer
and also the capability within the class of freeing the referenced resources in various ways
depending on how the resource was created. The embellishment would increase the memory required for
a 'pointer', so this might not be a good idea e.g. if your app might have large arrays of such
pointers. Basically, if there were a /simple/ answer to your original question, there would be no
interesting debate, but there's not - so you need to carefully balance costs/benefits. Simple is
good, unless simple doesn't work! :)

>> Of course, if you're writing the called function, and want to return some
>> literal string, you would need to strdup() that literal string yourself,
>> so that it can be correctly freed by the caller.
>
> Thats good for a simple situation. I was also thinking of a string pointer
> which could be initialized by the user but changed by a routine.

There are two aspects here, and it's good to keep them separate.

Looking at the std::string class, that addresses the "initialized by the user but changed by a
routine" bit of the problem, which I'll suggest is 99% of the benefit you're after. [Pass strings
as std::string& ]

The remaining "I really want to return pointers to memory allocated (and so to be later freed) in
several different ways decided at run-time ALL FROM THE SAME FUNCTION" is a separate problem. It
could be addresed with the approach of embellished pointers. The standard library supports custom
allocators which can serve as template arguments for containers like std::string, so you could have
a std::string with an embellished pointer - but, so much added complexity for what??

If you're more interested in C APIs, using raw pointers etc., then the whole topic of
responsibilities for managing memory between callers and called functions IN GENERAL is quite
subtle. DCE's Remote Procedure Call (RPC) framework has to solve this issue because the calling and
called routines typically are in different address spaces with no shared memory, and it's far from
trivial! The problem is that C/C++ language in itself does not sufficiently describe pointer use to
make this possible. E.g. a pointer could be a REF pointer to a single struct, or to an array, and
structs can chain to other structs, possibly involving loops of pointers, or at least having
multiple pointers to a single struct. Also pointers need to be understood as IN/OUT/INOUT in terms
of which way data is being passed to/from a function, which affects the rules for managing the
memory. If you're interested in the GENERAL solution for this kind of problem the DCE RPC
documentation (or Microsofts DCOM which uses RPC) relating to memory management would at least give
you a good idea of the issues.

But if you just want to provide one C-style API for one function with a modifiable IN/OUT parameter,
just do something like:

int AmendString (/*INOUT*/ char** ppString);

and clearly document callers responsibity, including how the string memory must initially be
allocated by the caller, and how it must eventually be freed upon return.

Mike.

Keith Thompson

unread,
Jan 21, 2023, 7:55:02 PM1/21/23
to
Malcolm McLean <malcolm.ar...@gmail.com> writes:
[...]
> It's a mess, and a hangover from C. C tresats string as character pointers,

No, a C string is by definition "a contiguous sequence of characters
terminated by and including the first null character". It is not a
pointer.

A *pointer to a string* is by definition "a pointer to its initial
(lowest addressed) character". Most manipulation of strings in C is
done via string pointers.

> and doesn't give you an easy way to tell if they are allocated on the heap, on
> the stack, or in read-only memory. So the C programmer has to be careful.

Right.

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
Working, but not speaking, for XCOM Labs
void Void(void) { Void(); } /* The recursive call of the void */

jak

unread,
Jan 22, 2023, 4:46:53 AM1/22/23
to
Il 22/01/2023 01:54, Keith Thompson ha scritto:
> Malcolm McLean <malcolm.ar...@gmail.com> writes:
> [...]
>> It's a mess, and a hangover from C. C tresats string as character pointers,
>
> No, a C string is by definition "a contiguous sequence of characters
> terminated by and including the first null character". It is not a
> pointer.
>
> A *pointer to a string* is by definition "a pointer to its initial
> (lowest addressed) character". Most manipulation of strings in C is
> done via string pointers.
>
>> and doesn't give you an easy way to tell if they are allocated on the heap, on
>> the stack, or in read-only memory. So the C programmer has to be careful.
>
> Right.
>

pointer to a string? by definition? string pointers? Maybe you are
confusing programming languages. The C language has no string pointers.

James Kuyper

unread,
Jan 22, 2023, 5:20:33 AM1/22/23
to
On 1/22/23 04:46, jak wrote:
> Il 22/01/2023 01:54, Keith Thompson ha scritto:
...
>> A *pointer to a string* is by definition "a pointer to its initial
>> (lowest addressed) character". Most manipulation of strings in C is
>> done via string pointers.
...> pointer to a string? by definition? string pointers? Maybe you are
> confusing programming languages. The C language has no string pointers.

When he said "by definition", he meant it:

"A _pointer to a string_ is a pointer to its initial (lowest addressed)
character." (C standard (n2731) 7.1.1p1).

The phrase "pointer to a string" is in italics, an ISO convention
indicating that the sentence in which that phrase occurs constitutes the
official definition of the meaning of that term.

jak

unread,
Jan 22, 2023, 6:17:44 AM1/22/23
to
Il 22/01/2023 11:20, James Kuyper ha scritto:
> On 1/22/23 04:46, jak wrote:
>> Il 22/01/2023 01:54, Keith Thompson ha scritto:
> ...
>>> A *pointer to a string* is by definition "a pointer to its initial
>>> (lowest addressed) character". Most manipulation of strings in C is
>>> done via string pointers.
> ...> pointer to a string? by definition? string pointers? Maybe you are
>> confusing programming languages. The C language has no string pointers.
>
> When he said "by definition", he meant it:
>
> "A _pointer to a string_ is a pointer to its initial (lowest addressed)
> character." (C standard (n2731) 7.1.1p1).
>

"pointer to a string" could be a pointer at the first element of an
array that contains a string, perhaps. You argue a lot about the
definitions contained in the "ISO", perhaps the writers should be
convinced to use greater precision.

R.Wieser

unread,
Jan 22, 2023, 6:28:00 AM1/22/23
to
"Mike Terry" <news.dead.p...@darjeeling.plus.com> wrote in message
news:OtycnbSNHJxJ_1H-...@brightview.co.uk...

> A noble aim, but... the C++ and C language raw pointers are basically just
> "pointers to type xxx" They don't encapsulate how to free referenced
> resource(s).

:-) That was what I tried to get clear : if perhaps the current iteration of
the language perhaps had such a ... something. An "I'm static" bitflag would
have been enough for my purposes.

> You could analyse the pointer value and try to work out what region of
> memory it belongs to, but that will be non-portable and too much to
> reasonably ask, I'd say.

Athough I currently am way to much of a novice in the language that might
actually be something I would try - even if just to show myself that it
isn't a good idea.

> Simple is good, unless simple doesn't work! :)

Thats something I know as "KISS" ("Keep It Stupidly Simple" / "Keep It
Simple, Stupid") and I quite agree with.

> Looking at the std::string class, that addresses the "initialized by the
> user but changed by a routine" bit of the problem, which I'll suggest is
> 99% of the benefit you're after. [Pass strings as std::string& ]

Using the "document the function!" method you mentioned earlier I could also
tell the user that only dynamic strings are permitted as input.

> It could be addresed with the approach of embellished pointers.

I'm sorry, but currently I have absolutily /no idea/ what those are.

> If you're more interested in C APIs, using raw pointers etc., then the
> whole topic of responsibilities for managing memory between callers and
> called functions IN GENERAL is quite subtle. DCE's Remote Procedure Call
> (RPC) framework has to solve this issue because the calling and called
> routines typically are in different address spaces with no shared memory,
> and it's far from trivial!

You mean marshalling ? I had to do that in Windows (using my preferred
language, Assembly) when I tried to retrieve some data from a control on a
dialog that was part of another process - which included having to "remote
allocate" some memory to copy to/from.

> The problem is that C/C++ language in itself does not sufficiently
> describe pointer use to make this possible.

As above, thart was what I needed to make sure of - before I started to try
to create all kinds of complex-y solutions that would not be needed.

> E.g. a pointer could be a REF pointer to a single struct, or to an array,
> and structs can chain to other structs, possibly involving loops of
> pointers, or at least having multiple pointers to a single struct.

Hey ! I was trying to keep it simple ! :-)

But yes, I was thinking in the same direction. In my case I wanted to
provide and/or return the string as part of a structure. The structure
could be dynamic too, meaning that before the structure would be destroyed
all of the strings in it need to be destroyed first.

> But if you just want to provide one C-style API for one function with a
> modifiable IN/OUT parameter, just do something like:
>
> int AmendString (/*INOUT*/ char** ppString);

Ehhrrmm... I didn't even realize that I needed that notation. I skipped it
that specific notation by having the string pointer "hidden" in a structure.

> and clearly document callers responsibity, including how the string memory
> must initially be allocated by the caller, and how it must eventually be
> freed upon return.

Agreed.

The only problem is that my user is rather stupid. I'm not sure if I'm, at
some time in the future, will be able to use the function I wrote myself
without fouling up and wondering whatever I was thinking of when I wrote it.
:-)

Thanks for the explanation.

Regards,
Rudy Wieser


R.Wieser

unread,
Jan 22, 2023, 6:28:00 AM1/22/23
to
"Richard Damon" <Ric...@Damon-Family.org> wrote in message
news:bSYyL.44734$%os8....@fx03.iad...
> On 1/21/23 2:09 PM, R.Wieser wrote:

> So the "string pointer" type you are thinking of needs to be told, and
> remember the answer to the question, "Does this memory need to be freed?"

Its unclear to me which "string pointer" you are referring to there : the
result of the strdup() function, or the "char* message" variable it is
stored in.

But yes, that is pretty much what I was asking about - with the "being told"
part done by the programming language. With me just checking what it was
told.

Regards,
Rudy Wieser


R.Wieser

unread,
Jan 22, 2023, 6:28:00 AM1/22/23
to
"Keith Thompson" <Keith.S.T...@gmail.com> wrote in message
news:87a62bi...@nosuchdomain.example.com...

> A *pointer to a string* is by definition "a pointer to its initial
> (lowest addressed) character".

To disambiguate :

Assuming that a "pointer to a string" is something a function like strdup()
returns, what do you call the (type of) variable that such a result is
stored in ?

... it often confuses the h*ll outof me when the word "stringpointer" is
used for both. :-\


Regards,
Rudy Wieser


Bo Persson

unread,
Jan 22, 2023, 6:30:08 AM1/22/23
to
On 2023-01-22 at 12:17, jak wrote:
> Il 22/01/2023 11:20, James Kuyper ha scritto:
>> On 1/22/23 04:46, jak wrote:
>>> Il 22/01/2023 01:54, Keith Thompson ha scritto:
>> ...
>>>> A *pointer to a string* is by definition "a pointer to its initial
>>>> (lowest addressed) character".  Most manipulation of strings in C is
>>>> done via string pointers.
>> ...> pointer to a string? by definition? string pointers? Maybe you are
>>> confusing programming languages. The C language has no string pointers.
>>
>> When he said "by definition", he meant it:
>>
>> "A _pointer to a string_ is a pointer to its initial (lowest addressed)
>> character." (C standard (n2731) 7.1.1p1).
>>
>
> "pointer to a string" could be a pointer at the first element of an
> array that contains a string, perhaps.

No, not "perhaps", but definitely. The term "pointer to a string" IS a
pointer to the first element of an array that contains a string. That's
what it says.

There are other pointers that can point to a char that is not part of a
string. Those are then not "pointer to a string".


Öö Tiib

unread,
Jan 22, 2023, 6:42:22 AM1/22/23
to
A pointer to a string is (constrained in described way) pointer to a
character. So variable has to be of type pointer to a character.

jak

unread,
Jan 22, 2023, 7:23:55 AM1/22/23
to
ISO standard can cancel that definition because the C does not have
strings but only an agreement that allows the functions to use array as
strings.

Paavo Helde

unread,
Jan 22, 2023, 7:31:22 AM1/22/23
to
22.01.2023 12:56 R.Wieser kirjutas:
> "Mike Terry" <news.dead.p...@darjeeling.plus.com> wrote in message
> news:OtycnbSNHJxJ_1H-...@brightview.co.uk...

>> You could analyse the pointer value and try to work out what region of
>> memory it belongs to, but that will be non-portable and too much to
>> reasonably ask, I'd say.
>
> Athough I currently am way to much of a novice in the language that might
> actually be something I would try - even if just to show myself that it
> isn't a good idea

This is basically impossible. Yes, maybe you can tell on some platform
if a string pointer points to some address in the heap, stack or a
global static area. Alas, even if it points to heap, you still won't
know if you may or must call free() on the pointer. The string might be
allocated globally and meant to be in shared use for longer, or the
string might be a part of another data structure which your code doesn't
even know how to deallocate.

So it's not just a bad idea, but impossible, unless you want to severely
cripple the ways how strings may be used in your program.

>
>> Simple is good, unless simple doesn't work! :)
>
> Thats something I know as "KISS" ("Keep It Stupidly Simple" / "Keep It
> Simple, Stupid") and I quite agree with.

If you want to keep it simple, then just use C++ std::string, you can't
get simpler than that, unless switching over to Python(*) or something.

(*) Some might claim Python strings are actually more complicated than
in C++ because of codepage/unicode nuances, Python 2 / Python 3
differences, and hidden performance gotchas, e.g. the computational
complexity of Python string += operation is worse in Python than in C++.




Paavo Helde

unread,
Jan 22, 2023, 7:54:07 AM1/22/23
to
ISO standard defines what C is or is not. It's not up to a random
internet commenter.

Of course ISO standard committee can change the wording in the next
version. If they do this and drop the definition of the term "pointer to
a string" from the next C standard, then C will not have such thing any
more. But until then it does.


David Brown

unread,
Jan 22, 2023, 9:31:59 AM1/22/23
to
Exactly.

A "string" is "a contiguous sequence of characters terminated by and
including the first null character", as defined by the C standards and
therefore by C (for those that don't understand how the language is
defined).

So a "pointer to a string" is also a "pointer to character", though
pointers to characters do not necessarily point to strings. And since
there is no specific type for a "string" in C, you use "pointer to
character" types to hold "pointer to string" values.


R.Wieser

unread,
Jan 22, 2023, 9:57:51 AM1/22/23
to
"嘱 Tiib" <oot...@hot.ee> wrote in message
news:604d1f9c-51b9-4e7d...@googlegroups.com...
Thats not what I ment. If an address which is pointing to a sequence of
characters is called a "pointer to a string" - normally referred to as "a
string pointer" - , what should the variable which stores that addres be
called ?

Regards,
Rudy Wieser


R.Wieser

unread,
Jan 22, 2023, 9:57:51 AM1/22/23
to
"Paavo Helde" <ees...@osa.pri.ee> wrote in message
news:tqjaaa$33tcp$1...@dont-email.me...
> 22.01.2023 12:56 R.Wieser kirjutas:
>> "Mike Terry" <news.dead.p...@darjeeling.plus.com> wrote in
>> message
>> news:OtycnbSNHJxJ_1H-...@brightview.co.uk...

>>> Simple is good, unless simple doesn't work! :)
>>
>> Thats something I know as "KISS" ("Keep It Stupidly Simple" / "Keep It
>> Simple, Stupid") and I quite agree with.
>
> If you want to keep it simple, then just use C++ std::string, you can't
> get simpler than that, unless switching over to Python(*) or something.

I'll keep that in mind. Heck, I might even go and take a look at it some
time. :-)

Regards,
Rudy Wieser


jak

unread,
Jan 22, 2023, 1:04:46 PM1/22/23
to
The comments are feedback. Feedback helps. Faith sometimes kills.

jak

unread,
Jan 22, 2023, 1:16:40 PM1/22/23
to
... I thought about what happens in the Middle East. Faith is faith and
the laws are laws but what is written is not always right.

Richard Damon

unread,
Jan 22, 2023, 1:24:38 PM1/22/23
to
On 1/22/23 9:40 AM, R.Wieser wrote:
> "Öö Tiib" <oot...@hot.ee> wrote in message
A pointer to string (with type pointer to char).

Just like an int value (like 5) is the value of a given bit combination,
a variable that holds such a value is also called an int.

If you need to distinguish them, one is a value, and the other is a
variable (or object)

Richard Damon

unread,
Jan 22, 2023, 1:25:01 PM1/22/23
to
Excpet that it DOES have strings, as defined in &.1.1p1

A string is a contiguous sequence of characters terminated by and

Keith Thompson

unread,
Jan 22, 2023, 1:57:49 PM1/22/23
to
Note that we're talking about C; C++, the topic of this newsgroup, is a
related but distinct language.

What C calls a "string" is very close to (probably identical to) what
C++ calls a "null-terminated byte string", or NTBS. The informal term
"C-style string" is common, to distinguish it from C++'s std::string.

In C, there is no string type (unless you define one yourself). The
language-defined term "string" is used to refer to a data format, not a
data type. An object of type "array of char" may or may not contain a
string (or multiple strings).

The C term "pointer to a string" is not analogous to, for example,
"pointer to an int". It is a language-defined phrase because the term
"pointer to a string" either would not make sense or would have a
different meaning in the absence of that definition. A pointer to a
string points to the initial (0th) element of the string, and can be
used via array arithmetic, indexing, and calls to library functions to
manipulate the string.

An object containing a *pointer to a string* is normally of type char*
or const char*.

Everything that can be done in C using C-style strings and C-style
string pointers can be done the same way in C++, but it's almost
always easier and safer to use std::string.

To answer your original question, given a char* value (say, a function
parameter), neither C nor C++ gives you a way to determine how the
memory it points to was allocated, and therefore how and whether it
should be deallocated. If you want to pass pointers around and let the
called function decide whether and how to deallocate it, you'll have to
pass that information somehow, probably as another argument or by
wrapping the pointer in a structure or class. Note also that memory
allocated by malloc() should be deallocated by calling free(), and
memory allocated by new should be deallocated by delete; mixing the two,
IIRC, has undefined behavior.

But again, std::string takes care of all of this for you.

Keith Thompson

unread,
Jan 22, 2023, 2:01:06 PM1/22/23
to
That is both off-topic and incorrect.

C does not have a string type. C certainly does have strings. The
definition is quoted above.

The C standard makes extensive use of the term "string", particularly in
the library section. The meaning is clear and unambiguous. Removing
the definition from the language would require a great deal of work to
update the standard, and would have no benefit.

Malcolm McLean

unread,
Jan 22, 2023, 2:46:16 PM1/22/23
to
The snag is that C has no way of resolving the problem that a string has
an unknown number of bytes.
So
char *stringpointer;
*stringpointer = "foo";

Won't work. Unlike a pointer to a numerical type.

In C++, the std::string manages memory internally, so you can have this convenience.
The cost is that you lose control of the run time operations used to manage the
memory.

R.Wieser

unread,
Jan 22, 2023, 3:24:48 PM1/22/23
to
"Richard Damon" <Ric...@Damon-Family.org> wrote in message
news:sbfzL.383809$iU59....@fx14.iad...
> On 1/22/23 9:40 AM, R.Wieser wrote:

>> If an address which is pointing to a sequence of characters is called a
>> "pointer to a string" - normally referred to as "a string pointer" - ,
>> what should the variable which stores that addres be called ?
>
> A pointer to string (with type pointer to char).

Yep. Both referred to with the exact same name. And that confuses the h*ll
outof someone who has to listen to someone talking about two different
things, but uses the same word for both. :-(

Why do you think I asked ?

> If you need to distinguish them, one is a value, and the other is a
> variable

You know that, I know that. But listening to people talking about both
using the same "string pointer" name I have to wonder if they do ...

Regards,
Rudy Wieser


R.Wieser

unread,
Jan 22, 2023, 3:24:48 PM1/22/23
to

"Keith Thompson" <Keith.S.T...@gmail.com> wrote in message
news:875ycyi...@nosuchdomain.example.com...
> "R.Wieser" <add...@not.available> writes:

> To answer your original question, given a char* value (say, a
> function parameter), neither C nor C++ gives you a way to determine
> how the memory it points to was allocated, and therefore how and
> whether it should be deallocated.

I already assumed as much, but as I'm a novice in regard to the language I
had to make sure.

Who knows, maybe what we all have been referring to as "a pointer to a
string" is actually an object carrying a number of attributes and a
deconstructor too. :-)

Regards,
Rudy Wieser


James Kuyper

unread,
Jan 22, 2023, 3:28:21 PM1/22/23
to
On 1/22/23 06:17, jak wrote:
> Il 22/01/2023 11:20, James Kuyper ha scritto:
...
>> When he said "by definition", he meant it:
>>
>> "A _pointer to a string_ is a pointer to its initial (lowest addressed)
>> character." (C standard (n2731) 7.1.1p1).
>>
>
> "pointer to a string" could be a pointer at the first element of an
> array that contains a string, perhaps. You argue a lot about the
> definitions contained in the "ISO", perhaps the writers should be
> convinced to use greater precision.

ISO is the International Standards Organization. They are responsible
for defining the C language. Terms defined in that standard are
specialized jargon whose meaning, in the context of C, might differ from
what you'd expect if you made the mistake of parsing them as ordinary
English words - that is the purpose of defining such jargon, to provide
more precise and slightly different definitions than ordinary English
might provide. In the context of the C standard, those definitions are
authoritative.

In that context, how do you think that definition fails to be
sufficiently precise? Can you give an example of a case where you might
imagine that it's ambiguous whether or not the definition applies?

James Kuyper

unread,
Jan 22, 2023, 3:40:07 PM1/22/23
to
On 1/22/23 07:23, jak wrote:
> Il 22/01/2023 12:29, Bo Persson ha scritto:
...
>> No, not "perhaps", but definitely. The term "pointer to a string" IS
>> a pointer to the first element of an array that contains a string.
>> That's what it says.
>>
>> There are other pointers that can point to a char that is not part of
>> a string. Those are then not "pointer to a string".
>>
>>
>
> ISO standard can cancel that definition because the C does not have
> strings but only an agreement that allows the functions to use array as
> strings.

C has strings, because the C standard provides a definition of what a C
string is. It's not a data type as you might expect from using other
languages. In C, it's only a data storage format recognized by many C
standard library functions, and as such defining it is the very first
sentence in the section defining the C standard library:

"A string is a contiguous sequence of characters terminated by and
including the first null character." (7.1.1p1)

The previously referenced definition of a "pointer to a string" occurs
shortly thereafter.

James Kuyper

unread,
Jan 22, 2023, 3:48:06 PM1/22/23
to
C doesn't have a string type, just a data storage format for strings
that is recognized by many standard library functions. Since there is no
string type, there's no specific pointer type that is used for storing
such pointers. Any pointer type might point at the first character of a
string, but the most useful types for such pointers are those that can
be passed to or or used to store the value returned from those library
functions without requiring an explicit conversion. Those functions
generally take arguments and/or return values of char* or const char*
types. I don't remember if there are any that use unsigned char rather
than char, but it would be possible. Other useful types would be [const]
void*, which in C (unlike C++) allows implicit conversions to and from
[const] char*.

Keith Thompson

unread,
Jan 22, 2023, 4:08:07 PM1/22/23
to
If I understand you correctly, the two things you're referring to are
the pointer *value* and an *object* of pointer type that holds that
value.

There's nothing special about pointers to strings here. The same
confusion occurs with, for example, a pointer to an int (which is
convenient, because we can discuss this without dragging C into the
discussion).

int n = 42;
int* ptr = &n;

The result of evaluating the expression `&n`, or the expression
`ptr`, is a value of type `int*`. We commonly refer to this value as
"a pointer" (or "an address").

The object named `ptr` is an object (variable) of type `int*`.
We also commonly refer to this object as "a pointer" (though not as
"an address").

Similarly we can refer either to `42` or to the object `n` as
"an integer" or "an int".

Usually this isn't a problem. In most contexts, the distinction
between a value of some type and an object/variable that holds a
value of some type either isn't important or is sufficiently clear
from the context.

For cases where the ambiguity is important, I find it useful
to think of the word "pointer" (as well as "array", "integer",
etc.) as an adjective rather than a noun. Thus we can refer to a
*pointer value*, or a *pointer object", or a *pointer type*, or a
*pointer expression*, all of which are clearly distinct concepts.
I'll note that the C and C++ standards often do not use this level
of precision (and in most cases they probably don't need to).

<OT>
For C's definition of a "pointer to a string", I'd say it refers
to a *value* of pointer type. An object of type char* might have a
current value that is a pointer to a string, but then after a value
is assigned to it it might not. The state of being a "pointer to
a string" applies to the value, not to the object that currently
happens to hold such a value.

I don't recall seeing a use of the term "pointer to a string" that
was confusing because it could refer to either a value or an object.
</OT>

Juha Nieminen

unread,
Jan 23, 2023, 1:34:20 AM1/23/23
to
Paavo Helde <ees...@osa.pri.ee> wrote:
> This is C++, so just return a std::string from your function, and forget
> everything about C-style strings, malloc, free and strdup. Good riddance!

When it comes to "C-style strings", not really. After all, they are
essentially just syntactic sugar over arrays of char, and thus have
all the advantages of such arrays.

C++ doesn't really add a "better" equivalent to C-style strings which
would have all the efficiency benefits of them. std::string_view gets
close, and in some cases it's actually the better choice, but it's not
exactly the same thing (for starters, it's larger than the size of a
pointer which, while in many cases that's inconsequential, sometimes
it can be a deal breaker).

And rather obviously std::string cannot completely replace C-style
strings. While safetywise and featurewise it's mostly superior (save
for a few exceptional cases), its problem is that it's always
dynamically allocated and doesn't support statically allocated
strings (which is the very reason why std::string_view was created,
to support that very thing).

In most cases avoiding the dynamic allocation may be completely
unnecessary and without any benefit. However, in the situations
where avoiding it does introduce a significant benefit it's good
to have the alternative. Especially if all you need to handle the
data is a pointer (and thus std::string_view would be overkill).

Juha Nieminen

unread,
Jan 23, 2023, 1:40:28 AM1/23/23
to
Mike Terry <news.dead.p...@darjeeling.plus.com> wrote:
> C++ functions could return some kind of embellished pointer - a class object including the pointer
> and also the capability within the class of freeing the referenced resources in various ways
> depending on how the resource was created. The embellishment would increase the memory required for
> a 'pointer', so this might not be a good idea e.g. if your app might have large arrays of such
> pointers. Basically, if there were a /simple/ answer to your original question, there would be no
> interesting debate, but there's not - so you need to carefully balance costs/benefits. Simple is
> good, unless simple doesn't work! :)

Instead of putting the meta information about the string in the pointer type,
it could be put in the string itself.

Make the first char of the string contain any information you need about the
string (up to 8 bits of it available), and have the actual string contents
start after that, and make the "pointer" point to that second byte. When
destroying the string, make it check the actual first byte to see if it
has to free it or not.

This way the pointer itself can be the size of a pointer, and the metadata
only takes 1 byte.

R.Wieser

unread,
Jan 23, 2023, 2:34:17 AM1/23/23
to
"Keith Thompson" <Keith.S.T...@gmail.com> wrote in message
news:87wn5eg...@nosuchdomain.example.com...

> If I understand you correctly, the two things you're referring to are
> the pointer *value* and an *object* of pointer type that holds that
> value.

I do not call it an object, to me its just a variable. Anything beyond that
(an object wich might contain other properties as well as methods) is (way)
outside of my scope.

> There's nothing special about pointers to strings here.
> The same confusion occurs with, for example, a pointer to an int

Indeed. But if there is one thing I've learned from Usenet it is that you
need to keep the context to a question simple. Referring to all the other
adresses and what they point to would just be "muddying the water".

> Usually this isn't a problem. In most contexts, the distinction
> between a value of some type and an object/variable that holds a
> value of some type either *isn't important or is sufficiently clear
> from the context*.

Agreed.

But thats the crux : somethimes I get the distinct feeling that someone is
talking about the address (pointing to a string or otherwise), but than
suddenly seems to talk about the variable holding it. :-\

> For cases where the ambiguity is important, I find it useful
> to think of the word "pointer" (as well as "array", "integer",
> etc.) as an adjective rather than a noun. Thus we can refer to a
> *pointer value*, or a *pointer object", or a *pointer type*, or a
> *pointer expression*, all of which are clearly distinct concepts.

I think I can translate "pointer value" as to be meaning an address. For
the others ? I do not have the foggiest I'm afraid.

> <OT>

Whoooo.... Is that *on*, of *off* topic ? Not that it matters much which
one though. :-)

> For C's definition of a "pointer to a string", I'd say it refers
> to a *value* of pointer type.

AFAIK most people do not use that phrase but use the (shorthand) "string
pointer" (with or without the space) instead. But yes, I would also.

Though thats the whole problem to me : most people seem to use that "string
pointer" phrase for the addres (the "value of pointer type") *as well as*
the variable (or worse : an object) its stored in.


But I think I am going to stop asking. It looks like the destinction
between the address and what its stored in isn't of much, if any, importance
to the people here.

Oh well, you can't get /everything/ answered. :-)

Regards,
Rudy Wieser


R.Wieser

unread,
Jan 23, 2023, 2:34:17 AM1/23/23
to

"James Kuyper" <james...@alumni.caltech.edu> wrote in message
news:tqk7dn$386s5$1...@dont-email.me...
> On 1/22/23 06:27, R.Wieser wrote:

>> Assuming that a "pointer to a string" is something a function like
>> strdup()
>> returns, what do you call the (type of) variable that such a result is
>> stored in ?

> C doesn't have a string type, just a data storage format for strings
> that is recognized by many standard library functions.

The famous "C string", non terminated but instead with its length stored
before the first character.

> Since there is no string type, there's no specific pointer type that is
> used for storing such pointers.

Thats not what I'm trying to get at.

You have an address and a variable you store that address in. Which names
do you use for both ?

I most allways see them referred to with the same phrase : "a string
pointer". Which is ofcourse ambigue. That doesn't matter much in most
circumstances (which one of the two it actually is can be gleaned from the
context), but does some others.

Regards,
Rudy Wieser


David Brown

unread,
Jan 23, 2023, 3:47:01 AM1/23/23
to
On 22/01/2023 19:16, jak wrote:
> Il 22/01/2023 19:04, jak ha scritto:
>> Il 22/01/2023 13:53, Paavo Helde ha scritto:
>>> 22.01.2023 14:23 jak kirjutas:
>>>> Il 22/01/2023 12:29, Bo Persson ha scritto:
>>>>> No, not "perhaps", but definitely. The term "pointer to a string"
>>>>> IS a pointer to the first element of an array that contains a
>>>>> string. That's what it says.
>>>>>
>>>>> There are other pointers that can point to a char that is not part
>>>>> of a string. Those are then not "pointer to a string".
>>>>>
>>>>>
>>>>
>>>> ISO standard can cancel that definition because the C does not have
>>>> strings but only an agreement that allows the functions to use array as
>>>> strings.
>>>
>>> ISO standard defines what C is or is not. It's not up to a random
>>> internet commenter.
>>>
>>> Of course ISO standard committee can change the wording in the next
>>> version. If they do this and drop the definition of the term "pointer
>>> to a string" from the next C standard, then C will not have such
>>> thing any more. But until then it does.
>>>
>>>
>> The comments are feedback. Feedback helps. Faith sometimes kills.
>>

<snip>

(This is a technical language group - please don't bring any kind of
political or religious issues into it, even if you think they are
analogous or illustrative. People get too worked up.)


You can comment all you like on the C and C++ standards, but those are
what /define/ the languages. Implementations follow those standards -
often with extensions, variations and small non-conformities, but still
basically following the standards. Books, courses, and tutorials (at
least those of any quality) primarily follow the terminology and
definitions from the standards.

The standards are what give us our common language. They let someone
call themselves a "C programmer", and let him or her write code that
will compile on a "C compiler". They let people write a new C compiler
for a new processor, and know people can use existing C code on it.
They let people have technical language discussions in a group like this
and know what each other is talking about.

So when someone talks about "C strings" or "strings in C", we all know
what is meant - the standards tell us. When someone says "this C
function takes a string pointer", we know what it means. It doesn't
matter that a "C string" is different from a "C++ std::string" or a
"BASIC string" or a "piece of string", as long as the context is clear.


(C++ standards are a bit more complicated than C standards, since there
are bigger differences between the versions, but the same principles apply.)



James Kuyper

unread,
Jan 23, 2023, 4:01:09 AM1/23/23
to
On 1/23/23 02:33, R.Wieser wrote:
> "Keith Thompson" <Keith.S.T...@gmail.com> wrote in message
> news:87wn5eg...@nosuchdomain.example.com...
>
>> If I understand you correctly, the two things you're referring to are
>> the pointer *value* and an *object* of pointer type that holds that
>> value.
>
> I do not call it an object, to me its just a variable. Anything beyond
> that (an object wich might contain other properties as well as
> methods) is (way) outside of my scope.

The term "object" has a well-defined meaning in C, but since it is not
an object-oriented language, that definition carries a lot less baggage
than you're thinking of:

"region of data storage in the execution environment, the contents of
which can represent values" (3.15p2).

The C standard does not define the noun "variable", a fact that is
difficult to confirm because it makes extensive use of the adjective
"variable". Also, it makes extensive use of the ISO convention of
defining some terms by italicizing those terms inside of a sentence
whose contents constitute the official definition of those terms, so in
principle you have to search every use of the word "variable" in the
entire standard. In practice, however, the index usually notes the
location where a term is defined, if it is defined.

C++ does have a definition of the noun "variable". If you strip that
definition of everything specific to C++, it boils down to "named
object". As far as I've been able to tell, that definition fits every
use of the noun "variable" in the C standard. Therefore, I assume that's
what the term means in the C standard, and I recommend that you do the same.

For example, if you declare "struct tm *start_times;" and assign it to
point at a dynamically allocated array of at least six elements, then
start_times, the array that it points at, start_times[5], and
start_times[5].tm_year are all considered to be objects, but only
start_times and tm_year are variables, because they are the only objects
in that list that have names. The array that start_times points at is an
unnamed object, and the last two are sub-objects of that object.

Therefore, every variable is also an object.

>> For cases where the ambiguity is important, I find it useful
>> to think of the word "pointer" (as well as "array", "integer",
>> etc.) as an adjective rather than a noun. Thus we can refer to a
>> *pointer value*, or a *pointer object", or a *pointer type*, or a
>> *pointer expression*, all of which are clearly distinct concepts.
>
> I think I can translate "pointer value" as to be meaning an address.
> For the others ? I do not have the foggiest I'm afraid.

A pointer object is a a region of memory used to store a pointer value.

A pointer type is the type of a pointer value, including in particular
the pointer value that might be stored in a pointer object.

"An expression is a sequence of operators and operands that specifies
computation of a value, 92) or that designates an object or a function,
or that generates side effects, or that performs a combination thereof."
(C standard, 6.5p1).

Since operators and operands are parts of the text of a C program,
expressions are things that exist only in the source code, not in the
compiled program. Translation of source code into an executable results
in the generation of code that does what the expression specifies should
be done. A pointer expression is an expression that specifies
computation of a pointer value, or designates a pointer object.

This whole issue is especially difficult for a newbie because it's often
perfectly clear, to an experienced user of C, whether "a pointer" refers
to a pointer value, a pointer object, a pointer type, or a pointer
expression. As a result, all of those terms are frequently shortened to
"pointer", unless the context is ambiguous. This is true even within the
standard itself. I'm sorry, but that's the way it is. I can't do
anything about it even if I didn't find it convenient to do the same.
All I can do is assure you that it will become clearer as you get more
familiar with the language.

James Kuyper

unread,
Jan 23, 2023, 4:13:20 AM1/23/23
to
On 1/23/23 02:04, R.Wieser wrote:
> "James Kuyper" <james...@alumni.caltech.edu> wrote in message
> news:tqk7dn$386s5$1...@dont-email.me...
...
>> C doesn't have a string type, just a data storage format for strings
>> that is recognized by many standard library functions.
>
> The famous "C string", non terminated but instead with its length stored
> before the first character.

That sentence no verb. I point that out because that fact that is has no
verb is part of the reason why I'm not sure what you meant by it. The
term "string", as defined by the C standard, is by definition
null-terminated, and does not have a stored length.

>> Since there is no string type, there's no specific pointer type that is
>> used for storing such pointers.
>
> Thats not what I'm trying to get at.
>
> You have an address and a variable you store that address in. Which names
> do you use for both ?
>
> I most allways see them referred to with the same phrase : "a string
> pointer". Which is ofcourse ambigue. That doesn't matter much in most
> circumstances (which one of the two it actually is can be gleaned from the
> context), but does some others.

Any given use might seem ambiguous to a newbie like yourself, but the
reason why it is commonplace to use "string pointer" without following
it with "value", "object", "type", or "expression" is precisely because
it's usually clear from context (at least to those with more experience
with the language) which of those is being referred to. When it would
otherwise be ambiguous, people usually (but unfortunately, not always)
do add the additional word that's needed to make it clear.

David Brown

unread,
Jan 23, 2023, 4:24:37 AM1/23/23
to
On 22/01/2023 20:46, Malcolm McLean wrote:
> On Sunday, 22 January 2023 at 18:24:38 UTC, Richard Damon wrote:
>> On 1/22/23 9:40 AM, R.Wieser wrote:
>>> "Öö Tiib" <oot...@hot.ee> wrote in message
>>> news:604d1f9c-51b9-4e7d...@googlegroups.com...
>>>> On Sunday, 22 January 2023 at 13:28:00 UTC+2, R.Wieser wrote:
>>>
>>>>> ... it often confuses the h*ll outof me when the word "stringpointer" is
>>>>> used for both. :-\
>>>>
>>>> A pointer to a string is (constrained in described way) pointer to a
>>>> character. So variable has to be of type pointer to a character.
>>>
>>> Thats not what I ment. If an address which is pointing to a sequence of
>>> characters is called a "pointer to a string" - normally referred to as "a
>>> string pointer" - , what should the variable which stores that addres be
>>> called ?
>>>
>>> Regards,
>>> Rudy Wieser
>>>
>>>
>> A pointer to string (with type pointer to char).
>>
>> Just like an int value (like 5) is the value of a given bit combination,
>> a variable that holds such a value is also called an int.
>>
>> If you need to distinguish them, one is a value, and the other is a
>> variable (or object)
>>
> The snag is that C has no way of resolving the problem that a string has
> an unknown number of bytes.

That is a very clumsy way of expressing yourself. A string in C is a
/value/ - it is formed by a particular set of characters of a particular
length. What you are trying to say, I think, is that a /pointer/ to a
string does not encode the length of the string. And that is true - but
it is not a "snag" or a "problem". It is an unavoidable artefact of the
simple way C strings are implemented. And it is easily solved by
calling "strlen".

> So
> char *stringpointer;
> *stringpointer = "foo";
>
> Won't work. Unlike a pointer to a numerical type.

Pointers to numerical types don't let you make type mistakes either.
"foo" is not a C string - it is a string literal. Neither a string
literal not a C string is a specific type in C.

>
> In C++, the std::string manages memory internally, so you can have this convenience.

C++ has a real standard string type, rather than just a defined concept
as in C. And it has a lot of useful features - it is a higher level
concept that C strings.

> The cost is that you lose control of the run time operations used to manage the
> memory.
>

In C++, std::string is a convenience shortcut for a specific
instantiation of the std::basic_string template. If you want to have a
string type with specific control of memory management, you can
instantiate the template with your own choice of allocator function.


Mut...@dastardlyhq.com

unread,
Jan 23, 2023, 4:26:03 AM1/23/23
to
On Sat, 21 Jan 2023 20:15:39 +0200
Paavo Helde <ees...@osa.pri.ee> wrote:
>21.01.2023 18:20 Mut...@dastardlyhq.com kirjutas:
>> On Sat, 21 Jan 2023 16:40:20 +0200
>> Paavo Helde <ees...@osa.pri.ee> wrote:
>>> 21.01.2023 16:28 R.Wieser kirjutas:
>>>> Hello all,
>>>>
>>>> I'm a rather newbie to C++ programming who is trying figure out how to deal
>
>>>> with string pointers - or rather, with what they point at.
>>>>
>>>> Case in point :
>>>>
>>>> char* message = "hello world";
>>>> char* message = strdup("hello world");
>>>>
>>>> I can user either of those in a function and return the pointer, and the
>>>> caller will be none-the-wiser which form (the static or dymnamic string) it
>
>>>> gets.
>>>
>>> This is C++, so just return a std::string from your function, and forget
>>> everything about C-style strings, malloc, free and strdup. Good riddance!
>>
>> Unless you want to parse the string which may involve chopping it about. Then
>
>> its simpler to use a C string and pointers rather than mess about with
>> inefficient substr() etc.
>
>If you want to speed up substring processing, then there is
>std::string_view for you. As fast as plain pointers and much cleaner.

Not really. This is fast, simple and clear:

for(p=str;*p;)
{
switch(*p)
{
case '[':
p = parseList(p+1);
break;
case '{':
p = parseBlock(p+1);
break;
case '(':
p = parseArray(p+1);
break;
default:
<do something else>
++p;
}
}

Not sure how using indexes and substrings would improve it.

Mut...@dastardlyhq.com

unread,
Jan 23, 2023, 4:30:21 AM1/23/23
to
On Sat, 21 Jan 2023 19:30:48 +0100
Bonita Montero <Bonita....@gmail.com> wrote:
>Am 21.01.2023 um 17:20 schrieb Mut...@dastardlyhq.com:
>
>> Unless you want to parse the string which may involve chopping it about. Then
>
>> its simpler to use a C string and pointers rather than mess about with
>> inefficient substr() etc.
>
>What's wrong with sth. like this ?
>
>string substr()
>{
> string hw( "hello world!" );
> hw.erase( hw.begin(), hw.begin() + 6 );
> hw.resize( 5 );
> return hw;
>}

char *hw = "hello world!";
:
return hw + 5;

David Brown

unread,
Jan 23, 2023, 4:33:43 AM1/23/23
to
On 23/01/2023 08:04, R.Wieser wrote:
> "James Kuyper" <james...@alumni.caltech.edu> wrote in message
> news:tqk7dn$386s5$1...@dont-email.me...
>> On 1/22/23 06:27, R.Wieser wrote:
>
>>> Assuming that a "pointer to a string" is something a function like
>>> strdup()
>>> returns, what do you call the (type of) variable that such a result is
>>> stored in ?
>
>> C doesn't have a string type, just a data storage format for strings
>> that is recognized by many standard library functions.
>
> The famous "C string", non terminated but instead with its length stored
> before the first character.
>

C strings are, by definition, always terminated - if they are not
terminated, they are not strings. And any length indicator is not part
of the string.

What you are describing here sounds more like what is known as a
"Pascal-style string", because it is the common format for variable
length short strings in Pascal. (Typical Pascal implementations can
support other string formats too, including fixed length formats, "long"
strings with 32-bit length indicators at the start, C-style strings, and
maybe others.)



Paavo Helde

unread,
Jan 23, 2023, 5:11:57 AM1/23/23
to
This looks like it either assumes the string is syntactically correct
(all parens are balanced correctly) or the parsed string is terminated
by a null byte. So when parsing a substring of some larger text or
binary file I would need to copy it out and append a null byte, just to
be sure my error detection works properly.

When using string_view, you always know how long your to-be-processed
piece is, so you don't need to make pointless copies just to know where
to stop processing. With raw pointers, you would need to pass a separate
end pointer, which would make the interfaces more complex.


Mut...@dastardlyhq.com

unread,
Jan 23, 2023, 5:15:17 AM1/23/23
to
On Mon, 23 Jan 2023 12:11:40 +0200
Or the given function works up to the matching close bracket and returns
the pointer to that + 1. In a real parser this would be in a recursive function
because of nested blocks/lists etc.

>binary file I would need to copy it out and append a null byte, just to
>be sure my error detection works properly.
>
>When using string_view, you always know how long your to-be-processed
>piece is, so you don't need to make pointless copies just to know where

Your to-be-processed piece will be the entire program/text to be parsed at
the start.

>to stop processing. With raw pointers, you would need to pass a separate
>end pointer, which would make the interfaces more complex.

I think we can assume you've never written a parser and leave it at that.

Paavo Helde

unread,
Jan 23, 2023, 5:49:23 AM1/23/23
to
23.01.2023 08:34 Juha Nieminen kirjutas:
> Paavo Helde <ees...@osa.pri.ee> wrote:
>> This is C++, so just return a std::string from your function, and forget
>> everything about C-style strings, malloc, free and strdup. Good riddance!
>
> When it comes to "C-style strings", not really. After all, they are
> essentially just syntactic sugar over arrays of char, and thus have
> all the advantages of such arrays.
>
> C++ doesn't really add a "better" equivalent to C-style strings which
> would have all the efficiency benefits of them. std::string_view gets
> close, and in some cases it's actually the better choice, but it's not
> exactly the same thing (for starters, it's larger than the size of a
> pointer which, while in many cases that's inconsequential, sometimes
> it can be a deal breaker).
>
> And rather obviously std::string cannot completely replace C-style
> strings. While safetywise and featurewise it's mostly superior (save
> for a few exceptional cases), its problem is that it's always
> dynamically allocated and doesn't support statically allocated
> strings (which is the very reason why std::string_view was created,
> to support that very thing).

Maybe you meant "dynamically initialized", not "dynamically allocated"?
AFAIK there is no guarantee a std::string would always involve dynamic
allocation, and with short strings it often does not.

Dynamic initialization might indeed be a problem because in C++ the
order of global statics initialization is not very well determined, and
there is a danger to access the object before its creation or after its
destruction. But strictly speaking, this is a problem with dynamic
initialization, not dynamic allocation (although it might be the
allocation step which is failing).

>
> In most cases avoiding the dynamic allocation may be completely
> unnecessary and without any benefit. However, in the situations
> where avoiding it does introduce a significant benefit it's good
> to have the alternative. Especially if all you need to handle the
> data is a pointer (and thus std::string_view would be overkill).

It's true that when striving for max performance, using C-style strings
might sometimes have benefits. I myself have coded large global C-style
arrays containing C-style string constants, sorted by hand for faster
lookup at run time. However, the OP claims he is novice in C++, aiming
for a simple solution, so he should not deal with such nuances before he
has got some years of experience under his belt.





Juha Nieminen

unread,
Jan 23, 2023, 6:52:01 AM1/23/23
to
Paavo Helde <ees...@osa.pri.ee> wrote:
>> And rather obviously std::string cannot completely replace C-style
>> strings. While safetywise and featurewise it's mostly superior (save
>> for a few exceptional cases), its problem is that it's always
>> dynamically allocated and doesn't support statically allocated
>> strings (which is the very reason why std::string_view was created,
>> to support that very thing).
>
> Maybe you meant "dynamically initialized", not "dynamically allocated"?
> AFAIK there is no guarantee a std::string would always involve dynamic
> allocation, and with short strings it often does not.

Short string optimization is not guaranteed by the standard and, thus,
you cannot rely on it. Additionally, even when it is implemented, you
have no way of controlling how large the "short" string is (which would
actually be a nice feature; I believe Boost has "short" data optimizing
containers where you can actually determine the size of that "short data".
But alas, that's not standard.)

Even more additionally, if short string optimization is implemented,
you pay the price of having an additional conditional on each single
access and operation you do to the string, whether you want it or not
(there's no way to turn it off if you wanted to). Sure, in most
situations this additional conditional is inconsequential, but in
those situations where you need to squeeze the last clock cycle out
of your code you will be paying the price.

I would even go as far as saying that it would be better if
implementations *don't* use short string optimization, because
then you get a consistent guaranteed behavior without surprising
side effects (in terms of performance). A "short (whatever)
optimizing" data container should be its own thing (like in Boost).

> Dynamic initialization might indeed be a problem because in C++ the
> order of global statics initialization is not very well determined, and
> there is a danger to access the object before its creation or after its
> destruction. But strictly speaking, this is a problem with dynamic
> initialization, not dynamic allocation (although it might be the
> allocation step which is failing).

Dynamic memory allocation is slow (relatively speaking) and causes other
optimization problems such as memory fragmentation. In most situations
this is inconsequential but, as mentioned, when you need to squeeze
the last clock cycles out of the code...

> It's true that when striving for max performance, using C-style strings
> might sometimes have benefits. I myself have coded large global C-style
> arrays containing C-style string constants, sorted by hand for faster
> lookup at run time. However, the OP claims he is novice in C++, aiming
> for a simple solution, so he should not deal with such nuances before he
> has got some years of experience under his belt.

Novice or not, I think it's not good advise to just say "forget about
C style strings, they are completely obsolete and not to be used".

Perhaps something more along the lines of: "It's better to just use
std::string in this case. It's much easier and much safer, and offers
a lot more features."

Malcolm McLean

unread,
Jan 23, 2023, 7:05:51 AM1/23/23
to
On Monday, 23 January 2023 at 11:52:01 UTC, Juha Nieminen wrote:
>
> Novice or not, I think it's not good advise to just say "forget about
> C style strings, they are completely obsolete and not to be used".
>
> Perhaps something more along the lines of: "It's better to just use
> std::string in this case. It's much easier and much safer, and offers
> a lot more features."
>
It's often easier to construct a string in a C style buffer, and easier to parse.

But std::strings can be copied and assigned. So it's much easier to pass
the strings around. Most strings in most programs are quite short, and
execution time and memory use isn't very relevant.

Paavo Helde

unread,
Jan 23, 2023, 7:14:41 AM1/23/23
to
This is relying on the often-not-so-guaranteed assumption the string is
syntactically correct and actually contains this matching close bracket
- that's what I wrote.

[...]

> I think we can assume you've never written a parser and leave it at that.

You can assume what you like. Meanwhile, I have written my part of
pointer-based parsers and now switching over to string_view.


jak

unread,
Jan 23, 2023, 7:22:53 AM1/23/23
to
hi David,
I have read your comments on these groups for a few years and I have an
excellent opinion about you but this time I have the impression that you
have read this branch of the thread with poor attention. Please read
more carefully the statement I replied.

Always with respect, Jak.

Mut...@dastardlyhq.com

unread,
Jan 23, 2023, 7:27:53 AM1/23/23
to
On Mon, 23 Jan 2023 14:14:26 +0200
Paavo Helde <ees...@osa.pri.ee> wrote:
>23.01.2023 12:14 Mut...@dastardlyhq.com kirjutas:
>> On Mon, 23 Jan 2023 12:11:40 +0200
>> Or the given function works up to the matching close bracket and returns
>> the pointer to that + 1.
>
>This is relying on the often-not-so-guaranteed assumption the string is
>syntactically correct and actually contains this matching close bracket
>- that's what I wrote.

If its not correct the parser throws an error. Thats what parsers do.

>> I think we can assume you've never written a parser and leave it at that.
>
>You can assume what you like. Meanwhile, I have written my part of
>pointer-based parsers and now switching over to string_view.

If you say so.

Another reason to use char* is that a lot of parsers will memory map a file
R/W and MAP_PRIVATE which gives you a char* pointing to the beginning of the
file and which you can manipulate as you see fit.

wij

unread,
Jan 23, 2023, 7:45:56 AM1/23/23
to
c-string or std::string are library (implementing) stuff, not really The language.
C/C++ just says any character sequence ending with 0 is a string.

struct AA {
char aa[33];
AA() { aa[32]=0; };
} aa;

AA* ptr=&aa; // (char*)ptr points to a string?
'string' is human question, not the CPU's problem.

For commercial programs, I would select QString (and QCString).
I also have my String, no string class is generally better.
So, in the end, it is still c-string. OS demands c-string.

Bonita Montero

unread,
Jan 23, 2023, 7:53:19 AM1/23/23
to
You're returning a newly created string object, thereby inducing
the issue a second allocation Mut...@dastardlyhq.com mentioned.
In this case this might not make a difference due to the short
string optimizations, in others this might be a peformance pro-
blem. And you're returning the exclamation mark I also stripped.


David Brown

unread,
Jan 23, 2023, 8:42:53 AM1/23/23
to
On 23/01/2023 13:22, jak wrote:
> Il 23/01/2023 09:46, David Brown ha scritto:
>> On 22/01/2023 19:16, jak wrote:
>>> Il 22/01/2023 19:04, jak ha scritto:
>>>> Il 22/01/2023 13:53, Paavo Helde ha scritto:
>>>>> 22.01.2023 14:23 jak kirjutas:

<snip>

>>>>>>
>>>>>> ISO standard can cancel that definition because the C does not have
>>>>>> strings but only an agreement that allows the functions to use
>>>>>> array as
>>>>>> strings.
>>>>>

<snip>

> hi David,
> I have read your comments on these groups for a few years and I have an
> excellent opinion about you but this time I have the impression that you
> have read this branch of the thread with poor attention. Please read
> more carefully the statement I replied.
>
> Always with respect, Jak.
>

Then let me be sure we are discussing the same thing, and there are no
misunderstandings.

I've cut out all except the most relevant quotation from your posts in
this branch. You claimed that C does not have strings, and when given a
direct reference to the definition of strings in the C standard, you
claimed that the standards do not determine what the C language is, and
the standards should be changed to remove the definition of strings,
since in your opinion they do not exist in the language.

Is that correct?


I tried to explain that the C language standards /do/ define the
language. Do you still deny that? If so, how do /you/ think the
language is defined?

Have you now looked at the standards and read for yourself how "strings"
are define in the standards? Have you looked at the other uses of the
word "string" in the standard, including "string literal", "pointer to a
string", and the "<string.h>" header and functions?


C's concept of a string is different from (and more low-level and
primitive than) that found in many other programming languages. But
that does not mean C does not have strings, defined in the standards and
as part of the language and standard library.

Malcolm McLean

unread,
Jan 23, 2023, 9:15:58 AM1/23/23
to
On Monday, 23 January 2023 at 13:42:53 UTC, David Brown wrote:
>
> C's concept of a string is different from (and more low-level and
> primitive than) that found in many other programming languages. But
> that does not mean C does not have strings, defined in the standards and
> as part of the language and standard library.
>
The language states that a text literal in double quotes produces a nul-terminated
string. I think that's the only place the C language itself defines a string. Otherwise
it is purely a standard library concept.

Scott Lurndal

unread,
Jan 23, 2023, 9:45:05 AM1/23/23
to
David Brown <david...@hesbynett.no> writes:
>On 22/01/2023 20:46, Malcolm McLean wrote:
>> On Sunday, 22 January 2023 at 18:24:38 UTC, Richard Damon wrote:

>> The snag is that C has no way of resolving the problem that a string has
>> an unknown number of bytes.
>
>That is a very clumsy way of expressing yourself. A string in C is a
>/value/ - it is formed by a particular set of characters of a particular
>length. What you are trying to say, I think, is that a /pointer/ to a
>string does not encode the length of the string. And that is true - but
>it is not a "snag" or a "problem". It is an unavoidable artefact of the
>simple way C strings are implemented. And it is easily solved by
>calling "strlen".

Easily solved, but performance for large strings is poor.

Scott Lurndal

unread,
Jan 23, 2023, 9:47:09 AM1/23/23
to
Bonita Montero <Bonita....@gmail.com> writes:
>Am 23.01.2023 um 10:30 schrieb Mut...@dastardlyhq.com:
>> On Sat, 21 Jan 2023 19:30:48 +0100
>> Bonita Montero <Bonita....@gmail.com> wrote:
>>> Am 21.01.2023 um 17:20 schrieb Mut...@dastardlyhq.com:
>>>
>>>> Unless you want to parse the string which may involve chopping it about. Then
>>>
>>>> its simpler to use a C string and pointers rather than mess about with
>>>> inefficient substr() etc.
>>>
>>> What's wrong with sth. like this ?
>>>
>>> string substr()
>>> {
>>> string hw( "hello world!" );
>>> hw.erase( hw.begin(), hw.begin() + 6 );
>>> hw.resize( 5 );
>>> return hw;
>>> }
>>
>> char *hw = "hello world!";
>> :
>> return hw + 5;
>
>You're returning a newly created string object, thereby inducing
>the issue a second allocation

No, he's simply returning the final characters of the C-style
sequence of 'char' entities. About as efficient as possible for
the stated example.

Malcolm McLean

unread,
Jan 23, 2023, 10:34:22 AM1/23/23
to
Yes. Normally when you assign a string, you don't need to keep the old
copy hanging about. So using std::strings and move assignment will
allow the assignment to be implemented in a few machine instructions.
(You can probably do this in C by reusing the buffer, but the code has to
be written vary carefully to ensure that the pointers are pointing to the right
type of memory).

Bonita Montero

unread,
Jan 23, 2023, 10:47:36 AM1/23/23
to
Maybe, but maybe he suggested that
as the body of my function-definition.

Paavo Helde

unread,
Jan 23, 2023, 10:56:29 AM1/23/23
to
23.01.2023 14:27 Mut...@dastardlyhq.com kirjutas:

> Another reason to use char* is that a lot of parsers will memory map a file
> R/W and MAP_PRIVATE which gives you a char* pointing to the beginning of the
> file and which you can manipulate as you see fit.

Seems like a non-portable hack.

Mut...@dastardlyhq.com

unread,
Jan 23, 2023, 11:09:17 AM1/23/23
to
On Mon, 23 Jan 2023 16:48:13 +0100
I was simply pointing out that for simple operations such as that char* is
usually more efficient than std::string.

Mut...@dastardlyhq.com

unread,
Jan 23, 2023, 11:11:47 AM1/23/23
to
Portable where? Its been standard posix functionality for decades and is
anything but a hack. The whole point of a private map is so you can manipulate
the file contents in memory without having to labouriously read it all in
first and without changing the file itself. Its extremely useful.


Bonita Montero

unread,
Jan 23, 2023, 11:12:36 AM1/23/23
to
You put that in a context where we discussed further allocations
if you extract a substring, and I took that into that context.
With your further explanations your code fits even less.


Mut...@dastardlyhq.com

unread,
Jan 23, 2023, 11:20:25 AM1/23/23
to
On Mon, 23 Jan 2023 13:53:55 +0100
Bonita Montero <Bonita....@gmail.com> wrote:
>Am 23.01.2023 um 10:30 schrieb Mut...@dastardlyhq.com:
>> On Sat, 21 Jan 2023 19:30:48 +0100
>> Bonita Montero <Bonita....@gmail.com> wrote:
>>> Am 21.01.2023 um 17:20 schrieb Mut...@dastardlyhq.com:
>>>
>>>> Unless you want to parse the string which may involve chopping it about.
>Then
>>>
>>>> its simpler to use a C string and pointers rather than mess about with
>>>> inefficient substr() etc.
>>>
>>> What's wrong with sth. like this ?
>>>
>>> string substr()
>>> {
>>> string hw( "hello world!" );
>>> hw.erase( hw.begin(), hw.begin() + 6 );
>>> hw.resize( 5 );
>>> return hw;
>>> }
>>
>> char *hw = "hello world!";
>> :
>> return hw + 5;
>
>You're returning a newly created string object, thereby inducing

This is C, not C++. There are no objects. Its returning a memory address
which in this case will probably be pointing to part of the program text area.

>blem. And you're returning the exclamation mark I also stripped.

hw[10] = '\0';

Sorted.


james...@alumni.caltech.edu

unread,
Jan 23, 2023, 11:22:21 AM1/23/23
to
The language doesn't state any such thing. The standard does, but there is no separate standard for the C language. The C standard describes both the C language and the C standard library. The part that describes the language defines the syntax for a string literal (NOT a text literal), and defines the corresponding semantics, which often (but not always) create a null-terminated string. The part that describes the C standard library starts, as it's very first sentence, with a definition of a C string: "A string is a contiguous sequence of characters terminated by and including the first null character." (7.1.1p1). In that sentence, the term "string" is italicized, an ISO convention indicating that the sentence in which that italicized term appears constitutes the official definition of that term.
This makes sense, because nothing in the language itself depends upon strings; they matter only because various functions in the C standard library take pointers to strings as arguments, or give such pointers as the return value of the function.

Bonita Montero

unread,
Jan 23, 2023, 11:25:11 AM1/23/23
to
Am 23.01.2023 um 17:20 schrieb Mut...@dastardlyhq.com:
> On Mon, 23 Jan 2023 13:53:55 +0100
> Bonita Montero <Bonita....@gmail.com> wrote:
>> Am 23.01.2023 um 10:30 schrieb Mut...@dastardlyhq.com:
>>> On Sat, 21 Jan 2023 19:30:48 +0100
>>> Bonita Montero <Bonita....@gmail.com> wrote:
>>>> Am 21.01.2023 um 17:20 schrieb Mut...@dastardlyhq.com:
>>>>
>>>>> Unless you want to parse the string which may involve chopping it about.
>> Then
>>>>
>>>>> its simpler to use a C string and pointers rather than mess about with
>>>>> inefficient substr() etc.
>>>>
>>>> What's wrong with sth. like this ?
>>>>
>>>> string substr()
>>>> {
>>>> string hw( "hello world!" );
>>>> hw.erase( hw.begin(), hw.begin() + 6 );
>>>> hw.resize( 5 );
>>>> return hw;
>>>> }
>>>
>>> char *hw = "hello world!";
>>> :
>>> return hw + 5;
>>
>> You're returning a newly created string object, thereby inducing
>
> This is C, not C++. ...

In a C++-newsgroup and we discussed the topic of copy-allocations ...


Mut...@dastardlyhq.com

unread,
Jan 23, 2023, 11:37:48 AM1/23/23
to
On Mon, 23 Jan 2023 17:25:49 +0100
When comparing C vs C++ a discussion of C is entirely appropriate.

Bonita Montero

unread,
Jan 23, 2023, 11:39:50 AM1/23/23
to
Yes, somewhere else in this thread.


james...@alumni.caltech.edu

unread,
Jan 23, 2023, 11:40:46 AM1/23/23
to
On Monday, January 23, 2023 at 11:20:25 AM UTC-5, Mut...@dastardlyhq.com wrote:
> On Mon, 23 Jan 2023 13:53:55 +0100
> Bonita Montero <Bonita....@gmail.com> wrote:
> >Am 23.01.2023 um 10:30 schrieb Mut...@dastardlyhq.com:
> >> On Sat, 21 Jan 2023 19:30:48 +0100
> >> Bonita Montero <Bonita....@gmail.com> wrote:
> >>> Am 21.01.2023 um 17:20 schrieb Mut...@dastardlyhq.com:
> >>>
> >>>> Unless you want to parse the string which may involve chopping it about.
> >Then
> >>>
> >>>> its simpler to use a C string and pointers rather than mess about with
> >>>> inefficient substr() etc.
> >>>
> >>> What's wrong with sth. like this ?
> >>>
> >>> string substr()
> >>> {
> >>> string hw( "hello world!" );
> >>> hw.erase( hw.begin(), hw.begin() + 6 );
> >>> hw.resize( 5 );
> >>> return hw;
> >>> }
> >>
> >> char *hw = "hello world!";
> >> :
> >> return hw + 5;
> >
> >You're returning a newly created string object, thereby inducing
> This is C, not C++. ...

Actually, this IS C++. While this discussion has been about C, it's actually taking place on comp.lang.c++.

> ... There are no objects. ...

As C defines the term "object", both hw and the array that "hello world!" points at are objects. However, you are correct in saying that hw+5 would not be an object in C.

> >blem. And you're returning the exclamation mark I also stripped.
> hw[10] = '\0';

hw points at the array that was created because of the existence of the string literal "hello world!".

"If the program attempts to modify such an array, the behavior is undefined." (C standard, 6.4.5p7)

Paavo Helde

unread,
Jan 23, 2023, 11:59:13 AM1/23/23
to
23.01.2023 18:11 Mut...@dastardlyhq.com kirjutas:
> On Mon, 23 Jan 2023 17:56:15 +0200
> Paavo Helde <ees...@osa.pri.ee> wrote:
>> 23.01.2023 14:27 Mut...@dastardlyhq.com kirjutas:
>>
>>> Another reason to use char* is that a lot of parsers will memory map a file
>>> R/W and MAP_PRIVATE which gives you a char* pointing to the beginning of the
>>> file and which you can manipulate as you see fit.
>>
>> Seems like a non-portable hack.
>
> Portable where?

This is a C++ group.

> Its been standard posix functionality for decades and is
> anything but a hack.


> The whole point of a private map is so you can manipulate
> the file contents in memory without having to labouriously read it all in
> first

This is achieved by a read-only memory map. On which a string_view would
work fine, coincidentally.

> and without changing the file itself. Its extremely useful.

Why should I want to manipulate the file contents when parsing it? Ah, I
know the answer, it comes from the camp who thinks copying a virtual
main memory page will be faster than passing some extra register
variables for keeping better track about the parsing process. Maybe 40
years ago on some hardware it had a point.

Mut...@dastardlyhq.com

unread,
Jan 23, 2023, 12:00:44 PM1/23/23
to
On Mon, 23 Jan 2023 17:40:30 +0100
Oh ok, now you're subdividing threads are you? You're the one who suggested
that using erase() and substr() was somehow just as efficient as returning
a pointer.


David Brown

unread,
Jan 23, 2023, 12:02:44 PM1/23/23
to
You are jumbling several things a bit. I would recommend you open a
copy of the C standards (I don't think anything here has changed since
at least C99) and have a look.

You'll find there is /one/ C standard document (in different versions) -
the standard library is considered an integral part of the language.
Very occasionally it is useful to distinguish a "core C language" (that
is not a term from the standard) from things defined in the standard
library - this is not such an occasion.

A sequence of characters inside double quotation marks is a "string
literal". The section describing these lexical elements, 6.4.5,
describes the array and character sequence generated. The /definition/
of the term "string" is found in chapter 7, describing the library, not
in the section defining the term "string literal". The terms "string"
and "pointer to string" are mentioned in a number of places throughout
the document, not just in the library or in connection with string literals.

It's fair to say that there is little that you can do with a string in C
that does not involve library calls - basically, you can take a pointer
to a string and use it as a pointer to a character, and you have
initialisation from string literals. String handling is done using
library functions. But that does not in any way mean strings are not
defined in the C language, or not part of the C language.

Mut...@dastardlyhq.com

unread,
Jan 23, 2023, 12:03:02 PM1/23/23
to
That's true, my mistake. So we change the char* to

char hw[] = "hello world!";

David Brown

unread,
Jan 23, 2023, 12:05:27 PM1/23/23
to
Sure.

There is no "perfect" way to implement a way of holding (general)
strings in a language. There are lots of ways to do it, but they all
have their disadvantages as well as advantages. If you want to work
efficiently with large strings, standard C strings is a poor choice of
format.

Mut...@dastardlyhq.com

unread,
Jan 23, 2023, 12:19:13 PM1/23/23
to
On Mon, 23 Jan 2023 18:58:58 +0200
Paavo Helde <ees...@osa.pri.ee> wrote:
>23.01.2023 18:11 Mut...@dastardlyhq.com kirjutas:
>> On Mon, 23 Jan 2023 17:56:15 +0200
>> Paavo Helde <ees...@osa.pri.ee> wrote:
>>> 23.01.2023 14:27 Mut...@dastardlyhq.com kirjutas:
>>>
>>>> Another reason to use char* is that a lot of parsers will memory map a file
>
>>>> R/W and MAP_PRIVATE which gives you a char* pointing to the beginning of
>the
>>>> file and which you can manipulate as you see fit.
>>>
>>> Seems like a non-portable hack.
>>
>> Portable where?
>
>This is a C++ group.

And? You are allowed to use standard OS C APIs in C++ in case you were
unaware.

>> The whole point of a private map is so you can manipulate
>> the file contents in memory without having to labouriously read it all in
>> first
>
>This is achieved by a read-only memory map. On which a string_view would
>work fine, coincidentally.

Maybe it would. But since you get char* out the box why bother?

>> and without changing the file itself. Its extremely useful.
>
>Why should I want to manipulate the file contents when parsing it? Ah, I
>know the answer, it comes from the camp who thinks copying a virtual
>main memory page will be faster than passing some extra register
>variables for keeping better track about the parsing process. Maybe 40
>years ago on some hardware it had a point.

Putting a \0 at the end of some text you want to pass to a sub function is
standard practice. But the main use is having the entire file (from the programs
perspective) available as text in memory without having to some or all of it
yourself manually or do file seeking with fstream or some kind of file pointer
which is far less efficient and still involves paging into memory anyway.

Shall we assume you've never heard of mmap() because you know nothing about
unix systems programming and got confused?

Scott Lurndal

unread,
Jan 23, 2023, 12:30:16 PM1/23/23
to
What Muttley posted is perfectly legal C++ code.

Malcolm McLean

unread,
Jan 23, 2023, 12:37:10 PM1/23/23
to
It depends how large.
If strings are very large then flat memory buffers are often the best way to go.
That prevents inadvertent copies.
If string are short, then it usually doesn't matter from a performance perspective,
so it's whether C strings or another format is easier to integrate with the exisiting
code.
If strings are medium length them then yes, C format is probably a poor choice,
because the strings are not so long that copying is prohibitively expensive,
but not so short than inefficient copying hardly matters. A format that allows
for efficient assignment, but is also easy to use, is likely better.

Scott Lurndal

unread,
Jan 23, 2023, 12:37:51 PM1/23/23
to
Yes. It's really application dependent.

For example, when parsing a sequence of bytes, one may not actually
care about the length and rather just start parsing at the first byte
and stop when the nul-byte (or a parse error) is encountered.

One pass through the string, instead of a pass every time strlen() is called.

Bonita Montero

unread,
Jan 23, 2023, 12:38:20 PM1/23/23
to
Yes, but not in that context.

Paavo Helde

unread,
Jan 23, 2023, 12:50:12 PM1/23/23
to
And ... we are arriving back to where we were before. I clarify now for
better addressing your points:

When using string_view, you always know how long your to-be-processed
piece is, so you don't need to make pointless copies or rely on
non-portable hacks (which incidentally also make pointless copies (of VM
pages)) just to know where to stop processing.


Wuns Haerst

unread,
Jan 23, 2023, 2:07:16 PM1/23/23
to
Am 23.01.2023 um 18:00 schrieb Mut...@dastardlyhq.com:

> Oh ok, now you're subdividing threads are you? You're the one who suggested
> that using erase() and substr() was somehow just as efficient as returning
> a pointer.

My suggestion was related to the point that changing the string might
result in a new string allocation and I've shown that this change might
occur in-place.

Malcolm McLean

unread,
Jan 23, 2023, 2:14:33 PM1/23/23
to
On Monday, 23 January 2023 at 17:02:44 UTC, David Brown wrote:
>
> A sequence of characters inside double quotation marks is a "string
> literal".
>
The double quotation marks plus the text inside is the "string literal".

You need a different term to describe the text itself.

james...@alumni.caltech.edu

unread,
Jan 23, 2023, 2:38:12 PM1/23/23
to
"string literal" is a named element of the C grammar. In any context in which "string literal" is the appropriate description the whole thing, the appropriate description for what's between the quote marks is the named grammar element used in the definition of "string literal": s-char-sequence (6.4.5p1).

In translation phase 7, the s-char-sequence is used to initialize an array. The details are described in the C standard, 6.4.5p6. That array is guaranteed to be terminated by a null character. As a result, every position within that array is guaranteed to qualify as the start of a C string, which might be empty if that position contains a null character.

Cholo Lennon

unread,
Jan 23, 2023, 2:43:54 PM1/23/23
to
On 1/23/23 06:25, Mut...@dastardlyhq.com wrote:
> On Sat, 21 Jan 2023 20:15:39 +0200
> Paavo Helde <ees...@osa.pri.ee> wrote:
>> 21.01.2023 18:20 Mut...@dastardlyhq.com kirjutas:
>>> On Sat, 21 Jan 2023 16:40:20 +0200
>>> Paavo Helde <ees...@osa.pri.ee> wrote:
>>>> 21.01.2023 16:28 R.Wieser kirjutas:
>>>>> Hello all,
>>>>>
>>>>> I'm a rather newbie to C++ programming who is trying figure out how to deal
>>
>>>>> with string pointers - or rather, with what they point at.
>>>>>
>>>>> Case in point :
>>>>>
>>>>> char* message = "hello world";
>>>>> char* message = strdup("hello world");
>>>>>
>>>>> I can user either of those in a function and return the pointer, and the
>>>>> caller will be none-the-wiser which form (the static or dymnamic string) it
>>
>>>>> gets.
>>>>
>>>> This is C++, so just return a std::string from your function, and forget
>>>> everything about C-style strings, malloc, free and strdup. Good riddance!
>>>
>>> Unless you want to parse the string which may involve chopping it about. Then
>>
>>> its simpler to use a C string and pointers rather than mess about with
>>> inefficient substr() etc.
>>
>> If you want to speed up substring processing, then there is
>> std::string_view for you. As fast as plain pointers and much cleaner.
>
> Not really. This is fast, simple and clear:
>
> for(p=str;*p;)
> {
> switch(*p)
> {
> case '[':
> p = parseList(p+1);
> break;
> case '{':
> p = parseBlock(p+1);
> break;
> case '(':
> p = parseArray(p+1);
> break;
> default:
> <do something else>
> ++p;
> }
> }
>

OMG! that's why we still have plenty of CVEs in C/C++ applications and
also new programming languages like Rust... :-O

> Not sure how using indexes and substrings would improve it.
>

--
Cholo Lennon
Bs.As.
ARG

Keith Thompson

unread,
Jan 23, 2023, 3:34:58 PM1/23/23
to
"R.Wieser" <add...@not.available> writes:
> "Keith Thompson" <Keith.S.T...@gmail.com> wrote in message
> news:87wn5eg...@nosuchdomain.example.com...
>
>> If I understand you correctly, the two things you're referring to are
>> the pointer *value* and an *object* of pointer type that holds that
>> value.
>
> I do not call it an object, to me its just a variable. Anything beyond that
> (an object wich might contain other properties as well as methods) is (way)
> outside of my scope.

The word "object" is not about object-oriented programming. The C
standard defines an "object" as a "region of data storage in the
execution environment, the contents of which can represent values"; it
doesn't use the term "variable". The C++ standard uses the word "object" in the
same way, though it doesn't have quite the same formal definition.

An "object" is just a variable, except that a variable typically has a
name.

>> There's nothing special about pointers to strings here.
>> The same confusion occurs with, for example, a pointer to an int
>
> Indeed. But if there is one thing I've learned from Usenet it is that you
> need to keep the context to a question simple. Referring to all the other
> adresses and what they point to would just be "muddying the water".
>
>> Usually this isn't a problem. In most contexts, the distinction
>> between a value of some type and an object/variable that holds a
>> value of some type either *isn't important or is sufficiently clear
>> from the context*.
>
> Agreed.
>
> But thats the crux : somethimes I get the distinct feeling that someone is
> talking about the address (pointing to a string or otherwise), but than
> suddenly seems to talk about the variable holding it. :-\

Yes, people are often informal, or even sloppy, about the distinction.

>> For cases where the ambiguity is important, I find it useful
>> to think of the word "pointer" (as well as "array", "integer",
>> etc.) as an adjective rather than a noun. Thus we can refer to a
>> *pointer value*, or a *pointer object", or a *pointer type*, or a
>> *pointer expression*, all of which are clearly distinct concepts.
>
> I think I can translate "pointer value" as to be meaning an address. For
> the others ? I do not have the foggiest I'm afraid.

For example, `int*` and `char*` are pointer types.

A pointer value is the result of evaluating an expression of pointer
type, including an expression that simply yields the value stored in an
object/variable. That value can be the address of an object (or of a
function, but let's set that aside), or it can be a null pointer (there
are other possibilities).

A pointer expression is a chunk of text in a C or C++ program,
interpreted as an expression that, when evaluated, will yield a result
of pointer type.

>> <OT>
>
> Whoooo.... Is that *on*, of *off* topic ? Not that it matters much which
> one though. :-)

"<OT>" means off-topic. I marked the last part of my article that way
because it discusses C, while the topic of this newsgroup is C++, a
distinct language.

You raised the issue of the ambiguity between a "pointer" as a value and
a "pointer" as an object in the context of a "pointer to a string",
which is a C-specific context. I'm trying to steer the discussion in a
direction that applies to both languages. (I might suggest posting to
comp.lang.c, but I'm not sure it would be useful.)

>> For C's definition of a "pointer to a string", I'd say it refers
>> to a *value* of pointer type.
>
> AFAIK most people do not use that phrase but use the (shorthand) "string
> pointer" (with or without the space) instead. But yes, I would also.

Sure, "string pointer" means the same thing as "pointer to a string".
It's slightly informal, but not likely to be ambiguous.

> Though thats the whole problem to me : most people seem to use that "string
> pointer" phrase for the addres (the "value of pointer type") *as well as*
> the variable (or worse : an object) its stored in.
>
> But I think I am going to stop asking. It looks like the destinction
> between the address and what its stored in isn't of much, if any, importance
> to the people here.

Suggestion: Any time you read something specific that you find
confusing, ask about it. That's likely to be a lot easier than trying
to solve the general problem.

People with a lot of experience are likely to write in an informal
shorthand that assumes a similar level of experience in readers. It can
be ambiguous, but the ambiguity can be resolved *if* you have that
experience. If something is genuinely confusing, feel free to keep us
on our toes by asking about it.

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
Working, but not speaking, for XCOM Labs
void Void(void) { Void(); } /* The recursive call of the void */

Keith Thompson

unread,
Jan 23, 2023, 4:26:42 PM1/23/23
to
Malcolm McLean <malcolm.ar...@gmail.com> writes:
> On Monday, 23 January 2023 at 13:42:53 UTC, David Brown wrote:
>> C's concept of a string is different from (and more low-level and
>> primitive than) that found in many other programming languages. But
>> that does not mean C does not have strings, defined in the standards and
>> as part of the language and standard library.
>>
> The language states that a text literal in double quotes produces a nul-terminated
> string. I think that's the only place the C language itself defines a string. Otherwise
> it is purely a standard library concept.

No, it doesn't say that. The standard's description of string
literals (N1570 6.4.5) does not refer to the standard's definition of
"string" (N1570 7.1.1).

Consider, for example, "abc\0def".

Sections 6 and 7 are both equally valid parts of the C standard.
(Most of section 7 is optional for freestanding implementations,
but 7.1.1 is not, though the concept of a "string" is less useful
in an implementation that doesn't provide library functions that
manipulate strings).

(A side note: As far as I can tell, N1570 6.4.5 describes the syntax and
semantics of a string literal, but never actually says what the value of
a string literal is. It's obvious that it's the value of the described
array object, but I don't think it actually says so. This is not
relevant to the current discussion.)

Mut...@dastardlyhq.com

unread,
Jan 24, 2023, 12:06:07 PM1/24/23
to
It was example code. It would have far more checks in a real program.

Tim Rentsch

unread,
Feb 2, 2023, 9:44:06 AM2/2/23
to
Keith Thompson <Keith.S.T...@gmail.com> writes:

> "R.Wieser" <add...@not.available> writes:
>
>> "Keith Thompson" <Keith.S.T...@gmail.com> wrote in message
>> news:87wn5eg...@nosuchdomain.example.com...
>>
>>> If I understand you correctly, the two things you're referring to
>>> are the pointer *value* and an *object* of pointer type that holds
>>> that value.
>>
>> I do not call it an object, to me its just a variable. Anything
>> beyond that (an object wich might contain other properties as well
>> as methods) is (way) outside of my scope.
>
> The word "object" is not about object-oriented programming. The C
> standard defines an "object" as a "region of data storage in the
> execution environment, the contents of which can represent values";
> it doesn't use the term "variable". The C++ standard uses the word
> "object" in the same way, though it doesn't have quite the same
> formal definition.
>
> An "object" is just a variable, except that a variable typically has
> a name.

Just a quibble: C does have some objects that are not variables
in the usual sense of the word. Generally it is true that all
"variables" correspond to objects (sometimes instantiated more
than once), but not all objects correspond to variables (again
in the usual sense of how the term "variable" is used).

james...@alumni.caltech.edu

unread,
Feb 2, 2023, 7:36:59 PM2/2/23
to
On Thursday, February 2, 2023 at 9:44:06 AM UTC-5, Tim Rentsch wrote:
> Keith Thompson <Keith.S.T...@gmail.com> writes:
...
> > The word "object" is not about object-oriented programming. The C
> > standard defines an "object" as a "region of data storage in the
> > execution environment, the contents of which can represent values";
> > it doesn't use the term "variable". The C++ standard uses the word
> > "object" in the same way, though it doesn't have quite the same
> > formal definition.
> >
> > An "object" is just a variable, except that a variable typically has
> > a name.
> Just a quibble: C does have some objects that are not variables
> in the usual sense of the word. ...

True, and his comment implies that fact, by using the word "except" - when that exception does not apply, you still have an object, but not a variable.

Tim Rentsch

unread,
Feb 2, 2023, 8:38:34 PM2/2/23
to
I suppose that is one way of reading it. But certainly it
is not the only way of reading it.
0 new messages