Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Question on a C++ string

44 views
Skip to first unread message

T

unread,
Dec 29, 2019, 1:36:08 AM12/29/19
to
Hi All,

I have four different way of converting a Raku/Perl6 string
into a C++ string to pass to native call. Three come out
the same and one comes out with a 0x0000 (hex) at the end.

The "say" functions shows me each byte in the converted
string. Which one is correct?

The following is "abcdefg" converted to a C string,
"wstr", etc. is the method I used and is not part
of the C string:

97 98 99 100 101 102 103 wstr
97 98 99 100 101 102 103 0 to-c-str
97 98 99 100 101 102 103 CArray[uint8].new
97 98 99 100 101 102 103 CArray[uint16].new

And why do I care if I am using "uint8" or "uint16"
to convert visable text? What about the one
with the 0 at the end?

Many thanks,
-T

Jorgen Grahn

unread,
Dec 29, 2019, 4:33:17 AM12/29/19
to
You have "C++ string" in the subject line, but this question seems to
be as offtopic as the other recent ones from you. I don't see how to
help you with this problem, from a comp.lang.c++ point of view.

From a C perspective, a string is a pointer to the first char in a
sequence, and the string ends with a char 0 marker. That's probably
the one your Windows API wants, if it's a C API like someone wrote.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Keith Thompson

unread,
Dec 29, 2019, 4:46:03 AM12/29/19
to
Jorgen Grahn <grahn...@snipabacken.se> writes:
[...]
> From a C perspective, a string is a pointer to the first char in a
> sequence, and the string ends with a char 0 marker. That's probably
> the one your Windows API wants, if it's a C API like someone wrote.

No, from a C perspective "A string is a contiguous sequence of
characters terminated by and including the first null character."

What you're describing is a pointer to a string (which is what is
typically passed to functions).

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
[Note updated email address]
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

T

unread,
Dec 29, 2019, 4:49:54 AM12/29/19
to
On 2019-12-29 01:45, Keith Thompson wrote:
> Jorgen Grahn <grahn...@snipabacken.se> writes:
> [...]
>> From a C perspective, a string is a pointer to the first char in a
>> sequence, and the string ends with a char 0 marker. That's probably
>> the one your Windows API wants, if it's a C API like someone wrote.
>
> No, from a C perspective "A string is a contiguous sequence of
> characters terminated by and including the first null character."

Do I need the chr(0) at the end?

T

unread,
Dec 29, 2019, 4:50:28 AM12/29/19
to
On 2019-12-29 01:33, Jorgen Grahn wrote:
> You have "C++ string" in the subject line, but this question seems to
> be as offtopic as the other recent ones from you. I don't see how to
> help you with this problem, from a comp.lang.c++ point of view.

What I need to know is the construction of a C++ string

Bo Persson

unread,
Dec 29, 2019, 5:00:24 AM12/29/19
to
A C string is always null-terminated, that is what *makes* it a string.
Without the terminator it is an array of chars, which is something
different.

The C language string-functions use the null to find the end of the
string. For example, strlen:

"Returns the length of the given null-terminated byte string, that is,
the number of characters in a character array whose first element is
pointed to by str up to and not including the first null character.

The behavior is undefined if str is not a pointer to a null-terminated
byte string."

https://en.cppreference.com/w/c/string/byte/strlen


Bo Persson

Paavo Helde

unread,
Dec 29, 2019, 5:17:21 AM12/29/19
to
On 29.12.2019 8:35, T wrote:
> Hi All,
>
> I have four different way of converting a Raku/Perl6 string
> into a C++ string to pass to native call.

In C++ a string means an object of class std::string. Its internal
representation is not fixed and may depend on the compiler, its version
and macro definitions. Suspecting you are talking about C strings instead.

In C, a string is commonly represented just as an array of characters.
The length of the array may be indicated by a terminating zero
character, or the length might be passed separately.

The type of the characters depends on the string encoding. Nowadays
there are only a handful of encodings worth considering: ASCII, UTF-8,
UTF-16 and UCS-4 (aka UTF-32). Note that ASCII is a proper subset of UTF-8.

> Three come out
> the same and one comes out with a 0x0000 (hex) at the end.
>
> The "say" functions shows me each byte in the converted
> string. Which one is correct?

In principle this "say" may not show the terminating zero byte even if
it is present. I'm not familiar with Perl internals.

Anyway, this depends on the SDK function you are passing this string to.
If it does not take string length as a separate parameter, then it most
likely expects it to be zero-terminated. I guess in Perl you can always
add the zero terminator manually to the string before or after conversion.

>
> The following is "abcdefg" converted to a C string,
> "wstr", etc. is the method I used and is not part
> of the C string:
>
> 97 98 99 100 101 102 103 wstr
> 97 98 99 100 101 102 103 0 to-c-str
> 97 98 99 100 101 102 103 CArray[uint8].new
> 97 98 99 100 101 102 103 CArray[uint16].new
>
> And why do I care if I am using "uint8" or "uint16"
> to convert visable text? What about the one
> with the 0 at the end?

It depends on the interface which string encoding it expects.

For example, in Windows, the SDK functions ending with "W" take string
arguments in UTF-16 encoding, so for them I guess you need the last
method, maybe with manually added zero terminator.

The Windows SDK functions ending with "A" are best avoided for anything
else than fixed ASCII strings, as they use some random code page and
call "W" variants internally anyway.

T

unread,
Dec 29, 2019, 5:33:35 AM12/29/19
to
Thank you. Exactly the same as Modula2.

Is it the same for C++?

Öö Tiib

unread,
Dec 29, 2019, 5:56:58 AM12/29/19
to
On Sunday, 29 December 2019 08:36:08 UTC+2, T wrote:
> Hi All,
>
> I have four different way of converting a Raku/Perl6 string
> into a C++ string to pass to native call. Three come out
> the same and one comes out with a 0x0000 (hex) at the end.

Why do not you ask your questions about Raku from its community?
Seems that it exists? <https://raku.org/community/>
Most of the people posting or reading this group do not really care
about it and the few that do can find the appropriate forums just
fine.

> The "say" functions shows me each byte in the converted
> string. Which one is correct?
>
> The following is "abcdefg" converted to a C string,
> "wstr", etc. is the method I used and is not part
> of the C string:
>
> 97 98 99 100 101 102 103 wstr
> 97 98 99 100 101 102 103 0 to-c-str
> 97 98 99 100 101 102 103 CArray[uint8].new
> 97 98 99 100 101 102 103 CArray[uint16].new
>
> And why do I care if I am using "uint8" or "uint16"
> to convert visable text? What about the one
> with the 0 at the end?

Where Windows API has LPCWSTR there must be pointer that points
at buffer that contains zero-terminated UTF-16LE encoded text.
Code points in UTF-16 are 16 bit. When someone wants to
manipulate numeric values of those then it makes sense to use
8 bit or 16 bit integers for that.

Jorgen Grahn

unread,
Dec 29, 2019, 6:05:07 AM12/29/19
to
On Sun, 2019-12-29, T wrote:
> On 2019-12-29 01:45, Keith Thompson wrote:
>> Jorgen Grahn <grahn...@snipabacken.se> writes:
>> [...]
>>> From a C perspective, a string is a pointer to the first char in a
>>> sequence, and the string ends with a char 0 marker. That's probably
>>> the one your Windows API wants, if it's a C API like someone wrote.
>>
>> No, from a C perspective "A string is a contiguous sequence of
>> characters terminated by and including the first null character."
>
> Do I need the chr(0) at the end?

Didn't we both just say so?

Yes. Assuming the API function you're calling really expects a C
string.

Jorgen Grahn

unread,
Dec 29, 2019, 6:08:30 AM12/29/19
to
On Sun, 2019-12-29, T wrote:
Fine, here it is:

https://en.cppreference.com/w/cpp/string/basic_string/basic_string

Imagine the Allocator stuff isn't there and mentally replace CharT
with char, and you have your normal std::string.

Bart

unread,
Dec 29, 2019, 7:09:11 AM12/29/19
to
These are the msdn ('msdn messagebox') docs for MessageBox:

int MessageBox(
HWND hWnd,
LPCTSTR lpText,
LPCTSTR lpCaption,
UINT uType
);

This is in a panel labelled 'C++', but has been around since it used to
be C. Looking at those string types here:

https://docs.microsoft.com/en-us/windows/win32/winprog/windows-data-types

LCPCSTR ends up (if you follow the chain of typedefs) as either a type
that points to sequence to 8-bit bytes, terminated by a 0 byte, or a
sequence of 16-bit words, terminated by a 0 word.

This latter depends on whether the macro UNICODE is defined inside
windows.h. It will call MessageBoxW if defined, and MessageBoxA if not.
This is rather interesting, since when calling from another language,
windows.h doesn't exist, so there is no such macro.

In fact, there is no actual function called "MessageBox", that's just an
artefact of windows.h.

Now, when I call MessageBox from my own non-C code, I directly call
MessageBoxA with normal 8-bit zero-terminated strings. I assume the
declaration of this function is as follows:

int MessageBoxA(
int hWnd,
char* Text,
char* Caption,
unsigned int Type
);

(I don't even use 'const char*', as 'const' is an artefact of the C
language; it has no meaning at the binary level. Here 'int' is assumed
to be 32 bits.)

Barry Schwarz

unread,
Dec 29, 2019, 11:39:27 AM12/29/19
to
On Sat, 28 Dec 2019 22:35:54 -0800, T <T...@invalid.invalid> wrote:

>Hi All,
>
>I have four different way of converting a Raku/Perl6 string
>into a C++ string to pass to native call. Three come out

The word string has two defined meanings in the C++ language. One is
the name of a type (std::string) defined in the Standard Template
Library. The other is an array of characters terminated by a 0 value
as defined in the C standard which is included in C++ by reference.
Which one do you mean?

What does "native call" mean? Is it to a C (or C++) function? What
does that function expect? Show us the prototype of the function.

>the same and one comes out with a 0x0000 (hex) at the end.

No they did not. The last three are significantly different from each
other.

>The "say" functions shows me each byte in the converted
>string. Which one is correct?

No, say did not show you that. It showed you its interpretation of
the argument you gave it. At the very least, it did not show you
seven bytes of data in the uint16 array.

Furthermore, none of the results is a std::string and only one of the
results is a C-style string.

>The following is "abcdefg" converted to a C string,
>"wstr", etc. is the method I used and is not part
>of the C string:

Something appears to be missing from this sentence. What is the
method you used? And what is not part of what?

>97 98 99 100 101 102 103 wstr
>97 98 99 100 101 102 103 0 to-c-str

This is a valid C-style string consisting of eight bytes/characters.

>97 98 99 100 101 102 103 CArray[uint8].new

This is an array of seven bytes/integers that may be considered
characters but it is not a string.

>97 98 99 100 101 102 103 CArray[uint16].new

This is an array of fourteen bytes (seven integers) that may be
considered wide characters but it is not a string.

>And why do I care if I am using "uint8" or "uint16"
>to convert visable text? What about the one

You care because the function you are calling may expect one or the
other (or possibly have code to let it determine which you provided
but that is unlikely).

>with the 0 at the end?

This is the only one that is a string. Whether that matters depends
on the function you are calling.

--
Remove del for email

Barry Schwarz

unread,
Dec 29, 2019, 11:45:35 AM12/29/19
to
On Sun, 29 Dec 2019 12:17:08 +0200, Paavo Helde
<myfir...@osa.pri.ee> wrote:

>In C, a string is commonly represented just as an array of characters.
>The length of the array may be indicated by a terminating zero
>character, or the length might be passed separately.

In order for the array to be a string, it must have a terminating
zero. Non-string arrays can use other termination characters or have
the length specified separately but by definition they are not
strings.

T

unread,
Dec 29, 2019, 2:50:53 PM12/29/19
to
On 2019-12-29 03:08, Jorgen Grahn wrote:
> On Sun, 2019-12-29, T wrote:
>> On 2019-12-29 01:33, Jorgen Grahn wrote:
>>> You have "C++ string" in the subject line, but this question seems to
>>> be as offtopic as the other recent ones from you. I don't see how to
>>> help you with this problem, from a comp.lang.c++ point of view.
>>
>> What I need to know is the construction of a C++ string
>
> Fine, here it is:
>
> https://en.cppreference.com/w/cpp/string/basic_string/basic_string
>
> Imagine the Allocator stuff isn't there and mentally replace CharT
> with char, and you have your normal std::string.
>
> /Jorgen
>

Hi Jorgen,

I figured out through experimentation that Native Call
automatically add the chr(0) on to the end of a CString.

And that sending Native Call a zero gets automatically
changed into a NULL. Hmmm, I wonder what happens
if you really what a numerical zero. That will be a
test for another day.

Thank you all for all the wonderful help!

-T

T

unread,
Dec 29, 2019, 2:51:57 PM12/29/19
to
Hi Bart,

I figured out through experimentation that Native Call
automatically adds the chr(0) on to the end of a CString.

T

unread,
Dec 29, 2019, 2:54:43 PM12/29/19
to
Hi Barry,

I figured out through experimentation that Native Call
automatically adds the chr(0) on to the end of a CString.

And that sending Native Call a zero gets automatically
changed into a NULL. Hmmm, I wonder what happens
if you really what a numerical zero. That will be a
test for another day.

Trivia: I can stick all the chr(0) in a Raku/Perl6
string I want and in any position I want. But
a Raku string is a "data structure", not an "array of
characters". The end of the string is in the structure,
not the string itself.

T

unread,
Dec 29, 2019, 3:02:17 PM12/29/19
to
On 2019-12-29 02:56, Öö Tiib wrote:
> Why do not you ask your questions about Raku from its community?
> Seems that it exists?<https://raku.org/community/>
> Most of the people posting or reading this group do not really care
> about it and the few that do can find the appropriate forums just
> fine.

Hi Öö,

Well now, I am glad you asked!

I pound those guys with questions. I even get on Raku's
chat line.

The reason I post here, is because those guys very
little knowledge of C and C++ and, well, you guys do.
Plus, there are a lot of mensches on this list.

And now that I ksow what is going on on he C and C+++
side, I understand much better how to use Raku's Native
Call. Oh ya, and with you guys help and other help
from varoius places, I tracked down a bug in
RegQueryValueExW that I now know how to work around
(always send lpType a NULL despite what the documentation
says).

Thank you all for the wonderful help!

-T

By the way, I do not program in C or C++ for the simple
reason that I ADORE Raku and -- well to put it
bluntly -- I am not smart enough to program in C or C++.
My hat going off to your guys.


T

unread,
Dec 29, 2019, 3:10:20 PM12/29/19
to
On 2019-12-29 08:39, Barry Schwarz wrote:
> On Sat, 28 Dec 2019 22:35:54 -0800, T <T...@invalid.invalid> wrote:
>
>> Hi All,
>>
>> I have four different way of converting a Raku/Perl6 string
>> into a C++ string to pass to native call. Three come out
>
> The word string has two defined meanings in the C++ language. One is
> the name of a type (std::string) defined in the Standard Template
> Library. The other is an array of characters terminated by a 0 value
> as defined in the C standard which is included in C++ by reference.
> Which one do you mean?
>
> What does "native call" mean? Is it to a C (or C++) function? What
> does that function expect? Show us the prototype of the function.

Hi Barry,

Native Call is a module (documented by Satin himself)
in Raku/Perl6 that allows you to make system calls to .so
and .dll's.

>
>> the same and one comes out with a 0x0000 (hex) at the end.
>
> No they did not. The last three are significantly different from each
> other.
>
>> The "say" functions shows me each byte in the converted
>> string. Which one is correct?
>
> No, say did not show you that. It showed you its interpretation of
> the argument you gave it. At the very least, it did not show you
> seven bytes of data in the uint16 array.

Which was what I was after.

> Furthermore, none of the results is a std::string and only one of the
> results is a C-style string.

Native call converts this to a C string for you. It is
not documented very well.


>> The following is "abcdefg" converted to a C string,
>> "wstr", etc. is the method I used and is not part
>> of the C string:
>
> Something appears to be missing from this sentence. What is the
> method you used? And what is not part of what?
>
>> 97 98 99 100 101 102 103 wstr
>> 97 98 99 100 101 102 103 0 to-c-str
>
> This is a valid C-style string consisting of eight bytes/characters.
>
>> 97 98 99 100 101 102 103 CArray[uint8].new
>
> This is an array of seven bytes/integers that may be considered
> characters but it is not a string.
>
>> 97 98 99 100 101 102 103 CArray[uint16].new
>
> This is an array of fourteen bytes (seven integers) that may be
> considered wide characters but it is not a string.
>
>> And why do I care if I am using "uint8" or "uint16"
>> to convert visable text? What about the one
>
> You care because the function you are calling may expect one or the
> other (or possibly have code to let it determine which you provided
> but that is unlikely).

Native Call adds the chr(0) to the end for you. This
I had to find out through experimentation.

>
>> with the 0 at the end?
>
> This is the only one that is a string. Whether that matters depends
> on the function you are calling.
>

Since "W" calls use uint16, I settled on
my $M = CArray[uint16].new("abcdefg".encode.list);
for simplicity.

Thank you for the wonderful help! On my journey, I hit
a few curbs, ran over a fire hydrant, hit a squirrel
(that "might" have been on purpose, I ain't saying) but
I got there eventually!

-T

Keith Thompson

unread,
Dec 29, 2019, 4:57:29 PM12/29/19
to
T <T...@invalid.invalid> writes:
> On 2019-12-29 01:45, Keith Thompson wrote:
>> Jorgen Grahn <grahn...@snipabacken.se> writes:
>> [...]
>>> From a C perspective, a string is a pointer to the first char in a
>>> sequence, and the string ends with a char 0 marker. That's probably
>>> the one your Windows API wants, if it's a C API like someone wrote.
>>
>> No, from a C perspective "A string is a contiguous sequence of
>> characters terminated by and including the first null character."
>
> Do I need the chr(0) at the end?

I presume that by "chr(0)" you mean the char value '\0', also known as
the null character. Your meaning is reasonably clear, but there is no
"chr" function in C or C++.

Yes, by definition a "string" includes the terminating null character.
If there's no null character, it's not a string.

Keith Thompson

unread,
Dec 29, 2019, 4:57:32 PM12/29/19
to
T <T...@invalid.invalid> writes:
[...]
> I figured out through experimentation that Native Call
> automatically add the chr(0) on to the end of a CString.
>
> And that sending Native Call a zero gets automatically
> changed into a NULL. Hmmm, I wonder what happens
> if you really what a numerical zero. That will be a
> test for another day.
>
> Thank you all for all the wonderful help!

You're talking about Raku/Perl6, right?

Things you find out through experimentation aren't always reliable.
It's possible that the Raku native call interface (which I haven't
looked at) doesn't distinguish between a numerical zero and a null
pointer. If that's the case, it could work with any C++ implementation
that represents a null pointer as all-bits-zero. That's likely to be
all existing C++ implementations, or at least all C++ implementations
that Raku might interface with. It's not guaranteed by the C++
standard, though. A literal 0 is a *null pointer constant*, but that
doesn't imply that a *null pointer* is represented as all-bits-zero.

Keith Thompson

unread,
Dec 29, 2019, 4:57:37 PM12/29/19
to
What exactly do you mean by a "C++ string"? Are you asking about the
std::string class defined in the <string> header?

If so, what about the construction of a std::string object are you
asking about? Most of the internal details are unspecified.

C++ also has C-style strings, referred to in the standard as
"null-terminated byte strings" (and often referred to as "C-strings"
or something similar).

T

unread,
Dec 29, 2019, 5:58:02 PM12/29/19
to
On 2019-12-29 13:57, Keith Thompson wrote:
> I presume that by "chr(0)" you mean the char value '\0', also known as
> the null character. Your meaning is reasonably clear, but there is no
> "chr" function in C or C++.
>
> Yes, by definition a "string" includes the terminating null character.
> If there's no null character, it's not a string.

It soaked in. Thank you!

T

unread,
Dec 29, 2019, 6:00:32 PM12/29/19
to
On 2019-12-29 13:57, Keith Thompson wrote:
> T <T...@invalid.invalid> writes:
> [...]
>> I figured out through experimentation that Native Call
>> automatically add the chr(0) on to the end of a CString.
>>
>> And that sending Native Call a zero gets automatically
>> changed into a NULL. Hmmm, I wonder what happens
>> if you really what a numerical zero. That will be a
>> test for another day.
>>
>> Thank you all for all the wonderful help!
>
> You're talking about Raku/Perl6, right?

Yes. Once I understood what was going on in C++,
I was able to get my code working.

> Things you find out through experimentation aren't always reliable.
> It's possible that the Raku native call interface (which I haven't
> looked at) doesn't distinguish between a numerical zero and a null
> pointer. If that's the case, it could work with any C++ implementation
> that represents a null pointer as all-bits-zero. That's likely to be
> all existing C++ implementations, or at least all C++ implementations
> that Raku might interface with. It's not guaranteed by the C++
> standard, though. A literal 0 is a *null pointer constant*, but that
> doesn't imply that a *null pointer* is represented as all-bits-zero.

The Raku Native Call documentation is so, so bad
your only choice is experimentation.

NULL was a nightmare for me to figure out. Once I did,
it was so simple it made my head spin

That you again for the wonder teaching!
-T

T

unread,
Dec 29, 2019, 6:02:37 PM12/29/19
to
On 2019-12-29 13:57, Keith Thompson wrote:
> T <T...@invalid.invalid> writes:
>> On 2019-12-29 01:33, Jorgen Grahn wrote:
>>> You have "C++ string" in the subject line, but this question seems to
>>> be as offtopic as the other recent ones from you. I don't see how to
>>> help you with this problem, from a comp.lang.c++ point of view.
>>
>> What I need to know is the construction of a C++ string
>
> What exactly do you mean by a "C++ string"? Are you asking about the
> std::string class defined in the <string> header?
>
> If so, what about the construction of a std::string object are you
> asking about? Most of the internal details are unspecified.
>
> C++ also has C-style strings, referred to in the standard as
> "null-terminated byte strings" (and often referred to as "C-strings"
> or something similar).
>

Hi Keith,

I figured it out.

Native call is converting to C anyway. This is from my
Navice Call notes:

3) a “C” string is an array of characters terminated with chr(0)

4) Kernel32.dll calls with a “W” in them use UTF16 `uint16` values
in their strings. The rest you can use UTF8 `uint8`. Most of
the time they are interchangeable if you are using standard
characters.

Use the following to convert a Raku strings, `abcdefg` below,
into a “C” string

UTF8: my $L = CArray[uint8].new("abcdefg".encode.list);
UTF16: my $M = CArray[uint16].new("abcdefg".encode.list);

Native Call is tack the chr(0) at the end for you.



Thank you again for all the wonderful teaching!

-T

Paavo Helde

unread,
Dec 29, 2019, 6:28:44 PM12/29/19
to
On 30.12.2019 1:02, T wrote:

> 4) Kernel32.dll calls with a “W” in them use UTF16 `uint16` values
> in their strings. The rest you can use UTF8 `uint8`. Most of
> the time they are interchangeable if you are using standard
> characters.

UTF-8 and UTF-16 can represent the exact same set of characters - the
whole Unicode repertoire.

Alas, Windows kernel32.dll SDK functions ending with "A" do NOT use
UTF-8 (unless you have a very new Windows 10 build and have managed to
set your process code page to UTF-8; see
"https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page").

Of course, if ASCII is all what matters to you there would be no
difference. Note however this might change the moment the important
customer insists of his spelling of "naïve" or "coöperative".


Sam

unread,
Dec 29, 2019, 9:47:25 PM12/29/19
to
Open your C++ book to the part that explain how to use the "std::string"
class, and read it.

0 new messages