Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Dealing with strings with NULL characters in the middle

106 views
Skip to first unread message

Chicken Mcnuggets

unread,
Nov 25, 2014, 2:47:52 PM11/25/14
to
I'm writing an SCGI server and it receives strings in this form:

"70:"
"CONTENT_LENGTH" <00> "27" <00>
"SCGI" <00> "1" <00>
"REQUEST_METHOD" <00> "POST" <00>
"REQUEST_URI" <00> "/deepthought" <00>
","
"What is the answer to life?"

as you can see it has NULL characters separating the headers and values
from each other. Using the standard constructor for std::string assumes
a const char * which is a C string and therefore NULL terminated.
Therefore the string is truncated at the first NULL character.

There must be some way to get past this since SCGI is not an unpopular
protocol.

Any help is appreciated.

Christopher Pisz

unread,
Nov 25, 2014, 3:02:03 PM11/25/14
to
For something like this, usually the first step is to see if there is an
existing library for the protocol already. I did a quick google search
and there are several. Why reinvent something someone else has already done?

Then it's a matter of narrowing it down by how good the documentation
and examples are, if there are resources for questions, etc.

If I absolute had to implement it myself, I'd treat the data as bytes
rather than strings until the parts were seperated. Perhaps a
std::vector<char> contains the received data. Then you search for
delims, like NULLs in this case, and seperate out the specific part.
You'd probably want a class structure for the entire response and some
map of name and values inside. I don't know anything about the scgi
protocol though.

Paavo Helde

unread,
Nov 25, 2014, 3:06:00 PM11/25/14
to
Chicken Mcnuggets <chi...@mcnuggets.com> wrote in news:w15dw.1498$fi4.67
@fx20.am4:

> I'm writing an SCGI server and it receives strings in this form:
>
> "70:"
> "CONTENT_LENGTH" <00> "27" <00>
> "SCGI" <00> "1" <00>
> "REQUEST_METHOD" <00> "POST" <00>
> "REQUEST_URI" <00> "/deepthought" <00>
> ","
> "What is the answer to life?"
>
> as you can see it has NULL characters separating the headers and values
> from each other. Using the standard constructor for std::string assumes
> a const char * which is a C string and therefore NULL terminated.

This is just a convenience constructor. There are other constructors for
std::string which also take the string length. There is no problem with
embedded zero bytes in std::string, it's a character like any other as far
as std::string is concerned.

Cheers
Paavo

Chicken Mcnuggets

unread,
Nov 26, 2014, 6:16:39 AM11/26/14
to
There are four primary reasons I am doing this rather than using an
existing library:

1) I'm learning C++ and I need the practice.
2) I want my implementation to play nicely with my async networking
implementation (using epoll etc).
3) I need BSD / MIT / Boost / public domain licensed code.
4) I don't like the implementation of most SCGI libraries I have looked at.

Now the libraries out there may very well play nicely in an async
environment but I have strong requirements for the public interface
exposed by the SCGI library so that it will integrate with the rest of
the system and because the rest of the library is BSD licensed I can't
use GPL or LGPL code in the code base.

I'll give the std::vector<char> a go and see how I get on.

Thanks.

Martijn Lievaart

unread,
Nov 26, 2014, 7:55:20 AM11/26/14
to
On Wed, 26 Nov 2014 11:16:28 +0000, Chicken Mcnuggets wrote:

> I'll give the std::vector<char> a go and see how I get on.

As others already pointed out, you may want to have a look at the
std::string(const char*, size_t) constructor. You also may want to look
at std::array<char> and std::deque<char>.

HTH,
M4

Chicken Mcnuggets

unread,
Nov 26, 2014, 9:09:01 AM11/26/14
to
Cool. Thanks for the tips.

I'll dig into my books and see what I can come up with.

Chicken Mcnuggets

unread,
Nov 27, 2014, 5:36:46 PM11/27/14
to
On 25/11/14 20:01, Christopher Pisz wrote:
I've been playing around with this a little bit this evening and can't
get it to work correctly. This is my code so far (I've put it up on
ideone.com because it wraps in a nasty way when posted here directly):

http://ideone.com/TgynzG

Unfortunately it throws an out of range exception. This is the actual error:

unknown location(0): fatal error in "SCGITest": std::out_of_range:
vector::_M_range_check: __n (which is 49) >= this->size() (which is 15)

and I'm stumped. I'm also pretty sure my code is incorrect but haven't
found the right method to parse a std::vector<char> for the correct
values yet.

Anyone got any suggestions? Using iterators and containers seem to be a
bit of a pain at the moment. As far as I can tell this should be fine
since I'm using an iterator which should stop at the end of the vector
but it seems that is not the case.

Chicken Mcnuggets

unread,
Nov 27, 2014, 5:40:02 PM11/27/14
to
Doh forgot to add this is my code for initialising the vector:

const char *c_netstring = "13:SCGI\01\0,Test";
const int string_length = 15;
std::vector<char> netstring {};

for(int i = 0; i < string_length; ++i)
{
netstring.push_back(c_netstring[i]);
}

Ben Bacarisse

unread,
Nov 27, 2014, 6:10:26 PM11/27/14
to
Chicken Mcnuggets <chi...@mcnuggets.com> writes:
<snip>
> I've been playing around with this a little bit this evening and can't
> get it to work correctly. This is my code so far (I've put it up on
> ideone.com because it wraps in a nasty way when posted here directly):
>
> http://ideone.com/TgynzG
>
> Unfortunately it throws an out of range exception. This is the actual error:
>
> unknown location(0): fatal error in "SCGITest": std::out_of_range:
> vector::_M_range_check: __n (which is 49) >= this->size() (which is
> 15)

The code increments an iterator in some specific cases. The for loop
then *always* increments it. Are you certain that the condition that
ends the loop (it != raw_scgi_netstring.end()) is going to fire and not
be "skipped"?

BTW, the code looks to be making heavy work of parsing out the parts of
string. The std::string class has lots of member function that can help
with this sort of task.

<snip>
--
Ben.

Paavo Helde

unread,
Nov 27, 2014, 6:29:21 PM11/27/14
to
Chicken Mcnuggets <chi...@mcnuggets.com> wrote in
news:XKNdw.22215$066....@fx16.am4:
> Doh forgot to add this is my code for initialising the vector:
>
> const char *c_netstring = "13:SCGI\01\0,Test";
> const int string_length = 15;
> std::vector<char> netstring {};
>
> for(int i = 0; i < string_length; ++i)
> {
> netstring.push_back(c_netstring[i]);
> }

I understand this is just helper code, but you can achieve the same in a
bit shorter and more reliable way:

const char c_netstring[] = "13:SCGI\01\0,Test";
std::vector<char> netstring
{ std::begin(c_netstring), std::end(c_netstring) };

Chicken Mcnuggets

unread,
Nov 28, 2014, 8:05:34 AM11/28/14
to
On 27/11/14 23:10, Ben Bacarisse wrote:
> Chicken Mcnuggets <chi...@mcnuggets.com> writes:
> <snip>
>> I've been playing around with this a little bit this evening and can't
>> get it to work correctly. This is my code so far (I've put it up on
>> ideone.com because it wraps in a nasty way when posted here directly):
>>
>> http://ideone.com/TgynzG
>>
>> Unfortunately it throws an out of range exception. This is the actual error:
>>
>> unknown location(0): fatal error in "SCGITest": std::out_of_range:
>> vector::_M_range_check: __n (which is 49) >= this->size() (which is
>> 15)
>
> The code increments an iterator in some specific cases. The for loop
> then *always* increments it. Are you certain that the condition that
> ends the loop (it != raw_scgi_netstring.end()) is going to fire and not
> be "skipped"?
>

I've removed all the the manual iterator increments in the loop and
replaced them with continue (which in hindsight is what I meant to do
originally anyway).

I still get the same error which is unfortunate. I'm a bit stuck here.
I've changed the iterator to a const iterator just to make sure I'm not
doing anything stupid with it by accident and I still get the out of
range exception. I could loop through it manually but that kinda defeats
the purpose of using C++ in the first place. I might as well just write
C style code for this if I do that way which I'm attempting to avoid.

> BTW, the code looks to be making heavy work of parsing out the parts of
> string. The std::string class has lots of member function that can help
> with this sort of task.
>
> <snip>
>

Yeah I started off with std::string and using the const char and string
length constructor but hit the same issue there as I did here with out
of range exceptions. I assumed it was because it was getting confused
with all the NULLs but from the looks of it it was something more than
that since the vector is exhibiting something similar.

I can't keep swapping backwards and forwards. I'll stick with vector
until I can get it working and then I'll think about a string
implementation. I guess I could have a constructor for each so the user
can choose how they want it stored internally if they like. Either way
it can't hurt.
Message has been deleted

Chicken Mcnuggets

unread,
Nov 28, 2014, 11:19:07 AM11/28/14
to
On 28/11/14 13:29, Stefan Ram wrote:
> Someone writes:
>> Subject: Dealing with strings with NULL characters in the middle
>
> The ASCII character is spelled »NUL«.
>

Ah. Yes. Thank you for the correction.

Ben Bacarisse

unread,
Nov 28, 2014, 11:31:09 AM11/28/14
to
Chicken Mcnuggets <chi...@mcnuggets.com> writes:
<snip>
> I still get the same error which is unfortunate. I'm a bit stuck
> here.

It's unlikely anyone can help without seeing the code. Ideally
executable code.

<snip>
--
Ben.

Chicken Mcnuggets

unread,
Nov 28, 2014, 11:52:21 AM11/28/14
to
Updated source code:

http://ideone.com/3g6fW5

Produces the following output:

Header Name:
Header Value:
Header Name:
Header Value:
Header Name:
Header Value:
Header Name:
Header Value:
Header Name: S
Header Value:
Header Name: SC
Header Value:
Header Name: SCG
Header Value:
Header Name: SCGI
Header Value:
Header Name: SCGI
Header Value:
Header Name:
Header Value:
Header Name:
Header Value:
Header Name:
Header Value:
Header Name:
Header Value:
Header Name:
Header Value:
Header Name:
Header Value:

The code works fine up until the NUL character and then fails for some
reason. It appears that it doesn't see the NUL character as an
individual character and then mucks up when you try and compare to it.

I got rid of the iterators since that wasn't working and the new loop at
least seems to produce some output to help with debugging.

I've just run it through GDB and I think I've found the problem. My
string that I am using is as follows:

const char *c_netstring = "13:SCGI\01\0,Test";

Notice the first NUL character followed by a 1. In GDB it appears that
the std::vector<char> interprets that as a single character (an ASCII 1
character aka SOH or Start of Heading or \001) rather than correctly
interpreting it as an NUL (ASCII 0 or \000) followed by a 1 (ASCII 49 or
\061).

Now I know what the problem is. How do I fix this? I can't put a
separator between the two characters because that would break the
protocol so I'm kinda stuck here.

Chicken Mcnuggets

unread,
Nov 28, 2014, 11:53:20 AM11/28/14
to
See my other post I just made with results from GDB and an explanation
of what the problem is.

Louis Krupp

unread,
Nov 28, 2014, 12:34:14 PM11/28/14
to
On Fri, 28 Nov 2014 16:52:09 +0000, Chicken Mcnuggets
<chi...@mcnuggets.com> wrote:

<snip>
>I've just run it through GDB and I think I've found the problem. My
>string that I am using is as follows:
>
>const char *c_netstring = "13:SCGI\01\0,Test";
>
>Notice the first NUL character followed by a 1. In GDB it appears that
>the std::vector<char> interprets that as a single character (an ASCII 1
>character aka SOH or Start of Heading or \001) rather than correctly
>interpreting it as an NUL (ASCII 0 or \000) followed by a 1 (ASCII 49 or
>\061).

My reaction is probably the same as a lot of other folks: I should
have seen that. This isn't a problem isn't with std::vector<char>;
the compiler is handling the octal conversion \01 properly, and
std::vector<char> is seeing the resulting byte.

If I'm not mistaken, \1, \01 and \001 should all give the same result,
a byte with a value of 1.

>
>Now I know what the problem is. How do I fix this? I can't put a
>separator between the two characters because that would break the
>protocol so I'm kinda stuck here.

\0001 should give you a nul byte followed by an ASCII '1'.

Louis

Chicken Mcnuggets

unread,
Nov 28, 2014, 1:20:19 PM11/28/14
to
Awesome!

That fixed that issue. Now I just need to fix my parser :P.

Thank you everyone for your help, it is much appreciated.

Paavo Helde

unread,
Nov 28, 2014, 5:12:35 PM11/28/14
to
Louis Krupp <lkr...@nospam.pssw.com.invalid> wrote in
news:cqbh7at5h614pvecj...@4ax.com:

> \0001 should give you a nul byte followed by an ASCII '1'.

Another a bit more readable option would be

const char *c_netstring = "13:SCGI\0" "1\0,Test";

or maybe something like

#define NUL "\0"
const char *c_netstring = "13:SCGI" NUL "1" NUL ",Test";

This defines a C array of 16 characters (the compiler adds one terminating
zero byte automatically).

hth
Paavo

red floyd

unread,
Nov 29, 2014, 8:10:02 PM11/29/14
to
I like the #define NUL solution. It just comes out as more "readable".

0 new messages