String::WriteAscii/WriteUtf8 counting the \0

167 views
Skip to first unread message

Chris Angelico

unread,
Jul 4, 2011, 12:41:40 AM7/4/11
to v8-u...@googlegroups.com
String::Write and String::WriteAscii count characters (in the latter
case, that's the same as bytes) written, not counting the null
terminator. But String::WriteUtf8 counts bytes, and includes the null
terminator. I assume there is a reason for this anomaly, since it's
explicit in the documentation as well as being visible in the source.
Can someone fill me in? I've been puzzling over this for a while,
which is largely my own fault for not reading the docs!

Chris Angelico

Charles A. Lopez

unread,
Jul 4, 2011, 10:56:59 AM7/4/11
to v8-u...@googlegroups.com
Can you provide the prototypes for the two methods you've listed.

Thank you,
Charles


Chris Angelico

unread,
Jul 4, 2011, 11:35:11 AM7/4/11
to v8-u...@googlegroups.com
On Tue, Jul 5, 2011 at 12:56 AM, Charles A. Lopez
<charle...@gmail.com> wrote:
> Can you provide the prototypes for the two methods you've listed.

I'm not at the computer where I do that coding right now, but Google
pointed me to this page which has the prototypes:

http://bespin.cz/~ondras/html/classv8_1_1String.html

V8EXPORT int WriteAscii (char *buffer, int start=0, int length=-1,
WriteHints hints=NO_HINTS) const
V8EXPORT int WriteUtf8 (char *buffer, int length=-1, int
*nchars_ref=NULL, WriteHints hints=NO_HINTS) const

Returns:
The number of characters copied to the buffer excluding the null
terminator. For WriteUtf8: The number of bytes copied to the buffer
including the null terminator.

I don't really care about the number of characters, only the number of
bytes, which I then further process.

Chris Angelico

Henrik Lindqvist

unread,
Jul 4, 2011, 1:19:20 PM7/4/11
to v8-users
I've noticed this too. It always write and counts the null-terminator,
even when specifying HINT_MANY_WRITES_EXPECTED. This makes WriteUtf8
somewhat useless when used to encode a string into a binary buffer
where you don't want null-termination, i.e. sockets transfer.

Chris Angelico

unread,
Jul 4, 2011, 1:25:33 PM7/4/11
to v8-u...@googlegroups.com
On Tue, Jul 5, 2011 at 3:19 AM, Henrik Lindqvist
<henrik.l...@gmail.com> wrote:
> I've noticed this too. It always write and counts the null-terminator,
> even when specifying HINT_MANY_WRITES_EXPECTED. This makes WriteUtf8
> somewhat useless when used to encode a string into a binary buffer
> where you don't want null-termination, i.e. sockets transfer.

Well, it's wasting some effort. You just have to decrement the count
that you get back. It's not impossible to deal with, but it just has
that anomalous feeling of 'struct tm' from the C standard library -
day of month is 1-31, but month of year is 0-11. The true anomaly is
the day of month, which ought to be 0-30 to parallel the
hour/minute/second, but civil time is usually displayed starting from
1 in the day and month, and it feels weird to have it come out
differently. WriteUtf8 can logically be explained as returning the
count of bytes written, but 'most every other function that does this
sort of job won't count the null.

Chris Angelico

Charles A. Lopez

unread,
Jul 4, 2011, 1:32:54 PM7/4/11
to v8-u...@googlegroups.com
The question then becomes...

How many bytes per Ascii character? 




Chris Angelico

Chris Angelico

unread,
Jul 4, 2011, 1:42:48 PM7/4/11
to v8-u...@googlegroups.com
On Tue, Jul 5, 2011 at 3:32 AM, Charles A. Lopez
<charle...@gmail.com> wrote:
> The question then becomes...
> How many bytes per Ascii character?

One ASCII character fits in one byte. One Unicode character doesn't,
and encoded as UTF-8, might take between one and three bytes. The null
terminator takes one byte, of course.

ChrisA

Stephan Beal

unread,
Jul 4, 2011, 1:56:09 PM7/4/11
to v8-u...@googlegroups.com
On Mon, Jul 4, 2011 at 7:42 PM, Chris Angelico <ros...@gmail.com> wrote:
One ASCII character fits in one byte. One Unicode character doesn't,
and encoded as UTF-8, might take between one and three bytes. The null
terminator takes one byte, of course.

Slight correction: one to four bytes.


ASCII text is, by definition, also UTF-8, so to say that a Unicode character is doesn't use 1 byte isn't strictly correct.

--
----- stephan beal
http://wanderinghorse.net/home/stephan/

Henrik Lindqvist

unread,
Jul 4, 2011, 4:40:52 PM7/4/11
to v8-users
Its more serious than a just little "quirk". Many binary protocols use
Pascal type strings where the length is stored explicitly, then
String::WriteUtf8 can't be used. V8 should atleast skip writing \0
when HINT_MANY_WRITES_EXPECTED is specified, that would be logical.

On Jul 4, 7:25 pm, Chris Angelico <ros...@gmail.com> wrote:
> On Tue, Jul 5, 2011 at 3:19 AM, Henrik Lindqvist
>

Chris Angelico

unread,
Jul 4, 2011, 5:53:22 PM7/4/11
to v8-u...@googlegroups.com
On Tue, Jul 5, 2011 at 3:56 AM, Stephan Beal <sgb...@googlemail.com> wrote:
> Slight correction: one to four bytes.
> http://en.wikipedia.org/wiki/UTF-8
> ASCII text is, by definition, also UTF-8, so to say that a Unicode character
> is doesn't use 1 byte isn't strictly correct.

Sorry, my bad. I don't know why I said three; probably a consequence
of posting at 3AM. Three bytes covers the BMP, four bytes will cover
all currently-defined Unicode codepoints. Not significant at the
moment, though.


On Tue, Jul 5, 2011 at 6:40 AM, Henrik Lindqvist
<henrik.l...@gmail.com> wrote:
> Its more serious than a just little "quirk". Many binary protocols use
> Pascal type strings where the length is stored explicitly, then
> String::WriteUtf8 can't be used. V8 should atleast skip writing \0
> when HINT_MANY_WRITES_EXPECTED is specified, that would be logical.

The trouble is, any code written now will expect it to include the \0
in the count. Would it suit to add an additional hint, eg
HINT_NO_NULL_TERMINATOR, which will then (a) not write the null, and
(b) not include it in the count?

It'll be a fairly simple change. I could make it when I get to work in
an hour or so, and submit a patch. Where are such things handled?

Chris Angelico

Chris Angelico

unread,
Jul 4, 2011, 7:41:49 PM7/4/11
to v8-u...@googlegroups.com
On Tue, Jul 5, 2011 at 7:53 AM, Chris Angelico <ros...@gmail.com> wrote:
> It'll be a fairly simple change. I could make it when I get to work in
> an hour or so, and submit a patch. Where are such things handled?

Changes made and being tested. Moving this thread to the v8-dev list
which is more appropriate.

Thanks for the advice!

Chris Angelico

Stephan Beal

unread,
Jul 5, 2011, 1:37:16 AM7/5/11
to v8-u...@googlegroups.com
On Tue, Jul 5, 2011 at 1:41 AM, Chris Angelico <ros...@gmail.com> wrote:
On Tue, Jul 5, 2011 at 7:53 AM, Chris Angelico <ros...@gmail.com> wrote:
> It'll be a fairly simple change. I could make it when I get to work in
> an hour or so, and submit a patch. Where are such things handled?

Changes made and being tested. Moving this thread to the v8-dev list
which is more appropriate.

And please don't forget to update the docs to reflect the new behaviour! ;)
 

Chris Angelico

unread,
Jul 5, 2011, 1:47:12 AM7/5/11
to v8-u...@googlegroups.com
On Tue, Jul 5, 2011 at 3:37 PM, Stephan Beal <sgb...@googlemail.com> wrote:
> And please don't forget to update the docs to reflect the new behaviour! ;)

If by "docs" you mean the Doxygen comments in v8.h, the patch I posted
on the bugtracker does update that. Are there other docs?

ChrisA

Stephan Beal

unread,
Jul 5, 2011, 1:50:05 AM7/5/11
to v8-u...@googlegroups.com
LOL! i was just wishfully thinking out loud. i assume there's a line or two of API docs, which is how (i assume) you came across the discrepancy. 

Happy Hacking!

Chris Angelico

unread,
Jul 5, 2011, 1:51:39 AM7/5/11
to v8-u...@googlegroups.com
On Tue, Jul 5, 2011 at 3:50 PM, Stephan Beal <sgb...@googlemail.com> wrote:
>
> LOL! i was just wishfully thinking out loud. i assume there's a line or two
> of API docs, which is how (i assume) you came across the discrepancy.
> Happy Hacking!

The API docs that you can find splattered across the internet are all
generated from the source, and differ only in how up-to-date they are.

Open source software: Where you have the power to read the source, but
also the need to.

ChrisA

Henrik Lindqvist

unread,
Jul 5, 2011, 12:53:27 PM7/5/11
to v8-users
HINT_NO_NULL_TERMINATOR as you describe would be the perfect solution,
of course it should apply to all the String::Write* methods. The
source is located in src/api.cc.

Chris Angelico

unread,
Jul 5, 2011, 6:32:52 PM7/5/11
to v8-u...@googlegroups.com
On Wed, Jul 6, 2011 at 2:53 AM, Henrik Lindqvist
<henrik.l...@gmail.com> wrote:
> HINT_NO_NULL_TERMINATOR as you describe would be the perfect solution,
> of course it should apply to all the String::Write* methods. The
> source is located in src/api.cc.

I ended up calling it WRITE_NO_NULL_TERMINATOR and renaming 'hints' to
'options'; patch is here:
http://code.google.com/p/v8/issues/detail?id=1537

ChrisA

koichik

unread,
Jul 11, 2011, 5:30:37 AM7/11/11
to v8-users
Hi,

Could you add more hint?
v8::String::WriteAscii converts '\0' to space (0x20).

http://groups.google.com/group/v8-users/browse_thread/thread/2b3c9fedbe35fd3f

I want a hint to keep `\0`. How about it?

On 7月6日, 午前7:32, Chris Angelico <ros...@gmail.com> wrote:
> On Wed, Jul 6, 2011 at 2:53 AM, Henrik Lindqvist
>

Chris Angelico

unread,
Jul 11, 2011, 5:51:03 AM7/11/11
to v8-u...@googlegroups.com
On Mon, Jul 11, 2011 at 7:30 PM, koichik <koi...@improvement.jp> wrote:
> Hi,
>
> Could you add more hint?
> v8::String::WriteAscii converts '\0' to space (0x20).

I'm not at the computer where I do V8 work, but you could fairly
easily implement that yourself. You may wish to apply my patch first,
and call yours an option rather than a hint.

ChrisA

Reply all
Reply to author
Forward
0 new messages