Why must it be 8-bit clean? Is that a limitation/feature of the
String() class? Will the String treat it as UTF automatically if it
sees a high bit? (Some API docs would be nice! hint, hint!)
In my case i'm working on an i/o library which of course treats the
data as opaque (void*). If i understand you correctly, if it happens
to read something with a high bit set then the data it passes back to
the caller (via a String insance) is effectively undefined (or, at
least not guaranteed to be the same bits that the input device read)?
Do i need to document that handling data with non-ASCII chars
essentially leads to undefined results? (Not a huge deal, IMO, for JS
code, as i can't imagine people doing much binary i/o with it, but i'd
like to document it if it's not going to work as expected.)
> If you make it a UC16 string then it has to have an even byte length.
Are there any docs on handling 2-byte strings in v8, or is this a
"must be done by implementations using ExternalStringRepresentation"
feature? Could/Should i potentially use ExternalStringRepresentation
as an internal buffer for the data, rather than an External-to-void*
(which can't be dereferenced by the caller)?
> So the status is that there isn't any good way to store binary data in
> JS at the moment. Of course it is possible to put the data in an
> external object instead.
That's an idea. Didn't think of that. It'd mean (in my case) buffering
arbitrarily large read buffers, and since v8 doesn't guaranty GC will
ever be called, i don't want to risk it causing an arbitrarily-sized
leak.
:)
--
----- stephan beal
http://wanderinghorse.net/home/stephan/
Aha, okay i wasn't clear on the automatic assumption to utf8. Fair enough.
> So it will assume that it is UTF-8 if it is not ASCII. Not all binary
> sequences are valid UTF-8 so you can't use this for binary data.
> Internally, V8 does not use UTF-8 so this data will be converted to
> UC16.
Doh, and here all along i assumed utf8 was what WAS used, as the API
has Utf8Value but no Utf16Value.
> /** Allocates a new string from utf16 data.*/
> static Local<String> New(const uint16_t* data, int length = -1);
>
> This one takes 16 bit characters and can represent binary data with no
> corruption, but the length is in characters, so you can's use it for
> an odd number of bytes.
What's the byte order?
>> In my case i'm working on an i/o library which of course treats the
>> data as opaque (void*). If i understand you correctly, if it happens
...
> Giving binary data to the above New method will result in undefined behaviour.
Fair enough.
> The external strings must have their data either in ASCII or in UC16.
> There's no Latin1 and undefined stuff will result if you try. In the
> case of an external string the actual string data is not on the V8
> heap. It is assumed to be immutable too of course since all JS
> strings are immutable.
That wouldn't solve my case, which is effectively latin1. i'll need to
think about that (but don't mind living with the limitation of ascii
read/write).
>> That's an idea. Didn't think of that. It'd mean (in my case) buffering
>> arbitrarily large read buffers, and since v8 doesn't guaranty GC will
>> ever be called, i don't want to risk it causing an arbitrarily-sized
>> leak.
>
> If the data is on the V8 heap then it won't be collected without a GC either. :)
But even if i registered it for gc via a weak pointer callback, it's
not guaranteed to be freed, so i'm forced to add external gc to it in
*any* case and have the client call the cleanup routine when their
context dies (this is currently handled via a sentry object in the
client app which cleans up when it goes out of scope).
But even if i registered it for gc via a weak pointer callback, it's
not guaranteed to be freed, so i'm forced to add external gc to it in
*any* case and have the client call the cleanup routine when their
context dies (this is currently handled via a sentry object in the
client app which cleans up when it goes out of scope).
Big correction: i've GOT guaranteed dtors...
Well, when a context closes, "the context is done with it," so it should tell us.
i don't want to fudge the memory size just to force gc. That's way too
vague and kludgy, and there's no guaranty the user hasn't set some
arbitrarily small limit which i might, though my abuse of the gc
engine, violate.
A v8 byte array type would be nice (which had, say, some toString()
methods for various encodings). I know google gears has binary Blobs,
the XMLHttpRequest level 2 has a ByteArray (but doesn't specify what
it would do), I think AIR and various other systems (?) also have it.
It seems like a reasonable thing to have in v8.