Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

The encoding API

4 views
Skip to first unread message

Dan Sugalski

unread,
Aug 9, 2004, 6:51:44 PM8/9/04
to perl6-i...@perl.org
These are the details for the encoding API. This is the layer that
mediates between parrot, which sees strings as a sequence of
codepoints, and the low-level buffer, which is filled with bytes.

Note that the charset layer lives above this, but since I've not
finished that part yet I figure better the finished piece than wait
even longer. Please note that comments are *very* welcome -- I want
to get this right the first time so we can get it in and stop
worrying about it.

Also note that while all these are presented as functions, they're
really entries in a function table, so translate in your heads
accordingly.

And note again that the functions are all shadowed by charset
functions. So we really call the charset versions of these which may
then call through to the encoding sets, so the charsets can pitch a
fit if you do something they don't like. (For example, turning a
Shift-JIS string to UTF-8 or something, if the charset even cares.
Which it probably won't, but you never know, and the charset code
will probably want to get in the way of bytesetting if it's a
multibyte charset, or codepoint setting if it's a set with combining
characters)

Generally only the charset code will call these anyway.

void to_encoding(STRING *);

Make the string the new encoding, in place

STRING *copy_to_encoding(STRING *);

Make a copy of the string, in the new encoding.

UINTVAL get_codepoint(STRING *, offset);

Return the codepoint at offset.

void set_codepoint(STRING, offset, UINTVAL codepoint);

Set the codepoint at offset to codepoint

UINTVAL get_byte(STRING *, offset)

Get the byte at offset

void set_byte(STRING *, offset, UINTVAL byte);

Set the byte at offset to byte

STRING *get_codepoints(STRING, offset, count);

Get count codepoints starting at offset, returned as a STRING of no
charset. (If called through the charset code the returned string may be
put into a charset if that's a valid thing)

STRING *get_bytes(STRING, offset, count)

Get count bytes starting at offset, returned as a binary STRING.

void set_codepoints(STRING, offset, count, STRING codepointstring);

Set count codepoints, at offset offset to the contents of the codepoint
string.

void set_bytes(STRING, offset, count, STRING binarystring);

Set count bytes, at offset offset, to the contents of binary string

void become_encoding(STRING *);

Assume the string is the new encoding and make it so. Validate first
and throw an exception if this assumption is incorrect.

UINTVAL codepoints(STRING *);

Return the size in codepoints

UINTVAL bytes(STRING *);

Return the size in bytes


I have, I'm sure, forgotten something, but let's start with this and
fill in the blanks.
--
Dan

--------------------------------------it's like this-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

0 new messages