Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[DRAFT PPD] External Data Interfaces

7 views
Skip to first unread message

Brent Dax

unread,
Aug 17, 2002, 6:58:32 PM8/17/02
to perl6-i...@perl.org
The POD below my sig is a proposed PDD on external data interfaces, that
is, the way embedders and extenders will access Parrot's data types. It
covers Strings, Buffers, and PMCs, as well as a few related functions.

Let me know what you think.

--Brent Dax <bren...@cpan.org>
@roles=map {"Parrot $_"} qw(embedding regexen Configure)

He who fights and runs away wasted valuable running time with the
fighting.


=head1 TITLE

External Data Interfaces

=head1 VERSION

1.0

=head2 CURRENT

Maintainer: Brent Dax <bren...@cpan.org>
Class: Internals
PDD Number: TBD
Version: 1.0
Status: Proposed
Last Modified: 13 August 2002
PDD Format: 1
Language: English

=head2 HISTORY

=over 4

=item version 1

None. First version

=back

=head1 CHANGES

=over 4

=item Version 1.0

None. First version

=back

=head1 ABSTRACT

This PDD describes the external interfaces to Parrot data structures,
such as PMCs and Strings. These interfaces are shared by the embedding
and extending systems.

=head1 DESCRIPTION

One of the major flaws of Perl 5 was that the extension interfaces were,
for lack of a better term, "raw". The same interfaces were used by
extenders and core developers; this necessitated much gnashing of teeth
when a function used by extenders was no longer needed or proved
insufficient for a task--and sweeping changes were next to impossible.

One of the intents of Parrot is to provide much cleaner extension
interfaces. Most other languages in Perl's class have clean extension
interfaces, where the internal functions aren't used by extenders and
the external functions aren't used by internals developers. This PDD
describes the parts of the overall embedding/extending interface related
to user-level data; these are defined separately from embedding and
extending interfaces because they are shared by both.

"User-level data" is defined to include PMCs, Strings, and Buffers.

The design of the external data interfaces has two major objectives:

=over 4

=item 1.
To be small and simple.

=item 2.
To be complete.

=back

Obviously, these two goals conflict. For this reason, there isn't much
redundancy in the interfaces. For example, all keyed PMC functions
accept only PMCs as sources, indices, and destinations.

=head1 IMPLEMENTATION

=head2 Strings

Parrot-level C<String>s are to be represented by the type
C<Parrot_String>. This type is defined to be a pointer to a C<struct
parrot_string_t>.

The functions for creating and manipulating C<Parrot_String>s are listed
below.

=over 4

=item C<Parrot_String Parrot_string_new(Parrot_Interp, char* bytes,
Parrot_Int len, Parrot_String enc)>

Allocates a Parrot_String and sets it to the first C<len> bytes of
C<bytes>. C<enc> is the name of the encoding to use (e.g. "ASCII",
"UTF-8", "Shift-JIS"); if a case-insensitive match of this name doesn't
result in an encoding name that Parrot knows about, or if NULL is passed
as the encoding, the platform's default encoding is assumed.[1] Values
of NULL and 0 can be passed in for C<bytes> and C<len> if the user
desires an empty string.

Note that it is rarely a good idea to not specify the encoding if you're
using C<bytes> and C<len>.

=item C<Parrot_String Parrot_string_copy(Parrot_Interp, Parrot_String
dest, Parrot_String src)>

Sets C<lhs> to C<rhs> and returns C<dest>. If C<dest> is NULL, a new
Parrot_String is allocated, operated on and returned. If C<dest> and
C<src> are the same, this is a noop. This may or may not be a
copy-on-write set; the embedder should not care.

B<XXX> Is this a good policy?

=item C<Parrot_String Parrot_string_copy_bytes(Parrot_Interp,
Parrot_String dest, char* bytes, Parrot_Int len, char* enc)>

Sets C<dest> to the first C<len> bytes of C<bytes> and returns C<dest>.
C<enc> is taken to be the encoding of C<bytes>; the Parrot_String will
retain its original encoding. (Call C<Parrot_string_transcode> on the
Parrot_String first if you want to retain C<enc>.)

=item C<Parrot_String Parrot_string_encoding(Parrot_Interp,
Parrot_String str)>

Returns the encoding of C<str> as a Parrot_String.

=item C<void Parrot_string_transcode(Parrot_Interp, Parrot_String str,
Parrot_String enc)>

Transcode C<str> to C<enc>. If C<enc> isn't recognized as a valid
encoding name by a case-insensitive match, or if it is NULL, the default
encoding is used.

=item C<Parrot_String Parrot_string_concat(Parrot_Interp, Parrot_String
dest, Parrot_String lhs, Parrot_String rhs)>

Set C<dest> to the concatenation of C<lhs> and C<rhs> and return the
value of C<dest>. If C<dest> is NULL, a new Parrot_String is allocated,
operated on and returned. C<dest>'s value may be the same as either or
both of C<lhs> and C<rhs>.

=item C<Parrot_String Parrot_string_chop(Parrot_Interp, Parrot_String
dest, Parrot_String lhs, Parrot_Int len)>

Copy C<lhs> to C<dest> and remove the last C<len> characters from it,
returning C<dest>. If C<dest> is NULL, a new Parrot_String is
allocated, operated on and returned.

=item C<Parrot_UInt Parrot_string_length(Parrot_Interp, Parrot_String
str)>

Returns the length of C<str> in characters. Note that this is
"characters", not "bytes"; the string's encoding defines what
"character" means.

=item C<Parrot_UInt Parrot_string_ord(Parrot_Interp, Parrot_String str,
Parrot_UInt index)>

Returns the value of the character at C<index> in C<str>. Note that
this is "character", not "byte"; the string's encoding defines what
"character" means.

=item C<Parrot_String Parrot_string_substr(Parrot_Interp, Parrot_String
dest, Parrot_String str, Parrot_UInt index, Parrot_UInt len)>

Sets C<dest> to the substring of C<str> starting at character C<index>
and continuing for C<len> characters and returns C<dest>. Note that
this is "characters", not "bytes"; the string's encoding defines what
"character" means. If C<dest> is NULL, a new Parrot_String is
allocated, operated on and returned.

=item C<void Parrot_string_replace(Parrot_Interp, Parrot_String str,
Parrot_UInt index, Parrot_UInt len, Parrot_String rep)>

Replaces the substring of C<str> starting at character C<index> and
continuing for C<len> characters with the value of C<rep>. Note that
this is "characters", not "bytes"; the string's encoding defines what
"character" means. C<rep> need not be the same length as the substring
being replaced.

=item C<Parrot_String Parrot_string_from_cstr(Parrot_Interp, char*
cstr)>

Creates a Parrot_String from the given C string. Assumes the native
encoding.

=item C<char* Parrot_string_to_cstr(Parrot_Interp, Parrot_String str)>

Creates a null-terminated C string from the given Parrot_String. If
necessary, transcodes to the native encoding.

Use of this function is discouraged for several reasons--information can
be lost in the transcoding and null characters in the string can cause
problems. However, this function is sometimes necessary, so it's
included.

The storage for the C string is created with C<Parrot_alloc()> and must
be freed with C<Parrot_free()>.

=back

=head2 Buffers

Parrot-level C<Buffer>s are to be represented by the type
C<Parrot_Buffer>. This is defined to be a pointer to a C<struct
parrot_buffer_t>.

The functions for creating and manipulating C<Parrot_Buffer>s are listed
below.

=over 4

=item C<Parrot_Buffer Parrot_buffer_new(Parrot_Interp, Parrot_UInt
size)>

Allocates a new C<Parrot_Buffer> with C<size> bytes of memory in it.

=item C<void Parrot_buffer_resize(Parrot_Interp, Parrot_Buffer buf,
Parrot_UInt newsize)>

Allocates C<newsize> bytes of memory, copies the contents of C<buf> to
it, and places the new memory into C<buf>.

=item C<Parrot_Buffer Parrot_buffer_copy(Parrot_Interp, Parrot_Buffer
dest, Parrot_Buffer src)>

Copies the contents of C<src> into C<dest>, resizing C<dest> if
necessary, and returns C<dest>. If C<dest> is NULL, a new Parrot_Buffer
is allocated, operated on and returned.

=item C<Parrot_UInt Parrot_buffer_size(Parrot_Interp, Parrot_Buffer
buf)>

Returns the size of the contents of C<buf>.

=item C<void* Parrot_buffer_contents(Parrot_Interp, Parrot_Buffer buf)>

Returns a pointer to the contents of C<buf>. This pointer can be used
to directly manipulate C<buf>'s contents.

B<Warning>: Make sure to block the garbage collector before calling this
function! Otherwise, the pointer may become invalid, resulting in
badness ranging from losing data to core dumps.

B<Warning>: Make sure that this pointer doesn't last beyond when garbage
collection is unblocked!

=back

=head2 PMCs

Parrot-level C<PMC>s are to be represented by the type C<Parrot_PMC>.
This is defined to be a pointer to a C<struct parrot_pmc_t>.

The functions for creating and manipulating C<Parrot_PMC>s are listed
below.

=over 4

=item C<Parrot_PMC Parrot_pmc_new(Parrot_Interp, Parrot_String type)>

Creates a new Parrot_PMC of the type C<type>. If C<type> is not a
case-insensitive match of any type already registered with Parrot, this
function will throw an exception.

=item C<Parrot_PMC Parrot_pmc_new_vtable(Parrot_Interp, Parrot_VTable
vtable)>

Creates a new Parrot_PMC using C<vtable>. This can be used for
"private" PMC types.

B<XXX> Is this a good idea or not?

=item C<Parrot_Int Parrot_pmc_get_integer(Parrot_Interp, Parrot_PMC
src)>

Returns the result of C<< src->vtable->get_integer() >>.

=item C<Parrot_Float Parrot_pmc_get_number(Parrot_Interp, Parrot_PMC
src)>

Returns the result of C<< src->vtable->get_number() >>.

=item C<Parrot_String Parrot_pmc_get_string(Parrot_Interp, Parrot_PMC
src)>

Returns the result of C<< src->vtable->get_string() >>.

=item C<Parrot_PMC Parrot_pmc_get_pmc(Parrot_Interp, Parrot_PMC src)>

Returns the result of C<< src->vtable->get_pmc() >>.

=item C<Parrot_PMC Parrot_pmc_set_integer(Parrot_Interp, Parrot_PMC
dest, Parrot_Int src)>

Calls C<< dest->vtable->set_integer(src) >> and returns C<dest>.[2]

=item C<Parrot_PMC Parrot_pmc_set_number(Parrot_Interp, Parrot_PMC dest,
Parrot_Float src)>

Calls C<< dest->vtable->set_number(src) >> and returns C<dest>.

=item C<Parrot_PMC Parrot_pmc_set_string(Parrot_Interp, Parrot_PMC dest,
Parrot_String src)>

Calls C<< dest->vtable->set_string(src) >>.

=item C<Parrot_PMC Parrot_pmc_set_pmc(Parrot_Interp, Parrot_PMC dest,
Parrot_PMC src)>

Calls C<< dest->vtable->set_pmc(src) >>.

=item C<Parrot_PMC Parrot_pmc_get_indexed(Parrot_Interp, Parrot_PMC src,
Parrot_PMC index)>

Constructs a key from C<index> and calls C<<
src->vtable->get_pmc_keyed(key) >>.[3]

=item C<Parrot_PMC Parrot_pmc_get_indexed_i(Parrot_Interp, Parrot_PMC
src, Parrot_Int index)>
Calls C<< src->vtable->get_pmc_keyed_integer(index) >>.

=item C<Parrot_PMC Parrot_pmc_set_indexed(Parrot_Interp, Parrot_PMC
dest, Parrot_PMC index, Parrot_PMC src)>

Constructs a key from C<index> and calls C<<
dest->vtable->set_pmc_keyed(key, src, NULL) >>.

=item C<Parrot_PMC Parrot_pmc_set_indexed_i(Parrot_Interp, Parrot_PMC
dest, Parrot_Int index, Parrot_PMC src)>

Calls C<< dest->vtable->set_pmc_keyed_integer(index, src, 0) >>.

=item C<Parrot_PMC Parrot_pmc_call(Parrot_Interp, Parrot_PMC sub,
Parrot_PMC args)>

Pushes C<args> onto the stack, calls C<sub>, pops the return value(s)
off the stack, and returns them.

=item C<Parrot_PMC Parrot_pmc_methcall(Parrot_Interp, Parrot_PMC object,
Parrot_String method, Parrot_PMC args)>

Finds C<method> in C<object>, pushes C<object> and C<args> onto the
stack, calls the method, pops the return value(s) off the stack, and
returns them.

=back

=head2 Miscellanea

=over 4

=item C<void *Parrot_alloc(Parrot_UInt size)>

Calls the system C<malloc()> with C<size>.

=item C<void Parrot_free(void * ptr)>

Calls the system C<free()> with C<ptr>.

=item C<void Parrot_block_gc(Parrot_Interp)>

Blocks the garbage collector on the selected interpreter. Note that
this is done by incrementing a counter, so three calls to
C<Parrot_block_gc()> require three calls to C<Parrot_unblock_gc()>
before GC is reactivated.

=item C<void Parrot_unblock_gc(Parrot_Interp)>

Unblocks the garbage collector on the selected interpreter.

=back

=head1 ATTACHMENTS

None.

=head1 FOOTNOTES

[1] A string is used so that Parrot can support pluggable string
encodings but still degrade gracefully if the given encoding hasn't been
plugged in.

[2] This allows for code like C<Parrot_PMC
*mypmc=Parrot_pmc_set_integer(interp, Parrot_pmc_new(interp, "PerlInt"),
1)>.

[3] Note how limited keyed support is. This is to keep things simple.
I thought about doing combinations of return types and key types, but
that caused a combinatorial explosion, and I didn't think it wise to
expose keys to the outside.

=head1 REFERENCES

PDD 10 (Embedding)

PDD 11 (Extending)

L<perlembed>, L<perlxs>

Nicholas Clark

unread,
Aug 18, 2002, 6:07:57 PM8/18/02
to Brent Dax, perl6-i...@perl.org
On Sat, Aug 17, 2002 at 03:58:32PM -0700, Brent Dax wrote:

> =head2 Strings
>
> Parrot-level C<String>s are to be represented by the type
> C<Parrot_String>. This type is defined to be a pointer to a C<struct
> parrot_string_t>.
>
> The functions for creating and manipulating C<Parrot_String>s are listed
> below.

Is it worth arranging a reminder in here that as parrot is garbage collected
there is no confusion about who owns pointers to blah?

> =item C<Parrot_String Parrot_string_new(Parrot_Interp, char* bytes,
> Parrot_Int len, Parrot_String enc)>
>
> Allocates a Parrot_String and sets it to the first C<len> bytes of
> C<bytes>. C<enc> is the name of the encoding to use (e.g. "ASCII",
> "UTF-8", "Shift-JIS"); if a case-insensitive match of this name doesn't
> result in an encoding name that Parrot knows about, or if NULL is passed
> as the encoding, the platform's default encoding is assumed.[1] Values
> of NULL and 0 can be passed in for C<bytes> and C<len> if the user
> desires an empty string.

Should that char * be const char *?

> Note that it is rarely a good idea to not specify the encoding if you're
> using C<bytes> and C<len>.

I'm a native English speaker and I'm finding that double negative hard to
work out. Is there a clearer way to phrase it?

> =item C<Parrot_String Parrot_string_copy(Parrot_Interp, Parrot_String
> dest, Parrot_String src)>
>
> Sets C<lhs> to C<rhs> and returns C<dest>. If C<dest> is NULL, a new
> Parrot_String is allocated, operated on and returned. If C<dest> and
> C<src> are the same, this is a noop. This may or may not be a
> copy-on-write set; the embedder should not care.

"This might be a copy-on-write set" ...

And do we need a RFC like definition of should/may/must/mustn't?

In which case, surely that should read "the embedded must not care"?

> B<XXX> Is this a good policy?
>
> =item C<Parrot_String Parrot_string_copy_bytes(Parrot_Interp,
> Parrot_String dest, char* bytes, Parrot_Int len, char* enc)>
>
> Sets C<dest> to the first C<len> bytes of C<bytes> and returns C<dest>.
> C<enc> is taken to be the encoding of C<bytes>; the Parrot_String will
> retain its original encoding. (Call C<Parrot_string_transcode> on the
> Parrot_String first if you want to retain C<enc>.)

Again, should that be const char *bytes?

> =item C<void Parrot_string_transcode(Parrot_Interp, Parrot_String str,
> Parrot_String enc)>
>
> Transcode C<str> to C<enc>. If C<enc> isn't recognized as a valid
> encoding name by a case-insensitive match, or if it is NULL, the default
> encoding is used.

Encodings are specified in parrot strings (not char *) yet you state
that it's case insensitive. Is case insensitivity well defined on an encoding
basis, or is it actually dependent on the language level?
[eg one might argue that in English ş and Ş aren't the same, but if the
string is in ISO-8859-1 then Parrot isn't going to know whether the name
was specified in English, German or Icelandic. I chose ş because I don't
think there are any foreign words adopted into English spelled with thorn.
Whereas I'd not be surprised if most other accented letters are used in
some or other word]

Independent of that, aren't we opening ourselves up to a big performance
hit by doing case insensitive matching on arbitrary encodings (such as
Unicode)? Which normal form were we going to do it in?
And if the canonical name is defined in (say) ISO 8859-1 but their string is
in Unicode, are we going to convert before deciding whether it is the same?
And if they're in Shift-JIS but we're supplying it in ISO-8859-2 - that's
2 conversions?

It seems faster having names as US-ASCII and being case insensitive, or having
names case sensitive.

> =item C<Parrot_UInt Parrot_string_length(Parrot_Interp, Parrot_String
> str)>
>
> Returns the length of C<str> in characters. Note that this is
> "characters", not "bytes"; the string's encoding defines what
> "character" means.

Should you be clear what happens with combining characters?
If so, that's "characters", not "bytes" or "glyphs", isn't it?

Is there a cross reference to what a Parrot_UInt is?

> =item C<Parrot_String Parrot_string_from_cstr(Parrot_Interp, char*
> cstr)>
>
> Creates a Parrot_String from the given C string. Assumes the native
> encoding.

const char* ?

> =item C<Parrot_PMC Parrot_pmc_new_vtable(Parrot_Interp, Parrot_VTable
> vtable)>
>
> Creates a new Parrot_PMC using C<vtable>. This can be used for
> "private" PMC types.
>
> B<XXX> Is this a good idea or not?

Singletons are considered useful in some language, aren't they?
Without this, would it be hard to efficiently create singletons?

> =item C<void *Parrot_alloc(Parrot_UInt size)>
>
> Calls the system C<malloc()> with C<size>.

Are you sure you want to set that in stone? "Calls the system malloc or
equivalent"
IIRC on Win32 perl5 supplies a malloc that tracks which (i)thread allocates
memory, and frees all memory on ithread exit. And perl5 comes with its own
malloc, which if often likes to use on *nix.

Nicholas Clark
--
Even better than the real thing: http://nms-cgi.sourceforge.net/

Brent Dax

unread,
Aug 18, 2002, 6:53:47 PM8/18/02
to Nicholas Clark, perl6-i...@perl.org
Nicholas Clark:
# > The functions for creating and manipulating C<Parrot_String>s are
# > listed below.
#
# Is it worth arranging a reminder in here that as parrot is
# garbage collected there is no confusion about who owns
# pointers to blah?

Probably. (Actually, I'd probably put it in a section above this, as it
doesn't necessarily go with strings in particular.

# > =item C<Parrot_String Parrot_string_new(Parrot_Interp, char* bytes,
# > Parrot_Int len, Parrot_String enc)>
# >
# Should that char * be const char *?

*shrugs* I never really understood the distinction. :^)

# > Note that it is rarely a good idea to not specify the encoding if
# > you're using C<bytes> and C<len>.
#
# I'm a native English speaker and I'm finding that double
# negative hard to work out. Is there a clearer way to phrase it?

"If C<bytes> and C<len> are used, specifying C<enc> is usually a good
idea."

# > C<src> are the same, this is a noop. This may or may not be a
# > copy-on-write set; the embedder should not care.
#
# "This might be a copy-on-write set" ...
#
# And do we need a RFC like definition of should/may/must/mustn't?

If so, I'd suggest the definition be patched into PDD0, so it's shared
by all PDDs instead of repeating the definitions everywhere.

# In which case, surely that should read "the embedded must not care"?

I don't want to say "must". If they do care, they're free to include
internals headers as they see fit--and deal with all the maintenance
hassles this causes.

# > B<XXX> Is this a good policy?
# >
# > =item C<Parrot_String Parrot_string_copy_bytes(Parrot_Interp,
# > Parrot_String dest, char* bytes, Parrot_Int len, char* enc)>
#
# Again, should that be const char *bytes?

Again, I dunno. :^)

# > =item C<void Parrot_string_transcode(Parrot_Interp,
# Parrot_String str,
# > Parrot_String enc)>
# >
# > Transcode C<str> to C<enc>. If C<enc> isn't recognized as a valid
# > encoding name by a case-insensitive match, or if it is NULL, the
# > default encoding is used.
#
# Encodings are specified in parrot strings (not char *) yet
# you state that it's case insensitive. Is case insensitivity
# well defined on an encoding basis, or is it actually
# dependent on the language level? [eg one might argue that in
# English þ and Þ aren't the same, but if the string is in
# ISO-8859-1 then Parrot isn't going to know whether the name
# was specified in English, German or Icelandic. I chose þ
# because I don't think there are any foreign words adopted
# into English spelled with thorn. Whereas I'd not be surprised
# if most other accented letters are used in some or other word]
#
# Independent of that, aren't we opening ourselves up to a big
# performance hit by doing case insensitive matching on
# arbitrary encodings (such as Unicode)? Which normal form were
# we going to do it in? And if the canonical name is defined in
# (say) ISO 8859-1 but their string is in Unicode, are we going
# to convert before deciding whether it is the same? And if
# they're in Shift-JIS but we're supplying it in ISO-8859-2 -
# that's 2 conversions?
#
# It seems faster having names as US-ASCII and being case
# insensitive, or having names case sensitive.

We can be case-sensitive. I'd rather not be encoding-sensitive, but
that's okay too if need be.

# > =item C<Parrot_UInt Parrot_string_length(Parrot_Interp,
# Parrot_String
# > str)>
# >
# > Returns the length of C<str> in characters. Note that this is
# > "characters", not "bytes"; the string's encoding defines what
# > "character" means.
#
# Should you be clear what happens with combining characters?
# If so, that's "characters", not "bytes" or "glyphs", isn't it?

That's what I mean by "the encoding decides". I would imagine that
Unicode encodings wouldn't count combining characters, but I don't know
enough to make an informed decision about that.

# Is there a cross reference to what a Parrot_UInt is?

I should include a section defining Parrot_Int, Parrot_UInt, and
Parrot_Float.

# > =item C<Parrot_String Parrot_string_from_cstr(Parrot_Interp, char*
# > cstr)>
# >
# > Creates a Parrot_String from the given C string. Assumes
# the native
# > encoding.
#
# const char* ?

*shrugs*

# > =item C<Parrot_PMC Parrot_pmc_new_vtable(Parrot_Interp,
# Parrot_VTable
# > vtable)>
# >
# > Creates a new Parrot_PMC using C<vtable>. This can be used for
# > "private" PMC types.
# >
# > B<XXX> Is this a good idea or not?
#
# Singletons are considered useful in some language, aren't
# they? Without this, would it be hard to efficiently create singletons?

What this really deals with is if I want a custom PMC type without
registering it.

# > =item C<void *Parrot_alloc(Parrot_UInt size)>
# >
# > Calls the system C<malloc()> with C<size>.
#
# Are you sure you want to set that in stone? "Calls the system
# malloc or equivalent" IIRC on Win32 perl5 supplies a malloc
# that tracks which (i)thread allocates memory, and frees all
# memory on ithread exit. And perl5 comes with its own malloc,
# which if often likes to use on *nix.

It should probably say "or equivalent".

--Brent Dax <bren...@cpan.org>
@roles=map {"Parrot $_"} qw(embedding regexen Configure)

"Java golf. That'd be a laugh. 'Look, I done it in 15!' 'Characters?'
'No, classes!'"
--Ferret, in the Monastery

Juergen Boemmels

unread,
Aug 19, 2002, 10:43:41 AM8/19/02
to Brent Dax, perl6-i...@perl.org
"Brent Dax" <bren...@cpan.org> writes:

Some comments on this

> =item C<Parrot_String Parrot_string_new(Parrot_Interp, char* bytes,
> Parrot_Int len, Parrot_String enc)>
>
> Allocates a Parrot_String and sets it to the first C<len> bytes of
> C<bytes>. C<enc> is the name of the encoding to use (e.g. "ASCII",
> "UTF-8", "Shift-JIS"); if a case-insensitive match of this name doesn't
> result in an encoding name that Parrot knows about, or if NULL is passed
> as the encoding, the platform's default encoding is assumed.[1] Values
> of NULL and 0 can be passed in for C<bytes> and C<len> if the user
> desires an empty string.
>
> Note that it is rarely a good idea to not specify the encoding if you're
> using C<bytes> and C<len>.

Are you sure you want encoding to be a Parrot_String which is also
encoded. I think it would be better if it were a NUL terminated
C-String in native encoding.

Parrot_string_new (interp, "foobar", 6, "UTF-8") vs.
Parrot_string_new (interp, "foobar", 6, Parrot_string_new (interp,
"UTF-8", 5, NULL)) or
Parrot_string_new (interp, "foobar", 6, Parrot_string_from_cstring
(interp, "UTF-8"))

Or there have to be predefined Strings like
Parrot_encoding_ASCII

The C-level API should not be unnessesary hard.

[...]

> =item C<Parrot_String Parrot_string_copy_bytes(Parrot_Interp,
> Parrot_String dest, char* bytes, Parrot_Int len, char* enc)>
>
> Sets C<dest> to the first C<len> bytes of C<bytes> and returns C<dest>.
> C<enc> is taken to be the encoding of C<bytes>; the Parrot_String will
> retain its original encoding. (Call C<Parrot_string_transcode> on the
> Parrot_String first if you want to retain C<enc>.)

Here enc is a native char *.
Either way, specifing encoding should be done in one and only one way.

> =item C<Parrot_String Parrot_string_encoding(Parrot_Interp,
> Parrot_String str)>
>
> Returns the encoding of C<str> as a Parrot_String.

If we go the way make encoding a C-String this should also be
const char * Parrot_String_encoding (Parrot_Interp, Parrot_String)
The life_time of this pointer can be specified as long as the
interpreter lives (same as the Parrot_String)

[...]

> =item C<Parrot_PMC Parrot_pmc_new(Parrot_Interp, Parrot_String type)>
>
> Creates a new Parrot_PMC of the type C<type>. If C<type> is not a
> case-insensitive match of any type already registered with Parrot, this
> function will throw an exception.

Ok, I think its a good thing to PMC-types in arbitary encodings. But
it would be nice to have a convinience function

Parrot_PMC Parrot_pmc_new_from_cstr (Parrot_Interp, const char *)

[...]

Just my EUR0.02
juergen
--
Juergen Boemmels boem...@physik.uni-kl.de
Fachbereich Physik Tel: ++49-(0)631-205-2817
Universitaet Kaiserslautern Fax: ++49-(0)631-205-3906
PGP Key fingerprint = 9F 56 54 3D 45 C1 32 6F 23 F6 C7 2F 85 93 DD 47

Bryan C. Warnock

unread,
Aug 20, 2002, 10:46:04 PM8/20/02
to perl6-i...@perl.org
On Sun, 2002-08-18 at 18:53, Brent Dax wrote:
> # And do we need a RFC like definition of should/may/must/mustn't?
>
> If so, I'd suggest the definition be patched into PDD0, so it's shared
> by all PDDs instead of repeating the definitions everywhere.

Noted.

--
Bryan C. Warnock
bwarnock@(gtemail.net|raba.com)

0 new messages