Typemap AV reference to char **

Bill Moseley

unread,

Jul 17, 2012, 10:05:21 AM7/17/12

to per...@perl.org

It's been many years since working with XS, so I apologize for my ignorance. :)

I have a C++ constructor with a signature like this:

family( int count, char* names[], int ages[], char* family_name );

My question is what is the easiest way to interface with methods with this kind of signature?

I'd be very happy to have typemaps handle those (and thus make my XS simply declare the constructor), but seems like I need to malloc the new arrays. And that would mean I'd need to free those which I assume I could do right after the constructor above returns.[1] Or, if I have some reasonable bounds on the total number of elements can I just a fixed size array on the stack?

Could someone please provide an example -- or maybe point me to existing code on CPAN?

Thanks,

[1] I guess I'd better see if family() holds on to the addresses of names[] and ages[] after returning. Hum, if I don't know that maybe I need to malloc my own "object" so I can free those on DESTROY. Is that right?

--
Bill Moseley
mos...@hank.org

Bill Moseley

unread,

Jul 17, 2012, 10:36:15 AM7/17/12

to per...@perl.org

Quick follow up:

On Tue, Jul 17, 2012 at 7:05 AM, Bill Moseley <mos...@hank.org> wrote:

family( int count, char* names[], int ages[], char* family_name );

[1] I guess I'd better see if family() holds on to the addresses of names[] and ages[] after returning. Hum, if I don't know that maybe I need to malloc my own "object" so I can free those on DESTROY. Is that right?

Thinking about this a bit more I realized that this specific constructor is only called ONCE per process. So, if I malloc names[] and ages[] I'm not sure it's a problem if they don't get freed. Does that simplify things? i.e. don't need to malloc a struct to hold these until DESTROY.

Second, this is my first use of XS and C++, and in the simple tests I've been doing XS is setting up a "THIS" for me. If I need to malloc a struct that holds names[], ages[], and the address returned by the C++ constructo ("my_cpp_object"), if my new method returns this struct then is that what THIS ends up as? That is, instead of THIS->method() I would then do THIS->my_cpp_object->method().

Am I making this harder that it is?

--
Bill Moseley
mos...@hank.org

David Mertens

unread,

Jul 27, 2012, 11:23:59 AM7/27/12

to Bill Moseley, per...@perl.org

Bill -

Sorry I haven't replied to this earlier. I mentally book-marked it,
but never returned. I see two potential paths for discussion.

First, suppose that you want to convert the array-of-strings each time
this function gets called because char * names[] can change with each
function call. In that case the nicest handling of this would be
typemaps. I presume that you expect your caller to supply an anonymous
list of scalars, in which case you should be able to do the following:

1) Get the length of the AV.
2) Use Newx to have Perl allocate a char ** for you with the length of
the AV [http://search.cpan.org/~rjbs/perl-5.16.0/pod/perlguts.pod#Memory_Allocation]
3) Mark that just-allocated char ** to be freed at the end of the
current scope with SAVEFREEPV so that your own XS code doesn't have to
SAVEFREE it [http://search.cpan.org/~rjbs/perl-5.16.0/pod/perlguts.pod#Localizing_changes],
and you prevent memory leaks
4) Iterate through the SVs in the AV; assign each char * in your array
to the return value of SvPVbyte_nolen on the given SV
[http://perldoc.perl.org/perlapi.html#SV-Manipulation-Functions]

On the other hand, if you know that char * names[] does not change
because it's an immutable part of a Singleton or it's a Perl-side
"constant", you could cache the array-of-strings in a C-level global.
If you do the former, you can populate that cache
1) with C code in your BOOT section
2) with a special internal C function that is called once from Perl
(tracking the state of the cache with Perl code)
3) with your function's C code that creates the cache when the
function is invoked if it doesn't already exist

You can free the cache
1) at object destruction, if your object is a true Singleton
2) with a special internal C function that is called from an END block
in your module
3) never, as it's not a huge memory leak

If you do a C-level global, you can use savepv for the string copy so
that you're not pointing to potentially changed PV elements of a
mutable SV.

Just my 2 cents for now.
David

--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan

David Oswald

unread,

Jul 27, 2012, 12:32:08 PM7/27/12

to per...@perl.org

I've sort of moved away from recommending the automatic string
conversion typemap that is available to both Inline::C and
Inline::CPP, because it turns 100% of the responsibility for
understanding everything there is to know about Unicode over to the
C/CPP/XS programmer. I could be wrong in this notion, but to me the
lure of simplicity in passing a Perl string to a c-string is deceptive
and while presenting a slight savings in code up front, will create a
need for a lot more hand-rolled code to support Unicode.

In my opinion (and I'd be happy to discover I'm wrong about this),
it's easier to pass the string-containing SV, *as an SV*, and then use
Perl's own string-handling "guts and API calls" to manipulate it. I
realize that this somewhat defeats the "performance improvement"
purpose of dropping into XS, but strings are truly one of those areas
where Perl's native performance is as it is because it's doing so much
(most of which has become necessary in a Unicode world).

--

David Oswald
daos...@gmail.com

David Mertens

unread,

Jul 27, 2012, 12:53:41 PM7/27/12

to David Oswald, per...@perl.org

Yeah. What one should do here depends on how much control you have
over the code. You can have Perl downgrade the string to a true byte
string if you use SvPVbyte_nolen, as I suggest, but then you loose any
unicode characters that your user may have passed in. If you don't
control your C++ library and it expects true C-style strings, then you
haven't much of an option.

David

Bill Moseley

unread,

Jul 27, 2012, 2:39:40 PM7/27/12

to per...@perl.org

On Fri, Jul 27, 2012 at 9:53 AM, David Mertens <dcmerte...@gmail.com> wrote:

Yeah. What one should do here depends on how much control you have
over the code. You can have Perl downgrade the string to a true byte
string if you use SvPVbyte_nolen, as I suggest, but then you loose any
unicode characters that your user may have passed in. If you don't
control your C++ library and it expects true C-style strings, then you
haven't much of an option.

I kind of got lost on the last part of this discussion. But I just realized I'm doing something wrong related to encoding.

But as for the memory management, I'm using Newx() to create the array, the populating it directly with SvPV, then after calling the C++ constructor I Safefree() the array.

But what I'm doing wrong is not handling encoding. If the C++ code is expecting utf8-encoded strings then I need to check if the SV has the utf8 flag on and then call bytes_to_utf8() if the flag is NOT set. If it's already set I should be able to copy directly.

Is that correct?

perlguts doesn't provide an example of how to use bytes_to_utf8(). Plus perlapi says it's experimental.

Or the other option is to upgrade the SV in place, right? But, I'm then upgrading the passed in string which the caller may not expect.

Hum, but then perlguts says don't do this:

if (!SvUTF8(left))
sv_utf8_upgrade(left);

So, I guess I have to ask for help with this. If I need to pass utf8 encoded strings to the C++ methods what's the correct way to upgrade/convert to utf8 and not have side-effects to the calling code?

--
Bill Moseley
mos...@hank.org