Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

sv_utf8_upgrade vs. bytes_to_utf8

7 views
Skip to first unread message

Bill Moseley

unread,
Aug 1, 2012, 10:05:06 AM8/1/12
to per...@perl.org

I'm not clear from perlguts and perlapi how to make sure all my strings are utf8.

And I'm more confused about sv_utf8_upgrade vs bytes_to_utf8. 

Say I want to pass a list of strings passed into my xsub as an arrayref. 

Is there a problem with blindly calling sv_utf8_upgrade on every element of my AV?


For example, this code does not handle utf8:

void
foo( names )
    AV * names

    INIT:
        char ** name_list;

    CODE:
        Newx( name_list, av_len( names ) + 1, char * );

        for( int i=0; i <= av_len( names ); i++ )
            name_list[i] = SvPV_nolen( *av_fetch( names, i, 0) );

        RETVAL = foo( name_list, av_len( names) );
        SafeFree( name_list );

     OUTPUT:
         RETVAL



If my C function foo() expects all character data utf8-encoded is this the correct approach?

        for( int i=0; i <= av_len( names ); i++ ) {
            SV * name_sv = *av_fetch( names, i, 0);
            sv_utf8_upgrade( name_sv );
            name_list[i] = SvPV_nolen( name_sv );
         }

And is sv_utf8_upgrade a NOOP if the utf8 flag is already set?

Or is a better approach to use bytes_to_utf8()?   But, if I did that I would need to SafeFree() each string in my name_list[] array after calling my C function, correct?


--
Bill Moseley
mos...@hank.org
0 new messages