I'm not clear from perlguts and perlapi how to make sure all my strings are utf8.
And I'm more confused about sv_utf8_upgrade vs bytes_to_utf8.
Say I want to pass a list of strings passed into my xsub as an arrayref.
Is there a problem with blindly calling sv_utf8_upgrade on every element of my AV?
For example, this code does not handle utf8:
void
foo( names )
AV * names
INIT:
char ** name_list;
CODE:
Newx( name_list, av_len( names ) + 1, char * );
for( int i=0; i <= av_len( names ); i++ )
name_list[i] = SvPV_nolen( *av_fetch( names, i, 0) );
RETVAL = foo( name_list, av_len( names) );
SafeFree( name_list );
OUTPUT:
RETVAL
If my C function foo() expects all character data utf8-encoded is this the correct approach?
for( int i=0; i <= av_len( names ); i++ ) {
SV * name_sv = *av_fetch( names, i, 0);
sv_utf8_upgrade( name_sv );
name_list[i] = SvPV_nolen( name_sv );
}
And is sv_utf8_upgrade a NOOP if the utf8 flag is already set?
Or is a better approach to use bytes_to_utf8()? But, if I did that I would need to SafeFree() each string in my name_list[] array after calling my C function, correct?