> And now, some rough spots that need attention:
>
> - I mentioned that I added new versions of functions with _(pvn|pv|sv)
> versions; But what should I do about the original versions, which are now
> little more than wrappers around the new functions? Leave them next to the
> new ones? Macros somewhere? rafl suggested the macro way, which I prefer the
> most, but with that I'm not sure where to sic sv_derived_from and sv_does.
Would inline functions work?
There is this "new fangled" PERL_STATIC_INLINE as of 5.14 which gives
C<static inline> where possible, and C<static> on legacy compilers.
> - Until yesterday, GvNAMEUTF8() was doing something like (HEK_UTF8() ||
> HEK_WASUTF8()), as the double check was needed by most places that called
> SvUTF8() on GVs (pp_concat, stringify, substr, SvPV, etc). However, that
> double-check was occasionally breaking hv_(fetch|store|common|etc) calls for
> globs with latin-1 in them, so I removed the HEK_WASUTF8 out of GvNAMEUTF8
> and added it explicitly in SvUTF8(). Question is, is there any value to
> adding a GvNAMEWASUTF8() macro, seeing how there's not much of anything that
> would end up using it, rather than continue using HEK_WASUTF8(GvNAME_HEK())?
I don't think that any code doing comparisons needs to worry about *WASUTF8.
It doesn't affect the (actual) encoding of the sequence of octets in the HEK.
HEKs containing only characters in the range 0-255 are always stored as bytes.
and HVhek_UTF8 is false.
HEKs containing any characters >255 are stored as UTF-8, and HVhek_UTF8 is
true.
That's enough for comparisons.
HVhek_WASUTF8 was added just before 5.8.0 was released to permit C<keys> to
return scalars identically encoded to the scalar used to create the hash key,
thanks to the horrible way that SvUTF8() is used by the core both to signal
encoding and matching semantics. People didn't like it in late 5.7.x when
C<keys> always returned "upgraded" scalars, and other people didn't like it
when C<keys> returned "downgraded" scalars. The only way to please everyone
was to have C<keys> faithfully return whatever people had used to create the
hash. It's only "used" here [complete with typos in the comments], to trigger
a call to bytes_to_utf8() on the *bytes* stored in (HEK_KEY(),HEK_LEN()):
SV *
Perl_newSVhek(pTHX_ const HEK *const hek)
{
dVAR;
if (!hek) {
SV *sv;
new_SV(sv);
return sv;
}
if (HEK_LEN(hek) == HEf_SVKEY) {
return newSVsv(*(SV**)HEK_KEY(hek));
} else {
const int flags = HEK_FLAGS(hek);
if (flags & HVhek_WASUTF8) {
/* Trouble :-)
Andreas would like keys he put in as utf8 to come back as utf8
*/
STRLEN utf8_len = HEK_LEN(hek);
SV * const sv = newSV_type(SVt_PV);
char *as_utf8 = (char *)bytes_to_utf8 ((U8*)HEK_KEY(hek), &utf8_len);
/* bytes_to_utf8() allocates a new string, which we can repurpose: */
sv_usepvn_flags(sv, as_utf8, utf8_len, SV_HAS_TRAILING_NUL);
SvUTF8_on (sv);
return sv;
} else if (flags & (HVhek_REHASH|HVhek_UNSHARED)) {
/* We don't have a pointer to the hv, so we have to replicate the
flag into every HEK. This hv is using custom a hasing
algorithm. Hence we can't return a shared string scalar, as
that would contain the (wrong) hash value, and might get passed
into an hv routine with a regular hash.
Similarly, a hash that isn't using shared hash keys has to have
the flag in every key so that we know not to try to call
share_hek_kek on it. */
SV * const sv = newSVpvn (HEK_KEY(hek), HEK_LEN(hek));
if (HEK_UTF8(hek))
SvUTF8_on (sv);
return sv;
}
/* This will be overwhelminly the most common case. */
{
/* Inline most of newSVpvn_share(), because share_hek_hek() is far
more efficient than sharepvn(). */
SV *sv;
new_SV(sv);
sv_upgrade(sv, SVt_PV);
SvPV_set(sv, (char *)HEK_KEY(share_hek_hek(hek)));
SvCUR_set(sv, HEK_LEN(hek));
SvLEN_set(sv, 0);
SvREADONLY_on(sv);
SvFAKE_on(sv);
SvPOK_on(sv);
if (HEK_UTF8(hek))
SvUTF8_on(sv);
return sv;
}
}
}
Nicholas Clark
On Mon, Jul 11, 2011 at 04:43:03AM -0300, Brian Fraser wrote:Would inline functions work?
There is this "new fangled" PERL_STATIC_INLINE as of 5.14 which gives
C<static inline> where possible, and C<static> on legacy compilers.
I don't think that any code doing comparisons needs to worry about *WASUTF8.
It doesn't affect the (actual) encoding of the sequence of octets in the HEK.
HEKs containing only characters in the range 0-255 are always stored as bytes.
and HVhek_UTF8 is false.
HEKs containing any characters >255 are stored as UTF-8, and HVhek_UTF8 is
true.
That's enough for comparisons.
HVhek_WASUTF8 was added just before 5.8.0 was released to permit C<keys> to
return scalars identically encoded to the scalar used to create the hash key,
thanks to the horrible way that SvUTF8() is used by the core both to signal
encoding and matching semantics. People didn't like it in late 5.7.x when
C<keys> always returned "upgraded" scalars, and other people didn't like it
when C<keys> returned "downgraded" scalars. The only way to please everyone
was to have C<keys> faithfully return whatever people had used to create the
hash. It's only "used" here [complete with typos in the comments], to trigger
a call to bytes_to_utf8() on the *bytes* stored in (HEK_KEY(),HEK_LEN()):
Agree, they can't be static in a core *.c file.
The intent of "static inline" is that they go into one of the header files,
which means that they are visible and available to everyone. The plan is to
use them in place of the core's current macro addiction.
It happens that *nearly* every example use so far isn't in a header file. :-)
Nicholas Clark