I see the attributes work on github, but not CopLABEL's yet.
I'm also concerned about the public HEK API.
I see only functions accepting HEK's, but no functions getting those.
How should we create HEK's which need to be passed to the setters?
#if defined(PERL_IN_HV_C)
...
sa |HE* |new_he
sanR |HEK* |save_hek_flags |NN const char *str|I32 len|U32 hash|int flags
Shouldn't those be public now?
--
Reini
> I'm also concerned about the public HEK API.
> I see only functions accepting HEK's, but no functions getting those.
> How should we create HEK's which need to be passed to the setters?
>
> #if defined(PERL_IN_HV_C)
> ...
> sa |HE* |new_he
> sanR |HEK* |save_hek_flags |NN const char *str|I32 len|U32 hash|int flags
>
> Shouldn't those be public now?
There is this already:
Ap |HEK* |share_hek |NN const char* str|I32 len|U32 hash
which returns a HEK.
Nicholas Clark
> Moving on, PL_multi_(close|open) are chars, not char*, so any attempt to
> implement RT#89032 would have to turn those to something more sensible (an
> SV, perhaps? through a simple three-element struct would make do just fine).
> Should I change it?
I don't see any particular problem with changing them to something with a new
name. Google codesearch suggests that nothing is using them. (Sadly the most
excellent grep.cpan.me isn't making us remember how much we value it by not
being there).
What would be in the proposed three-element struct? It's not obvious to me.
I'd be tempted to avoid SVs where one doesn't need the full "power" of SVs.
The core has a long history of (re) using the core's data types
(such as SVs for temporary buffers, AVs for pad structures and aspects of
PerlIO, HVs for the regex Unicode features) which I'm not sure has always
been the best long-term trade off, given
a: The added flexibility of these containers means that they are actually
larger than that which would be minimally necessary
(eg 3 pointers in every AV that will never be needed for a PAD)
b: They end up trying to be one-size-fits all, which may mean they aren't
always the fastest way to solve a problem (HVs inversion lists)
c: global destruction and interpreter creation memory management mean that the
things you rather hoped would be there can disappear from under you, or not
yet be ready to use
Nicholas Clark
I've often thought we need a lightweight array type that is just a
struct { int size; void *array[1]; /* chumminess */ }
for all those situations where there aren't multiple references, and we
don't mind the struct being realloced: such as the backref array.
The API could take a pointer to a pointer to the struct, so it can update
the original pointer if it needs to reallooc.
And we could probably make more use of the ptr_table API instead of using
a full hash (although I don't know whether the API would need updating to
make it more widely usable).
--
Diplomacy is telling someone to go to hell in such a way that they'll
look forward to the trip
Note also that PL_multi_(close|open) are just back-compat aliases to
fields within the new(ish) PL_parser struct:
#define PL_multi_open (PL_parser->multi_open)
#define PL_multi_close (PL_parser->multi_close)
this may affect how you want to declare the three fields: i.e. there's no
particular reason why you need to preserve the PL_multi_open/close names
if directly referring to structure or substructure elements within
PL_parser makes more sense.
--
Dave's first rule of Opera:
If something needs saying, say it: don't warble it.
> I've often thought we need a lightweight array type that is just a
>
> struct { int size; void *array[1]; /* chumminess */ }
>
> for all those situations where there aren't multiple references, and we
> don't mind the struct being realloced: such as the backref array.
> The API could take a pointer to a pointer to the struct, so it can update
> the original pointer if it needs to reallooc.
I'd been wondering about
struct { STRLEN size; STRLEN allocated; } ... /* raw memory follows */
but I was thinking about various times SVs are used as byte buffers.
int troubles me because it's signed, and because it likely is only 32 bits
on a 64 bit system.
> And we could probably make more use of the ptr_table API instead of using
> a full hash (although I don't know whether the API would need updating to
> make it more widely usable).
From memory, it doesn't have a delete. It assumes that it grows, and is then
discarded.
Nicholas Clark
I don't see any particular problem with changing them to something with a new
name. Google codesearch suggests that nothing is using them. (Sadly the most
excellent grep.cpan.me isn't making us remember how much we value it by not
being there).
What would be in the proposed three-element struct? It's not obvious to me.
I'd be tempted to avoid SVs where one doesn't need the full "power" of SVs.
I'd feel more comfortable removing the names, and ensuring that compile time
breakage happened for anything hypothetical, to highlight that it needs
fixing, rather than trying to bodge something that increasingly doesn't work
as well.
> The flags are superfluous, really. I think I was worried about \xAB and \xBB
> being (up|down)gradeable last night, but I can't come up with an example
> that isn't incredibly silly - or possible, for that matter. So make that a
> two-element struct.
> The length member is nice to have, because it allows us to optimize for the
> most common case (when len == 1, we can keep the current char-oriented
> behavior) and stop strlens popping out here and there for the
> not-so-common-or-yet-possible case.
I don't think strlen() wants to appear anywhere near this code, as
a: the NUL byte is a valid delimiter.
b: given that we're talking about the (well formed) UTF-8 representation of
one Unicode code point, the length is implicit in the first octet.
Which means that the "fast path" is when IS_UTF8_CHAR_1(delim) is true.
#define IS_UTF8_CHAR_1(p) \
((p)[0] <= 0x7F)
> For the char[ UTF8_MAXBYTES ], I _think_ that char[3] might actually make do
> just fine there [*], assuming that the len member stays and all the possible
> paired delimiters are all on plane 0 (The latter does hold true for anything
> that matches
I'm not an expert on this stuff, but I think that open and close are used at
all times, not just for paired delimiters:
/* if open delimiter is the close delimiter read unbridle */
if (PL_multi_open == PL_multi_close) {
So I think it needs to stay as your original suggestion of UTF8_MAXBYTES.
Nicholas Clark
I see the attributes work on github, but not CopLABEL's yet.
I think it fell through the cracks.