[PSR-16] key specification issues

36 views
Skip to first unread message

Rasmus Schultz

unread,
May 20, 2019, 9:05:16 AM5/20/19
to PHP Framework Interoperability Group
I found some issues with the description of keys in section 1.2 of the PSR-16 spec.

First off:

"Implementing libraries MUST support keys consisting of the characters A-Z, a-z, 0-9, _, and . in any order in UTF-8 encoding and a length of up to 64 characters."

So the only characters that MUST be supported are ASCII characters - but then, in the same sentence, UTF-8 encoding is stipulated?

If the only characters supported are ASCII characters, stipulating UTF-8 encoding doesn't seem to make any sense.

Am I to understand that *if* the implementation supports more than the required ASCII characters, it must use UTF-8 encoding?

And if the implementation *does* support UTF-8, then presumably the stipulated minimum length is 64 Unicode runes? e.g. larger than the 64 bytes required to support 64 ASCII characters?

Secondly:

"Libraries are responsible for their own escaping of key strings as appropriate, but MUST be able to return the original unmodified key string"

How?

To my understanding, there's no API in the specification that returns keys.

So this clause seems unnecessary? How the implementation stores keys internally, or whether it is able to recover them, doesn't seem like it should be a concern as such?

(Possibly, this clause was relevant to PSR-6 and may have carried over unintentionally?)

Thanks,
  Rasmus

Larry Garfield

unread,
May 20, 2019, 9:44:31 AM5/20/19
to PHP-FIG
All of this language was carried over from PSR-6, yes.

For the first part, there are encodings beyond ASCII and UTF-8, even though they are not often seen in the western world, and some of them are incompatible with UTF-8/ASCII, even on lower glyphs. For instance, UTF-16 and UTF-32 are incompatible with UTF-8, because they have a fixed width character rather than UTF-8's variable-width.

So an implementation that uses UTF-16 natively to store/interpret/return the key string is Doing It Wrong(tm), per PSR-6/16. And yes, that means it may need more than 64 bytes if someone stores a UTF-8 Japanese glpyh or poop emoji as their cache key. (I would reject a PR that does the latter, but it would technically be spec-compliant.)

For the second point, "return" is a bit misleading here, and is probably just a carry-over from PSR-6. In practice it means that if I store a cache key with a poop emoji, then I should be able to reliably look it up with a poop-emoji key. If the key is manged in storage such that I cannot look it up with the same key as it was stored with, then the implementation is Doing It Wrong(tm).

--Larry Garfield
Reply all
Reply to author
Forward
0 new messages