Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: quoting with bidi bracketing characters

15 views
Skip to first unread message

G.W. Haywood via perl5-porters

unread,
Feb 3, 2022, 3:45:04 AM2/3/22
to Ricardo Signes, Perl 5 Porters
Hi there,

On Wed, 2 Feb 2022, Ricardo Signes wrote:

> A mere seven and a half years ago, I said it would be nice to get
> more bracketing pairs available in non-ASCII source. Here's the
> message: http://markmail.org/message/sfq7ahm3axbivgze
>
> I still think we should do this.

Just because we *can* do something, it doesn't necessarily mean we *should*.

There are plenty of things to fix in Perl without gratuitously inventing more.

I think there's enough trouble with UTF-8 already without adding to our woes
with a whole new class of them. I see absolutely no need for YASOPQC (Yet
Another Set Of Paired Quote Characters). There are undoubtedly displays
which will render them in strange ways, making reading the code fraught,
and there are undoubtedly editors, pagers, and the like which will handle
them unhelpfully.

-1 to this proposal.

--

73,
Ged.

Darren Duncan

unread,
Feb 3, 2022, 4:45:03 AM2/3/22
to Perl 5 Porters
I personally fail to see what the practical benefit is of natively supporting
more string delimiters. It seems more cute than practical. I feel that adding
support for this would seem to make things worse rather than better. We already
have enough string delimiters.

This also feels similar to the idea of natively supporting numeric literals in
arbitrary bases and not just the common 4 of {2,8,10,16}. Being able to write
literals in base-3 or base-12 or whatever is more cute than something to
complicate the language with.

Look at Raku, as far as I know they have the range of Unicode string delimiters,
and the range of numeric bases, but who actually uses them outside academic
publications of "look what we can do"?

-- Darren Duncan

Karl Williamson

unread,
Feb 3, 2022, 4:45:03 PM2/3/22
to Darren Duncan, Perl 5 Porters
I don't buy these arguments. A reason to add some delimitters is to
find ones that don't occur inside the pattern, which have to be escaped,
which leads to a bit of extra cognitive load each time the pattern is
looked at by a human. Many more complex patterns will use all the
existing paired delimitters.

Another reason is that some just look better than any existing. I think
the « » pair is better than any we have. If I had much text that was
written in a language that used these as quotation marks, I might have a
different opinion.

And that's part of the point. Different strokes for different folks,
literally. We should be making it easier for people to code in a
non-English language.

I am unaware of much issue that 'use utf8' causes. Again, if your
preferred language isn't English, you very may well find it much more
preferable to program using native language identifiers and strings,
using English only for the reserved words. AFAIK, that is not currently
a problem for Perl.

The amount of work to incorporate this is not very much; so using that
cost agaainst the feature is not very valid

Darren Duncan

unread,
Feb 3, 2022, 9:30:04 PM2/3/22
to Perl 5 Porters
On 2022-02-03 4:53 p.m., Ricardo Signes wrote:
> On Thu, Feb 3, 2022, at 4:33 PM, Karl Williamson wrote:
>> I don't buy these arguments.  A reason to add some delimitters is to
>> find ones that don't occur inside the pattern, which have to be escaped,
>> which leads to a bit of extra cognitive load each time the pattern is
>> looked at by a human.  Many more complex patterns will use all the
>> existing paired delimitters.
>
> Karl has exactly nailed my reasons, which I admittedly failed to provide.
>
> "I wouldn't use it" isn't particularly interesting.  Neither is "it might not
> look right on some terminals".
>
> One question was "would this cover curly quotes", which I take to mean “…”.  The
> answer to that is no, as the proposal is to punt the selection of bracketing
> pairs to Unicode, who do not include those characters in the bidi bracket
> listings.  Perhaps not ideal, but it eliminates our curation of Unicode
> character selection, which is just the kind of pride that often cometh before a
> fall.

There is already curation in Perl however, in that some existing quoting
characters are interpolating and others are not. So how is it decided when
taking all the Unicode quoting characters which ones interpolate and which don't?

Or perhaps the plan, which I would argue for, is all of them are treated
consistently as being NOT interpolating, so they're straight literal with no
surprises.

Only a small curated list, or plan ASCII "", whatever interpolates now, is
interpolating, and all additions do not.

-- Darren Duncan

Ovid via perl5-porters

unread,
Feb 4, 2022, 11:30:09 AM2/4/22
to Perl 5 Porters, Ricardo Signes
On Friday, 4 February 2022, 03:44:17 CET, Ricardo Signes <perl...@rjbs.manxome.org> wrote:


> We should stick to that rule.  qq«...» interpolates, qx«...» interpolates, q«...» does not.
I also love the idea 

I also love the idea of being able to use «» as delimiters for the reasons mentioned. 

How much of a burden will this place on people who can't type them? I can type «», but this:

    my $string = q⌜This string contains ⌜ and ⌟ which is fine.⌟;

I would have to cut-n-paste.

I don't think it will be too much of a burden, but I can't say that I appreciate all of the implications.

Best,
Ovid
-- 
IT consulting, training, specializing in Perl, databases, and agile development
http://www.allaroundtheworld.fr/

Buy my book! - http://bit.ly/beginning_perl

G.W. Haywood via perl5-porters

unread,
Feb 4, 2022, 1:15:03 PM2/4/22
to perl5-porters
Hi there,

On Fri, 4 Feb 2022, Ovid via perl5-porters wrote:

> ... I can't say that I appreciate all of the implications.

Here's the sort of thing that worries me:

https://capec.mitre.org/data/definitions/80.html

--

73,
Ged.

Andrew Hewus Fresh

unread,
Feb 4, 2022, 2:00:04 PM2/4/22
to Ricardo Signes, G.W. Haywood, Curtis Poe, Perl 5 Porters
On Fri, Feb 04, 2022 at 01:13:22PM -0500, Ricardo Signes wrote:
> On Fri, Feb 4, 2022, at 1:05 PM, G.W. Haywood wrote:
> > On Fri, 4 Feb 2022, Ovid via perl5-porters wrote:
> >
> > > ... I can't say that I appreciate all of the implications.
> >
> > Here's the sort of thing that worries me:
> >
> > https://capec.mitre.org/data/definitions/80.html
>
> I don't know what you're getting at, here.
>
> This proposal is not "Perl should allow programmers to write their source document in UTF-8 which is decoded by perl." We have had that for twenty years.
>
> This proposal is extremely tightly scoped, altering which pairs of characters can be used as balanced pairs in quote-like operators.
>
> Can you be more specific?


Perhaps it was unclear that this would be going from:

perl -Mutf8 -E 'say q«hai!«' # <- works now
perl -Mutf8 -E 'say q«hai!»' # <- works after

(plus nesting)

Karl Williamson

unread,
Feb 4, 2022, 2:45:04 PM2/4/22
to Andrew Hewus Fresh, Ricardo Signes, G.W. Haywood, Curtis Poe, Perl 5 Porters
This is why it isn't planned to be on by default. You have to request it.

Oodler 577 via perl5-porters

unread,
Feb 4, 2022, 3:30:04 PM2/4/22
to Ricardo Signes, Perl 5 Porters
* Ricardo Signes <perl...@rjbs.manxome.org> [2022-02-03 21:43:35 -0500]:

> On Thu, Feb 3, 2022, at 9:26 PM, Darren Duncan wrote:
> > Or perhaps the plan, which I would argue for, is all of them are treated
> > consistently as being NOT interpolating, so they're straight literal with no
> > surprises.
> >
> > Only a small curated list, or plan ASCII "", whatever interpolates now, is
> > interpolating, and all additions do not.
>
> To be clear, this proposal is that the brackets can be used in quote-like operators such as qq and q. Those operators have clearly defined meaning, although in a handful of cases it is "interpolative unless the delimiter is apostrophe".

This is the only reason I can think of that makes sense, but the argument here seems
to be on of pairity with ASCII counter parts. That said, if it makes it easier to construct
a multi-line infix operator that looks like Homer Simpson, I can definitely get behind it.

Cheers,
Brett

>
> We should stick to that rule. qq«...» interpolates, qx«...» interpolates, q«...» does not.

Which reminds me of qw and how sometimes I want it to interpolate. Not to split the thread...

Cheers,
Brett

>
> --
> rjbs

--
--
ood...@cpan.org
oodl...@sdf-eu.org
SDF-EU Public Access UNIX System - http://sdfeu.org
irc.perl.org #openmp #pdl #native

shmem

unread,
Feb 4, 2022, 4:15:03 PM2/4/22
to Tomasz Konojacki, Ovid via perl5-porters
From the keyboard of Tomasz Konojacki [04.02.22,19:35]:

> On Fri, 4 Feb 2022 16:17:07 +0000 (UTC)
> Ovid via perl5-porters <perl5-...@perl.org> wrote:
>> I also love the idea 
>>
>> I also love the idea of being able to use <<>> as delimiters for the reasons mentioned. 
>>
>> How much of a burden will this place on people who can't type them? I can type <<>>, but this:
>>
>>     my $string = q?This string contains ? and ? which is fine.?;
>>
>> I would have to cut-n-paste.
>>
>> I don't think it will be too much of a burden, but I can't say that I appreciate all of the implications.
>
> Not many people know that, but \x00 (literal zero byte) is already a
> valid delimiter for q/qq/qw/qx. That's even harder to type :P

That one is easy - in vim at least: ^v^@ ->

0--gg-

--
_($_=" "x(1<<5)."?\n".q·/)Oo. G°\ /
/\_¯/(q /
---------------------------- \__(m.====·.(_("always off the crowd"))."·
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}

G.W. Haywood via perl5-porters

unread,
Feb 4, 2022, 7:00:03 PM2/4/22
to Andrew Hewus Fresh, Perl 5 Porters
Hi there,

On Fri, 4 Feb 2022, Andrew Hewus Fresh wrote:
>
> Perhaps it was unclear that this would be going from:
>
> perl -Mutf8 -E 'say q«hai!«' # <- works now
> perl -Mutf8 -E 'say q«hai!»' # <- works after

Did you forget

perl -Mutf8 -E 'say q«hai!«' # <- does *not* work after

and did I get my point across yet?

--

73,
Ged.

G.W. Haywood via perl5-porters

unread,
Feb 4, 2022, 11:30:05 PM2/4/22
to Ovid, Perl 5 Porters, Ricardo Signes
Hi there,

On Fri, 4 Feb 2022, Ovid via perl5-porters wrote:

> ... I can't say that I appreciate all of the implications.

Here's the sort of thing that worries me:

https://capec.mitre.org/data/definitions/80.html

--

73,
Ged.

Karl Williamson

unread,
Feb 4, 2022, 11:30:06 PM2/4/22
to Ricardo Signes, Perl 5 Porters
On 2/4/22 18:04, Ricardo Signes wrote:
> On Fri, Feb 4, 2022, at 6:54 PM, G.W. Haywood via perl5-porters wrote:
>> Did you forget
>>
>> perl -Mutf8 -E 'say q«hai!«' # <- does *not* work after
>>
>> and did I get my point across yet?
>
> No.
>
> Provide an explanation with a stated concern and reasoning, not a
> oneliner you don't like or a link to an article about UTF-8 decoding.
> "Can't you see the obvious thing that I hate?" is not helping.
>
> --
> rjbs


I believe I know the point he is trying to make, and if so, he did not
understand my reply addressing it.

Currently the characters we are proposing to make paired can be used as
unpaired delimitters. His example shows that use.

With the new scheme in effect, such uses would no longer parse the same.

But my point was that the new scheme has to be opt-in, for precisely
this reason. You would have to say something like

use feature 'extended-brackets

whose effect is lexically scoped. Thus the proposal is not a problem.

I myself am in favor of eventually getting rid of having this be opt-in.
That would require a deprecation cycle, in which usage of anything we
think might want eventually to be paired, based on probably the Unicode
Bidi-Mirroring Glyph property, would raise a warning, while still
otherwise functioning as today. My guess is that there are extremely
few such uses in the wild. Making a warning in 5.36 would flush them
out, even if we didn't get around to adding the new pragma.

Karl Williamson

unread,
Feb 4, 2022, 11:45:05 PM2/4/22
to G.W. Haywood, Ovid, Perl 5 Porters, Ricardo Signes
I don't understand your point here. Yes overlongs are a problem. Perl
does not accept overlongs; it used to, as did Unicode, before the
problems of doing so were apparent.

We use Marcus Kuhn's stress tests (fully credited to him) as part of our
test suite to verify that we don't now accept illegal input, and to
ensure that any future change that could accept such input will instead
be caught in our testing. I have also added extensive other tests to
verify that we handle various sequences properly.

As far as parsing, all input under 'use utf8' is checked for Perl
extended-UTF-8 validity, using the same functions as are used for
checking stream input.

Martijn Lievaart

unread,
Feb 5, 2022, 4:00:04 AM2/5/22
to perl5-...@perl.org
Or to put it another way, if this problem is real (which I am not saying
it is or it isn't, although Karl makes a good point), if this problem is
real, it is already there and this proposal does not change anything
regarding that. Or am I missing something?


M4


G.W. Haywood via perl5-porters

unread,
Feb 5, 2022, 6:45:04 AM2/5/22
to Perl 5 Porters
Hi there,

On Fri, 4 Feb 2022, Karl Williamson wrote:
> On 2/4/22 18:04, Ricardo Signes wrote:
>>
>> Provide an explanation with a stated concern and reasoning...

See below.

> I believe I know the point he is trying to make, and if so, he did not
> understand my reply addressing it.
>
> Currently the characters we are proposing to make paired can be used as
> unpaired delimitters. His example shows that use.
>
> With the new scheme in effect, such uses would no longer parse the same.

> ... You would have to say something like
>
> use feature 'extended-brackets
>
> whose effect is lexically scoped. Thus the proposal is not a problem.

Well we don't *think* it's a problem.

In all human endeavour which pushes some envelope or other, there will
inevitably be occasions when someone says "I never thought of that!".
Overlong characters was an example; this thread even gives us another
which is quite interesting. I've said that we have enough characters
available to us for quoting in Perl and my until now private suspicion
was that we probably have too many. It hadn't occurred to me to use a
null character as a delimiter. Many thanks to Mr. Konojacki for his
most appropriate illustration, and for confirming my suspicion.

Because by profession I'm an engineer, probably my favourite example
of the results of mankind's singular ability to construct a functional
retroscope is here:

https://en.wikipedia.org/wiki/Tacoma_Narrows_Bridge_(1940)#Film_of_collapse

As you see that was in the 1940s but now, AFAICT, nowhere is this sort
of thing exhibited better than in the field of software development.

Perl is widely used in systems which can both directly and indirectly
affect the public. The end effects can be for better or worse. Most
of the time they're for the better, and nobody gets to hear about it.
But if they're for the worst, they'll hit the headlines and everybody
will know what a bunch of jerks we all are.

Wantonly changing code because it will look prettier or even gives you
that warm fuzzy feeling you get when you see your name published on an
RFC is out of order for a tool which is used by millions and, if there
is an accident can potentially have direct effects on the privacy, the
finances, quite plausibly the health and safety of billions of people.
In my view there isn't enough effort available to expend it on putting
frills like this into the language plus making sure they didn't break
something, and so long as we're collectively incapable of proving that
anything which contains software is secure there likely never will be.

Of course I'm not saying that simply out of fear of getting it wrong
there should be no new development, of course I'm not. But I do not
want Perl to make the news in the way that a Java library just did.

Perl has been derided for its use of "funny characters" for decades;
if we a make this "worse" - and on top of that get it wrong - we will
have given ammunition to Perl's detractors and to Perl's users YARTJS
(Yet Another Reason To Jump Ship, which I very nearly did around the
turn of the century when 5.6 hit the fan while I was working on Perl
contracts in London and Los Angeles). We do not *need* more quotes.
There are probably too many already, and, thanks to Mr. Konojacki, one
might well now be asking if we should be doing something about that.

There's no hate here and I'm sure Mr. Signes didn't mean it literally,
but this has probably gone far enough so I guess this will be my last
word on the subject. It probably won't stop me making the same sort
of comments on other suggestions which I see as frivolous for a tool
like Perl which is, even if nobody noticed it, so thoroughly embedded
into the infrastructure of our society.

--

73,
Ged.

Karl Williamson

unread,
Feb 9, 2022, 2:15:04 PM2/9/22
to Ricardo Signes, Perl 5 Porters
On 2/2/22 20:42, Ricardo Signes wrote:
> Porters,
>
> A mere seven and a half years ago, I said it would be nice to get more
> bracketing pairs available in non-ASCII source.  Here's the message:
> http://markmail.org/message/sfq7ahm3axbivgze
> <http://markmail.org/message/sfq7ahm3axbivgze>
>
> I still think we should do this.
>
> Unicode provides a thing called bracket pairs
> <https://www.unicode.org/notes/tn39/>.  These are pairs of characters
> that have an opening and a closing character and can enclose a run of
> other characters.  Each pair has an opening and closing character.  I
> think that when processing utf-8 source (under "use utf8") we could
> treat these pairs as paired delimiters for quote-like operators.  For
> example
>
> my @words = qw〔 bingo bango bongo 〕;
>
>
> To match the behavior of ASCII brackets, these would nest.
>
> my $string = q⌜This string contains ⌜ and ⌟ which is fine.⌟;
>

I have done some research and have a revised proposal.

Unicode has a property Bidi_Mirrored that matches characters of some
importance to the Bidirectional algorithm that need to be represented
differently in a Right-to-Left rendering. There is another property,
Bidi_Mirroring_Glyph that given an input character, returns its mirrored
mate. Not all characters which are to be mirrored have such a mate in
Unicode; if not, an application needs to go to whatever trouble it is
willing to to represent the mirror. This property is used by such an
application to avoid that work when a mate does exist.

Brackets are often important to the algorithm, so the
Bidi_Paired_Bracket and its type properties have been created so that an
application conveniently knows which brackets are considered opening vs
closing, and what their mate is.

Punctuation is problematic for the Bidi algorithm, as it turns out
things vary from language to language. There is the Open_Punctuation
property for punctuation that doesn't vary, and its mate, the
Close_Punctuation property. Most punctuation that has a directional
component are in those two. But there is also Initial_Punctuation and
Final_Punctuation for those (fewer) characters where what language it is
needs to be known, to be able to handle.

Symbols don't tend to be important to the algorithm, so Unicode hasn't
bothered to mark them as opening/closing, as it just doesn't matter to
them. The symbols with a directional component tend to have that
direction specified in their names. like RIGHTWARDS ARROW

So what has all this to do with using characters as quote-like
delimiters? Not much at all. We are concerned with the shape, not the
meaning. What shape looks like it should be the opening quote, versus
the closing one? The properties we have proposed using are not designed
for this, but for handling the Bidi algorithm. They are convenient for
finding candidates, but shouldn't be the final drivers.

The problem I see is that using some of the punctuation characters as
delimiters is culturally sensitive. A French speaker will naturally
think qr« » is appropriate, but Russian uses those differently. It
might be they would prefer qr» ». Any Russian speakers, please chime
in. These punctuation characters match the Initial/Final Punctuation
properties. There aren't many such, but some are ones that Western
European speakers consider important
'«' LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'‘' LEFT SINGLE QUOTATION MARK
'“' LEFT DOUBLE QUOTATION MARK
and their mates.

I'm proposing we use those as Western Europeans would expect.

I wrote a bit of code to take into account all these findings, and came
up with this list.

LEFT PARENTHESIS => RIGHT PARENTHESIS ( )
LESS-THAN SIGN => GREATER-THAN SIGN < >
LEFT SQUARE BRACKET => RIGHT SQUARE BRACKET [ ]
LEFT CURLY BRACKET => RIGHT CURLY BRACKET { }
LEFT-POINTING DOUBLE ANGLE QUOTATION MARK => RIGHT-POINTING DOUBLE ANGLE
QUOTATION MARK « »
TIBETAN MARK GUG RTAGS GYON => TIBETAN MARK GUG RTAGS GYAS ༺ ༻
TIBETAN MARK ANG KHANG GYON => TIBETAN MARK ANG KHANG GYAS ༼ ༽
SINGLE LEFT-POINTING ANGLE QUOTATION MARK => SINGLE RIGHT-POINTING ANGLE
QUOTATION MARK ‹ ›
LEFT SQUARE BRACKET WITH QUILL => RIGHT SQUARE BRACKET WITH QUILL ⁅ ⁆
SUPERSCRIPT LEFT PARENTHESIS => SUPERSCRIPT RIGHT PARENTHESIS ⁽ ⁾
SUBSCRIPT LEFT PARENTHESIS => SUBSCRIPT RIGHT PARENTHESIS ₍ ₎
RIGHTWARDS ARROW => LEFTWARDS ARROW → ←
RIGHTWARDS ARROW WITH STROKE => LEFTWARDS ARROW WITH STROKE ↛ ↚
RIGHTWARDS WAVE ARROW => LEFTWARDS WAVE ARROW ↝ ↜
RIGHTWARDS TWO HEADED ARROW => LEFTWARDS TWO HEADED ARROW ↠ ↞
RIGHTWARDS ARROW WITH TAIL => LEFTWARDS ARROW WITH TAIL ↣ ↢
RIGHTWARDS ARROW FROM BAR => LEFTWARDS ARROW FROM BAR ↦ ↤
RIGHTWARDS ARROW WITH HOOK => LEFTWARDS ARROW WITH HOOK ↪ ↩
RIGHTWARDS ARROW WITH LOOP => LEFTWARDS ARROW WITH LOOP ↬ ↫
UPWARDS ARROW WITH TIP RIGHTWARDS => UPWARDS ARROW WITH TIP LEFTWARDS
↱ ↰
DOWNWARDS ARROW WITH TIP RIGHTWARDS => DOWNWARDS ARROW WITH TIP
LEFTWARDS ↳ ↲
UPWARDS HARPOON WITH BARB RIGHTWARDS => UPWARDS HARPOON WITH BARB
LEFTWARDS ↾ ↿
DOWNWARDS HARPOON WITH BARB RIGHTWARDS => DOWNWARDS HARPOON WITH BARB
LEFTWARDS ⇂ ⇃
RIGHTWARDS PAIRED ARROWS => LEFTWARDS PAIRED ARROWS ⇉ ⇇
RIGHTWARDS DOUBLE ARROW WITH STROKE => LEFTWARDS DOUBLE ARROW WITH
STROKE ⇏ ⇍
RIGHTWARDS DOUBLE ARROW => LEFTWARDS DOUBLE ARROW ⇒ ⇐
RIGHTWARDS TRIPLE ARROW => LEFTWARDS TRIPLE ARROW ⇛ ⇚
RIGHTWARDS SQUIGGLE ARROW => LEFTWARDS SQUIGGLE ARROW ⇝ ⇜
RIGHTWARDS DASHED ARROW => LEFTWARDS DASHED ARROW ⇢ ⇠
RIGHTWARDS ARROW TO BAR => LEFTWARDS ARROW TO BAR ⇥ ⇤
RIGHTWARDS WHITE ARROW => LEFTWARDS WHITE ARROW ⇨ ⇦
THREE RIGHTWARDS ARROWS => THREE LEFTWARDS ARROWS ⇶ ⬱
RIGHTWARDS OPEN-HEADED ARROW => LEFTWARDS OPEN-HEADED ARROW ⇾ ⇽
LEFT CEILING => RIGHT CEILING ⌈ ⌉
LEFT FLOOR => RIGHT FLOOR ⌊ ⌋
LEFT-POINTING ANGLE BRACKET => RIGHT-POINTING ANGLE BRACKET 〈 〉
APL FUNCTIONAL SYMBOL RIGHTWARDS VANE => APL FUNCTIONAL SYMBOL LEFTWARDS
VANE ⍆ ⍅
APL FUNCTIONAL SYMBOL QUAD RIGHTWARDS ARROW => APL FUNCTIONAL SYMBOL
QUAD LEFTWARDS ARROW ⍈ ⍇
MEDIUM LEFT PARENTHESIS ORNAMENT => MEDIUM RIGHT PARENTHESIS ORNAMENT
❨ ❩
MEDIUM FLATTENED LEFT PARENTHESIS ORNAMENT => MEDIUM FLATTENED RIGHT
PARENTHESIS ORNAMENT ❪ ❫
MEDIUM LEFT-POINTING ANGLE BRACKET ORNAMENT => MEDIUM RIGHT-POINTING
ANGLE BRACKET ORNAMENT ❬ ❭
HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT => HEAVY
RIGHT-POINTING ANGLE QUOTATION MARK ORNAMENT ❮ ❯
HEAVY LEFT-POINTING ANGLE BRACKET ORNAMENT => HEAVY RIGHT-POINTING ANGLE
BRACKET ORNAMENT ❰ ❱
LIGHT LEFT TORTOISE SHELL BRACKET ORNAMENT => LIGHT RIGHT TORTOISE SHELL
BRACKET ORNAMENT ❲ ❳
MEDIUM LEFT CURLY BRACKET ORNAMENT => MEDIUM RIGHT CURLY BRACKET
ORNAMENT ❴ ❵
LEFT S-SHAPED BAG DELIMITER => RIGHT S-SHAPED BAG DELIMITER ⟅ ⟆
MATHEMATICAL LEFT WHITE SQUARE BRACKET => MATHEMATICAL RIGHT WHITE
SQUARE BRACKET ⟦ ⟧
MATHEMATICAL LEFT ANGLE BRACKET => MATHEMATICAL RIGHT ANGLE BRACKET ⟨ ⟩
MATHEMATICAL LEFT DOUBLE ANGLE BRACKET => MATHEMATICAL RIGHT DOUBLE
ANGLE BRACKET ⟪ ⟫
MATHEMATICAL LEFT WHITE TORTOISE SHELL BRACKET => MATHEMATICAL RIGHT
WHITE TORTOISE SHELL BRACKET ⟬ ⟭
MATHEMATICAL LEFT FLATTENED PARENTHESIS => MATHEMATICAL RIGHT FLATTENED
PARENTHESIS ⟮ ⟯
LONG RIGHTWARDS ARROW => LONG LEFTWARDS ARROW ⟶ ⟵
LONG RIGHTWARDS DOUBLE ARROW => LONG LEFTWARDS DOUBLE ARROW ⟹ ⟸
LONG RIGHTWARDS ARROW FROM BAR => LONG LEFTWARDS ARROW FROM BAR ⟼ ⟻
LONG RIGHTWARDS DOUBLE ARROW FROM BAR => LONG LEFTWARDS DOUBLE ARROW
FROM BAR ⟾ ⟽
LONG RIGHTWARDS SQUIGGLE ARROW => LONG LEFTWARDS SQUIGGLE ARROW ⟿ ⬳
RIGHTWARDS TWO-HEADED ARROW FROM BAR => LEFTWARDS TWO-HEADED ARROW FROM
BAR ⤅ ⬶
RIGHTWARDS DOUBLE ARROW FROM BAR => LEFTWARDS DOUBLE ARROW FROM BAR ⤇ ⤆
RIGHTWARDS DOUBLE DASH ARROW => LEFTWARDS DOUBLE DASH ARROW ⤍ ⤌
RIGHTWARDS TRIPLE DASH ARROW => LEFTWARDS TRIPLE DASH ARROW ⤏ ⤎
RIGHTWARDS TWO-HEADED TRIPLE DASH ARROW => LEFTWARDS TWO-HEADED TRIPLE
DASH ARROW ⤐ ⬷
RIGHTWARDS ARROW WITH DOTTED STEM => LEFTWARDS ARROW WITH DOTTED STEM
⤑ ⬸
RIGHTWARDS TWO-HEADED ARROW WITH TAIL => LEFTWARDS TWO-HEADED ARROW WITH
TAIL ⤖ ⬻
RIGHTWARDS ARROW-TAIL => LEFTWARDS ARROW-TAIL ⤚ ⤙
RIGHTWARDS DOUBLE ARROW-TAIL => LEFTWARDS DOUBLE ARROW-TAIL ⤜ ⤛
RIGHTWARDS ARROW TO BLACK DIAMOND => LEFTWARDS ARROW TO BLACK DIAMOND
⤞ ⤝
RIGHTWARDS ARROW FROM BAR TO BLACK DIAMOND => LEFTWARDS ARROW FROM BAR
TO BLACK DIAMOND ⤠ ⤟
ARROW POINTING DOWNWARDS THEN CURVING RIGHTWARDS => ARROW POINTING
DOWNWARDS THEN CURVING LEFTWARDS ⤷ ⤶
RIGHTWARDS ARROW WITH PLUS BELOW => LEFTWARDS ARROW WITH PLUS BELOW ⥅ ⥆
RIGHTWARDS ARROW THROUGH X => LEFTWARDS ARROW THROUGH X ⥇ ⬾
RIGHTWARDS HARPOON WITH BARB UP TO BAR => LEFTWARDS HARPOON WITH BARB UP
TO BAR ⥓ ⥒
RIGHTWARDS HARPOON WITH BARB DOWN TO BAR => LEFTWARDS HARPOON WITH BARB
DOWN TO BAR ⥗ ⥖
RIGHTWARDS HARPOON WITH BARB UP FROM BAR => LEFTWARDS HARPOON WITH BARB
UP FROM BAR ⥛ ⥚
RIGHTWARDS HARPOON WITH BARB DOWN FROM BAR => LEFTWARDS HARPOON WITH
BARB DOWN FROM BAR ⥟ ⥞
RIGHTWARDS HARPOON WITH BARB UP ABOVE LONG DASH => LEFTWARDS HARPOON
WITH BARB UP ABOVE LONG DASH ⥬ ⥪
RIGHTWARDS HARPOON WITH BARB DOWN BELOW LONG DASH => LEFTWARDS HARPOON
WITH BARB DOWN BELOW LONG DASH ⥭ ⥫
EQUALS SIGN ABOVE RIGHTWARDS ARROW => EQUALS SIGN ABOVE LEFTWARDS ARROW
⥱ ⭀
TILDE OPERATOR ABOVE RIGHTWARDS ARROW => TILDE OPERATOR ABOVE LEFTWARDS
ARROW ⥲ ⭉
RIGHTWARDS ARROW ABOVE TILDE OPERATOR => LEFTWARDS ARROW ABOVE TILDE
OPERATOR ⥴ ⥳
RIGHTWARDS ARROW ABOVE ALMOST EQUAL TO => LEFTWARDS ARROW ABOVE ALMOST
EQUAL TO ⥵ ⭊
LEFT WHITE CURLY BRACKET => RIGHT WHITE CURLY BRACKET ⦃ ⦄
LEFT WHITE PARENTHESIS => RIGHT WHITE PARENTHESIS ⦅ ⦆
Z NOTATION LEFT IMAGE BRACKET => Z NOTATION RIGHT IMAGE BRACKET ⦇ ⦈
Z NOTATION LEFT BINDING BRACKET => Z NOTATION RIGHT BINDING BRACKET ⦉ ⦊
LEFT SQUARE BRACKET WITH UNDERBAR => RIGHT SQUARE BRACKET WITH UNDERBAR
⦋ ⦌
LEFT SQUARE BRACKET WITH TICK IN TOP CORNER => RIGHT SQUARE BRACKET WITH
TICK IN TOP CORNER ⦍ ⦐
LEFT SQUARE BRACKET WITH TICK IN BOTTOM CORNER => RIGHT SQUARE BRACKET
WITH TICK IN BOTTOM CORNER ⦏ ⦎
LEFT ANGLE BRACKET WITH DOT => RIGHT ANGLE BRACKET WITH DOT ⦑ ⦒
LEFT ARC LESS-THAN BRACKET => RIGHT ARC GREATER-THAN BRACKET ⦓ ⦔
DOUBLE LEFT ARC GREATER-THAN BRACKET => DOUBLE RIGHT ARC LESS-THAN
BRACKET ⦕ ⦖
LEFT BLACK TORTOISE SHELL BRACKET => RIGHT BLACK TORTOISE SHELL BRACKET
⦗ ⦘
LEFT WIGGLY FENCE => RIGHT WIGGLY FENCE ⧘ ⧙
LEFT DOUBLE WIGGLY FENCE => RIGHT DOUBLE WIGGLY FENCE ⧚ ⧛
LEFT-POINTING CURVED ANGLE BRACKET => RIGHT-POINTING CURVED ANGLE
BRACKET ⧼ ⧽
RIGHTWARDS QUADRUPLE ARROW => LEFTWARDS QUADRUPLE ARROW ⭆ ⭅
REVERSE TILDE OPERATOR ABOVE RIGHTWARDS ARROW => REVERSE TILDE OPERATOR
ABOVE LEFTWARDS ARROW ⭇ ⭁
RIGHTWARDS ARROW ABOVE REVERSE ALMOST EQUAL TO => LEFTWARDS ARROW ABOVE
REVERSE ALMOST EQUAL TO ⭈ ⭂
RIGHTWARDS ARROW ABOVE REVERSE TILDE OPERATOR => LEFTWARDS ARROW ABOVE
REVERSE TILDE OPERATOR ⭌ ⭋
RIGHTWARDS TRIANGLE-HEADED ARROW => LEFTWARDS TRIANGLE-HEADED ARROW ⭢ ⭠
RIGHTWARDS TRIANGLE-HEADED DASHED ARROW => LEFTWARDS TRIANGLE-HEADED
DASHED ARROW ⭬ ⭪
RIGHTWARDS TRIANGLE-HEADED ARROW TO BAR => LEFTWARDS TRIANGLE-HEADED
ARROW TO BAR ⭲ ⭰
RIGHTWARDS TRIANGLE-HEADED PAIRED ARROWS => LEFTWARDS TRIANGLE-HEADED
PAIRED ARROWS ⮆ ⮄
RIGHTWARDS BLACK CIRCLED WHITE ARROW => LEFTWARDS BLACK CIRCLED WHITE
ARROW ⮊ ⮈
RIGHTWARDS BLACK ARROW => LEFTWARDS BLACK ARROW ⮕ ⬅
THREE-D TOP-LIGHTED RIGHTWARDS EQUILATERAL ARROWHEAD => THREE-D
TOP-LIGHTED LEFTWARDS EQUILATERAL ARROWHEAD ⮚ ⮘
BLACK RIGHTWARDS EQUILATERAL ARROWHEAD => BLACK LEFTWARDS EQUILATERAL
ARROWHEAD ⮞ ⮜
DOWNWARDS TRIANGLE-HEADED ARROW WITH LONG TIP RIGHTWARDS => DOWNWARDS
TRIANGLE-HEADED ARROW WITH LONG TIP LEFTWARDS ⮡ ⮠
UPWARDS TRIANGLE-HEADED ARROW WITH LONG TIP RIGHTWARDS => UPWARDS
TRIANGLE-HEADED ARROW WITH LONG TIP LEFTWARDS ⮣ ⮢
BLACK CURVED DOWNWARDS AND RIGHTWARDS ARROW => BLACK CURVED DOWNWARDS
AND LEFTWARDS ARROW ⮩ ⮨
BLACK CURVED UPWARDS AND RIGHTWARDS ARROW => BLACK CURVED UPWARDS AND
LEFTWARDS ARROW ⮫ ⮪
RIGHTWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS => LEFTWARDS
TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS ⯮ ⯬
LEFT SUBSTITUTION BRACKET => RIGHT SUBSTITUTION BRACKET ⸂ ⸃
LEFT DOTTED SUBSTITUTION BRACKET => RIGHT DOTTED SUBSTITUTION BRACKET
⸄ ⸅
LEFT TRANSPOSITION BRACKET => RIGHT TRANSPOSITION BRACKET ⸉ ⸊
LEFT RAISED OMISSION BRACKET => RIGHT RAISED OMISSION BRACKET ⸌ ⸍
LEFT LOW PARAPHRASE BRACKET => RIGHT LOW PARAPHRASE BRACKET ⸜ ⸝
TOP LEFT HALF BRACKET => TOP RIGHT HALF BRACKET ⸢ ⸣
BOTTOM LEFT HALF BRACKET => BOTTOM RIGHT HALF BRACKET ⸤ ⸥
LEFT SIDEWAYS U BRACKET => RIGHT SIDEWAYS U BRACKET ⸦ ⸧
LEFT DOUBLE PARENTHESIS => RIGHT DOUBLE PARENTHESIS ⸨ ⸩
LEFT SQUARE BRACKET WITH STROKE => RIGHT SQUARE BRACKET WITH STROKE ⹕ ⹖
LEFT SQUARE BRACKET WITH DOUBLE STROKE => RIGHT SQUARE BRACKET WITH
DOUBLE STROKE ⹗ ⹘
TOP HALF LEFT PARENTHESIS => TOP HALF RIGHT PARENTHESIS ⹙ ⹚
BOTTOM HALF LEFT PARENTHESIS => BOTTOM HALF RIGHT PARENTHESIS ⹛ ⹜
LEFT ANGLE BRACKET => RIGHT ANGLE BRACKET 〈 〉
LEFT DOUBLE ANGLE BRACKET => RIGHT DOUBLE ANGLE BRACKET 《 》
LEFT CORNER BRACKET => RIGHT CORNER BRACKET 「 」
LEFT WHITE CORNER BRACKET => RIGHT WHITE CORNER BRACKET 『 』
LEFT BLACK LENTICULAR BRACKET => RIGHT BLACK LENTICULAR BRACKET 【 】
LEFT TORTOISE SHELL BRACKET => RIGHT TORTOISE SHELL BRACKET 〔 〕
LEFT WHITE LENTICULAR BRACKET => RIGHT WHITE LENTICULAR BRACKET 〖 〗
LEFT WHITE TORTOISE SHELL BRACKET => RIGHT WHITE TORTOISE SHELL BRACKET
〘 〙
LEFT WHITE SQUARE BRACKET => RIGHT WHITE SQUARE BRACKET 〚 〛
SMALL LEFT PARENTHESIS => SMALL RIGHT PARENTHESIS ﹙ ﹚
SMALL LEFT CURLY BRACKET => SMALL RIGHT CURLY BRACKET ﹛ ﹜
SMALL LEFT TORTOISE SHELL BRACKET => SMALL RIGHT TORTOISE SHELL BRACKET
﹝ ﹞
FULLWIDTH LEFT PARENTHESIS => FULLWIDTH RIGHT PARENTHESIS ( )
FULLWIDTH LEFT SQUARE BRACKET => FULLWIDTH RIGHT SQUARE BRACKET [ ]
FULLWIDTH LEFT CURLY BRACKET => FULLWIDTH RIGHT CURLY BRACKET { }
FULLWIDTH LEFT WHITE PARENTHESIS => FULLWIDTH RIGHT WHITE PARENTHESIS
⦅ ⦆
HALFWIDTH LEFT CORNER BRACKET => HALFWIDTH RIGHT CORNER BRACKET 「 」
HALFWIDTH RIGHTWARDS ARROW => HALFWIDTH LEFTWARDS ARROW → ←
RIGHTWARDS ROCKET => LEFTWARDS ROCKET 🙮 🙬
RIGHTWARDS ARROW WITH SMALL TRIANGLE ARROWHEAD => LEFTWARDS ARROW WITH
SMALL TRIANGLE ARROWHEAD 🠂 🠀
RIGHTWARDS ARROW WITH MEDIUM TRIANGLE ARROWHEAD => LEFTWARDS ARROW WITH
MEDIUM TRIANGLE ARROWHEAD 🠆 🠄
RIGHTWARDS ARROW WITH LARGE TRIANGLE ARROWHEAD => LEFTWARDS ARROW WITH
LARGE TRIANGLE ARROWHEAD 🠊 🠈
RIGHTWARDS ARROW WITH SMALL EQUILATERAL ARROWHEAD => LEFTWARDS ARROW
WITH SMALL EQUILATERAL ARROWHEAD 🠒 🠐
RIGHTWARDS ARROW WITH EQUILATERAL ARROWHEAD => LEFTWARDS ARROW WITH
EQUILATERAL ARROWHEAD 🠖 🠔
HEAVY RIGHTWARDS ARROW WITH EQUILATERAL ARROWHEAD => HEAVY LEFTWARDS
ARROW WITH EQUILATERAL ARROWHEAD 🠚 🠘
HEAVY RIGHTWARDS ARROW WITH LARGE EQUILATERAL ARROWHEAD => HEAVY
LEFTWARDS ARROW WITH LARGE EQUILATERAL ARROWHEAD 🠞 🠜
RIGHTWARDS TRIANGLE-HEADED ARROW WITH NARROW SHAFT => LEFTWARDS
TRIANGLE-HEADED ARROW WITH NARROW SHAFT 🠢 🠠
RIGHTWARDS TRIANGLE-HEADED ARROW WITH MEDIUM SHAFT => LEFTWARDS
TRIANGLE-HEADED ARROW WITH MEDIUM SHAFT 🠦 🠤
RIGHTWARDS TRIANGLE-HEADED ARROW WITH BOLD SHAFT => LEFTWARDS
TRIANGLE-HEADED ARROW WITH BOLD SHAFT 🠪 🠨
RIGHTWARDS TRIANGLE-HEADED ARROW WITH HEAVY SHAFT => LEFTWARDS
TRIANGLE-HEADED ARROW WITH HEAVY SHAFT 🠮 🠬
RIGHTWARDS TRIANGLE-HEADED ARROW WITH VERY HEAVY SHAFT => LEFTWARDS
TRIANGLE-HEADED ARROW WITH VERY HEAVY SHAFT 🠲 🠰
RIGHTWARDS FINGER-POST ARROW => LEFTWARDS FINGER-POST ARROW 🠶 🠴
RIGHTWARDS SQUARED ARROW => LEFTWARDS SQUARED ARROW 🠺 🠸
RIGHTWARDS COMPRESSED ARROW => LEFTWARDS COMPRESSED ARROW 🠾 🠼
RIGHTWARDS HEAVY COMPRESSED ARROW => LEFTWARDS HEAVY COMPRESSED ARROW
🡂 🡀
RIGHTWARDS HEAVY ARROW => LEFTWARDS HEAVY ARROW 🡆 🡄
RIGHTWARDS SANS-SERIF ARROW => LEFTWARDS SANS-SERIF ARROW 🡒 🡐
WIDE-HEADED RIGHTWARDS LIGHT BARB ARROW => WIDE-HEADED LEFTWARDS LIGHT
BARB ARROW 🡢 🡠
WIDE-HEADED RIGHTWARDS BARB ARROW => WIDE-HEADED LEFTWARDS BARB ARROW
🡪 🡨
WIDE-HEADED RIGHTWARDS MEDIUM BARB ARROW => WIDE-HEADED LEFTWARDS MEDIUM
BARB ARROW 🡲 🡰
WIDE-HEADED RIGHTWARDS HEAVY BARB ARROW => WIDE-HEADED LEFTWARDS HEAVY
BARB ARROW 🡺 🡸
WIDE-HEADED RIGHTWARDS VERY HEAVY BARB ARROW => WIDE-HEADED LEFTWARDS
VERY HEAVY BARB ARROW 🢂 🢀
RIGHTWARDS TRIANGLE ARROWHEAD => LEFTWARDS TRIANGLE ARROWHEAD 🢒 🢐
RIGHTWARDS WHITE ARROW WITHIN TRIANGLE ARROWHEAD => LEFTWARDS WHITE
ARROW WITHIN TRIANGLE ARROWHEAD 🢖 🢔
RIGHTWARDS ARROW WITH NOTCHED TAIL => LEFTWARDS ARROW WITH NOTCHED TAIL
🢚 🢘
RIGHTWARDS BOTTOM SHADED WHITE ARROW => LEFTWARDS BOTTOM-SHADED WHITE
ARROW 🢡 🢠
RIGHTWARDS TOP SHADED WHITE ARROW => LEFTWARDS TOP SHADED WHITE ARROW
🢣 🢢
RIGHTWARDS RIGHT-SHADED WHITE ARROW => LEFTWARDS RIGHT-SHADED WHITE
ARROW 🢥 🢦
RIGHTWARDS LEFT-SHADED WHITE ARROW => LEFTWARDS LEFT-SHADED WHITE ARROW
🢧 🢤
RIGHTWARDS BACK-TILTED SHADOWED WHITE ARROW => LEFTWARDS BACK-TILTED
SHADOWED WHITE ARROW 🢩 🢨
RIGHTWARDS FRONT-TILTED SHADOWED WHITE ARROW => LEFTWARDS FRONT-TILTED
SHADOWED WHITE ARROW 🢫 🢪
RIGHTWARDS HAND => LEFTWARDS HAND 🫱 🫲
RIGHTWARDS ARROW AND UPPER AND LOWER ONE EIGHTH BLOCK => LEFTWARDS ARROW
AND UPPER AND LOWER ONE EIGHTH BLOCK 🮶 🮵


I propose that using any of the lhs characters as an unpaired delimiter
will result in a deprecation message. Otherwise there is no change to
existing code.

And there would be a new pragma, 'use feature qw(expanded_brackets)'
within whose scope the use of any of the lhs characters would require a
closing delimiter as listed above.

The list has been curated by the removal of two that don't match Western
European expectations:
LEFT VERTICAL BAR WITH QUILL => RIGHT VERTICAL BAR WITH QUILL ⸠ ⸡
OGHAM FEATHER MARK => OGHAM REVERSED FEATHER MARK ᚛ ᚜

and doesn't include one which might match those expectations, but
Unicode disagrees:
'﴾' ORNATE LEFT PARENTHESIS
and its mate

h...@crypt.org

unread,
Feb 9, 2022, 2:45:03 PM2/9/22
to Karl Williamson, Ricardo Signes, Perl 5 Porters, h...@crypt.org
Karl Williamson <pub...@khwilliamson.com> wrote:
:The problem I see is that using some of the punctuation characters as
:delimiters is culturally sensitive.

I'm mildly surprised not to see mention of the inverted question mark
and exclamation mark as used paired (I believe) in Spanish.

Hugo

Darren Duncan

unread,
Feb 14, 2022, 10:45:02 PM2/14/22
to Perl 5 Porters
On 2022-02-14 7:45 a.m., Paul "LeoNerd" Evans wrote:
> On Mon, 14 Feb 2022 17:14:15 +0200 Veesh Goldman wrote:
>
>> American English speaker here, so not sure if that puts me in a
>> different camp than western Europeans.
>> I would have expected right pointing arrows to be the closing
>> bracket, not the opening (most brackets and such "point" outwards).
>
> I could equally imagine them pointing inwards. It's like a sigh saying
> "HERE BE MY STRING":
>
> my $str = qq==> The contents inside here <==;

I second Paul. With arrow delimiters I would expect them to point towards what
they are delimiting. I would not consider parenthesis/brackets/braces to be a
precedent for arrows which are a completely different thing. But I also
wouldn't use arrows as delimiters, rather I see them as infix operators or
something that goes BETWEEN a pair. -- Darren Duncan

Martijn Lievaart

unread,
Feb 15, 2022, 2:45:03 AM2/15/22
to perl5-...@perl.org

On 2/14/22 16:45, Paul "LeoNerd" Evans wrote:
> On Mon, 14 Feb 2022 17:14:15 +0200
> Veesh Goldman <rabbi...@gmail.com> wrote:
>
>> American English speaker here, so not sure if that puts me in a
>> different camp than western Europeans.
>> I would have expected right pointing arrows to be the closing
>> bracket, not the opening (most brackets and such "point" outwards).
> I could equally imagine them pointing inwards. It's like a sigh saying
> "HERE BE MY STRING":
>
> my $str = qq==> The contents inside here <==;
>

Well, maybe we should allow both? But maybe someone did $a=qw)a b);


HTH,

M4


Karl Williamson

unread,
Feb 16, 2022, 12:00:03 AM2/16/22
to Martijn Lievaart, perl5-...@perl.org
That has been illegal forever

Karl Williamson

unread,
Feb 16, 2022, 12:00:04 AM2/16/22
to Darren Duncan, Perl 5 Porters
Maybe we should deprecate using delimiters that have a left/right
mirror. It would seem to me to be pretty much always confusing.
Attached is a quite complete list of candidates for that deprecation.
unmirrored.log

Darren Duncan

unread,
Feb 16, 2022, 1:15:03 AM2/16/22
to Perl 5 Porters
Personally I like that idea, even if it does then mean one can't use some things
they might expect to work, such as mirroring curly-quotes. On the other hand,
the most important usages of those things, in the literal text itself, is
preserved. -- Darren Duncan

G.W. Haywood via perl5-porters

unread,
Feb 16, 2022, 4:00:04 AM2/16/22
to Perl 5 Porters
Hi there,

On Tue, 15 Feb 2022, Karl Williamson wrote:

> Maybe we should deprecate using delimiters that have a left/right mirror.

+1

--

73,
Ged.

Karl Williamson

unread,
Feb 16, 2022, 9:45:04 AM2/16/22
to Darren Duncan, Perl 5 Porters
Curly quotes would be accepted paired delimiters when the feature is in
scope

Martijn Lievaart

unread,
Feb 16, 2022, 1:30:04 PM2/16/22
to perl5-...@perl.org

Op 16-02-2022 om 05:47 schreef Karl Williamson:
> On 2/15/22 00:27, Martijn Lievaart wrote:
>>
>> Well, maybe we should allow both? But maybe someone did $a=qw)a b);
>>
> That has been illegal forever
>

I meant @a=qw)a b), and that does work.

 perl -MData::Dump -E '@a=qw)a b);dd\@a'

["a", "b"]


HTH,

M4

Karl Williamson

unread,
Feb 16, 2022, 11:00:03 PM2/16/22
to Martijn Lievaart, perl5-...@perl.org
Sorry your first example also worked; I just misread it

Karl Williamson

unread,
Feb 19, 2022, 8:15:03 PM2/19/22
to Ricardo Signes, Perl 5 Porters
On 2/18/22 05:57, Ricardo Signes wrote:
> On Wed, Feb 9, 2022, at 2:09 PM, Karl Williamson wrote:
>> I wrote a bit of code to take into account all these findings, and came
>> up with this list.
>
> [snip]

> All this looks quite plausible to me, but whenever we programmers get
> into "and then we curated a large-ish set of codepoints", I get a bit
> nervous.  I also wonder what you can tell us about the likely change in
> this behavior over time.
>
> Unicode may add new codepoints that would be selected by your algorithm,
> right?  But they are (I assume) extremely unlikely to reclassify
> existing codepoints to enter the set in a future version?

The only danger here would be if a new character completes a pair that
is now a singleton.

I think it unlikely that Unicode would add something to a singleton
bracket or punctuation. Those are long existing and stable. If they
added a whole new pair, there would be no problem as those code points
are currently illegal to use as delimiters.

The main concern is that there currently are about 100 existing
characters with directionality implied that there isn't another reason
to exclude as being paired, and which don't have a mirror, about half of
them arrows. Unicode could add a mirror to any of them in a future
version. I see we have several choices:

1) Leave things as they are, and accept any direction-implying unpaired
character as a single fore/aft string delimiter

2) deprecate all of the ones that have fairly strong directionality
implied from being used as string delimiters, so that we would be ready
for any to become paired. It isn't like there is a shortage of possible
delimiters. It's probably not good practice to have a directional
delimiter fore and aft of a string

3) act on a case-by-case basis; few would be added in any given yearly
cycle; choosing legacy behavior or starting a deprecation cycle,
depending on the outcome of the analysis.

I redid the list of characters to include everything that mentions Left
vs Right, and everything that Unicode considers a mirror. Again, things
are sorted by category. A lot of them are listed under "excluded" for
the various reasons given, meaning I don't think we should treat these
as paired. I didn't exclude any that Unicode lists as part of a pair,
though I think we should exclude any alphabetics and symbol modifiers.
There are few of these.

The section "excluded: Direction but no mate" is where the mischief
could come.

>
> As rendered on my screen, there were a few emoji.  I assume this is
> something deciding that I'd rather see the emoji variant, and not your
> doing.  These are all unambiguously single codepoint characters?

Yes single characters; your screen's doing.

>
> --
> rjbs

Karl Williamson

unread,
Feb 19, 2022, 8:45:13 PM2/19/22
to Ricardo Signes, Perl 5 Porters
On 2/19/22 18:07, Karl Williamson wrote:
>
forgot to attach file
unmirrored.log
0 new messages