Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

UTF-8 and Unicode FAQ, demos

57 views
Skip to first unread message

Michael Lazzaro

unread,
Oct 31, 2002, 12:34:26 PM10/31/02
to perl6-l...@perl.org

Here is an extensive FAQ for Unicode and UTF-8:

http://www.cl.cam.ac.uk/~mgk25/unicode.html

and here is a test file that will show you how many of the "most common
glyphs" (WGL4, via Microsoft) you are capable of displaying in your
current setup:

http://www.cl.cam.ac.uk/~mgk25/ucs/wgl4.txt

A reduced list of interesting characters is as follows. Note that I
may not be sending them all correctly, as not all of them are available
on OSX. And that not all interesting characters are a part of the WGL4
set.

00AB # « LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
00BB # » RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK

00AC # ¬ NOT SIGN
2202 # ∂ PARTIAL DIFFERENTIAL
2206 # ∆ INCREMENT
220F # ∏ N-ARY PRODUCT
2211 # ∑ N-ARY SUMMATION
2219 # ÅE BULLET OPERATOR
221A # √ SQUARE ROOT
221E # ∞ INFINITY
221F # àA RIGHT ANGLE
2229 # Åø INTERSECTION
222B # ∫ INTEGRAL
2248 # ≈ ALMOST EQUAL TO
2260 # ≠ NOT EQUAL TO
2261 # Åfl IDENTICAL TO
2264 # ≤ LESS-THAN OR EQUAL TO
2265 # ≥ GREATER-THAN OR EQUAL TO
00D7 # Å~ MULTIPLICATION SIGN
00F7 # ÷ DIVISION SIGN
00B0 # ° DEGREE SIGN
00B1 # ± PLUS-MINUS SIGN
00B5 # µ MICRO SIGN
00B6 # ¶ PILCROW SIGN
2020 # † DAGGER
2021 # ‡ DOUBLE DAGGER
2022 # • BULLET
2026 # … HORIZONTAL ELLIPSIS
2030 # ‰ PER MILLE SIGN

00A1 # ¡ INVERTED EXCLAMATION MARK
00A2 # ¢ CENT SIGN
00A3 # £ POUND SIGN
00A4 # °Ë CURRENCY SIGN
00A5 # ¥ YEN SIGN
00A6 # ∫ BROKEN BAR
00A7 # § SECTION SIGN
00A8 # ¨ DIAERESIS
00A9 # © COPYRIGHT SIGN
00AA # ª FEMININE ORDINAL INDICATOR
00AD # ú SOFT HYPHEN
00AE # ® REGISTERED SIGN
00AF # ¯ MACRON
00B2 # ©˜ SUPERSCRIPT TWO
00B3 # ©¯ SUPERSCRIPT THREE
00B7 # · MIDDLE DOT
00B8 # ¸ CEDILLA
00BA # º MASCULINE ORDINAL INDICATOR
00BF # ¿ INVERTED QUESTION MARK
203C # ᥠDOUBLE EXCLAMATION MARK
02C7 # ˇ CARON
02D8 # ˘ BREVE
02D9 # ˙ DOT ABOVE
02DA # ˚ RING ABOVE
2122 # ™ TRADE MARK SIGN
2126 # Ω OHM SIGN

2190 # Å© LEFTWARDS ARROW
2191 # Å™ UPWARDS ARROW
2192 # Å® RIGHTWARDS ARROW
2193 # Å´ DOWNWARDS ARROW
2194 # °Í LEFT RIGHT ARROW
2195 # ¢’ UP DOWN ARROW

0391 # Éü GREEK CAPITAL LETTER ALPHA
0392 # Ɇ GREEK CAPITAL LETTER BETA
0393 # É° GREEK CAPITAL LETTER GAMMA
0394 # ɢ GREEK CAPITAL LETTER DELTA
0395 # ɣ GREEK CAPITAL LETTER EPSILON
0396 # ɧ GREEK CAPITAL LETTER ZETA
0397 # É• GREEK CAPITAL LETTER ETA
0398 # ɶ GREEK CAPITAL LETTER THETA
0399 # Éß GREEK CAPITAL LETTER IOTA
039A # É® GREEK CAPITAL LETTER KAPPA
039B # É© GREEK CAPITAL LETTER LAMDA
039C # É™ GREEK CAPITAL LETTER MU
039D # É´ GREEK CAPITAL LETTER NU
039E # ɨ GREEK CAPITAL LETTER XI
039F # É≠ GREEK CAPITAL LETTER OMICRON
03A0 # ÉÆ GREEK CAPITAL LETTER PI
03A1 # ÉØ GREEK CAPITAL LETTER RHO
03A3 # É∞ GREEK CAPITAL LETTER SIGMA
03A4 # ɱ GREEK CAPITAL LETTER TAU
03A5 # É≤ GREEK CAPITAL LETTER UPSILON
03A6 # É≥ GREEK CAPITAL LETTER PHI
03A7 # ɥ GREEK CAPITAL LETTER CHI
03A8 # ɵ GREEK CAPITAL LETTER PSI
03A9 # Ω GREEK CAPITAL LETTER OMEGA

03B1 # Éø GREEK SMALL LETTER ALPHA
03B2 # É¿ GREEK SMALL LETTER BETA
03B3 # É¡ GREEK SMALL LETTER GAMMA
03B4 # ɬ GREEK SMALL LETTER DELTA
03B5 # É√ GREEK SMALL LETTER EPSILON
03B6 # Ƀ GREEK SMALL LETTER ZETA
03B7 # É≈ GREEK SMALL LETTER ETA
03B8 # É∆ GREEK SMALL LETTER THETA
03B9 # É« GREEK SMALL LETTER IOTA
03BA # É» GREEK SMALL LETTER KAPPA
03BB # É… GREEK SMALL LETTER LAMDA
03BC # É  GREEK SMALL LETTER MU
03BD # ÉÀ GREEK SMALL LETTER NU
03BE # ÉÃ GREEK SMALL LETTER XI
03BF # ÉÕ GREEK SMALL LETTER OMICRON
03C0 # π GREEK SMALL LETTER PI
03C1 # ɜ GREEK SMALL LETTER RHO
03C2 # V GREEK SMALL LETTER FINAL SIGMA
03C3 # É– GREEK SMALL LETTER SIGMA
03C4 # É— GREEK SMALL LETTER TAU
03C5 # É“ GREEK SMALL LETTER UPSILON
03C6 # É” GREEK SMALL LETTER PHI
03C7 # É‘ GREEK SMALL LETTER CHI
03C8 # É’ GREEK SMALL LETTER PSI
03C9 # É÷ GREEK SMALL LETTER OMEGA


MikeL

Michael Lazzaro

unread,
Oct 31, 2002, 1:11:00 PM10/31/02
to perl6-l...@perl.org

And if you really want to drool at all the neat glyphs that the
wonderful, magical world of math has given us, check out:

http://www.unicode.org/charts/PDF/U2A00.pdf

.... now *theres* some brackets!

MikeL

Luke Palmer

unread,
Oct 31, 2002, 2:12:44 PM10/31/02
to mlaz...@cognitivity.com, perl6-l...@perl.org
> Mailing-List: contact perl6-lan...@perl.org; run by ezmlm
> Date: Thu, 31 Oct 2002 10:11:00 -0800
> From: Michael Lazzaro <mlaz...@cognitivity.com>
> X-SMTPD: qpsmtpd/0.12, http://develooper.com/code/qpsmtpd/

Ooh! Let's use 2AF7 and 2AF8 for qw!

> MikeL

Austin Hastings

unread,
Oct 31, 2002, 4:31:08 PM10/31/02
to Luke Palmer, mlaz...@cognitivity.com, perl6-l...@perl.org

--- Luke Palmer <fibo...@babylonia.flatirons.org> wrote:
> > And if you really want to drool at all the neat glyphs that the
> > wonderful, magical world of math has given us, check out:
> >
> > http://www.unicode.org/charts/PDF/U2A00.pdf
> >
> > .... now *theres* some brackets!
>
> Ooh! Let's use 2AF7 and 2AF8 for qw!

Frankly, I don't know HOW we've lived for so long without "larger than"
and "smaller than" operators.

=Austin


__________________________________________________
Do you Yahoo!?
HotJobs - Search new jobs daily now
http://hotjobs.yahoo.com/

John Williams

unread,
Nov 1, 2002, 12:05:27 PM11/1/02
to Luke Palmer, perl6-l...@perl.org
On Thu, 31 Oct 2002, Luke Palmer wrote:

> > .... now *theres* some brackets!
>
> Ooh! Let's use 2AF7 and 2AF8 for qw!

Actually, I wanted to suggest »German quotes« instead of French for qw.

:)

~ John Williams


Unknown

unread,
Nov 1, 2002, 2:00:39 PM11/1/02
to John Williams, Luke Palmer, perl6-l...@perl.org

Well, the other guys are suggesting bow tie operators, so maybe we should keep
«foo bar baz» with French quotes, and go with @a »*« @b for vector multiply.

That suggests to me that the circumlocution could be >>*<<. That's
presuming we also force the bitshifts to be qualified as +>> ~>>.

Actually, we could use <<foo bar baz>> for qw too if here-docs always
have to be <<' or <<". Or we could change here docs to vv"TAG"
or some such that points downwards to where the text actually is.

Larry

Matthew Zimmerman

unread,
Nov 1, 2002, 2:24:36 PM11/1/02
to perl6-l...@perl.org
Larry has been consistently using

OxAB op 0xBB

in his messages to represent a (French quote) hyperop,
(corresponding to the Unicode characters 0x00AB and 0x00BB)
which is consistent with the iso-8859-1 encoding (despite
the fact that my mailserver or his mailer insists on
labelling those messages as UTF-8).

However, the UTF-8 encoding of those Unicode characters
actually is:

0xC2AB op 0xC2BB

.. As far as I understand it, the UTF-8 encoding only allows
single byte representations of characters if they fall in the
0x00 to 0x7F range.

So the question is, if I'm writing a program and I actually
want to use one of these ops, do I put

0xAB op 0xBB

or

0xC2AB op 0xC2BB

?

-- Matt,
who'd never thought he'd have to do hex dumps to debug
his Perl programs ;)

--
Matthew Zimmerman
Interdisciplinary Biophysics, University of Virginia
http://www.people.virginia.edu/~mdz4c/

Simon Cozens

unread,
Nov 1, 2002, 7:06:07 PM11/1/02
to perl6-l...@perl.org
ma...@macko.med.virginia.edu (Matthew Zimmerman) writes:
> Larry has been consistently using
>
> OxAB op 0xBB
>
> in his messages to represent a (French quote) hyperop,
> (corresponding to the Unicode characters 0x00AB and 0x00BB)

More and more conversations like this, (and how many have we seen here
already?) about characters sets, encodings, mail quoting issues, in
fact, anything other than Perl, will be rife on every Perl-related
mailing list if we persist with this idiotic idea of having Unicode
operators.

--
"Irrigation of the land with seawater desalinated by fusion power is ancient.
It's called 'rain'."
-- Michael McClary, in alt.fusion

Markus Laire

unread,
Nov 2, 2002, 7:44:39 AM11/2/02
to Simon Cozens, perl6-l...@perl.org
On 2 Nov 2002 at 0:06, Simon Cozens wrote:

> ma...@macko.med.virginia.edu (Matthew Zimmerman) writes:
> > Larry has been consistently using
> >
> > OxAB op 0xBB
> >
> > in his messages to represent a (French quote) hyperop,
> > (corresponding to the Unicode characters 0x00AB and 0x00BB)
>
> More and more conversations like this, (and how many have we seen here
> already?) about characters sets, encodings, mail quoting issues, in
> fact, anything other than Perl, will be rife on every Perl-related
> mailing list if we persist with this idiotic idea of having Unicode
> operators.

It may seem idiotic to the egocentric people who only needs chars a-z
in his language. But for all others (think about Chinese), Unicode is
real asset.

--
Markus Laire 'malaire' <markus...@nic.fi>


Luke Palmer

unread,
Nov 2, 2002, 8:07:34 AM11/2/02
to markus...@nic.fi, si...@simon-cozens.org, perl6-l...@perl.org
> From: "Markus Laire" <markus...@nic.fi>
> Date: Sat, 02 Nov 2002 14:44:39 +0200

I don't think anyone's arguing that unicode shouldn't be in the
language. I am all for allowing people to define their own unicode
operators and such. I just don't think it should be in the core.

I do most of my work over an ssh connection to my favorite server,
through gnome-terminal. gnome-terminal does not support unicode, so
this whole thread has been filled with ?'s and \251's. I can't see a
thing...

And I'm a mostly typical geek. I _finally_ got unicode working in
Emacs, though it was not easy. I still haven't any idea how to type
these things, just look at them. Think about how much trouble a
less-geeky-than-I person would have.

We _want_ the world to be unicode compatible, for sure. But having a
useful operator in unicode isn't quite the answer. Rather than fixing
their boxes to work with unicode, like we on this list would, they
simply wouldn't use the operator. I don't quite think this is the
desired effect.

I'm fine with having tolerable synonyms. Vector plus shouldn't be
"`<<[+]>>" but I'm okay with it being "^[+]" or some such. The only
thing to think about there is what will happen when someone writes in
unicode, then someone comes along in maintainance without a
unicode-compatible editor. It will surely be perplexing to see vector
plus written ?+?. Of course, this is equivalent to the problem of
unicode variable names, so the point is moot.

Luke

Paul Johnson

unread,
Nov 2, 2002, 8:23:16 AM11/2/02
to Markus Laire, Simon Cozens, perl6-l...@perl.org

I don't think Simon is disputing the value of Unicode. I suspect that
he probably has more cause to use and definitely understands it better
than the vast majority of Perl programmers, for whom seven bits is
ample.

I live in Switzerland and regularly deal with three languages which have
various diacritics and special characters. Personally, I would be very
happy with Unicode operators, but I fear that Simon's prediction would
be accurate and I would much rather spend my time evangelising the
virtues of Perl 6 as a language than trying to fathom or explain the
incantations required to program on various platforms with a backdrop of
unfamiliar, buggy or non-existent Unicode support.

--
Paul Johnson - pa...@pjcj.net
http://www.pjcj.net

Bart Schuller

unread,
Nov 2, 2002, 9:00:20 AM11/2/02
to perl6-l...@perl.org
On Sat, Nov 02, 2002 at 06:07:34AM -0700, Luke Palmer wrote:
> I do most of my work over an ssh connection to my favorite server,
> through gnome-terminal. gnome-terminal does not support unicode, so
> this whole thread has been filled with ?'s and \251's. I can't see a
> thing...

gnome-terminal does support unicode.

For the gnome1 version:
- select a font in iso10646-1 encoding
- set at least LC_CTYPE to something like en_US.UTF-8. At least in
Debian GNU/Linux you might also have to "dpkg-reconfigure locales" to
actually enable that locale to be generated
- "echo -n ^[%G" inside the terminal, where ^[ is a literal escape
character (type it as Control-V Control-[)

For the gnome2 version:
- set at least LC_CTYPE to something like en_US.UTF-8.
- start a new gnome terminal. If you already have one running with a
different locale setting, you might have to run it as
"gnome-terminal --disable-factory"

This is enough to run mutt and (with the right font, like misc-fixed)
read almost any correctly tagged Asian spam!

--
Bart.

David Wheeler

unread,
Nov 2, 2002, 11:22:57 AM11/2/02
to Simon Cozens, perl6-l...@perl.org
On Friday, November 1, 2002, at 04:06 PM, Simon Cozens wrote:

> More and more conversations like this, (and how many have we seen here
> already?) about characters sets, encodings, mail quoting issues, in
> fact, anything other than Perl, will be rife on every Perl-related
> mailing list if we persist with this idiotic idea of having Unicode
> operators.

You keep saying or suggesting that the idea of using Unicode operators
is "idiotic." Perhaps you could make an argument in support that
assertion (as Luke and Paul have done). I for one would be interested
to hear your reasoning.

Regards,

David

--
David Wheeler AIM: dwTheory
da...@wheeler.net ICQ: 15726394
http://david.wheeler.net/ Yahoo!: dew7e
Jabber: The...@jabber.org

Unknown

unread,
Nov 2, 2002, 2:11:10 PM11/2/02
to Simon Cozens, perl6-l...@perl.org
On Sat, Nov 02, 2002 at 12:06:07AM +0000, Simon Cozens wrote:
> More and more conversations like this, (and how many have we seen here
> already?) about characters sets, encodings, mail quoting issues, in
> fact, anything other than Perl, will be rife on every Perl-related
> mailing list if we persist with this idiotic idea of having Unicode
> operators.

There will certainly be some pain in breaking out of ASCII. It might
well be idiotic now, but I don't think it will be idiotic in ten years.
And I am quite willing to deal with a certain amount of short-term crap
on behalf of the future.

Larry

Simon Cozens

unread,
Nov 2, 2002, 10:58:15 AM11/2/02
to perl6-l...@perl.org
pa...@pjcj.net (Paul Johnson) writes:
> > > More and more conversations like this, (and how many have we seen here
> > > already?) about characters sets, encodings, mail quoting issues, in
> > > fact, anything other than Perl, will be rife on every Perl-related
> > > mailing list if we persist with this idiotic idea of having Unicode
> > > operators.
>
> I live in Switzerland and regularly deal with three languages which have
> various diacritics and special characters. Personally, I would be very
> happy with Unicode operators, but I fear that Simon's prediction would
> be accurate and I would much rather spend my time evangelising the
> virtues of Perl 6 as a language than trying to fathom or explain the
> incantations required to program on various platforms with a backdrop of
> unfamiliar, buggy or non-existent Unicode support.

On the other hand, maybe I'm being as shortsighted as Thomas J Watson
[1] and that once the various operating systems do get their Unicode
support together and we see the introduction of the 50,000 key keyboard,
then Perl 6's Unicode operators will be a real boon. After all, it worked
for APL and for the MIT space-cadet keyboards, so...

I dunno. I just think that right now, it's a crazy idea. And if we have
user-definable operators *anyway*, it's a doubly crazy idea.

Just make everything be user-definable multimethods. In fact, that's
another reason for not using . for the bit ops: I'd like to be able to
see
$a .+ $b

as being equivalent to
$a.+($b)

That is, calling the + method on $a. This way you also get to choose which
of the multimethods gets applied for free...

[1] "I think there's a world market for about five computers."
--
I hooked up my accelerator pedal in my car to my brake lights. I hit the gas,
people behind me stop, and I'm gone. -- Steven Wright

Matthew Zimmerman

unread,
Nov 2, 2002, 10:27:22 AM11/2/02
to perl6-l...@perl.org
On 2002.11.01 19:06 Simon Cozens wrote:
> More and more conversations like this, (and how many have we seen here
> already?) about characters sets, encodings, mail quoting issues, in
> fact, anything other than Perl, will be rife on every Perl-related
> mailing list if we persist with this idiotic idea of having Unicode
> operators.

I don't really want Unicode operators either, but if it is decided that
there will be such operators, I would still _want_to_know_how_to_use_them_.

So let me make my original question a little more general: are Perl 6 source
files encoded in Latin-1, UTF-8, or will Perl 6 provide some sort of
translation mechanism, like specifying the charset on the command line?

--
Matt

Simon Cozens

unread,
Nov 2, 2002, 11:33:34 AM11/2/02
to perl6-l...@perl.org
da...@wheeler.net (David Wheeler) writes:
> You keep saying

I didn't think I was doing it habitually.

> or suggesting that the idea of using Unicode operators
> is "idiotic." Perhaps you could make an argument in support that
> assertion (as Luke and Paul have done).

Sure:

> > More and more conversations like this, (and how many have we seen here
> > already?) about characters sets, encodings, mail quoting issues, in
> > fact, anything other than Perl, will be rife on every Perl-related
> > mailing list

--
Triage your efforts, y'know?
- Thorfinn

Simon Cozens

unread,
Nov 2, 2002, 10:59:35 AM11/2/02
to perl6-l...@perl.org
markus...@nic.fi (Markus Laire) writes:
> It may seem idiotic to the egocentric people who only needs chars a-z
> in his language. But for all others (think about Chinese), Unicode is
> real asset.

I don't often think about Chinese. Chinese is hard. But I think about
Japanese a lot of the time, and without Unicode data processing in Japanese
would be (was, in fact) a complete nightmare.

But I was talking about the specific case of Perl operators, not
Unicode in general.

--
I'm surrounded by electromagnetic radiation all the time. There are radio
stations broadcasting at lots of kW, other people using phones, the police,
[...] the X-rays coming from my monitor, and God help us, the sun. I figure
I have better things to worry about than getting cancer from the three or
four minutes a day I spend on my cell phone. - Dave Brown.

Damian Conway

unread,
Nov 2, 2002, 7:59:38 PM11/2/02
to perl6-l...@perl.org
Simon Cozens wrote:

> On the other hand, maybe I'm being as shortsighted as Thomas J Watson
> [1] and that once the various operating systems do get their Unicode
> support together and we see the introduction of the 50,000 key keyboard,

Of course, scary 50K keyboards aren't really necessary. All we really need is
a keybord with configurable keys. That is, each key has an LED, or OLED,
or digital plastic surface, and an index key that allows you to select the
Unicode block to be currently mapped onto the keyboard. I imagine there would
also be some user-programmable hot-keys to shortcut to the blocks one
uses regularly, and almost certainly the ability to create "virtual blocks",
so one could easily set up an "ACSII plus angle quotes" block specifically
for Perl 6 programming.

Or, for those on MacOS or X11, you could create your own keyboard mapping
(e.g. http://wordherd.com/keyboards/ for MacOS), with angle quotes mapped
onto, say, option-< and option->.

Or just use the existing keyboard mappings for those two operators
(e.g. option-\ and option-| respectively on MacOS).

Or set your editor config file to insert the characters for you.
For example, in .exrc:

map <<<< «
map >>>> »

Remember, we're only talking about C<«> and C<»> as standard Perl 6 operators.
Or, at worst, a few other Latin-1 codepoints as well. It's not as though we're
defining dozens of Cyrillic, Bengali, or Hiragana symbols (though I must confess
that 309D -- HIRAGANA ITERATION MARK -- sounds like a great candidate for an
operator version of C<loop> ;-).


> I dunno. I just think that right now, it's a crazy idea. And if we have
> user-definable operators *anyway*, it's a doubly crazy idea.
>
> Just make everything be user-definable multimethods.

My problem with that is that it's effectively the same solution Perl 5
provided for switch statements. That is: let them build their own. The
reason Perl 6 has a single built-in switch mechanism is that in Perl 5
everybody *did* build their own switch statement. Differently. :-(


> In fact, that's another reason for not using . for the bit ops: I'd like
> to be able to see
> $a .+ $b
>
> as being equivalent to
> $a.+($b)
>
> That is, calling the + method on $a.

Great minds obviously think alike. That's what it *will* do.
Except it's:

$a + $b

that's equivalent to the multimethod call:

operator:+($a,$b)


> This way you also get to choose which of the multimethods
> gets applied for free...

Err, no you don't. The whole point of multimethods is that *they* choose,
based on the dynamic types of *all* their arguments.

Damian

Simon Cozens

unread,
Nov 2, 2002, 10:36:18 PM11/2/02
to perl6-l...@perl.org
dam...@conway.org (Damian Conway) writes:
> Of course, scary 50K keyboards aren't really necessary. All we really need is
> a keybord with configurable keys. That is, each key has an LED, or OLED,
> or digital plastic surface, and an index key that allows you to select the
> Unicode block to be currently mapped onto the keyboard.

Original Chinese typewriters used to work like this, I believe.

> My problem with that is that it's effectively the same solution Perl 5
> provided for switch statements. That is: let them build their own. The
> reason Perl 6 has a single built-in switch mechanism is that in Perl 5
> everybody *did* build their own switch statement. Differently. :-(

It isn't a "let them build their own". It's a "supply a default but take
it out of the language core for simplicity and regularity". I suspect
that what I'm suggesting just comes down to an implementation detail,
but I'm asking for a "+" method on an object of class integer to be given
precisely the same status as a frob method on an object of class Foobar.

> Great minds obviously think alike. That's what it *will* do.
> Except it's:
>
> $a + $b
>
> that's equivalent to the multimethod call:
>
> operator:+($a,$b)

So, in fact, totally different.

--
You want to read that stuff, fine. You want to create a network for such
things, fine. You want to explore the theoretical boundaries of free speech,
fine. But when it starts impacting *people* trying to *communicate*, then
that is where I draw the line.
- Russ Allbery, http://www.eyrie.org/~eagle/writing/rant.html

Damian Conway

unread,
Nov 2, 2002, 10:48:03 PM11/2/02
to perl6-l...@perl.org
Larry Wall wrote:

> Well, the other guys are suggesting bow tie operators, so maybe we should keep
> «foo bar baz» with French quotes, and go with @a »*« @b for vector multiply.

I wouldn't have a problem with that.


> That suggests to me that the circumlocution could be >>*<<.

A five character multiple symbol??? I guess that's the penalty for not
upgrading to something that can handle unicode.


> Actually, we could use <<foo bar baz>> for qw too if here-docs always
> have to be <<' or <<".

Yes. I thought we'd pretty much decided that anyway, hadn't we?

Hmmm...I wonder if one could then write:

$str = <<<<EOT>>;

to make the heredoc terminator *really* stand out ;-)


> Or we could change here docs to vv"TAG"
> or some such that points downwards to where the text actually is.

Urk.


Damian

Damian Conway

unread,
Nov 2, 2002, 10:52:23 PM11/2/02
to perl6-l...@perl.org
Simon Cozens wrote:

>>Of course, scary 50K keyboards aren't really necessary. All we really need is
>>a keybord with configurable keys. That is, each key has an LED, or OLED,
>>or digital plastic surface, and an index key that allows you to select the
>>Unicode block to be currently mapped onto the keyboard.
>
> Original Chinese typewriters used to work like this, I believe.

Interesting. There's obviously nothing new under the sun. :-)


> It isn't a "let them build their own". It's a "supply a default but take
> it out of the language core for simplicity and regularity". I suspect
> that what I'm suggesting just comes down to an implementation detail,
> but I'm asking for a "+" method on an object of class integer to be given
> precisely the same status as a frob method on an object of class Foobar.

Then I suspect we're in violent agreement, because that's the plan, as I
understand it.


>>Great minds obviously think alike. That's what it *will* do.
>>Except it's:
>>
>> $a + $b
>>
>>that's equivalent to the multimethod call:
>>
>> operator:+($a,$b)
>
>
> So, in fact, totally different.

Well, yeah. But only at the syntactic level. Which, as Dan will tell you,
is completely irrelevant. ;-)

Damian

David Wheeler

unread,
Nov 3, 2002, 12:45:07 AM11/3/02
to Simon Cozens, perl6-l...@perl.org
On Saturday, November 2, 2002, at 08:33 AM, Simon Cozens wrote:

>>> More and more conversations like this, (and how many have we seen
>>> here
>>> already?) about characters sets, encodings, mail quoting issues, in
>>> fact, anything other than Perl, will be rife on every Perl-related
>>> mailing list

I guess I don't see much of an argument there. That a discussion leads
to discussions on other mail lists is not a reason not to use Unicode
operators. Or so it seems to me.

Rafael Garcia-Suarez

unread,
Nov 3, 2002, 4:41:44 AM11/3/02
to perl6-l...@perl.org
Matthew Zimmerman wrote in perl.perl6.language :

>
> So let me make my original question a little more general: are Perl 6 source
> files encoded in Latin-1, UTF-8, or will Perl 6 provide some sort of
> translation mechanism, like specifying the charset on the command line?

I expect probably something similar to Perl 5's encoding pragma.
(But hopefully lexically scoped.)

Ken Fox

unread,
Nov 4, 2002, 9:43:32 AM11/4/02
to Damian Conway, perl6-l...@perl.org
Damian Conway wrote:

> Larry Wall wrote:
>> That suggests to me that the circumlocution could be >>*<<.
>
> A five character multiple symbol??? I guess that's the penalty for not
> upgrading to something that can handle unicode.

Unless this is subtle humor, the Huffman encoding idea is getting
seriously out of hand. That 5 char ASCII sequence is *identically*
encoded when read by the human eye. Humans can probably type the 5
char sequence faster too. How does Unicode win here?

I know I'm just another sample point in a sea of samples, but
my embedded symbol parser seems optimized for alphabetic symbols.
The cool non-alphabetic Unicode symbols are beautiful to look at,
but they don't help me read or write faster. There are rare
exceptions (like grouping) where I strongly prefer non-alphabetics,
but otherwise alphabetics help me get past the "what is this code?"
phase and into the "what does this code do?" phase as quickly as
possible.

(I just noticed that all the non-alphabetic symbols (except '?')
in the previous paragraph are used for grouping. Weird.)

- Ken

Garrett Goebel

unread,
Nov 4, 2002, 11:29:50 AM11/4/02
to Ken Fox, Damian Conway, perl6-l...@perl.org
Ken Fox wrote:
> Damian Conway wrote:
> > Larry Wall wrote:
> >> That suggests to me that the circumlocution could be >>*<<.
> >
> > A five character multiple symbol??? I guess that's the
> > penalty for not upgrading to something that can handle
> > unicode.
>
> Unless this is subtle humor, the Huffman encoding idea is
> getting seriously out of hand. That 5 char ASCII sequence
> is *identically* encoded when read by the human eye. Humans
> can probably type the 5 char sequence faster too. How does
> Unicode win here?
>
> I know I'm just another sample point in a sea of samples

Can't we have our cake and eat it too? Give ASCII digraph or trigraph
alternatives for the incoming tide of Perl6 Unicode?

Allow both >>*<< and »*«?

Or something similar '>>*'<<, [>*<], etc...

--
Garrett Goebel
IS Development Specialist

ScriptPro Direct: 913.403.5261
5828 Reeds Road Main: 913.384.1008
Mission, KS 66202 Fax: 913.384.2180
www.scriptpro.com gar...@scriptpro.com

Brent Dax

unread,
Nov 4, 2002, 11:55:16 AM11/4/02
to Garrett Goebel, Ken Fox, Damian Conway, perl6-l...@perl.org
Garrett Goebel:
# Ken Fox wrote:
# > Unless this is subtle humor, the Huffman encoding idea is getting
# > seriously out of hand. That 5 char ASCII sequence is *identically*
# > encoded when read by the human eye. Humans can probably type the 5
# > char sequence faster too. How does Unicode win here?
#
# Can't we have our cake and eat it too? Give ASCII digraph or
# trigraph alternatives for the incoming tide of Perl6 Unicode?

The Unicode version is more typing than the non-Unicode version, so
what's the advantage? It's prettier?

--Brent Dax <bren...@cpan.org>
@roles=map {"Parrot $_"} qw(embedding regexen Configure)

Wire telegraph is a kind of a very, very long cat. You pull his tail in
New York and his head is meowing in Los Angeles. And radio operates
exactly the same way. The only difference is that there is no cat.
--Albert Einstein (explaining radio)

Austin Hastings

unread,
Nov 4, 2002, 12:34:26 PM11/4/02
to Brent Dax, Garrett Goebel, Ken Fox, Damian Conway, perl6-l...@perl.org
This > ¶ < is a pilchrow, which shows up for me as one of those
paragraph-sign looking backwards P's with two vertical bars. Sorry if
it doesn't come out for you.

--- Brent Dax <bren...@cpan.org> wrote:

> The Unicode version is more typing than the non-Unicode version, so
> what's the advantage? It's prettier?

If you're a MAC user, you already know from the myriad responses
pointing at option-whatever how to generate the sequences.

If you're a PC/DOS user, you deserve whatever you get. But your
text-editor may support macros. I used Multi-Edit when I was DOSsing,
and it did. Ask your manufacturer.

If you're using OS/2, I'm sorry. The 850 codepage supports the
characters « and », but I don't know how to help you generate them
other than alt-123

If you're using X, and you don't already know how to generate these
characters, RTFM: xmodmap. If you can't make it work, your license to
use Linux will be revoked. Nobody who can't use xmodmap should be
allowed to own a keyboard.

If you're using Windows, specific reference is made to the following
URL:

http://support.microsoft.com/default.aspx?scid=kb;en-us;Q306560

I quote (using alt-[ and alt-], which now do « and » for me,
respectively):

««««««
Adding the United States-International Keyboard Layout

To add the United States-International keyboard layout, follow these
steps:

Click Start, and then click Control Panel.

Under Pick a category, click Date, Time, Language, and Regional
Options.

The Regional and Language Options dialog box appears.

On the Languages tab, click Details.

The Text Services and Input Languages dialog box appears.

Under Installed services, click Add.

The Add Input language dialog box appears.

In the Input language list, click the language that you want. For
example, English (United States).

NOTE: When you use the United States-International keyboard layout, you
should also use an English language setting.

In the Keyboard layout/IME list, click United States-International, and
then click OK.

In the Select one of the installed input languages to use when you
start your computer list, click Language name - United
States-International (where Language name is the language that you
selected in step 6), and then click OK.

In the Regional and Language Options dialog box, click OK.

Notice that the Language bar appears on the taskbar. When you position
the mouse pointer over it, a ToolTip appears that describes the active
keyboard layout. For example, United States-International.

Click the Language bar, and then click United States-International on
the shortcut menu that appears.

The United States-International keyboard layout is selected.
»»»»»»

NOW:

At this point, Meestaire ISO-phobic Amairecain Programmaire, you have
achieved keyboard parity with the average Swiss six-year-old child.
Don't let this happen again. We can't afford a keyboard-gap!

Yes, living below the earth in salt mines, with a properly chosen ratio
of nubile females to fertile males, say ten for every one, it will be
possible to ...

erk. sorry. (*)

The URL goes on to list all the cool keys you can generate, but two
things:

1- You've probably got a little [EN] icon in your taskbar. Make sure to
switch to the international flavor (either using leftALT-Shift in a
window, or by mousing the [EN] taskbar icon) before pounding away at
your new ¶erlified keyboard.

2- The IME only uses rightALT for key composition. Those of you whose
right thumbs have been amputated in freak mine accidents will have to
pursue more accessibility features.

=áµßþíñ (International Man of Mystery)

* -- Speaking of Dr. Strangelove, happy 50th birthday to "Mike."

November 1, 1952, saw the detonation of "Mike", the worlds first
hydrogen bomb. "Elugelab, the Pacific island on which Mike exploded,
was erased by the blast. When told that Elugelab was 'missing',
America's president-elect, Dwight Eisenhower, visibly paled."
(Economist)

Here's to 50 years of unemployment. So far.

Michael Lazzaro

unread,
Nov 4, 2002, 1:19:55 PM11/4/02
to Brent Dax, Garrett Goebel, Ken Fox, Damian Conway, perl6-l...@perl.org

On Monday, November 4, 2002, at 08:55 AM, Brent Dax wrote:
> # Can't we have our cake and eat it too? Give ASCII digraph or
> # trigraph alternatives for the incoming tide of Perl6 Unicode?
>
> The Unicode version is more typing than the non-Unicode version, so
> what's the advantage? It's prettier?

Well, yes! :-)... but also because they are unique characters compared
to all the other existing prefix/postfix/binary/quotelike operators, so
there pretty much zero chance of ambiguity. Using just a few Unicode
symbols would seriously open up the range of possible "sensible"
operators, without causing the kind of mind-numbing ambiguities and
subtle no-not-this-I-mean-that we've seen in the whole xor/hyper
discussions.

UTF-8 «op» representations have the advantage of trivially not
conflicting with _any_ existing operators, and being visually distinct
from all of them. There may be a few other things in
easy-to-find-and-type Latin1, like one or two of these:

• ≈ ∫ ∆ ® © § ∑ Ω ∆ ¶ ‡ ± ˇ ¿

That could maybe fill in for ';' in the cases where ';' has been given
a sneaky meaning, or represent some infrequent but terrifically useful
unary or binary op, etc.

C'mon, everybody's doing it! First one's free, kid... ;-)

MikeL

Matthew Zimmerman

unread,
Nov 4, 2002, 2:09:57 PM11/4/02
to Rafael Garcia-Suarez, perl6-l...@perl.org

Okay, but what will the default be? UTF-8? iso-8859-1? My
current locale? Am I going to have put

use encoding 'utf8'; # or whatever the P6 syntax will be

at the beginning of every program that might get distributed
outside of my home country to make sure it'll run? Are we
going to tell newbies to make sure they have '-w' and 'use
strict' *and* 'use encoding' at the beginning of their programs?

I'm just worried about the possibility of writing Perl 6
programs and then sending them to friends in other parts
of the world and having them fail in subtle ways because
my Perl 6 expects 0xAB and theirs expects 0xC2AB (or visa
versa). Or if I post a code sample to CLPM that runs on
my machine that doesn't compile from the posting because
my news client automatically converts charsets.

Undoubtedly the Perl 6 parser will be smart enough to
figure out all of this, and I'm making a mountain out of a
molehill. But I just want to make sure that one of the
people in authority here either is or will be thinking
about this.

Austin Hastings

unread,
Nov 4, 2002, 2:27:16 PM11/4/02
to mzimm...@virginia.edu, Rafael Garcia-Suarez, perl6-l...@perl.org

--- Matthew Zimmerman <ma...@macko.med.virginia.edu> wrote:
> On Sun, Nov 03, 2002 at 09:41:44AM -0000, Rafael Garcia-Suarez wrote:
> > Matthew Zimmerman wrote in perl.perl6.language :
> > >
> > > So let me make my original question a little more
> > > general: are Perl 6 source files encoded in Latin-1,
> > > UTF-8, or will Perl 6 provide some sort of translation
> > > mechanism, like specifying the charset on the command
> > > line?
> >
> > I expect probably something similar to Perl 5's encoding
> > pragma. (But hopefully lexically scoped.)
>
> Okay, but what will the default be? UTF-8? iso-8859-1? My
> current locale? Am I going to have put
>
> use encoding 'utf8'; # or whatever the P6 syntax will be
>
> at the beginning of every program that might get distributed
> outside of my home country to make sure it'll run?

8859-1 will be the default. If you want "trigraph" support, you'll have
to put

use encoding 'ugly-american';

at the top of your files. ;-) ;-) ;-)

Otherwise, it'll be one-character «fancyops» all the way.

=Austin

Unknown

unread,
Nov 4, 2002, 2:58:38 PM11/4/02
to Michael Lazzaro, Brent Dax, Garrett Goebel, Ken Fox, Damian Conway, perl6-l...@perl.org
On Mon, Nov 04, 2002 at 10:19:55AM -0800, Michael Lazzaro wrote:
> UTF-8 «op» representations have the advantage of trivially not
> conflicting with _any_ existing operators, and being visually distinct
> from all of them. There may be a few other things in
> easy-to-find-and-type Latin1, like one or two of these:
>
> • ≈ ∫ ∆ ® © § ∑ Ω ∆ ¶ ‡ ± ˇ ¿

I've actually got my eye on ≈ (U+2248 ALMOST EQUAL TO) as a
replacement for ~~ someday in the distant future.

I suppose it could be argued that we should use ≅ (U+2245
APPROXIMATELY EQUAL TO) instead. That's what =~ was supposed to
represent, after all...

> That could maybe fill in for ';' in the cases where ';' has been given
> a sneaky meaning, or represent some infrequent but terrifically useful
> unary or binary op, etc.

You know, separate streams in a for loop are not going to be that
common in practic, so maybe we should look around a little harder for
a supercomma that isn't a semicolon. Now *that* would be a big step
in reducing ambiguity...

Even if we limit ourselves to Latin1 for now, there's things like
the broken pipe ¦ and logical not ¬ and such that look useful.
I'd avoid using standard signs like multiply × and divide ÷ for
non-standard purposes though. (Not that we can exactly use multiply
even for its standard purpose--there's an awfully heavy resemblance
between × and x, at least in the typical sans serif font.)

It would be really funny to use cent ¢, pound £, or yen ¥ as a sigil, though...

> C'mon, everybody's doing it! First one's free, kid... ;-)

People who believe slippery slope arguments should never go skiing.

On the other hand, even the useful slippery slopes have "beginner"
slopes. I think one advantage of using Unicode for advanced features
is that it *looks* scary. So in general we should try to keep the
basic features in ASCII, and only use Unicode where there be dragons.

It will certainly be possible to write APL in Perl, but if you do,
you'll get what you deserve.

In fact, the problem with APL is not that it's possible to write APL
in it, but that it is impossible not to... :-)

Larry

Unknown

unread,
Nov 4, 2002, 3:12:51 PM11/4/02
to Austin Hastings, mzimm...@virginia.edu, Rafael Garcia-Suarez, perl6-l...@perl.org
On Mon, Nov 04, 2002 at 11:27:16AM -0800, Austin Hastings wrote:
> --- Matthew Zimmerman <ma...@macko.med.virginia.edu> wrote:
> > On Sun, Nov 03, 2002 at 09:41:44AM -0000, Rafael Garcia-Suarez wrote:
> > > Matthew Zimmerman wrote in perl.perl6.language :
> > > >
> > > > So let me make my original question a little more
> > > > general: are Perl 6 source files encoded in Latin-1,
> > > > UTF-8, or will Perl 6 provide some sort of translation
> > > > mechanism, like specifying the charset on the command
> > > > line?
> > >
> > > I expect probably something similar to Perl 5's encoding
> > > pragma. (But hopefully lexically scoped.)
> >
> > Okay, but what will the default be? UTF-8? iso-8859-1? My
> > current locale? Am I going to have put
> >
> > use encoding 'utf8'; # or whatever the P6 syntax will be
> >
> > at the beginning of every program that might get distributed
> > outside of my home country to make sure it'll run?
>
> 8859-1 will be the default.

Actually, Unicode will be the default. 8859-1 can probably also be
handled without declaration.

> If you want "trigraph" support, you'll have to put
>
> use encoding 'ugly-american';
>
> at the top of your files. ;-) ;-) ;-)
>

> Otherwise, it'll be one-character ?fancyops? all the way.

Mmm, I view one-character Unicode operators as more of an escape hatch
for the future, not as something to be made mandatory. But then,
I'm one of those ugly Americans.

Of course, I also think I'm allowed to be a little inconsistent in
forcing things like »op« on people. After all, there's gotta be
some advantage to being the Fearless Leader...

Larry

Austin Hastings

unread,
Nov 4, 2002, 3:26:56 PM11/4/02
to perl6-l...@perl.org
--- La...@yahoo.com, UNEXPECTED_DATA_AFTER_ADDRESS@.SYNTAX-ERROR.
wrote:

> Mmm, I view one-character Unicode operators as more of an escape
> hatch
> for the future, not as something to be made mandatory. But then,
> I'm one of those ugly Americans.

EBCDIC didn't support brackets, originally, so ANSI included trigraphs
called ??( and ??) for [ and ], respectively.

But the fact of the matter is that about epsilon (which is to say,
really close to zero) people wrote trigraphs.

So, yeah, include trigraph sequences if it will make happy the people
on the list who can't be bothered to read the documentation for their
own keyboard IO system.

But don't expect the rest of us to use them.

In short:

1- « and » are really useful in my context.
2- I can make my work environment generate them in one (modified)
keystroke.
3- I can make my home environment do likewise.
4- The "ascii-only" version isn't faster and easier, nor more morally
pure.
5- There is no "differently keyboard abled" market out there which has
engaged my sympathy, ascii-operator wise.

Ergo,

6- my @a = @b «+» @c;

> Of course, I also think I'm allowed to be a little inconsistent in
> forcing things like »op« on people. After all, there's gotta be
> some advantage to being the Fearless Leader...

Which kind of begs the question: Who are you? And can you authenticate
that which you just implicitly claimed? (See quote header, above, if
you don't understand my question)

>
> Larry

Mark J. Reed

unread,
Nov 4, 2002, 3:32:48 PM11/4/02
to Austin Hastings, perl6-l...@perl.org
On 2002-11-04 at 12:26:56, Austin Hastings wrote:
> 1- ? and ? are really useful in my context.
Okay. Now can you get your mailer to send them properly? :)

Ken Fox

unread,
Nov 4, 2002, 3:37:59 PM11/4/02
to Austin_...@yahoo.com, Brent Dax, Garrett Goebel, Damian Conway, perl6-l...@perl.org
Austin Hastings wrote:

> At this point, Meestaire ISO-phobic Amairecain Programmaire, you have
> achieved keyboard parity with the average Swiss six-year-old child.

The question is not about being ISO-phobic or pro-English. **

The question is whether we want a pictographic language. I like
the size of the English alphabet. It produces fairly short words,
but the words are very robust (people can read words in all
orientations, backwards, upside down, in crazy fonts, hand-written,
etc.) This is the opposite of Huffman encoding, but just
as useful IMHO.

I've had the unpleasant job of turning math into software. Hand
written formulae can be very difficult to read because mathematics
worships Huffman encoding. Multiplication is specified by *nothing*.
Exponents are just written a bit smaller and a bit raised. Is this
what we want in the core?

Does anyone have any references for reading and comprehension
rates for different types of languages? I'm ignorant on the subject
and this seems like something a Perl programmer should know.

- Ken

** I'm probably both. ISO-phobic because I actually represented my
company on an ISO standard committee. Pro-English because it's what
I use -- being pro-English doesn't make me against everything else.
A language would have to be pretty bad to have its native speakers
advocate something else!

Me

unread,
Nov 4, 2002, 3:37:03 PM11/4/02
to Austin Hastings, mzimm...@virginia.edu, Rafael Garcia-Suarez, perl6-l...@perl.org
> After all, there's gotta be some advantage to
> being the Fearless Leader...
>
> Larry

Thousands will cry for the blood of the Perl 6
design team. As Leader, you can draw their ire.
Because you are Fearless, you won't mind...

--
ralph

Damian Conway

unread,
Nov 4, 2002, 4:06:44 PM11/4/02
to perl6-l...@perl.org
Ken Fox wrote:

> I know I'm just another sample point in a sea of samples, but
> my embedded symbol parser seems optimized for alphabetic symbols.
> The cool non-alphabetic Unicode symbols are beautiful to look at,
> but they don't help me read or write faster.

Once again: we're only talking about « and ».


> There are rare exceptions (like grouping)

E.g. « and »

;-)


> where I strongly prefer non-alphabetics,
> but otherwise alphabetics help me get past the "what is this code?"
> phase and into the "what does this code do?" phase as quickly as
> possible.

Interestingly, I find it just the opposite. The use of symbolic
operators makes it easier for me to differentiate the "nouns",
"verbs", and "punctuation" of a piece of code.

Damian

Austin Hastings

unread,
Nov 4, 2002, 4:07:35 PM11/4/02
to Ken Fox, Austin_...@yahoo.com, Brent Dax, Garrett Goebel, Damian Conway, perl6-l...@perl.org

--- Ken Fox <kf...@vulpes.com> wrote:

> Austin Hastings wrote:
>
> The question is not about being ISO-phobic or pro-English. **

The two gripes I've heard have been:

1- It's hard to type.
2- I don't know how to type it on platform X.

With combo gripe "It'll be hard to remember how to type it across
multiple platforms X, Y, Z, etc." coming in third.

So I solved that problem. I know it's easy to type on Mac, I know how
to MAKE it easy to type on WinPC, and I know how to MAKE it easy to
type on an X terminal. In all cases, [OPTION] or [ALT] plus some
matching set of punctuation [(slashes) or (brackets)].

Now it's easy to type (easier, for me at least, than typing two
backticks, since the modifier level is the same and the hand-contortion
on a PC type keyboard [with ` and ~ in the top left corner] is much
lower), and not too difficult to remember, even across N platforms.

So I'll treat your objection, below, as a new one.



> The question is whether we want a pictographic language. I like
> the size of the English alphabet. It produces fairly short words,
> but the words are very robust (people can read words in all
> orientations, backwards, upside down, in crazy fonts, hand-written,
> etc.) This is the opposite of Huffman encoding, but just
> as useful IMHO.

The << and >> (rendered thus for Mr. Reed) are just as pictographic (or
not) as [ and ]. They look the same from top or bottom, and are
unmistakable in direction when looked at from either side. Likewise,
they are probably MORE clear, as has been mentioned, than the
difference between ' (apostrophe) and ` (tick) in many standard fonts,
especially the variable-width variety sometimes invoked for 8-bit
messages.

But in this context, we've got a pair of balanced, unmistakable
characters which have no other uses (compare, say, %hash and $a %= $b;
same character '%', different usages) being proposed to serve as the
marker for a new class of operation.

> ...


> Exponents are just written a bit smaller and a bit raised. Is this
> what we want in the core?

If every keyboard and operating system had the ability to simply
generate arbitrary expressions of the form (expr-a) ** (expr-b), ad
infinitum (a ** b ** c ** d ** e) then we'd be remiss not to use it.
But they can't, so we don't.

> ...


> ** I'm probably both. ISO-phobic because I actually represented my
> company on an ISO standard committee.

You have my sympathy.

Damian Conway

unread,
Nov 4, 2002, 4:11:40 PM11/4/02
to perl6-l...@perl.org
Garrett Goebel wrote:

> Can't we have our cake and eat it too? Give ASCII digraph or trigraph
> alternatives for the incoming tide of Perl6 Unicode?
>
> Allow both >>*<< and »*«?

I'd really prefer we didn't. I'd much rather keep << and >> for other
things.


> Or something similar '>>*'<<, [>*<], etc...

Much as I hate the notion of di- and trigraphs, this is a possibility.

Though I'd much rather we just allowed POD escapes (e.g. E<laquo> and E<raquo>)
in code.

And, yes, I'm aware that makes E<laquo>*E<raquo> incredibly ugly.
I'm rather *counting* on it, in fact ;-)

Damian

Me

unread,
Nov 4, 2002, 4:12:53 PM11/4/02
to Austin_...@yahoo.com, perl6-l...@perl.org
> people on the list who can't be bothered to read
> the documentation for their own keyboard IO system.

Most of this discussion seems to focus on keyboarding.
But that's of little consequence. This will always be
spotted before it does much harm and will affect just
one person and their software at a time.

Errors in encoding during transmission is a whole lot
more problematic. This will almost always be spotted
after the fact, and may affect many people at a time
and require fixes to multiple systems not controlled
by the sender or receiver.

--
ralph

Brian Ingerson

unread,
Nov 4, 2002, 4:22:25 PM11/4/02
to perl6-l...@perl.org, Austin Hastings, mzimm...@virginia.edu, Rafael Garcia-Suarez, perl6-l...@perl.org
On 04/11/02 12:12 -0800, perl6-language-return-12208-ingy=ttul...@perl.org
<aka Larry Wall> wrote:
> > If you want "trigraph" support, you'll have to put
> >
> > use encoding 'ugly-american';
> >
> > at the top of your files. ;-) ;-) ;-)
> >
> > Otherwise, it'll be one-character ?fancyops? all the way.
>
> Mmm, I view one-character Unicode operators as more of an escape hatch
> for the future, not as something to be made mandatory. But then,
> I'm one of those ugly Americans.
>
> Of course, I also think I'm allowed to be a little inconsistent in
> forcing things like »op« on people. After all, there's gotta be
> some advantage to being the Fearless Leader...

On one hand I really respect your fearlessness to go where no language
author has gone before. No matter what happens, I pretty sure you'll be
"remembered" for it. ;)

On the other hand I'm wondering what happens to the ebcdic platforms and
the like. Will it even work to have core modules written in non ascii
and expect them to translate to ebcdic? I suppose you'll have to convert
them to trigraphs as part of the installation. Just wondering if you've
thought through the support issues for platforms that by their
definition won't be using utf ever.

FWIW, ebcdic *does* have the cent sign!

Cheers, Brian

Austin Hastings

unread,
Nov 4, 2002, 4:23:15 PM11/4/02
to Me, Austin_...@yahoo.com, perl6-l...@perl.org

--- Me <m...@self-reference.com> wrote:
> > people on the list who can't be bothered to read
> > the documentation for their own keyboard IO system.
>
> Most of this discussion seems to focus on keyboarding.
> But that's of little consequence. This will always be
> spotted before it does much harm and will affect just
> one person and their software at a time.

Good. Counting Damian, that makes three of us. Welcome aboard, ralph.
:-)

> Errors in encoding during transmission is a whole lot
> more problematic. This will almost always be spotted
> after the fact, and may affect many people at a time
> and require fixes to multiple systems not controlled
> by the sender or receiver.

I disagree (slightly). I get emailed powerpoint files, jpeg images, and
tens of other binary formats every day, and they consistently come
through correctly.

The transmission network is working fine.

What we've got is an encoding problem at the MUA level. Mark Reed says
my mailer (Yahoo!) tagged a message containing high-bit characters as
US-ASCII. Several people the other day reported on the differences in
UTF8 vs. Latin-1 handling among pine, elm, and other mailers.

There are problems, and this kind of change will create a demand to get
them fixed. Those products that satisfy the demand will survive. The
others won't. Up until now, though, everyone's been lax about making
the encoding stuff strack. But this is a language widely regarded as a
huge player, and when a huge player says "You need to take care of
(something)", then it gets done.

Perl6 will do more to address the real technical issues of electronic
communication between Americans and French-speakers than anything else.
(Primarily because Perl hackers want to talk to each other, but no
French-speaker wants to talk to an American ;-)

Austin Hastings

unread,
Nov 4, 2002, 4:24:46 PM11/4/02
to Brian Ingerson, perl6-l...@perl.org, Austin Hastings, mzimm...@virginia.edu, Rafael Garcia-Suarez, perl6-l...@perl.org

--- Brian Ingerson <in...@ttul.org> wrote:
> FWIW, ebcdic *does* have the cent sign!

And the "not" sign. Damian may force us to abandon ASCII entirely...

=Austin


__________________________________________________
Yahoo! - We Remember
9-11: A tribute to the more than 3,000 lives lost
http://dir.remember.yahoo.com/tribute

Adam D. Lopresto

unread,
Nov 4, 2002, 4:03:56 PM11/4/02
to Austin_...@yahoo.com, perl6-l...@perl.org, ad...@express.cec.wustl.edu
I'm having trouble this is even being considered. At all. And especially for
these operators...

> So, yeah, include trigraph sequences if it will make happy the people
> on the list who can't be bothered to read the documentation for their
> own keyboard IO system.
>
> But don't expect the rest of us to use them.

So you're one of the very few people who bothered to set up unicode, and now
you want to force the rest of us into your own little "leet" group. Given the
choice between learning how to reconfigure their keyboard, editor, terminal,
fonts, and everything else, or just not learning perl6, I bet you'd have a LOT
of people who get scared away. Face it, too many people think perl is
linenoise heavy and random already.

Which brings me to my real question: why these operators? It's not as if
they're even particularly intuitive for this context. They're quotes. They
don't mean "vector" anything, and never have. I could almost see if the
characters in question just screamed the function in question (sqrt, not
equals, not, sum, almost anything like that), but these are just sort of
random.

Given how crazy this is all getting, is it absolutely certain that we're better
off not just making vector operations work without modifiers? I reread the
apocalypse just now, and I don't really see the problem. The main argument
against seems to be "perl5 people expect it to be scalar", but perl5 people
will have to get used to a lot. I think the operators should just be list
based, and if you want otherwise you can specify "scalar:op" or convert both
sides to scalars manually (preferably with .length, so it's absolutely clear
what's meant).
--
Adam Lopresto (ad...@cec.wustl.edu)
http://cec.wustl.edu/~adam/

Who are you and what have you done with reality?
--Jamin Gray

Paul Johnson

unread,
Nov 4, 2002, 4:36:23 PM11/4/02
to Austin Hastings, perl6-l...@perl.org
On Mon, Nov 04, 2002 at 12:26:56PM -0800, Austin Hastings wrote:

> In short:
>
> 1- ? and ? are really useful in my context.


> 2- I can make my work environment generate them in one (modified)
> keystroke.
> 3- I can make my home environment do likewise.
> 4- The "ascii-only" version isn't faster and easier, nor more morally
> pure.
> 5- There is no "differently keyboard abled" market out there which has
> engaged my sympathy, ascii-operator wise.
>
> Ergo,
>

> 6- my @a = @b ?+? @c;

It's a great argument. I know how to type "funny" characters too. I
can even read some of the ones some people send. Just don't expect me
to be able to understand any Perl 6 you mail me. Whether the problem is
at your end, my end or somewhere in the middle is moot.

On the other hand, maybe all these issues will be sorted out before we
can start writing Perl 6 in earnest. In one way I hope that is true.
In another I hope it isn't ;-)

--
Paul Johnson - pa...@pjcj.net
http://www.pjcj.net

Simon Cozens

unread,
Nov 4, 2002, 5:01:59 PM11/4/02
to perl6-l...@perl.org
kf...@vulpes.com (Ken Fox) writes:
> The question is whether we want a pictographic language.

So far we've managed to avoid turning Perl into APL. :-)
-- Larry Wall in <1997022519...@wall.org>

Although that was some time ago... :)

--
The FSF is not overly concerned about security. - FSF

Rafael Garcia-Suarez

unread,
Nov 4, 2002, 5:02:28 PM11/4/02
to perl6-l...@perl.org
Austin Hastings wrote in perl.perl6.language :

>
> What we've got is an encoding problem at the MUA level. Mark Reed says
> my mailer (Yahoo!) tagged a message containing high-bit characters as
> US-ASCII. Several people the other day reported on the differences in
> UTF8 vs. Latin-1 handling among pine, elm, and other mailers.

Not only the MUA level. Usually source code is written in a lowest
common denominator of ascii, even for languages that allow unicode
identifiers (Java) or markup. That's because source code is handled by
parsers, documentation extractors, pretty printers, diff(1), patch(1),
version control software, and (you said it) various internet clients.

That's why some people may still prefer to continue using pure ascii
even though then think that unicode operators are cool. (Esp. if they
are under the influence of FUD : "use PHP ! it's ascii compliant !")

> Perl6 will do more to address the real technical issues of electronic
> communication between Americans and French-speakers than anything else.
> (Primarily because Perl hackers want to talk to each other, but no
> French-speaker wants to talk to an American ;-)

You're Italian, aren't you ?

Austin Hastings

unread,
Nov 4, 2002, 5:09:20 PM11/4/02
to Rafael Garcia-Suarez, perl6-l...@perl.org

--- Rafael Garcia-Suarez <rgarci...@free.fr> wrote:
> Austin Hastings wrote in perl.perl6.language :
> >
> > What we've got is an encoding problem at the MUA level. Mark Reed
> says
> > my mailer (Yahoo!) tagged a message containing high-bit characters
> as
> > US-ASCII. Several people the other day reported on the differences
> in
> > UTF8 vs. Latin-1 handling among pine, elm, and other mailers.
>
> Not only the MUA level. Usually source code is written in a lowest
> common denominator of ascii, even for languages that allow unicode
> identifiers (Java) or markup. That's because source code is handled
> by
> parsers, documentation extractors, pretty printers, diff(1),
> patch(1),
> version control software, and (you said it) various internet clients.

> That's why some people may still prefer to continue using pure ascii
> even though then think that unicode operators are cool. (Esp. if they
> are under the influence of FUD : "use PHP ! it's ascii compliant !")

Yeah, but ActiveState does Perl, and Microsoft owns ActiveState, so
we've got the kings of FUD on our side for a change. Joy.

> > Perl6 will do more to address the real technical issues of
> electronic
> > communication between Americans and French-speakers than anything
> else.
> > (Primarily because Perl hackers want to talk to each other, but no
> > French-speaker wants to talk to an American ;-)
>
> You're Italian, aren't you ?

Actually, an American who's been ignored in many places. :-)

Simon Cozens

unread,
Nov 4, 2002, 5:07:05 PM11/4/02
to perl6-l...@perl.org
dam...@conway.org (Damian Conway) writes:
> > Or something similar '>>*'<<, [>*<], etc...
>
> Much as I hate the notion of di- and trigraphs, this is a possibility.

I do like this too, because it reminds me of C trigraphs, which had precisely
the same purpose - allow people with old-fashioned sub-standard character
sets to come and play with the big boys. And eventually, the old trigraphs
died out because everyone caught up with the decent (for the era) character
sets.

That's assuming we have to have Unicode operators.

I would, however, like to hear a passionate argument in favour of
this, because we've seen plenty of arguments against (encoding, transmission,
keyboarding, etc.) but not all that many in favour, so a nice definitive one
would be helpful.

--
<evilPetey> I often think I'd get better throughput yelling at the modem.

Ken Fox

unread,
Nov 4, 2002, 5:23:08 PM11/4/02
to Austin_...@yahoo.com, Brent Dax, Garrett Goebel, Damian Conway, perl6-l...@perl.org
Austin Hastings wrote:

> The << and >> ... are just as pictographic (or
> not) as [ and ].

I'm not particularly fond of << or >> either. ;) Damian just
wrote that he prefers non-alphabetic operators to help
differentiate nouns and verbs. I find it helpful when people
explain their biases like that. What's yours?

> They look the same from top or bottom, and are
> unmistakable in direction when looked at from either side.

Well, anything can look like itself, that wasn't the point. The
goal is to not look like anything else in any orientation. The
chars O and 0 fail badly, but A and T are excellent. I'm not
sure where << and >> fall because I don't have any experience
with them.

Programming languages probably get away with more because
most programmers don't spray paint algorithms on the side of
a bridge. (Well, Lisp programmers maybe. ;) My three points
against arbitrary punctuation as symbols are
(1) it's impossible to identify symbol boundaries when
reading punctuation -- you just have to guess,
(2) it's harder to work with punctuation in non-digital
communication, and
(3) my memory doesn't work well on punctuation symbols!

Perl has some nice features like sigils that clue people in on
how to read a sentence. But...

> difference between ' (apostrophe) and ` (tick)

is a horrible abomination. ;)

> If every keyboard and operating system had the ability to simply
> generate arbitrary expressions of the form (expr-a) ** (expr-b), ad
> infinitum (a ** b ** c ** d ** e) then we'd be remiss not to use it.
> But they can't, so we don't.

Non sequitur. Written language prior to the printing press had
no technological reason to limit alphabet size. Some languages
developed very large pictographic representations, others
developed small alphabets with word formation rules. I have no
idea what the design pressures were that caused these different
solutions. Do you? What are the strengths and weaknesses of the
approaches? Why should we select one over the other?

- Ken

Simon Cozens

unread,
Nov 4, 2002, 5:12:16 PM11/4/02
to perl6-l...@perl.org
austin_...@yahoo.com (Austin Hastings) writes:
> Yeah, but ActiveState does Perl, and Microsoft owns ActiveState

To what extent are *either* of those statements true? :)

--
All the good ones are taken.

Michael Lazzaro

unread,
Nov 4, 2002, 5:25:08 PM11/4/02
to Larry Wall, Brent Dax, Garrett Goebel, Ken Fox, Damian Conway, perl6-l...@perl.org
On Monday, November 4, 2002, at 11:58 AM, Larry Wall wrote:
> You know, separate streams in a for loop are not going to be that
> common in practic, so maybe we should look around a little harder for
> a supercomma that isn't a semicolon. Now *that* would be a big step
> in reducing ambiguity...

Or more than one type of supercomma, e.g:

for @x ∫ @y ∫ @z -> $x ∫ $y ∫ $z { ... }

to mean:
for @x ; @y ; @z -> $x ; $y ; $z { ... }

- vs -

for @x § @y § @z -> $x § $y § $z { ... }

to mean:

for @x -> $x {
for @y -> $y {
for @z -> $z {
...
}
}
}

;-)

MikeL

Austin Hastings

unread,
Nov 4, 2002, 5:27:33 PM11/4/02
to Simon Cozens, perl6-l...@perl.org

--- Simon Cozens <si...@simon-cozens.org> wrote:
> austin_...@yahoo.com (Austin Hastings) writes:
> > Yeah, but ActiveState does Perl, and Microsoft owns ActiveState
>
> To what extent are *either* of those statements true? :)

Hmm. Well, last time I checked you could still download a perl binary
from ActiveState.com.

And, in fact, check out the motivation behind the "agreement":

http://www.dnjonline.com/newsreel/index.html

Microsoft buys into Perl

Aug 6 - Microsoft has hired ActiveState Tool Corporation to improve the
Windows functionality of the Open Source scripting language Perl. This
agreement reinforces a long-term relationship between Microsoft and
Perl, stemming from 1993 when Microsoft funded the first port of Perl 5
to the Windows platform.

ActiveState develops and distributes the popular Windows version,
called ActivePerl. "Our mission is to make Perl as popular as
possible," said Dick Hardt, chief executive of ActiveState.

The monetary details of the deal weren't revealed - in fact there
was no mention of it anywhere on the Microsoft web site. Instead the
impetous seems to have come from Microsoft India where the main aim is
to improve Perl's support for non-Roman character-sets through Unicode.

As part of the agreement, ActiveState will add features
previously missing from Windows ports of Perl, as well as full support
for Unicode - a key feature to users dealing with Asian character sets.

.... blah blah blah ...

Austin Hastings

unread,
Nov 4, 2002, 5:45:06 PM11/4/02
to Adam D. Lopresto, Austin_...@yahoo.com, perl6-l...@perl.org, ad...@express.cec.wustl.edu

--- "Adam D. Lopresto" <ad...@cec.wustl.edu> wrote:
> I'm having trouble this is even being considered. At all. And
> especially for these operators.

Heute vektoren, morgen das welt!

Uniperl, Uniperl uber alles,
Uber alles in der welt!
With hyper-states through choose and true();
Masterfully golf scorin' script,
Von der << bis an die all(),
Von der any() bis an den >> -
Uniperl, Uniperl uber alles,
Uber alles in der welt!

> So you're one of the very few people who bothered to set up unicode,
> and now you want to force the rest of us into your own little
> "leet" group.

Nerp. Hadn't given it a second thought until the whining started about
"It's so hard..." I had actually figured that I'd be able to set a
keystroke in my editor and that would be the end of it. But then, for
no good reason that I can think of, I tried microsoft's help site and
found it in about thirty seconds. No need to set up a keyboard macro --
it's part of the OS.

I did BBS, though not as a "warez d00d". It's "L33t".

> Given the choice between learning how to reconfigure their keyboard,
> editor, terminal, fonts, and everything else, or just not learning
> perl6, I bet you'd have a LOT of people who get scared away.

That sounds a lot like what I said (and to a certain extent still fear)
back when -> was first going away.

It didn't work then, either.

> Face it, too many people think perl is linenoise heavy and random
> already.

Which is why adding a single character with a single meaning that can
be covered in chapter 14 instead of chapter 3 is a workable idea, and
why creating an operator called "Jesus, it looks like an ASCII-art
version of a dancing penguin in high heels" isn't. "Bow-tie operator",
indeed!

If @a [>*=<] @b; doesn't scan like rats chewing their way into your
cable, what does?

> Which brings me to my real question: why these operators? It's not
> as if they're even particularly intuitive for this context. They're
> quotes. They don't mean "vector" anything, and never have. I could
> almost see if the characters in question just screamed the function
> in question (sqrt, not equals, not, sum, almost anything like that),
> but these are just sort of random.

Simple answer: Larry suggested them. And was willing to sacrifice qw
functionality to this.

Also, I suppose, because of the map() suggestion a while back -- this
"operation" is going to wind up taking a huge range of parameters in
some not-too-distant future.

And @a = @b <<sub>> @c; will read a lot better, when <<sub>> is 8 lines
long.

> Given how crazy this is all getting, is it absolutely certain that
> we're better
> off not just making vector operations work without modifiers? I
> reread the
> apocalypse just now, and I don't really see the problem. The main
> argument
> against seems to be "perl5 people expect it to be scalar", but perl5
> people
> will have to get used to a lot. I think the operators should just be
> list
> based, and if you want otherwise you can specify "scalar:op" or
> convert both
> sides to scalars manually (preferably with .length, so it's
> absolutely clear
> what's meant).

It's not absolutely certain. But this discussion was destined to
happen, since we're just about out of line noise, but we're nowhere
close to being out of clever ideas.

Brian Ingerson

unread,
Nov 4, 2002, 5:48:20 PM11/4/02
to Austin Hastings, Rafael Garcia-Suarez, perl6-l...@perl.org
On 04/11/02 14:09 -0800, Austin Hastings wrote:
>
> --- Rafael Garcia-Suarez <rgarci...@free.fr> wrote:
> > Austin Hastings wrote in perl.perl6.language :
> > >
> > > What we've got is an encoding problem at the MUA level. Mark Reed
> > says
> > > my mailer (Yahoo!) tagged a message containing high-bit characters
> > as
> > > US-ASCII. Several people the other day reported on the differences
> > in
> > > UTF8 vs. Latin-1 handling among pine, elm, and other mailers.
> >
> > Not only the MUA level. Usually source code is written in a lowest
> > common denominator of ascii, even for languages that allow unicode
> > identifiers (Java) or markup. That's because source code is handled
> > by
> > parsers, documentation extractors, pretty printers, diff(1),
> > patch(1),
> > version control software, and (you said it) various internet clients.
>
> > That's why some people may still prefer to continue using pure ascii
> > even though then think that unicode operators are cool. (Esp. if they
> > are under the influence of FUD : "use PHP ! it's ascii compliant !")
>
> Yeah, but ActiveState does Perl, and Microsoft owns ActiveState, so
> we've got the kings of FUD on our side for a change. Joy.

Speaking of FUD, that's simply not true, nor tasteful IMO.

AS has done a handful of short-term contacts for MS, and that's the
extent of their relationship.

FWIW, AS also does as much or more Unix development as Windows
development. They also employ some people who have individually
advanced Perl more than you'll ever know.

Simon Cozens

unread,
Nov 4, 2002, 5:51:35 PM11/4/02
to perl6-l...@perl.org
austin_...@yahoo.com (Austin Hastings) writes:
> If @a [>*=<] @b; doesn't scan like rats chewing their way into your
> cable, what does?

This is why God gave us functions as well as operators.

--
I _am_ pragmatic. That which works, works, and theory can go screw
itself.
- Linus Torvalds

Unknown

unread,
Nov 4, 2002, 8:52:29 PM11/4/02
to Michael Lazzaro, Larry Wall, Brent Dax, Garrett Goebel, Ken Fox, Damian Conway, perl6-l...@perl.org
[Note to all: yes, this is me, despite the weirdities of the quoting
and headers. This is how it looks when I using mutt out of the box,
because I haven't yet customized it like I have pine. But I do like
being able to see my own Unicode characters, not to mention everyone
else's. If you don't believe this is me, well, I'll just tell you that
I live on a tropical island near Antarctica, my social security number
is 987-65-4321, and my mother's maiden name was the same as my maternal
grandfather's maiden name. Or something like that... --Ed]

On Mon, Nov 04, 2002 at 02:25:08PM -0800, Michael Lazzaro wrote:
> On Monday, November 4, 2002, at 11:58 AM, Larry Wall wrote:
> >You know, separate streams in a for loop are not going to be that
> >common in practic, so maybe we should look around a little harder for
> >a supercomma that isn't a semicolon. Now *that* would be a big step
> >in reducing ambiguity...
>
> Or more than one type of supercomma, e.g:
>
> for @x ∫ @y ∫ @z -> $x ∫ $y ∫ $z { ... }
>
> to mean:
> for @x ; @y ; @z -> $x ; $y ; $z { ... }

That almost works visually.

> - vs -
>
> for @x § @y § @z -> $x § $y § $z { ... }
>
> to mean:
>
> for @x -> $x {
> for @y -> $y {
> for @z -> $z {
> ...
> }
> }
> }
>
> ;-)

Glad you put the smiley. I think the latter is much clearer.

But at the moment I'm thinking there's something wrong about any
approach that requires a special character on the signature side.
I'm starting to think that all the convolving should be specified
on the left. So in this:

for parallel(@x, @y, @z) -> $x, $y, $z { ... }

the signature specifies that we are expecting 3 scalars to the sub,
and conveys no information as to whether they are generated in parallel
or serially. That's entirely specified on the left. The natural
processing of lists says that serial is specified like this:

for @a, @b, @c -> $x, $y, $z { ... }

Of course, parallel() is a rotten thing to have to say unless you're
into readability. So we could still have some kind of parallizing
supercomma, mabye even ∥ (U+2225 PARALLEL TO). But let's keep it
out of the signature, I think. In other words, if something like

for @x ∥ @y ∥ @z -> $x, $y, $z { ... }

is to work, then

@result = @x ∥ @y ∥ @z;

has to interleave @x, @y, and @z. It's not special to the C<for>.
In the case of C<for>, of course, the compiler should feel free to
optimize out the actual construction of an interleaved array.

I suppose it could be argued that ∥ is really spelled »,« or some such.
However,

@result = @x »,« @y »,« @z;

just doesn't read quite as well for some reason. A slightly better
case could be made for

@result = @x `|| @y `|| @z;

The reason we originally munged with the signature was so that we
could do weird things with differing numbers of streams on the left
and the right. But if you really want a way to take 3 from @x, then
3 from @y, then 3 from @z, there should be something equivalent to:

for round_robin_by_3s(@x, @y, @z) -> $x, $y, $z { ... }

Fooling around with signature syntax for that rare case is not worth it.
This way, the C<for> won't have to know anything about the signature other
than that it expects 3 scalar arguments. And Simon will be happ(y|ier)
that we've removed an exception.

Ed, er, Larry

Brian Ingerson

unread,
Nov 4, 2002, 10:27:56 PM11/4/02
to perl6-l...@perl.org, Michael Lazzaro, Larry Wall, Brent Dax, Garrett Goebel, Ken Fox, Damian Conway, perl6-l...@perl.org
On 04/11/02 17:52 -0800, perl6-language-return-12241-ingy=ttul...@perl.org wrote:
> [Note to all: yes, this is me, despite the weirdities of the quoting
> and headers. This is how it looks when I using mutt out of the box,
> because I haven't yet customized it like I have pine. But I do like
> being able to see my own Unicode characters, not to mention everyone
> else's. If you don't believe this is me, well, I'll just tell you that
> I live on a tropical island near Antarctica, my social security number
> is 987-65-4321, and my mother's maiden name was the same as my maternal
> grandfather's maiden name. Or something like that... --Ed]

Mutt?

I'm using mutt and I still haven't had the privledge of correctly viewing one
of these unicode characters yet. I'm gonna be really mad if you say you're
also using an OS X terminal. I suspect that it's my horrific OS X termcap
that's misbehaving here.

Aargh!

Brian

>
> On Mon, Nov 04, 2002 at 02:25:08PM -0800, Michael Lazzaro wrote:
> > On Monday, November 4, 2002, at 11:58 AM, Larry Wall wrote:
> > >You know, separate streams in a for loop are not going to be that
> > >common in practic, so maybe we should look around a little harder for
> > >a supercomma that isn't a semicolon. Now *that* would be a big step
> > >in reducing ambiguity...
> >
> > Or more than one type of supercomma, e.g:
> >

> > for @x I @y I @z -> $x I $y I $z { ... }


> >
> > to mean:
> > for @x ; @y ; @z -> $x ; $y ; $z { ... }
>
> That almost works visually.
>
> > - vs -
> >
> > for @x § @y § @z -> $x § $y § $z { ... }
> >
> > to mean:
> >
> > for @x -> $x {
> > for @y -> $y {
> > for @z -> $z {
> > ...
> > }
> > }
> > }
> >
> > ;-)
>
> Glad you put the smiley. I think the latter is much clearer.
>
> But at the moment I'm thinking there's something wrong about any
> approach that requires a special character on the signature side.
> I'm starting to think that all the convolving should be specified
> on the left. So in this:
>
> for parallel(@x, @y, @z) -> $x, $y, $z { ... }
>
> the signature specifies that we are expecting 3 scalars to the sub,
> and conveys no information as to whether they are generated in parallel
> or serially. That's entirely specified on the left. The natural
> processing of lists says that serial is specified like this:
>
> for @a, @b, @c -> $x, $y, $z { ... }
>
> Of course, parallel() is a rotten thing to have to say unless you're
> into readability. So we could still have some kind of parallizing

> supercomma, mabye even P (U+2225 PARALLEL TO). But let's keep it


> out of the signature, I think. In other words, if something like
>

> for @x P @y P @z -> $x, $y, $z { ... }
>
> is to work, then
>
> @result = @x P @y P @z;


>
> has to interleave @x, @y, and @z. It's not special to the C<for>.
> In the case of C<for>, of course, the compiler should feel free to
> optimize out the actual construction of an interleaved array.
>

> I suppose it could be argued that P is really spelled »,« or some such.

Brent Dax

unread,
Nov 4, 2002, 10:10:05 PM11/4/02
to Larry Wall, perl6-l...@perl.org
Larry Wall:
# for @x ∥ @y ∥ @z -> $x, $y, $z { ... }

Even if you decide to use UTF-8 operators (which I am Officially
Recommending Against), *please* don't use this one. This shows up as a
box in the Outlook UTF-8 font.

--Brent Dax <bren...@cpan.org>
@roles=map {"Parrot $_"} qw(embedding regexen Configure)

Wire telegraph is a kind of a very, very long cat. You pull his tail in
New York and his head is meowing in Los Angeles. And radio operates
exactly the same way. The only difference is that there is no cat.
--Albert Einstein (explaining radio)

Damian Conway

unread,
Nov 4, 2002, 10:44:48 PM11/4/02
to perl6-l...@perl.org
Larry wrote:

> I've actually got my eye on ≈ (U+2248 ALMOST EQUAL TO) as a
> replacement for ~~ someday in the distant future.
>
> I suppose it could be argued that we should use ≅ (U+2245
> APPROXIMATELY EQUAL TO) instead. That's what =~ was supposed to
> represent, after all...

Yeah, either of those work. But neither is entirely satisfactory, since
there's nothing "almost" or "approximate" about the matching the operator
does. We obviously need a unicode "IS LIKE UNTO" codepoint. ;-)


> You know, separate streams in a for loop are not going to be that
> common in practic, so maybe we should look around a little harder for
> a supercomma that isn't a semicolon. Now *that* would be a big step
> in reducing ambiguity...

Amen.


> Even if we limit ourselves to Latin1 for now,

Which I suspect we should seriously consider. Maybe leave 9+ bit
operators to Perl 7. ;-)


> I'd avoid using standard signs like multiply × and divide ÷ for
> non-standard purposes though. (Not that we can exactly use multiply
> even for its standard purpose--there's an awfully heavy resemblance
> between × and x, at least in the typical sans serif font.)

That's why I semi-seriously suggested replacing C<x> by C<×>.
For some reason alphabetic operators (at least, those that are
pretending to be symbols) really bug me.


> It would be really funny to use cent ¢, pound £, or yen ¥ as a sigil, though...

Hmmmm. Given that a pound is worth more than a dollar, maybe £ is the sigil
for pairs.

;-)


Damian

Damian Conway

unread,
Nov 4, 2002, 11:21:54 PM11/4/02
to perl6-l...@perl.org
Larry wrote:

> But at the moment I'm thinking there's something wrong about any
> approach that requires a special character on the signature side.
> I'm starting to think that all the convolving should be specified
> on the left. So in this:
>
> for parallel(@x, @y, @z) -> $x, $y, $z { ... }
>
> the signature specifies that we are expecting 3 scalars to the sub,
> and conveys no information as to whether they are generated in parallel
> or serially. That's entirely specified on the left. The natural
> processing of lists says that serial is specified like this:
>
> for @a, @b, @c -> $x, $y, $z { ... }
>
> Of course, parallel() is a rotten thing to have to say unless you're
> into readability. So we could still have some kind of parallizing
> supercomma, mabye even ∥ (U+2225 PARALLEL TO).

I'd rather we not use that. I found it surprisingly hard to
distinguish∥from ||. May I suggest that this might be the opportunity
to deploy ¦ (i.e. E<brvbar>).


> But let's keep it
> out of the signature, I think. In other words, if something like
>
> for @x ∥ @y ∥ @z -> $x, $y, $z { ... }
>
> is to work, then
>
> @result = @x ∥ @y ∥ @z;
>
> has to interleave @x, @y, and @z. It's not special to the C<for>.

Very nice. The n-ary "zip" operator.

> I suppose it could be argued that ∥ is really spelled »,« or some such.
> However,
>
> @result = @x »,« @y »,« @z;
>
> just doesn't read quite as well for some reason.

Agreed.


> A slightly better case could be made for
>
> @result = @x `|| @y `|| @z;

Except by those who suffer FIABCB (font-induced apostrophe/backtick
character blindness).


> The reason we originally munged with the signature was so that we
> could do weird things with differing numbers of streams on the left
> and the right. But if you really want a way to take 3 from @x, then
> 3 from @y, then 3 from @z, there should be something equivalent to:
>
> for round_robin_by_3s(@x, @y, @z) -> $x, $y, $z { ... }

Or perhaps just:

sub take(int $n, *@from) {
yield splice @from, 0, $n while @from > $n;
return ( @from, undef xx ($n-@from) )
}

&three = &take.assuming(n=>3);

for three(@x), three(@y), three($z) -> $x, $y, $z { ... }

???


> Fooling around with signature syntax for that rare case is not worth it.
> This way, the C<for> won't have to know anything about the signature other
> than that it expects 3 scalar arguments. And Simon will be happ(y|ier)
> that we've removed an exception.

....and reinstituted the previous exception that a semicolon in an parameter
list marks the start of optional parameters! :-)


Damian

Dan Kogai

unread,
Nov 5, 2002, 3:31:29 AM11/5/02
to perl6-l...@perl.org
On Tuesday, Nov 5, 2002, at 04:58 Asia/Tokyo, Larry Wall wrote:
> It would be really funny to use cent ¢, pound £, or yen ¥ as a sigil,
> though...

Which 'yen' ? I believe you already know \ (U+005c -> REVERSE SOLIDUS)
is prited as a yen figure in most of Japanese platforms so yen is
already everywhere :)

One big problem for introducing Unicode operator is that there are too
many symbols that look the same but with different code points (Unicode
consortium has so done to make its capitalist members happy so their
proprietary symbols in their legacy codes are preserved). Therefore I
object to the idea of making Unicode operator "standard", however
advanced that particular operator would be. At the same time, things
like "use (more) operators => taste;" is very welcome. i.e.

use operators => "smooth";
$hashref = ♀%hash # U+2640 FEMALE SIGN
$value = $hashref♂{key}; # U+2642 MALE SIGN

> People who believe slippery slope arguments should never go skiing.

I don't want perl6 to be as "tough" as skiing, though.

> On the other hand, even the useful slippery slopes have "beginner"
> slopes. I think one advantage of using Unicode for advanced features
> is that it *looks* scary. So in general we should try to keep the
> basic features in ASCII, and only use Unicode where there be dragons.

Heck. We already have source filters in perl5 and I'm pretty much sure
someone will just invent yet another 'use operators => "ascii";' kind
of stuff in perl6. I thought "use English" was already enough.

> It will certainly be possible to write APL in Perl, but if you do,
> you'll get what you deserve.

And even APL has j. Methinks the question is now whether you make APL
out of j or j out of APL.

弾 the ♂ with Too Many Symbols to Deal With

P.S. Here is even wilder idea than Unicode operators. Why don't we
just make perl6 XML-based and allow inline objects to be operators?

<perl>
$two = $one <operator src="plus.png"> $one;
</perl>

..... Yuck!

Richard Proctor

unread,
Nov 5, 2002, 4:48:13 AM11/5/02
to perl6-l...@perl.org
This UTF discussion has got silly.

I am sitting at a computer that is operating in native Latin-1 and is
quite happy - there is no likelyhood that UTF* is ever likely to reach it.

The Gillemets are coming through fine, but most of the other heiroglyphs need
a lot to be desired.

Lets consider the coding comparisons.

Chars in the range 128-159 are not defined in Latin-1 (issue 1) and are
used differently by windows to Latin-1 (later issues) so should be avoided.

Chars in the range 160-191 (which include the gillemot) are coming through
fine if encoded by the sender as UTF8.

Anything in the range 192-255 is encoded differently and thus should be
avoided.

Therefore the only addition characters that could be used, that will work
under UTF8 and Latin-1 and Windows are:

Code Symbol Comment
160 Non-breaking space (map to normal whitespace)
161 ¡ Could be used
162 ¢ Could be used
163 £ Could be used
164 ¤ Could be used
165 ¥ Could be used
166 ¦ Could be used
167 § Could be used
168 ¨ Could be used thouugh risks confusion with "
169 © Could be used
170 ª Could be used (but I dislike it as it is alphabetic)
171 « May well be used
172 ¬ "Not"?
173 ­ Nonbreaking "-" treat as the same
174 ® Could be used
175 ¯ May cause confusion with _ and -
176 ° Could be used
177 ± Introduces an interesting level of uncertainty? Useable
178 ² To the power of 2 (squaring ? ) Otherwise best avoided
179 ³ Cubing? Otherwise best avoided
180 ´ Too confusing with ' and `
181 µ Could be used
182 ¶ Could be used
183 · Dot Product? though likely to be confused with .
184 ¸ treat as ,
185 ¹ To the power 1? Probably best avoided
186 º Could be used (but I dislike it as it is alphabetic)
187 » May well be used
188 ¼ Could be used
189 ½ Could be used
190 ¾ Could be used
191 ¿ Could be used

Richard

--
Personal Ric...@waveney.org http://www.waveney.org
Telecoms Ric...@WaveneyConsulting.com http://www.WaveneyConsulting.com
Web services Ric...@wavwebs.com http://www.wavwebs.com
Independent Telecomms Specialist, ATM expert, Web Analyst & Services

Jonathan Scott Duff

unread,
Nov 5, 2002, 10:15:28 AM11/5/02
to Damian Conway, perl6-l...@perl.org
On Tue, Nov 05, 2002 at 03:21:54PM +1100, Damian Conway wrote:
> Larry wrote:
> > But let's keep it
> > out of the signature, I think. In other words, if something like
> >
> > for @x ∥ @y ∥ @z -> $x, $y, $z { ... }
> >
> > is to work, then
> >
> > @result = @x ∥ @y ∥ @z;
> >
> > has to interleave @x, @y, and @z. It's not special to the C<for>.
>
> Very nice. The n-ary "zip" operator.

Um ... could we have a zip functor as well? I think the common case
will be to pull N elements from each list rather than N from one, M
from another, etc. So, in the spirit of timtowtdi:

for zip(@a,@b,@c) -> $x,$y,$z { ... } # one at a time
for zip(@a,@b,@c,3) -> $x,$y,$z { ... } # three at a time

zip() would interleave its array arguments one at a time by
default and N at a time if the last argument is a number. Then the
RHS of the arrow just tells perl (and us) how many things to pull from
the resultant list. This would, of course, lead to strange things
like this though:

for zip(@a,@b,2) -> $x,$y,$z { ... }

but perl is always giving us enough rope. Besides ... someone may
want/need those semantics.

> Or perhaps just:
>
> sub take(int $n, *@from) {
> yield splice @from, 0, $n while @from > $n;
> return ( @from, undef xx ($n-@from) )
> }
>
> &three = &take.assuming(n=>3);
>
> for three(@x), three(@y), three($z) -> $x, $y, $z { ... }

Or if we generalized zip() a little:

for weave(@a,2,@b,1) -> $x,$y,$z { ... }

Which would take 2 elements from @a, and one from @b, until both
arrays were exhausted.

I'm just casting for alternatives to the punctuative versions in case
I hit something that's really good :-)

-Scott
--
Jonathan Scott Duff
du...@cbi.tamucc.edu

Matthew Zimmerman

unread,
Nov 5, 2002, 11:16:29 AM11/5/02
to Austin Hastings, perl6-l...@perl.org
On Mon, Nov 04, 2002 at 12:26:56PM -0800, Austin Hastings wrote:
>
> > Of course, I also think I'm allowed to be a little inconsistent in
> > forcing things like ?op? on people. After all, there's gotta be

> > some advantage to being the Fearless Leader...
>
> Which kind of begs the question: Who are you? And can you authenticate
> that which you just implicitly claimed? (See quote header, above, if
> you don't understand my question)

That message got cc:'ed to me, and according to the headers I
got, somebody either cracked 'wall.org' or that's the real Larry.
Looks like he just switched to mutt and has a little bit of
config tweaking yet to do. ;)

--
Matt

Matthew Zimmerman
Interdisciplinary Biophysics, University of Virginia
http://www.people.virginia.edu/~mdz4c/

Ken Fox

unread,
Nov 5, 2002, 11:36:45 AM11/5/02
to du...@pobox.com, Damian Conway, perl6-l...@perl.org
Jonathan Scott Duff wrote:

> Um ... could we have a zip functor as well? I think the common case
> will be to pull N elements from each list rather than N from one, M
> from another, etc. So, in the spirit of timtowtdi:
>
> for zip(@a,@b,@c) -> $x,$y,$z { ... }

sub zip (\@:ref repeat{1,}) {
my $max = max(map { $_.length } @_);
my $i = 0;
while ($i < $max) {
for (@_) {
yield $_[$i]
}
++$i
}
return ( )
}

That prototype syntax is probably obsolete, but I'm not sure
what the current proposal is. It might be better to force scalar
context on the args so that both arrays and array refs can be
zipped.

I really like the idea of using generic iterators instead of
special syntax. Sometimes it seems like we're discussing 6.x
instead of just 6.0.

This iterator is nice too:

sub pairs (\@a, \@b) {
my $max = max(@a.length, @b.length);
my $i = 0;
while ($i < $max) {
yield @a[$i] => @b[$i];
++$i
}
return ( )
}

for pairs (@a, @b) {
print .x, .y
}

- Ken

Michael Lazzaro

unread,
Nov 5, 2002, 12:56:27 PM11/5/02
to Richard Proctor, perl6-l...@perl.org
Thanks, I've been hoping for someone to post that list. Taking it one
step further, we can assume that the only chars that can be used are
those which:

-- don't have an obvious meaning that needs to be reserved
-- appear decently on all platforms
-- are distinct and recognizable in the tiny font sizes
used when programming

Comparing your list with mine, with some subjective editing based on my
small courier font, that chops the list of usable operators down to
only a handful:

> Code Symbol Comment
> 167 § Could be used
> 169 © Could be used


> 171 « May well be used
> 172 ¬ "Not"?

> 174 ® Could be used


> 176 ° Could be used
> 177 ± Introduces an interesting level of uncertainty? Useable

> 181 µ Could be used
> 182 ¶ Could be used

> 186 º Could be used (but I dislike it as it is alphabetic)
> 187 » May well be used

> 191 ¿ Could be used

That's all. A shame, because some of the others have very interesting
possibilities:

• ≠ ø † ∑ ∂ ƒ ∆ ≤ ≥ ∫ ≈ Ω ‡ ± ˇ ∏ Æ

But if Windows can't easily do them, that's a pretty big problem.
Thanks for the list.

MikeL

Jonathan Scott Duff

unread,
Nov 5, 2002, 2:29:45 PM11/5/02
to Michael Lazzaro, Richard Proctor, perl6-l...@perl.org

I'm all for one or two unicode operators if they're chosen properly
(and I trust Larry to do that since he's done a stellar job so far),
but what's the mechanism to generate unicode operators if you don't
have access to a unicode-aware editor/terminal/font/etc.? IS the only
recourse to use the "named" versions? Or will there be some sort of
digraph/trigraph/whatever sequence that always gives us the operator
we need? Something like \x[263a] but in regular code and not just
quote-ish contexts:

$campers = $a \x[263a] $b # make $a and $b happy

Smylers

unread,
Nov 5, 2002, 4:13:33 PM11/5/02
to perl6-l...@perl.org
Richard Proctor wrote:

> I am sitting at a computer that is operating in native Latin-1 and is
> quite happy - there is no likelyhood that UTF* is ever likely to reach
> it.
>

> ... Therefore the only addition characters that could be used, that
> will work under UTF8 and Latin-1 and Windows ...

What about people who don't use Latin-1, perhaps because their native
language uses Latin-2 or some other character set mutually exclusive
with Latin-1?

I don't have a Latin-2 ('Central and East European languages') typeface
handy, but its manpage includes:

253 171 AB LATIN CAPITAL LETTER T WITH CARON
273 187 BB LATIN SMALL LETTER T WITH CARON

"Caron" is sadly missing from my dictionary so I'm not sure what those
would look like, but I suspect they wouldn't be great symbols for vector
operators.

> 171 « May well be used

Also I wonder how similar to doubled less-than or greater-than signs
guillemets would look. In this font they're fine, but I'm concerned at
my abilities to make them sufficiently distinguishable on a whiteboard,
and whether publishers will cope with them (compare a recent discussion
on 'use Perl' regarding curly quotes and "fi" ligatures appearing in
code samples).

Smylers

Damian Conway

unread,
Nov 5, 2002, 4:40:26 PM11/5/02
to perl6-l...@perl.org
Scott Duff wrote:

>>Very nice. The n-ary "zip" operator.
>
> Um ... could we have a zip functor as well?

Yes, I expect so. Much as C<|>, C<&>, and C<^> will be operator versions
of C<any>, C<all>, and C<one>.

And I'd suggest that it be implemented something like:

sub zip(ARRAY *@sources; $by = 1) {
if exists $by && all(@sources).isa(PAIR) {
warn "Useless 'by' argument (every array already has a count)";
}
else {
for @sources { $_ = $_=>$by unless .isa(PAIR) }
}
my @zipped;
while any(@sources).key {
push @zipped, splice(.key, 0, .value) for @sources;
}
return @zipped;
}


> So, in the spirit of timtowtdi:
>
> for zip(@a,@b,@c) -> $x,$y,$z { ... } # one at a time
> for zip(@a,@b,@c,3) -> $x,$y,$z { ... } # three at a time

As implied above, I think the N-at-a-time behaviour would be better
mediated by an optional named parameter. So that second one should be:

for zip(@a,@b,@c,by=>3) -> $x,$y,$z { ... } # three at a time

> Or if we generalized zip() a little:
>
> for weave(@a,2,@b,1) -> $x,$y,$z { ... }
>
> Which would take 2 elements from @a, and one from @b, until both
> arrays were exhausted.

As Buddha Buck suggested elsewhere, and as I have coded above,
I would imagine that this functionality would be mediated by pairs
and merged into a single C<zip> function. So that last example is just:

for zip(@a=>2,@b=>1) -> $x,$y,$z { ... }

Damian

Smylers

unread,
Nov 5, 2002, 3:57:45 PM11/5/02
to perl6-l...@perl.org
Dan Kogai wrote:

> We already have source filters in perl5 and I'm pretty much sure
> someone will just invent yet another 'use operators => "ascii";' kind
> of stuff in perl6.

I think that's backwards to have operators being funny characters by
default but requiring explicit declaration to use well-known Ascii
characters.

Doing it t'other way round would mean that you can always write fully
portable code fragments in pure Ascii, something that'd be helpful on
mailing lists and the like.

There could be an alias syntax for people in an environment where they'd
prefer to have a non-Ascii character in place of a conglomerate of Ascii
symbols, maybe:

treat '»...«' as '[>...<]';

That has the documentational advantage that any non-Ascii character used
in code must be declared earlier in that file. And even if the
non-Ascii character gets warped in the post and displays oddly for you,
you can still see what the author intended it to do.

This has the risk that Damian described of everybody defining their own
operators, but I think that's unlikely. There's likely to be a
convention used by many people, at least those who operate in a given
character set. This way also permits those who live in a Latin 2 (or
whatever) world to have their own convention using characters that make
sense to them.

Smylers

Richard Proctor

unread,
Nov 5, 2002, 5:19:07 PM11/5/02
to Smylers, perl6-l...@perl.org
On Tue 05 Nov, Smylers wrote:
> Richard Proctor wrote:
>
> > I am sitting at a computer that is operating in native Latin-1 and is
> > quite happy - there is no likelyhood that UTF* is ever likely to reach
> > it.
> >
> > ... Therefore the only addition characters that could be used, that
> > will work under UTF8 and Latin-1 and Windows ...
>
> What about people who don't use Latin-1, perhaps because their native
> language uses Latin-2 or some other character set mutually exclusive
> with Latin-1?


Once you go beyond latin-1 there is nothing common anyway. The Gullimots
become T and t with inverted hats under Latin-2, oe and G with an inverted
hat under Latin-3, oe and G with a squiggle under it under Latin-4, No
meaning and a stylisd K for Latin-5, (cant find latin6), Gullimots under
Latin 7, nothing under latin-8.

Michael Lazzaro

unread,
Nov 5, 2002, 5:25:39 PM11/5/02
to Damian Conway, perl6-l...@perl.org

On Tuesday, November 5, 2002, at 01:40 PM, Damian Conway wrote:
> As Buddha Buck suggested elsewhere, and as I have coded above,
> I would imagine that this functionality would be mediated by pairs
> and merged into a single C<zip> function. So that last example is just:
>
> for zip(@a=>2,@b=>1) -> $x,$y,$z { ... }

Interesting... We still have "step" to deal with, as well, so I'd like
to figure out how to throw all this together into one basket, if
possible. (I know it's reaching, I'm just trying to extend this as far
as it'll go.)

So how could we say "take 2 elements from @a, stepping 10 indices at a
time, plus one from @b"? One obvious way is adverbially, but we have
two possible adverbs:

for @a:{ by => 2, step => 10 } ; @b -> $x, $y, $z { ... }

.... which is perhaps ugly, but unambiguous.

The point being that I don't know how we associate more than one adverb
with a construct, or if we can. But it would seem profoundly useful in
situations like this one.

MikeL

Flaviu Turean

unread,
Nov 5, 2002, 5:24:19 PM11/5/02
to perl6-l...@perl.org
one more data point from a person who lived, travelled and used computers
in a few countries (Romania, France, Germany, Belgium, UK, Canada, US,
Holland, Italy). paraphrasing:

rule 1: if it's not on my keyboard, it doesn't exist;
rune 2: if it's not on everybody's keyboard, it doesn't exist.


long, windy argument:

1. enter an internet cafe in Amsterdam, read your account in the web
browser. you get a window, it's hard to guess which OS is underneath. all
you get is a browser window, full screen. you are on the perl6-language
mailing list. before even contributing to the list you need to configure
your keyboard, and you have to figure out how. and you have to trust the
OS and browser installation to correctly transfer the funnies;

2. different keyboards have different symbols on them. did you know that
the UK keyboard is different from the US one? Belgium has two national
keyboards (Vallon and Flemish), the Vallon one is different from the one
used in France (and from the one used in Quebec), the Flemish one
different from the one used in Holland, and so on;

3. backquote is not on all keyboards, similarly the curlies. some have a
funny quote (oblique), which doesn't transfer/translate well, and which,
visually, seems fine until you run it through the interpreter;

4. "> everybody is doing it! first one is free!"
actually, it is like the other favourite pastime: "everybody is doing it,
but the first time hurts the most" (of the people ;-)

setting it up is difficult, afterwards yes, it may come up fine for more
symbols;

5. if you want to wait for the computing platforms before programming in
p6, then there is quite a wait ahead. how about platforms which will never
catch up? VMS, anyone?

6. "> they'll catch up with p6 and employ Unicode, or they'll die"
or the other way 'round;

7. I type this on a Solaris box, telnet'd into a Linux box, I run pine
(please _do_not_ ask people to change application so that they become
worthy of reading your messages!). accented letters don't go through;

8. << and >> are not exactly common in non-Latin scripts. one more alien
symbol to learn for those who started their lives in scripts like Chinese,
Japanese, Hindi, Arabic, etc.;

9. "> now you have the set-up of a six-year old Swiss"
can the six-year old explain how he did it?

10. fearless leaders listen to their constituency and act accordingly,
this is the only way they can remain fearless.

still reading?
flaviu


Damian Conway

unread,
Nov 5, 2002, 5:46:42 PM11/5/02
to perl6-l...@perl.org
Michael Lazzaro asked:

> So how could we say "take 2 elements from @a, stepping 10 indices at a
> time, plus one from @b"?

I think it's overreaching to try and fold this into C<zip>.
I'd suggest that hyperslicing @a within a C<zip> will probably
take care of that (presumably uncommon) case.

Damian

Michael Lazzaro

unread,
Nov 5, 2002, 5:44:32 PM11/5/02
to perl6-l...@perl.org

As one of the instigators of this thread, I submit that we've probably
argued about the Unicode stuff enough. The basic issues are now known,
and it's known that there's no general agreement on any of this stuff,
nor will there ever be. To wit:

-- Extended glyphs might be extremely useful in extending the operator
table in non-ambiguous ways, especially for "advanced" things like «op»..

-- Many people loathe the idea, and predict newcomers will too.

-- Many mailers & older platforms tend to react badly for both viewing
and inputting.

-- If extended characters are used at all, the decision needs to be
made whether they shall be least-common-denominator Latin1, UTF-8, or
full Unicode, and if there are backup spellings so that everyone can
play.

It's up to Larry, and he knows where we're all coming from. Unless
anyone has any _new_ observations, I propose we pause the debate until
a decision is reached?

MikeL

Damian Conway

unread,
Nov 5, 2002, 6:19:17 PM11/5/02
to perl6-l...@perl.org
Scott Duff wrote:

That would probably be:

$campers = $a \c[263a] $b # make $a and $b happy

if it were allowed (which I suspect it mightn't be, since it looks
rather like an index on a reference to the value returned by a call
to the subroutine C<c>.

Incidentally, this is why I previously suggested that we might
allow POD escapes in code as well. Thus:

$campers = $a E<263a> $b # make $a and $b happy

Damian

Damian Conway

unread,
Nov 5, 2002, 6:21:19 PM11/5/02
to perl6-l...@perl.org
Michael Lazzaro proposed:

> It's up to Larry, and he knows where we're all coming from. Unless
> anyone has any _new_ observations, I propose we pause the debate until a
> decision is reached?

I second the motion!

Damian

David Dyck

unread,
Nov 6, 2002, 1:54:01 AM11/6/02
to perl6-l...@perl.org

The first message had many of the following characters viewable in my
telnet window, but the repost introduced a 0xC2 prefix to the 0xA7 character.

I have this feeling that many people would vote against posting all these
funny characters, as is does make reading the perl6 mailing lists difficult
in some contexts. Ever since introducing these UTF-8 > 127 characters
into this mailing list, I can never be sure of what the posting author
intended to send. I'm all for supporting UTF-8 characters in strings,
and perhaps even in variable names but to we really have to have
perl6 programs with core operators in UTF-8. I'd like to see all
the perl6 code that had UTF-8 operators start with use non_portable_utf8_operators.

As it stands now, I'm going to have to find new tools for my linux platform
that has been performing fine since 1995 (perl5.9 still supports libc5!),
and I don't yet know how I am
going to be able to telnet in from win98, and I'll bet that the dos kermit that I
use when I dial up won't support UTF-8 characters either.

David

ps.

I just read how many people will need to upgrade their operating systems
if the want to upgrade to MS Word11.

Do we want to require operating system and/or many support tools to
be upgraded before we can share perl6 scripts via email?


On Tue, 5 Nov 2002 at 09:56 -0800, Michael Lazzaro <mlaz...@cognitivity.co...:

> > Code Symbol Comment
> > 167 § Could be used
> > 169 © Could be used
> > 171 « May well be used
> > 172 ¬ "Not"?
> > 174 ® Could be used
> > 176 ° Could be used
> > 177 ± Introduces an interesting level of uncertainty? Useable
> > 181 µ Could be used
> > 182 ¶ Could be used
> > 186 º Could be used (but I dislike it as it is alphabetic)
> > 187 » May well be used
> > 191 ¿ Could be used

Larry Wall

unread,
Nov 6, 2002, 1:43:11 PM11/6/02
to Brian Ingerson, perl6-l...@perl.org, Michael Lazzaro, Brent Dax, Garrett Goebel, Ken Fox, Damian Conway
On Mon, Nov 04, 2002 at 07:27:56PM -0800, Brian Ingerson wrote:
: Mutt?

:
: I'm using mutt and I still haven't had the privledge of correctly viewing one
: of these unicode characters yet. I'm gonna be really mad if you say you're
: also using an OS X terminal. I suspect that it's my horrific OS X termcap
: that's misbehaving here.
:
: Aargh!

I'm using mutt version 1.4i. The stock mutt on my RedHat wasn't new enough.

Larry

Larry Wall

unread,
Nov 6, 2002, 1:56:53 PM11/6/02
to Ken Fox, du...@pobox.com, Damian Conway, perl6-l...@perl.org
On Tue, Nov 05, 2002 at 11:36:45AM -0500, Ken Fox wrote:
: Jonathan Scott Duff wrote:
:
: >Um ... could we have a zip functor as well? I think the common case
: >will be to pull N elements from each list rather than N from one, M
: >from another, etc. So, in the spirit of timtowtdi:
: >
: > for zip(@a,@b,@c) -> $x,$y,$z { ... }
:
: sub zip (\@:ref repeat{1,}) {
: my $max = max(map { $_.length } @_);
: my $i = 0;
: while ($i < $max) {
: for (@_) {
: yield $_[$i]
: }
: ++$i
: }
: return ( )
: }
:
: That prototype syntax is probably obsolete, but I'm not sure
: what the current proposal is. It might be better to force scalar
: context on the args so that both arrays and array refs can be
: zipped.

You never have to put \ into a signature anymore--that's the default.
You only get list context (and flattening) when you use the splat.
For a recurring scalar context, you want something like:

sub zip (@refs is repeatedly (Array)) {

The exact syntax is subject to change, of course.

: I really like the idea of using generic iterators instead of


: special syntax. Sometimes it seems like we're discussing 6.x
: instead of just 6.0.
:
: This iterator is nice too:
:
: sub pairs (\@a, \@b) {
: my $max = max(@a.length, @b.length);
: my $i = 0;
: while ($i < $max) {
: yield @a[$i] => @b[$i];
: ++$i
: }
: return ( )
: }
:
: for pairs (@a, @b) {
: print .x, .y

: }

Neither of these work on arrays which have a finite but unknown length.

Larry

Brad Hughes

unread,
Nov 6, 2002, 4:27:02 PM11/6/02
to perl6-l...@perl.org
Flaviu Turean wrote:
[...]

> 5. if you want to wait for the computing platforms before programming in
> p6, then there is quite a wait ahead. how about platforms which will never
> catch up? VMS, anyone?

Not to start an OS war thread or anything, but why do people still have
this mistaken impression of VMS? We have compilers and hard drives and
networking and everything. We even have color monitors. Sure, we lack
a decent c++ compiler, but we consider that a feature. :-)

brad

Dan Sugalski

unread,
Nov 7, 2002, 4:00:28 PM11/7/02
to perl6-l...@perl.org

Lacking a decent C++ compiler isn't necessarily a strike against
VMS--to be a strike against, there'd actually have to *be* a decent
C++ compiler...
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Kurt D. Starsinic

unread,
Nov 7, 2002, 4:30:14 PM11/7/02
to Dan Sugalski, perl6-l...@perl.org
On Nov 07, Dan Sugalski wrote:
> Lacking a decent C++ compiler isn't necessarily a strike against
> VMS--to be a strike against, there'd actually have to *be* a decent
> C++ compiler...

Doesn't VMS have a /bin/false?

- Kurt

fear...@figaro.weizmann.ac.il

unread,
Nov 10, 2002, 4:00:03 PM11/10/02
to Michael Lazzaro, Larry Wall, Brent Dax, Garrett Goebel, Ken Fox, Damian Conway, perl6-l...@perl.org
Larry Wall <> writes:
> But at the moment I'm thinking there's something wrong about any
> approach that requires a special character on the signature side.
> I'm starting to think that all the convolving should be specified
> on the left. So in this:
>
> for parallel(@x, @y, @z) -> $x, $y, $z { ... }
>
> the signature specifies that we are expecting 3 scalars to the sub,
> and conveys no information as to whether they are generated in parallel
> or serially. That's entirely specified on the left.


if I understand correctly, the main problems with Apocalypse version
of "for" are :

* need for special meaning of ";" in the nlock signature
* need to specify unifying/intersection/other behaviour
* not everybody is happy with strean vs block arguments alignment
possibilities

one solution , to which thetread converged (??) is to essentially give
simple ways to "weave" many streams in single one, and "for" to become
always "single-stream" . this is essentially the old @a ^| @b proposal
written in english. it doesnot solve the "alignment" problem. also ,
it seems ( but may be I am wrong ) that there is run-time overhead ,
since "weaving" if done explicitly takes additional time ~ length of
arrays . this will not happen if "for" will notice one of "weaving"
functions and optimize it away. So that means that we will have to
have "standart" set of "weaving" functions recognizable by "for" .

so possibly we can revive the multistream for if we wrap this
behaviour around "loop" and "given" , something like this

loop {
given each @a -> $x {
given each @b -> $y {
given each @c -> $z {
last loop if undef $x|$y|$z
....

}}}}

this is already valid perl6 syntax if array @a have iterator method
similar hash. ( may be it is called @a.next or @a.iter ) .
and if "each" will notice how many arguments closure expects.

as it is , it looks weird , and we loose the fact that "for" loop is
*single* topicalizer scope ( here we have to break 3 of them to get
out . and also the topic inside the ... is the *last* argument $z and
not the first as would be for "usual" for .

so strictly speaking , this is not "wrapping around" -- this is just
valid (??) sintax. but may be it *is* possible to somehow wrap the
multistream behaviour around loop - given pair.

I dont know. maybe new keyword "stream"

loop {
stream @a -> $x,$y {
stream @b -> $z {
stream @c -> $alpha,$beta {
last loop if undef $x|$y|$z
....

}}}}

and "stream" does not set the topicilizer scope. it seems that
"stream" is just a function . and then it does not automatically
create a topicalizer scope.

or maybe "each" is sort of redundant inside given and we have

loop {
given @a -> [$x,$y ] {
given @b -> [$z ] {
given @c -> [$alpha,$beta] {
last loop if undef $x|$y|$z
....

}}}}

but then @a will have to remember its current index.
and given to be aware of it. may be its too much for given.

>
> for round_robin_by_3s(@x, @y, @z) -> $x, $y, $z { ... }


>
> Fooling around with signature syntax for that rare case is not worth it.
> This way, the C<for> won't have to know anything about the signature other
> than that it expects 3 scalar arguments. And Simon will be happ(y|ier)
> that we've removed an exception.
>

and for this type of things there is always "weaving" possibility.

arcadi .

fear...@figaro.weizmann.ac.il

unread,
Nov 15, 2002, 8:55:57 AM11/15/02
to perl6-l...@perl.org
Larry Wall <> writes:
>
> It would be really funny to use cent ¢, pound £, or yen ¥ as a
> sigil, though...
>
> > C'mon, everybody's doing it! First one's free, kid... ;-)

>
> People who believe slippery slope arguments should never go skiing.
>

just (re)reading *old* threads :

is it possible to extend the perl sigil behaviour .

that is , one day somebody decides it needs ¢ as sigil for certain
class of variables . will it be possible to do . ( without rewriting
the whole perl )

e.g.

my ¢a = ... ;

and this being the same as

my ??a is Cent_sigil_type ;

like

my $a ;

is same as

my $a is Scalar ;

( as I understand , perl knows what to do with $a not because it
notice every time '$' in the beginning but because it notice the
compile -- time property of that variable "is Scalar" )

I am not sure if that is *all* sigil is about in perl but if yes then
adding new sigil will be doable : just add one more property to all
variables starting with ¢ , e.g. ( and provide corresponding
functionality -- that is a black hole !) .


so it seems that sigil *is* extensible. ( at least through some sort
of filtering ) .

e.g.

I can force all variables starting with 'A' to be constant .
now 'A' is special sigil .

( can I ??? )
( probably this is something perl should avoid somehow )

arcadi .

Austin Hastings

unread,
Nov 15, 2002, 9:27:19 AM11/15/02
to fear...@figaro.weizmann.ac.il, perl6-l...@perl.org

--- fear...@figaro.weizmann.ac.il wrote:
> e.g.
>
> I can force all variables starting with 'A' to be constant .
> now 'A' is special sigil .
>
> ( can I ??? )
> ( probably this is something perl should avoid somehow )

And by extension, you can force all variables starting with 'hwpstr' to
be a certain class.

Congratulations! You've breathed new life into Hungarian notation
(compiler enforced, no less!) and assured yourself of future employment
at Microsoft.

:-) :-) :-)

=Austin


__________________________________________________
Do you Yahoo!?
Yahoo! Web Hosting - Let the expert host your site
http://webhosting.yahoo.com

Damian Conway

unread,
Nov 16, 2002, 9:07:08 PM11/16/02
to perl6-l...@perl.org
Acadi asked:

> is it possible to extend the perl sigil behaviour .

Yes.


> that is , one day somebody decides it needs ¢ as sigil for certain
> class of variables . will it be possible to do . ( without rewriting
> the whole perl )

Yes. Just inherit the standard Perl grammar, extend the C<var> rule and
install the derived grammar as the caller's parser.

Damian

0 new messages