In addition, it doesn't automatically bind to T like we were making ::T
do, so you have to use it consistently:
sub sametype (¢T $x, ¢T $y) {...}
Within a larger scope, you can always alias, though:
::T := ¢T;
Larry
How about an ASCII version and/or a class() built-in that means the same
thing?
¢T == class(T) == ?T
^
|
Dunno what to put there....
-John
1. What does it look like? I've never used a cent sign, and have seen
several.
2. How can it be typed with X character composition, vim's digraphs and
major international keyboards?
3. What is the ASCII equivalent?
4. Why not ^, which is available?
5. Why is the sigil needed? Pairs do well without, too.
Juerd
--
http://convolution.nl/maak_juerd_blij.html
http://convolution.nl/make_juerd_happy.html
http://convolution.nl/gajigu_juerd_n.html
Suggestion: 1c
'c' is an invalid character in numbers, and currently only numbers can
begin with a digit.
1cFoo
The 1 provides an extra visual hint of the cheapness of the class.
Or the euro symbol, which also has a C in it. It doesn't always have to
be American ;) It's in iso-8859-15, which is compatible enough with
iso-8859-1 to support ¥ and both « and ». (I hope those turn out as Y,
<< and >>'s pretty equivalents.)
Steve Peters
st...@fisharerojo.org
All non-ASCII operators have ASCII equivalents:
¥ Y
« <<
» >>
I'm sure ¢ will have its equivalent too.
(It's ^KCt in vim, btw)
Part of the reason for picking it is that we want to discourage people
from using it unless they're experts. But it's in Latin-1, so it's not
going to be any harder than the other Latin-1 characters we've used
to type.
Larry
The idea of punishing programmers who choose to use certain operating system
or locales just doesn't seem right to me.
Steve Peters
st...@fisharerojo.org
That's why we provide ugly ASCII workarounds for all of them. We just
haven't decided what the appropriate ugly ASCII workaround for ¢ should be.
Larry
Larry
So...no joy on the class(T) builtin/macro/whatever? Does it look too much
like a cast?
-John
c| or C| maybe.
Larry
And a nice side effect of that is that declaring the invocant ㎡
doesn't commit to whether you are thinking in a class-based or
prototype-based model. And you wouldn't care until you got down
to a .clone or a .bless.
Larry
It looks too much like a class declaration, and we're not declaring
a class. We're just declaring a variable that holds something that
"does class".
Larry
But
sub c { ... }
sub d { ... }
if $foo eq c|d { ... }
Another thing I didn't mention is that that binds both the variable
and its class. But the $ variable is of course optional after the
type, so you could just write that
sub sametype (¢T, ¢T) {...}
if you don't actually care about $x and $y. Basically, ¢T captures
the type of the associated scalar in any lvalue or declarative context,
whether or not hte scalar itself is captured.
Sorry for all the short notes--we still don't know how long this OSCON
net will be up before they take it down.
Larry
Other suggestions welcome.
Larry
So it's a type position thing if it can be. Good. (I wonder if,
since it's allowed in term position, we will come up with ambiguities)
How about this:
sub foo(c|T $x) {
my sub util (c|T $in) {...}
util($x)
}
Is that c|T in util() a new, free type variable, or am I asserting
that the type of util()'s argument must be the same type as $x?
Luke
Would c! be an option?
--
schnee
In current Perl 6: Yes, because infix ! does not exist.
But several people want ! to be a chainy none() constructor, and this
would destroy a dream.
You seem to be forgetting that we do have the longest token rule. So,
the only way this destroys a dream (and likewise, the only way c|
doesn't work), is if you have the poor package or class name c and you
insist on writing c|d or c!d without spaces.
Still, if you'd like to make a suggestion instead of just telling us
why our ideas don't work in very specific circumstances, feel free.
Luke
I don't know ... since we're still using ::T for classy things, I'd
kind of like to see something with a : in it. I also get the feeling
that these are type/class placeholders, so I wouldn't mind a ^ either.
Here are some suggestions:
:$T
:^T
^^T
:&T
$::T
$:T
[T] # these next 3 don't evoke "variable" as much as
<T> # parametric type (ala C++)
(T)
And yes, I know several of those are already "taken". I'm suggesting
that we at least think about reassigning them.
-Scott
--
Jonathan Scott Duff
du...@pobox.com
> All non-ASCII operators have ASCII equivalents:
>
> ¥ Y
> « <<
> » >>
Speaking of which the advantage of, say, « over << is that the former is
_one_ charachter. But Y, compared to ¥, is one charachter only as well,
and is even more visually distinctive with most fonts I know of, afaict,
so is there any good reason to keep the latter as the "official" one?!?
Michele
--
Commander Helena Braddock: So, where is everybody?
Melanie Ballard: Yeah, Friday night, the whole place should be packed. A
whole twelve hours before sun up and there's money to burn, whores to fuck
and drugs to take.
- Gosts of Mars (2001)
> : > c| or C| maybe.
[snip]
> : if $foo eq c|d { ... }
>
> Other suggestions welcome.
<| maybe? And what will we make |> do?
Michele
--
Se non te ne frega nulla e lo consideri un motore usa e getta, vai
pure di avviatore, ma e' un vero delitto. Un po' come vedere un
cavallo che sodomizza un criceto!!!
- Cesare/"edizioni modellismo sas" in it.hobby.modellismo
c\
Then you get
sub sametype (c\T $x, c\T $y) {...}
Not exactly pretty though. c\T
Actualy i think ^ might be my favorite so far.
sub sametype (^T $x, ^T $y) {...}
--
Eric
> The idea of punishing programmers who choose to use certain operating system
> or locales just doesn't seem right to me.
Haven't they already acclimated to the punishment of those operating
systems already?
-- c
I'd like to be able to use these without whitespace, and I expect it to
be commonly written without whitespace for simple cases, because 1|2|3
isn't any less clear than 1 | 2 | 3, while it's a lot easier to type.
> Still, if you'd like to make a suggestion instead of just telling us
> why our ideas don't work in very specific circumstances, feel free.
I've already suggested two. Is that not enough?
(a) ^
(b) 1c
I have rarely considered working in Eclipse or WSAD to be a punishment, but
I still can't type a Latin-1 sigil on its editor. Here on my OpenBSD box,
I can't even cut and paste a Latin-1 sigil here in mutt. There are many
things that I get punished for by being required to use Windows at work,
but I've not programmed in a language that punishes me for the characters
available on my system.
Since there have been some concerns regarding the lack of suggestions in this
thread, my suggestion is to avoid non-ASCII sigils completely. There are
a couple of reasons I see for this.
The first reason is efficiency. I started programming with Perl 5 because of
its efficiency. The lack of the code-compile-run loop helped to shorten development
and feedback times. The fact that I had to go to Google to figure out how to
type a cent character doesn't bode well for my efficiency in Perl 6. The best
way I can see currently on my current desktop setup is:
* Start up Microsoft Word
* Type the character as ALT-155 (the 155 must be typed on the numeric
keypad)
* Copy and paste the character into my editor
Like the old joke goes "Doctor, Doctor, it hurts when I try to type a Latin-1
character." "So don't try to type Latin-1 characters!" Instead, many
programmers will to use the ASCII equivolents that will require additional
keystrokes. Ideally, a lazy programmer will develop shortcuts to make this
easier, but this, of course, takes time and the right editor.
The second reason is in educating the average programmer. There may be books
written on Perl 6 that don't explain the ASCII equivolents for the Latin-1
sigils and vice-versa. If you don't think that will be the case, lets take
Perl 5 as the example. There are many beginning Perl 5 books, even those
written by reputable authors, that treat "for" and "foreach" very differently,
when they are identical in every way. I would hope the book editors will be
good enough to catch the sigil differences in Perl 6, but this seems rather
naive on my part.
These both cause problems with advocacy. The high-end Perl programmers who
these sigils are supposed to be for are also, typically, the best advocates and
the ones trusted in a typical programming shop. If this programmer has to
advocate changes in the entire development environment to get the most efficient
environment they can get along with migrating to Perl 6, this programmer is
going to have a tough fight, especially when competing against the likes of
Java with Eclipse/WSAD or Ruby on Rails.
I have some serious concerns about using Latin-1 sigils within Perl 6 and
the ASCII multi-character aliases. Am I not understanding something that
I should see this as an advantage?
Steve Peters
st...@fisharerojo.org
I had the same concern a few months back. I've come to see the light
in this fashion:
1) more and more Perl programmers come from non-English countries.
Heck, the Pugs effort is at least 50% non-US, if not more. None of the
are on US soil and very few of the leaders are US citizens.
2) More and more of us are programming with internationalization
(i18n) in mind. Just recently, I had to edit french text within the
templates of an app I work on. If you haven't already, you will be
doing so in the near future, within the next 3 years.
3) Every editor (with very few exceptions) can display Latin-1
and, with a few more exceptions, can input Latin-1. If your favorite
editor cannot, then that's something to bring up with the authors.
Windows ... yeah. As you pointed out, the old joke goes "Doctor,
it hurts when I use Windows . . . then, don't use Windows!" With the
availability of dual-booting into FreeBSD/Linux (given the
near-complete migration of all the necessary Office products) and both
gvim and emacs having been successfully ported to WIn32, there is a
way to do it. gvim on WinXP will do all Latin-1 charset with the vim
keys. (I don't know about emacs, but I'd be shocked if it didn't.) If
your IT department's policy is rigid, a quick discussion with your
manager's manager will solve that problem immediately. Or, the cost of
a few lunches with your favorite IT person will exempt your computer
from the nightly audit. ($50 goes a long way ...)
Personally, I plan on using every single Latin-1 operator I am
given access to. All the cool kids will ...
Rob
You mean additional keystroke. We haven't yet developed any ASCII
equivalent that takes more than two characters. For most cases, the
ASCII equivalents are easier to type than the Latin-1 versions.
However, being a Perl 6 programmer myself, I still use the Latin-1
versions because I like how they look and feel better. But nobody is
forcing you to do the same.
The one thing you have to worry about is if you use an editor that
doesn't support Latin-1 to read somebody else's code. However, many
many popular editors are capable of doing this, and any editor that
doesn't probably will soon. We've been over this and over this.
Also, don't think of the class sigil as a sigil. You won't be writing
it very often. Just think of it as an operator.
My final point: we don't introduce unicode characters lightly. We do
so when we think it is the best symbol for the job, optimizing, for
once, for readability rather than writability. If you don't think the
class sigil should be a unicode character, come up with a better one.
We're not going to say "You're right, Steve. No more unicode sigils!"
until wee see a good alternative to the unicode sigil that we have.
Luke
Is this necessary? Isn't putting a variable before another variable
like that in the correct context (subroutine declaration, in this case),
enough to imply that the variable "does Class" ?
While I'm not arguing against another sigil type, I think this would
distinguish it from the other sigils % and @, which are just an implicit
(does Hash) / (does Array), as well as being a part of the unique name,
as I understand it so far.
This makes me wonder which language feature is used to describe sigils
themselves. Can I define my own sigils with their own type
implications?
Sam.
ps, X11 users, if you have any key bound to "AltGr", then "AltGr" + C
might well give you a ¢ sign without any extra reconfiguration.
I say that we should exploit all the Unicode
characters reasonably possible to make for a more
elegant language, and any tools currently behind
will catch up before long.
In this case, I support the use of any
international currency symbol for use as Perl
sigils and/or operators as appropriate. Eg, we
already use $ (dollar; unicode=0024; utf8=24) and
¥ (yen; unicode=00A5; utf8=C2A5), and I suggest
that the next best one to exploit is ¤ (euro;
unicode=20AC; utf8=E282AC), and the next best is
£ (pound; unicode=00A3; utf8=C2A3). In my
experience, the ¢ (cent; unicode=00A3; utf8=C2A3)
is no harder to type than either of those.
In some cases, typing a ¢ is easier than most of
those characters. On a Macintosh keyboard,
typing opt-4 will get a ¢ as shift-4 gets a $.
For that matter, Macintosh keyboards and their
'option' key allows one to type twice as many
characters without entering special codes or
using an input palette as other keyboards having
only a 'shift' key do. So in that respect, if
you want a sigil that is meant to be discouraged
due to being harder to type, then ¢ may be a
worse choice than some other options.
On the other hand, if you want to use the ¢ due
to its being conceptually tied to $, that they
are different units of currency meant to be used
together, then the ¢ is fine.
All this being said, if you explicitly want to
have ASCII alternatives for all Unicode
characters being used, then I suggest it is best
to keep the use of Unicode characters mainly in
operators, because those are always surrounded by
whitespace and can easily be substituted for
latin words.
Whereas, because sigils are always right next to
ordinary word characters, I suggest that they
should always be ASCII characters, or that the
ASCII equivalent should not contain any word
characters. My impression is that sigils
containing alphanumerics just look wrong.
Perhaps a solution here for an ASCII equivalent
is something combining the $ and something else.
How about this twigil, which combines '::' and
'$':
:$:
Does that conflict with anything?
-- Darren Duncan
I think the reason why Larry proposed the "¢" is much simpler - it
looks a bit like a c, which one could associate with "class", similar
to how $ looks like S (scalar) and @ looks like a (array). :)
--
schnee
And in those kinds of corporate environments, you're not going to be
working with any code but code written in-house. Which means that
nobody is going to be using Latin-1, and everyone will be using the
ASCII synonyms. What's the problem?
Please, just use the ASCII synonyms. We've argued over the unicodity
of Perl 6 many times; you're certainly not the first with this
concern. The result every time is: Unicode will be in Perl 6, nobody
is forcing you to use it.
Luke
Surely you aren't suggesting that these non-English speakers do not have
access to the ASCII (or EBCDIC) character sets for their editors, are you?
> 2) More and more of us are programming with internationalization
> (i18n) in mind. Just recently, I had to edit french text within the
> templates of an app I work on. If you haven't already, you will be
> doing so in the near future, within the next 3 years.
I have worked on an app that needed to work with English (US and GB),
German, and Japanese. I do not, however, remember having to write my
code in anything but ASCII.
> 3) Every editor (with very few exceptions) can display Latin-1
> and, with a few more exceptions, can input Latin-1. If your favorite
> editor cannot, then that's something to bring up with the authors.
As I mentioned earlier, most programmers in a corporate environment have
limited access to system settings. Changing them in some cases can cause
reprimands or dismissal. Systems are often set up with the bare minimum
of locales and character sets necessary to do the job. Also, you have to
deal with the situations where programmers are connecting to *nix servers
through a variety of Windows-based XWindows servers (Exceed, Cygwin, etc.)
complicates what character sets are available immensely.
Also, what settings changes do I need to make to get Latin-1 on
<enter any operating system or editor here>? Welcome to your documentation
nightmare! In Perl 5, we have a nearly impossible time keeping track of where
Microsoft has put their free compiler tools. Now multiply that by the
number of Linux distributions, BSD distributions, and various other operating
systems. Don't forget different versions will do it differently, and have
documentation in different places. Some of the documentation won't even be
available on the Internet, so Perl 6 would need to reference it in some way.
Are you beginning to get the magnitude of the documentation problem?
>
> Windows ... yeah. As you pointed out, the old joke goes "Doctor,
> it hurts when I use Windows . . . then, don't use Windows!"
Well over 95% of the desktop computers in a corporate environment are using
Windows. If you are suggesting Perl 6 ignores Windows, then we should all
start writing Perl 6's obituary. This sort of attitude does nothing to
advance Perl 6.
> With the availability of dual-booting into FreeBSD/Linux (given the
> near-complete migration of all the necessary Office products) and both
> gvim and emacs having been successfully ported to WIn32, there is a
> way to do it. gvim on WinXP will do all Latin-1 charset with the vim
> keys. (I don't know about emacs, but I'd be shocked if it didn't.) If
> your IT department's policy is rigid, a quick discussion with your
> manager's manager will solve that problem immediately. Or, the cost of
> a few lunches with your favorite IT person will exempt your computer
> from the nightly audit. ($50 goes a long way ...)
>
Again, I'd prefer not to be fired. Everything you have written above is
not an option for the majority of the programmers out there. Also, not
to helpful if you write your programs in TSO on an IBM mainframe.
> Personally, I plan on using every single Latin-1 operator I am
> given access to. All the cool kids will ...
Famous last words have never been more finely spoken. Ignoring Windows and
other environments without ready access to Latin-1 seems like a horrible
mistake to me. While the cool kids are playing with their Latin-1 sigils,
programmers in corporate environments where Latin-1 isn't available will
start writing their new systems in Java, Ruby, or .NET.
Steve Peters
st...@fisharerojo.org
And how % looks like h (hash).
I dislike things like "$calar" and "@rray", and now some people will use
"ólass" in examples.
Please let that the sigil looks like a certain leter not be a reason.
But I may have to support your code. That's the issue.
>
> The one thing you have to worry about is if you use an editor that
> doesn't support Latin-1 to read somebody else's code. However, many
> many popular editors are capable of doing this, and any editor that
> doesn't probably will soon. We've been over this and over this.
I'd say a lot more editors support ASCII than Latin-1. Also, you are also
assuming that programmers have control over what tools they have available,
and have the ability to upgrade whenever they wish. I've found this to be
very far from reality. I understand that the ability to process the code
as Unicode is an important feature of Perl 6. There is a big difference
between allowing it and requiring it. Writing off a large number of
editors, and even operating systems, seems like a big shot in the foot.
My biggest concern, however, relates to advocacy. There will need to be
books, magazine articles, tutorials, etc. written to announce the arrival
of Perl 6. If the code uses Latin-1 characters, and people are unable to
look at the example code in their favorite editor or type in some of the
example code, we'll lose that person to Perl 6. The other alternative is
to preface every article with the explanation of the separate ASCII/Latin-1
sigils. That doesn't sound practical, and cannot be policed or enforced.
>
> Also, don't think of the class sigil as a sigil. You won't be writing
> it very often. Just think of it as an operator.
>
> My final point: we don't introduce unicode characters lightly. We do
> so when we think it is the best symbol for the job, optimizing, for
> once, for readability rather than writability.
As you mentioned above, readibility is a big issue. If I can't tell one sigil
from another, or cannot even see it, how can I support the code?
> If you don't think the
> class sigil should be a unicode character, come up with a better one.
> We're not going to say "You're right, Steve. No more unicode sigils!"
> until wee see a good alternative to the unicode sigil that we have.
~ seems to be available for a sigil, if my reading of S02 is correct, and
the cent sign is replacing :: in all cases. If not (that is $::foo is
still the global variable named foo) then * may also be available.
Steve Peters
st...@fisharerojo.org
Surely you aren't suggesting that your editor doesn't have access to
the Latin-1 charset, are you? Let's take a look at popular editors:
vi - check
emacs - check
eclipse - check
mutt - check (http://www.rano.org/mutt.html)
Notepad - check
A bazillion other editors - check
(http://www.alanwood.net/unicode/utilities_editors.html)
> I have worked on an app that needed to work with English (US and GB),
> German, and Japanese. I do not, however, remember having to write my
> code in anything but ASCII.
No, I had to edit my -templates- and -data files- that included French
text. Some of them could use HTML entities, but the datafiles intended
for the DB couldn't.
> As I mentioned earlier, most programmers in a corporate environment have
> limited access to system settings. Changing them in some cases can cause
> reprimands or dismissal. Systems are often set up with the bare minimum
> of locales and character sets necessary to do the job. Also, you have to
> deal with the situations where programmers are connecting to *nix servers
> through a variety of Windows-based XWindows servers (Exceed, Cygwin, etc.)
> complicates what character sets are available immensely.
I have worked as a contractor in almost a dozen settings, most of them
corporate lockdowns, and I've always been able to go to my manager and
say "To be more productive, I need this tool" and it would be loaded
the next day. The few times I've had to talk to an IT person to
explain the tool, I'd do it over lunch (my treat) and it would be on
my desktop the next morning. Saying you cannot get a tool you need
loaded on your machine is, essentially, saying that you cannot play
corporate politics. I'm assuming you can, which means this is a straw
man.
Rob
They make for good mnemonics, which isn't necessarily a bad thing for
people coming from languages without them or with fewer
- sebastian
Not every installed version of the above can handle Latin-1 by default. Since
many programmers have little control over their installed software, this
remains an issue. Also, the ability to do this within the application is
not well documented within many editors. Finally, most will of the above allow
you to paste in Latin-1 or even UTF-8 data, but the ability to actually
enter it from a keyboard using the editor is a completely different issue.
> > As I mentioned earlier, most programmers in a corporate environment have
> > limited access to system settings. Changing them in some cases can cause
> > reprimands or dismissal. Systems are often set up with the bare minimum
> > of locales and character sets necessary to do the job. Also, you have to
> > deal with the situations where programmers are connecting to *nix servers
> > through a variety of Windows-based XWindows servers (Exceed, Cygwin, etc.)
> > complicates what character sets are available immensely.
>
> I have worked as a contractor in almost a dozen settings, most of them
> corporate lockdowns, and I've always been able to go to my manager and
> say "To be more productive, I need this tool" and it would be loaded
> the next day. The few times I've had to talk to an IT person to
> explain the tool, I'd do it over lunch (my treat) and it would be on
> my desktop the next morning. Saying you cannot get a tool you need
> loaded on your machine is, essentially, saying that you cannot play
> corporate politics. I'm assuming you can, which means this is a straw
> man.
I don't think a programmer's skill (or lack thereof) in corporate politics
should be a prerequisite to experimenting in Perl 6. My bigger point is
about system settings which are typically locked down and not usually
sweet-talkable. Also, getting new software purchased can be a painfully
slow depending on the bureaucracy involved, and generally requires lots
of beers and lunches, or the right catastrophe, which could have been
prevented and/or repaired with the tool you want, to speed up the process.
Steve Peters
st...@fisharerojo.org
Sigils can't conflict with unary operators (like, say, the
stringification and flattening operators, ~ and *) and ideally
shouldn't conflict with binary ops either (although % breaks this
rule).
This has been done before several times on p6l, but I'll do it again:
Chr Term Operator
=== ==== ========
~ Stringify Concatenate
` Reserved for user Reserved for user
! Not
@ Array sigil Array sigil
# Comment Comment
$ Scalar sigil Scalar sigil
% Hash sigil Hash sigil, modulo
^ (Not sure) one() junction
& Subroutine sigil all() junction
* Unary splat Multiplication
( Open paren Subroutine call
) (technically unused) Close paren
- Negate Subtract
_ Identifier (technically unused)
= Iteration Assign
+ Numify Add
\ Take reference
| any() junction
[ Anonymous array Array index
{ Block Hash index
] (technically unused) Close square bracket
} (technically unused) Close curly bracket
; (technically unused) Statement delimiter, anonymous array
: Pair "super comma"
' Single quotes (technically unused)
" Double quotes (technically unused)
, (technically unused) List items
< quote words Less than
. Method call on topic Method call
> (technically unused) Greater than
/ Anonymous rule Divide
? Boolify
There are very few unary operators available, and none (besides the
user-defined backticks operator) unused in both term and operator
context.
--
Brent 'Dax' Royal-Gordon <br...@brentdax.com>
Perl and Parrot hacker
For me AltGr + C gives Copyright-symbol "©".
(SuSe 9.1, tested in konsole, kwrite and thunderbird)
--
Markus Laire
> In this case, I support the use of any international currency symbol
> for use as Perl sigils and/or operators as appropriate. Eg, we
> already use $ (dollar; unicode=0024; utf8=24) and ¥ (yen;
> unicode=00A5; utf8=C2A5), and I suggest that the next best one to
> exploit is ¤ (euro; unicode=20AC; utf8=E282AC), and the next best is £
> (pound; unicode=00A3; utf8=C2A3). In my experience, the ¢ (cent;
> unicode=00A3; utf8=C2A3) is no harder to type than either of those.
I haven't read this list for quite a long time, but do we already have
the yen sign as a sigil?
In Japan, there has been a big confusion between backslashes and yen
signs over two decades.
The code point 0x5c is a backslash in ASCII but it is the yen sign in
JISX0201.
When I display ASCII Perl program with my Japanese Windows' notepad, it
shows all the backslashes as yen signs.
Japanese Perl books sometimes tell:
"If you cannot find a backslash on your keyboard, use the yen sign".
Thus we usually think yen = ascii 005c,
my eyes are optimized to unify a backslash and a yen sign in program codes,
my finger is optimized to hit the yen key when my brain thinks of a
backslash.
It's already merged into my reflection :P
Yes, I know. Careful configuration of your editor should allow you to
distinguish ASCII 0x5c from JISX0201 0x5c.
But in Japan, only a very keen coding-system/character-set wizard can do
that.
Don't you have similar confusions with the pound sign in ISO-646 British
version?
> the next best is £ (pound; unicode=00A3; utf8=C2A3)
Isn't that 0x23 in UK? I imagine that someday all the comment lines
cause syntax errors in UK...
Sorry if this is an already discussed and solved issue.
--
Kaoru Maeda
ma...@tokyo.pm.org
I've just checked the windows Character Map, and ¢ (cent) is ALT-0162
( If it's not in your startmenu, do start -> run -> charmap )
It displays in Eclipse (3.1.1) whether the Text File Encoding is set to
Cp1252 (default) or UTF-8 or ISO-8859-1
Cheers,
Carl
My point is that there is a difference between the source file being in
Unicode and depending on characters outside of ASCII. If someone wants
to code using whatever Unicode characters they want, that's fine. Not
every computer or editor can do Unicode out of the box. The issue
starts when people are required to write code outside of ASCII and that
is not available.
>
> Also it's quite interesting how often was Latin-1 and UTF-8 used in the
> discussion interchangeably;
> "every source is Latin-1" is marginally better than "every source is
> ASCII", but we can do better.
>
> As for keyboard layouts: I don't think there is Yen sign on US keyboard
> either.
And that is as much of an issue.
> bra??o
>
> P.S. this e-mail should be sent in UTF-8.
And I see your name as "bra??o" :)
Actually, both work. That's where the issus with the documentation starts.
>
> It displays in Eclipse (3.1.1) whether the Text File Encoding is set to
> Cp1252 (default) or UTF-8 or ISO-8859-1
Older versions of Eclipse are not able to enter these characters. That's
where the copy and paste comes in.
Steve Peters
st...@fisharerojo.org
That's where upgrades come in.
In non-term, it's not a sigil. There cannot be two subsequent terms.
This is why it makes no sense to want sigils to be free in infix/op
position, and why % and ^ would work well (without ambiguity) as sigils.
> ^ (Not sure) one() junction
^ is available in prefix/term
> ( Open paren Subroutine call
open paren: grouping. The paren is the glyph, not its function.
Also, for the subcall to work, it's not all possible infix/op, but only
postix with no whitespace in between. Same for other .[] where [] is any
set of brackets, and the dot is implied.
> { Block Hash index
Block/hash
> < quote words Less than
Also, hash subscript.
> There are very few unary operators available, and none (besides the
> user-defined backticks operator) unused in both term and operator
> context.
But that isn't necessary. It's not as if % used in two ways is new, and
was already overstepping a boundary. It's perfectly normal to have one
glyph do very different things according to how/where it's used.
Strange, in any windows app on my machine, ALT-155 prints "o" with a
diagonal line through it (bottom left to upper right).
cent: ¢
not: ø
I wonder if it's a font issue?
Carl
Steve Peters
st...@fisharerojo.org
So, you are proposing that the Perl of the Unicode era be limited to
ASCII because a 15 year old editor cannot handle the charset? That's
like suggesting that operating systems should all be bootable from a
single floppy because not everyone has access to a CD drive.
Rob
Do you even need to ask? It's because it *looks cool* :)
We need *more* of these. I can't wait until the day when I can finally
code in overloaded Tagalog or Gujarati:
>
> But I may have to support your code. That's the issue.
>
Isn't perl6 assuming the source file is in UTF-8 unless explicitly specified
differently?
Also it's quite interesting how often was Latin-1 and UTF-8 used in the
discussion interchangeably;
"every source is Latin-1" is marginally better than "every source is ASCII",
but we can do better.
As for keyboard layouts: I don't think there is Yen sign on US keyboard
either.
I also use Slovak layout, which does not have backtick (only grave accent)
and all sigils but % are written with AltGr. So what. I got used to it.
On the other hand, there is ¤ sign. (That's U+00A4 Currency Sign -- hey, it
looks like little o. If ¢ is maimed c for class, then ¤ may be o for object.
Or universal-unspecified-i-dont-care-sigil.)
braňo
P.S. this e-mail should be sent in UTF-8.
<lurk>
For me too, but AltGr + shift + E gives ¢.
/Stefan Lidman
I saying that, since my up-to-date version of vi on my up-to-date OpenBSD
can't type, much less even allow me to paste in, a Latin-1 character, this
is an issue.
>> _one_ charachter. But Y, compared to ¥, is one charachter only as well,
>> and is even more visually distinctive with most fonts I know of, afaict,
>> so is there any good reason to keep the latter as the "official" one?!?
>
> Do you even need to ask? It's because it *looks cool* :)
Does it? Guillemets _do_ look kool, but I don't by the argument for the
Yen symbol...
Michele
--
>Is e+pi a rational or irrational number?
Yes, it is.
- Robert Israel in sci.math, "Re: A Number Problem"
You should report this bug. Hopefully, it will then be fixed before Perl
6 is released.
It looks like a lowercase c with a vertical line through it -- though the
vertical line is often slanted forward, so it looks like a c overtyped with
a slash ("/").
2. How can it be typed with X character composition, vim's digraphs and
> major international keyboards?
For vim, use CTRL-K C t
I can't address the other contexts.
=thom
"A painting in a museum hears more ridiculous opinions than anything else in
the world."
Edmond de Goncourt
> Again, I'd prefer not to be fired. Everything you have written above is
> not an option for the majority of the programmers out there. Also, not
> to helpful if you write your programs in TSO on an IBM mainframe.
In general true, but the cent sign was always part of EBCDIC and even
existed on the old card punch machines. It is these newfangled braces
and brackets that are not available on the 3270 terminal. Of course
you don't need them for PL/I. And BCPL uses $( and $) instead of {
and }, which makes it so much easier to type than these new Pascal and
C languages. Well, Pascal also allowed (* and *) for braces; can't
remember what it used for brackets.
Anyways, just pointing out that this is not a new discussion. :)
Cheers,
-Jan
I can't speak for anyone else, but personally I prefer ¥ because I don't
like infix operators that look like identifiers. It's idiosyncratic,
admittedly, but I dislike Pascal's "mod" and Perl5's "x" for the same
reason. Even with the ability to use Unicode names, ¥ can't be an
identifier, because it's not a letter, it's a currency symbol. Now that
we've opened up the Pandora's box of Unicode, we have lots more letters, but
also lots more non-letters, and I'd rather see the latter used for
operators.
Just my 2¢. :)
You're still using the base vi vs. vim?!? I didn't know people did
that when it wasn't 3am on Sunday when trying to fix a borked /etc ...
Huh!
Rob
If you're using stock vi rather than vim or elvis or at least nvi,
"up-to-date" doesn't apply. :) But the pasting problem has more to do with
your windowing and terminal environment, and I'd be surprised if there
weren't a simple tweak that would make it work for you.
I honestly don't know or care what flavor of vi I using, since it usually
changes depending on what *nix flavor I'm working on. I also don't think that
it should make a difference what editor I'm using with a programming language.
Others seem to think differently. C'est la vie.
Steve Peters
st...@fisharerojo.org
It won't make a difference. Even if you're in an environment where you
can neither type nor copy'n'paste the cent sign, you can still use the
ASCII version of the sigil. Sure, it's going to be one extra
keystroke, but that's not really a big issue - and even less so when
you consider that you probably won't be using the class sigil as often
as the others, anyway.
The amount of typing that was required for your emails in this thread
so far probably exceeds the amount of extry typing you'll have to do
to use the ASCII version of the sigil for your entire life already. :)
> Steve Peters
> st...@fisharerojo.org
--
schnee
For me, all that you have written above is correct. That still does not
fix that potential advocacy and documentation issues that are created by
this. Someone who is new to Perl 6 after its released may not know the
difference. That's the problem.
Steve Peters
st...@fisharerojo.org
Thom> On 10/20/05, Juerd <ju...@convolution.nl> wrote:
>> 2. How can it be typed with X character composition, vim's digraphs
>> and major international keyboards?
For X11 composition, where getting into compose state is up to your X
environment:
<compose>/c
In my case (for a more concrete example), that's "<ctrl-alt-space>/c".
--s.
My experience is that this isn't true: we use lots of external code, but
I still need to file requests with IT to get system-settings changed.
That said, I have no objection to Latin-1 sigils. So it's only your
argument that's bogus, not the conclusion ;-).
Brent 'Dax' Royal-Gordon wrote:
> Steve Peters <st...@fisharerojo.org> wrote:
>
>>~ seems to be available for a sigil, if my reading of S02 is correct, and
>>the cent sign is replacing :: in all cases. If not (that is $::foo is
>>still the global variable named foo) then * may also be available.
>
>
> Sigils can't conflict with unary operators (like, say, the
> stringification and flattening operators, ~ and *) and ideally
> shouldn't conflict with binary ops either (although % breaks this
> rule).
My 2¢ is that we should reap ^ from the one junction and promote it to
become the 'runtime type information carrier' sigil---like the wings
on the feet of Hermes/Mercury :)
And we should find an alternative to binary % which isn't very well
defined in it's abstract meaning---but I find that the 0/0 connotation
that it spawns in my infinitly twisted brain matches nicely with infinite
precision nums and I get the identities:
Undef ::= 0/0;
One ::= Any/Any # actually $x = any(1..Inf) && 1 == $x/$x
Inf ::= Inf/Inf # the other Undef :)
Type ::= All # the concept that is shared by all instances
# and represented by the one meta representative
and of course some mixed cases like
0 ::= 0/Any
Inf ::= Any/0
The none junction hasn't one single char infix creator either. Also the
all junction is in partial conflict with the & sigil. OTOH, many fear
that junctive auto-threading enters their functions. And the junctions
have got very well picked short names.
In other words a comparison like
if $x != $x { ... }
should *never* hit the nada operator. While
if &x != &x { ... }
could depending on the evaluation of the code &x refers to.
--
$TSa.greeting := "HaloO"; # mind the echo!
It is not necessary (or sane, but that's an opinion) to reap it from the
junction, because that's in infix/op position, while sigils are in
prefix/term position.
In Perl 5:
- % is a sigil and an infix operator
- * is a sigil and an infix operator
- & is a sigil and an infix operator
I do not see why $ and @ couldn't be both a sigil and an infix
operator, and the same goes for whatever ASCII equivalent ¢ gets.
^ and | are available for sigil use. (All the closing brackets are too,
but that would be very confusing because we tend to visually parse those
in pairs.)
Using the an infix operator's symbol as a sigil is not weird, not wrong,
not confusing and mostly: not a new idea.
Oh good, reduce the number of fears I have of working in a tightly
controlled corporate environment by one... bringing it to 499.
Luke
Rob
Indeed. Somehow I think this makes some sense:
sub Bool eqv (|T $x, |T $y) { ... }
Thanks,
/Autrijus/
Code page 437:
http://www.kostis.net/charsets/cp437.htm
On Fri, Oct 21, 2005 at 06:07:47AM -0500, Steve Peters wrote:
> On Fri, Oct 21, 2005 at 09:42:00AM +0100, Carl Franks wrote:
> > Where did you get ALT-155 from?
> >
> > I've just checked the windows Character Map, and ¢ (cent) is ALT-0162
> > ( If it's not in your startmenu, do start -> run -> charmap )
>
> Actually, both work. That's where the issus with the documentation starts.
"what he says"
This is going to be hard to document well.
For example, *I* know why the leading zero is significant on ALT-0162, but
how many people are going to assume that it's not?
Anyone care to save to a file called AUX.TXT on Windows?
And for anyone who says "upgrade", please note that many firms in the real
world are still forcing a base perl version of 5.005_03 or 5.6.1 for
development. Still.
The active perl "community" is not wholly representitive of the global usage
of perl, and would do well to remember this. For example, see
http://use.perl.org/~barbie/journal/27098
Nicholas Clark
> And for anyone who says "upgrade", please note that many firms in the real
world are still forcing a base perl version of 5.005_03 or 5.6.1 for
development. Still.
My weekend project is to demonstrate that you are an optimist. Really.
On Thu, Oct 20, 2005 at 04:02:10PM -0700, Darren Duncan wrote:
> that the next best one to exploit is ¤ (euro;
> unicode=20AC; utf8=E282AC), and the next best is
Woah. You've just demonstrated why Euro is far worse than any of the other
"Unicode" characters so far suggested. You mail headers say:
Content-Type: text/plain; charset="iso-8859-1" ; format="flowed"
The symbol in your message *as sent* is the international currency symbol,
U00A4. The Euro symbol is not part of ISO-8859-1.
(ISO-8859-15 yes, but that's about 10 years more recent)
ISO-8859-1 has been the default standard for the character set on most
Internet protocols for a long time, and many systems for the past 10+ years
have supported it by default (most Unix variants, Windows 3.1, I think.
DOS boxes were CP437, but native Windows was (extended) ISO-8859-1)
This cannot be said for ISO-8859-15. So I can see little reason why any
currently operational system will be incapable of displaying the ISO-8859-1
operators in scripts or CPAN modules correctly, even if the editor the
maintenance programmer (or sysdamin) is constrained to entering the ASCII
digraphs.
But there will be a lot of systems out there where this is not true for the
Euro symbol, and the assumption of ISO-8859-1 defaults will mean that this
won't be the last time that Euro symbols are going to get mangled during
transit, with all the ensuing pain, frustration, losses and defections to
other languages that this will cause.
Perl 5 runs everywhere: http://www.cpan.org/ports/index.html
Perl 6 is intended to be an improvement on Perl 5. It would be a shame to
design in restrictions on portability.
Nicholas Clark
U+00A3 "POUND SIGN" is at 0x23 in ISO 646-GB (aka BS 4730), true.
Fortunately, that character set is almost never used. I think the last
time I encountered it was on a dot-matrix printer manufactured in the
1980s.
Hmmm. Encode.pm doesn't seem to have support available for any of the
ISO 646 character sets. I feel a patch coming on.
--
Aaron Crane
Um, that's not what I'm hearing.
To type in a Unicode character requires machinations beyond just
hitting a labelled key on the keybourd. There are no standards
for these machinations - what must be done is different for
Windows vs. Linux, and different for specific applications
(text-mode mutt vs. xvi vs. Eclipse vs. ...).
So, a book can't just show code and expect the reader to be
able to use it, and no book is going to be able to tell all
of its users how to type the characters because there are so
many different ways.
Any serious programmer will be able to sort out how to do
things but casual programmers won't be typing the extended
characters enough to learn how to do it without looking it
up each time. Proprammers that use many different computers
and applications will have difficulty remembering which of
the varous incantations happen to work on the system they're
currently using. People who do sort out a good working
environment will be at a loss when they occassionally have to do
something on a different system and no longer know how to type
"basic" characters. (But since in their normal environment they
do know how, they may never have known the ASCII workarounds,
so they'll have to look them up.) I've gotten away from
programming enough that I often have to look up a function
or operator definition to check on details; but that is much
less disruptive to the thought process than having to look up
how to type a character.
I think that the reasons for using Unicode characters are good
ones and that there is no good alternative. However, doing
so does make Perl less accessable for casual programmers.
(While we may deride the Learn to Web Program in 5 Minutes
crowd, that did get many people involved with Perl, and I'm
sure some of them evolved beyond those limited roots, just
as an earlier generation of programmers had some who evolved
beyond their having started with Basic into nonetheless becoming
competent and knowledgeable craftsmen.)
We need to have a "Why Unicode is the lesser of evils" document
to refer to whenever this issue rizes up again. The genuine
problems involved ensure that the issue will continue to arise,
so we can't just get mad at the people who raise it.
--
Actually, what you point out in my message is a
limitation of my email client, which I didn't
realize existed until now.
I then did a bit of research, and apparently the
newest Eudora doesn't support customization of
what character set messages are composed with,
always sending them using ISO-8859-1. This is
apparently a an issue that many Eudora users
requested fixed but haven't been addressed.
This said, sending UTF8 files as email
attachments, rather than UTF8 in the message
body, still works fine, AFAIK, as does
transmitting them by other ways such as http or
ftp etc.
And my normal text editor handles UTF8 correctly.
Also, apparently some other email clients handle UTF8 properly.
So my email client failed me, but my point still
stands that Unicode characters should still be
embraced in Perl 6. I just need to replace my
email client if I want to type them into the
message body.
-- Darren Duncan
The List::MoreUtils CPAN module does provide
this, and is cited in Perl Best Practices as
doing so also. -- Darren Duncan
> Indeed. Somehow I think this makes some sense:
>
> sub Bool eqv (|T $x, |T $y) { ... }
Except that it prevents anyone from ever writing:
multi sub circumfix:<| |> (Num $x) { return abs $x }
multi sub circumfix:<| |> (Vec $x) { return $x.mag }
which many mathematically inclined folks might find annoying.
(It also precludes intriguing possibilities like:
multi sub circumfix:«| >» ($q) { return Quantum::State.new(val => $q) }
which I personally would find irritating. ;-)
Damian
Dave Whipp wrote:
> My experience is that this isn't true: we use lots of external code,
> but I still need to file requests with IT to get system-settings changed.
Right. We rely on Perl libraries from CPAN, and elsewhere. You
have to make sure that the code you are looking at is transfered
via utf-8 aware systems only. It is not safe that we decide to
use ASCII synonyms ourselves. We have to be sure that all the
modules, which happen to have Unicode sigils/ops, should be
installed without intervening legacy systems.
Explanation of the situation in Japan follows. Those who are not
interested in Japan can skip. Seemingly this problem is very unique
to Japan.
(It's already one year since yen sign became zip-operator.
This is not to kick an argument, just a whining of mine. :P)
The problem doesn't reside in writing code but in carrying files.
- You cannot tell whether a text file is in US-ASCII, utf8,
or ShiftJIS, when all the code points are below 0x7f. It
is too late when you receive a code snippet from your
colleague by mail.
- If we convert yen from Latin-1 (0xa5) to Unicode
(utf8=c2a5), then to "the default coding system, which is
believed to be ASCII but actually ShiftJIS", it becomes
0x5c. There's no way to tell whether the byte was a
bachslash or a yen at the beginning.
Grepping for yen signs doesn't help because at the time you run
grep, they are already backslashes.
If we find a lot of yen signs as zip-operators in the standard
library, Japanese would have a big question: "Give up either
Perl6 or Windows. Which do we need?" And I suppose the answer
would be "We have a lot of substitutes to Perl6: Ruby, Perl5,
etc."
In <20040321000...@wall.org> Larry wrote:
> (Of course, we'll leave out the little problem that half the people
> in Japan would read it as a backslash wannabe...that's not really
> a problem since a zipper would only be used where an operator is
> expected, and backslash is illegal there (so far).)
It is not the people who read a yen as a backslash, but the
legacy systems. We might define backslash as a synonym for the
zip op, but it's too risky. "Yen as zip" has the same magnitude
of risk in Japan.
--
Kaoru Maeda
ma...@tokyo.pm.org
Dave Whipp wrote:
> My experience is that this isn't true: we use lots of external code,
> but I still need to file requests with IT to get system-settings changed.
Right. We rely on Perl libraries from CPAN, and elsewhere.
You have to make sure that the code you are looking at is
transfered via utf-8 aware systems only.
It is not safe that we decide to use ASCII synonyms ourselves.
We have to be sure that all the modules, which happen to
have Unicode sigils/ops, should be installed without intervening
legacy systems.
Explanation of the situation in Japan follows. Those who are not
interested in Japan can skip. Seemingly this problem is very unique
to Japan. It's already one year since yen sign became zip-operator.
This is not to kick a discussion, just a whining of mine. :P
Ancient ISO-646 allowed variants, which substitute certain part of ASCII characters
with local symbols. Currency signs were the first candidates of this.
http://en.wikipedia.org/wiki/ISO_646
This legacy convention is still alive in Japan as JIS/ShiftJIS encodings.
I hope Unicode supercedes them and the "backslash-yen" confusion would disappear,
but the movement is not quick enough.
The problem doesn't reside in writing code but in carrying files.
- You cannot tell whether a text file is in US-ASCII, utf8,
or ShiftJIS, when all the code points are below 0x7f. It is too
late when you receive a code snippet from your colleague by mail.
- If we convert yen from Latin-1 (0xa5) to Unicode
(utf8=c2a5), then to "the default coding system,
which is believed to be ASCII but actually
ShiftJIS", it becomes 0x5c. There's no way to tell
whether the byte was a bachslash or a yen at the beginning.
Grepping for yen signs doesn't help because at the time you
run grep, they are already backslashes.
If we find a lot of yen sign as zip-operator in the standard library,
we have a big question: "Give up either Perl6 or Windows. Which do we abandon?"
And I suppose the answer would be "We have a lot of substitutes to Perl6:
Ruby, Perl5, etc."
In Japan, yes is synonym to backslash. We wish to retain this legacy.
Zip-operator is far less important than regex-escape, string-escape, and
take-reference operator.
--
Kaoru Maeda
ma...@tokyo.pm.org
Thank you for raising this issue and sorry for not raising this myself.
On Oct 22, 2005, at 19:42 , Kaoru Maeda wrote:
> If we find a lot of yen sign as zip-operator in the standard library,
> we have a big question: "Give up either Perl6 or Windows. Which do
> we abandon?"
> And I suppose the answer would be "We have a lot of substitutes to
> Perl6:
> Ruby, Perl5, etc."
>
> In Japan, yes is synonym to backslash. We wish to retain this legacy.
> Zip-operator is far less important than regex-escape, string-
> escape, and
> take-reference operator.
To make the matter worse, there are not just one "yen sign" in
Unicode. Take a look at this.
¥ U+00A5 YEN SIGN
¥ U+FFE5 FULLWIDTH YEN SIGN
Tough they look and groks the same to human, computers handle them
differently. This happened when Unicode Consortium decided to make
BMP round-trippable against legacy encodings. They were distinct in
JIS standards, so happened Unicode.
Maybe we should avoid other symbols like this for sigils -- those not
in ASCII that have 'fullwidth' variations. q($) and q(\) are okay
(or too late) because they are already in ASCII. q(¥) should be
avoided because you can hardly tell the difference from q(¥) in the
display.
But this will also outlaw the cent sign. I have attached a list of
those affected. As you see, most are with ASCII equivalents but some
are not.
Dan the Man with Too Many Signs to Deal With
% grep FULLWIDTH /usr/local/lib/perl5/5.8.7/unicore/Name.pl | perl -
Mencoding=utf8 -aple '$_=chr(hex($F[0]))."\t".$_'
! FF01 FULLWIDTH EXCLAMATION MARK
" FF02 FULLWIDTH QUOTATION MARK
# FF03 FULLWIDTH NUMBER SIGN
$ FF04 FULLWIDTH DOLLAR SIGN
% FF05 FULLWIDTH PERCENT SIGN
& FF06 FULLWIDTH AMPERSAND
' FF07 FULLWIDTH APOSTROPHE
( FF08 FULLWIDTH LEFT PARENTHESIS
) FF09 FULLWIDTH RIGHT PARENTHESIS
* FF0A FULLWIDTH ASTERISK
+ FF0B FULLWIDTH PLUS SIGN
, FF0C FULLWIDTH COMMA
- FF0D FULLWIDTH HYPHEN-MINUS
. FF0E FULLWIDTH FULL STOP
/ FF0F FULLWIDTH SOLIDUS
0 FF10 FULLWIDTH DIGIT ZERO
1 FF11 FULLWIDTH DIGIT ONE
2 FF12 FULLWIDTH DIGIT TWO
3 FF13 FULLWIDTH DIGIT THREE
4 FF14 FULLWIDTH DIGIT FOUR
5 FF15 FULLWIDTH DIGIT FIVE
6 FF16 FULLWIDTH DIGIT SIX
7 FF17 FULLWIDTH DIGIT SEVEN
8 FF18 FULLWIDTH DIGIT EIGHT
9 FF19 FULLWIDTH DIGIT NINE
: FF1A FULLWIDTH COLON
; FF1B FULLWIDTH SEMICOLON
< FF1C FULLWIDTH LESS-THAN SIGN
= FF1D FULLWIDTH EQUALS SIGN
> FF1E FULLWIDTH GREATER-THAN SIGN
? FF1F FULLWIDTH QUESTION MARK
@ FF20 FULLWIDTH COMMERCIAL AT
A FF21 FULLWIDTH LATIN CAPITAL LETTER A
B FF22 FULLWIDTH LATIN CAPITAL LETTER B
C FF23 FULLWIDTH LATIN CAPITAL LETTER C
D FF24 FULLWIDTH LATIN CAPITAL LETTER D
E FF25 FULLWIDTH LATIN CAPITAL LETTER E
F FF26 FULLWIDTH LATIN CAPITAL LETTER F
G FF27 FULLWIDTH LATIN CAPITAL LETTER G
H FF28 FULLWIDTH LATIN CAPITAL LETTER H
I FF29 FULLWIDTH LATIN CAPITAL LETTER I
J FF2A FULLWIDTH LATIN CAPITAL LETTER J
K FF2B FULLWIDTH LATIN CAPITAL LETTER K
L FF2C FULLWIDTH LATIN CAPITAL LETTER L
M FF2D FULLWIDTH LATIN CAPITAL LETTER M
N FF2E FULLWIDTH LATIN CAPITAL LETTER N
O FF2F FULLWIDTH LATIN CAPITAL LETTER O
P FF30 FULLWIDTH LATIN CAPITAL LETTER P
Q FF31 FULLWIDTH LATIN CAPITAL LETTER Q
R FF32 FULLWIDTH LATIN CAPITAL LETTER R
S FF33 FULLWIDTH LATIN CAPITAL LETTER S
T FF34 FULLWIDTH LATIN CAPITAL LETTER T
U FF35 FULLWIDTH LATIN CAPITAL LETTER U
V FF36 FULLWIDTH LATIN CAPITAL LETTER V
W FF37 FULLWIDTH LATIN CAPITAL LETTER W
X FF38 FULLWIDTH LATIN CAPITAL LETTER X
Y FF39 FULLWIDTH LATIN CAPITAL LETTER Y
Z FF3A FULLWIDTH LATIN CAPITAL LETTER Z
[ FF3B FULLWIDTH LEFT SQUARE BRACKET
\ FF3C FULLWIDTH REVERSE SOLIDUS
] FF3D FULLWIDTH RIGHT SQUARE BRACKET
^ FF3E FULLWIDTH CIRCUMFLEX ACCENT
_ FF3F FULLWIDTH LOW LINE
` FF40 FULLWIDTH GRAVE ACCENT
a FF41 FULLWIDTH LATIN SMALL LETTER A
b FF42 FULLWIDTH LATIN SMALL LETTER B
c FF43 FULLWIDTH LATIN SMALL LETTER C
d FF44 FULLWIDTH LATIN SMALL LETTER D
e FF45 FULLWIDTH LATIN SMALL LETTER E
f FF46 FULLWIDTH LATIN SMALL LETTER F
g FF47 FULLWIDTH LATIN SMALL LETTER G
h FF48 FULLWIDTH LATIN SMALL LETTER H
i FF49 FULLWIDTH LATIN SMALL LETTER I
j FF4A FULLWIDTH LATIN SMALL LETTER J
k FF4B FULLWIDTH LATIN SMALL LETTER K
l FF4C FULLWIDTH LATIN SMALL LETTER L
m FF4D FULLWIDTH LATIN SMALL LETTER M
n FF4E FULLWIDTH LATIN SMALL LETTER N
o FF4F FULLWIDTH LATIN SMALL LETTER O
p FF50 FULLWIDTH LATIN SMALL LETTER P
q FF51 FULLWIDTH LATIN SMALL LETTER Q
r FF52 FULLWIDTH LATIN SMALL LETTER R
s FF53 FULLWIDTH LATIN SMALL LETTER S
t FF54 FULLWIDTH LATIN SMALL LETTER T
u FF55 FULLWIDTH LATIN SMALL LETTER U
v FF56 FULLWIDTH LATIN SMALL LETTER V
w FF57 FULLWIDTH LATIN SMALL LETTER W
x FF58 FULLWIDTH LATIN SMALL LETTER X
y FF59 FULLWIDTH LATIN SMALL LETTER Y
z FF5A FULLWIDTH LATIN SMALL LETTER Z
{ FF5B FULLWIDTH LEFT CURLY BRACKET
| FF5C FULLWIDTH VERTICAL LINE
} FF5D FULLWIDTH RIGHT CURLY BRACKET
~ FF5E FULLWIDTH TILDE
⦅ FF5F FULLWIDTH LEFT WHITE PARENTHESIS
⦆ FF60 FULLWIDTH RIGHT WHITE PARENTHESIS
¢ FFE0 FULLWIDTH CENT SIGN
£ FFE1 FULLWIDTH POUND SIGN
¬ FFE2 FULLWIDTH NOT SIGN
 ̄ FFE3 FULLWIDTH MACRON
¦ FFE4 FULLWIDTH BROKEN BAR
¥ FFE5 FULLWIDTH YEN SIGN
₩ FFE6 FULLWIDTH WON SIGN
In addition to your handy table, the >> and << french quotes, which are used
quite heavily in Perl 6 for both bracketing and hyper operators, also have
full width equivalents:
300A;LEFT DOUBLE ANGLE BRACKET;Ps;0;ON;;;;;Y;OPENING DOUBLE ANGLE BRACKET;;;;
300B;RIGHT DOUBLE ANGLE BRACKET;Pe;0;ON;;;;;Y;CLOSING DOUBLE ANGLE BRACKET;;;;
Half width: «»
Full width: 《》
There is no way to type out the half-width yen and double angle brackets under
MSWin32, under either the traditional or simplified code pages; only full width
variants are available.
One way to approach it is to make Perl 6 accept both full- and
half-width variants.
Another way would be to use ASCII fallbacks exclusively in real programs, and
reserve unicode variants for pretty-printing, the same way that PLT Scheme and
Haskell recognizes λ in literatures, but actually write "lambda" and
"\" respectively
in everyday coding.
TIMTOWTDI. :)
Thanks,
/Autrijus/
Isn't this starting to be the question of why we have the Unicode
operators instead of just functions? Would it be possible to have a
function be infix?
Rob
> If we find a lot of yen signs as zip-operators in the standard
> library, Japanese would have a big question: "Give up either
> Perl6 or Windows. Which do we need?" And I suppose the answer
Hmmm, begins to sound interesting... ;-P
Michele
--
voices
you're letting voices tell you what to do
when you yourself don't know
- Pennywise, "Come Out Fighting".
Luke Palmer wrote:
> On 10/20/05, Larry Wall <la...@wall.org> wrote:
>
>>Another thing I didn't mention is that that binds both the variable
>>and its class. But the $ variable is of course optional after the
>>type, so you could just write that
>>
>> sub sametype (¢T, ¢T) {...}
>>
>>if you don't actually care about $x and $y. Basically, ¢T captures
>>the type of the associated scalar in any lvalue or declarative context,
>>whether or not hte scalar itself is captured.
Does this capturing of the type into ¢T also involve runtime
code template expansion? That is, if sametype(Int,Int) didn't
exist it would be compiled on the fly for a call sametype(3,2)?
Which brings up the question if ¢T will be allowed in multi defs?
And how does it influence dispatch then? Can type variables be
constrained with where clauses?
> So it's a type position thing if it can be. Good. (I wonder if,
> since it's allowed in term position, we will come up with ambiguities)
>
> How about this:
>
> sub foo(c|T $x) {
> my sub util (c|T $in) {...}
> util($x)
> }
>
> Is that c|T in util() a new, free type variable, or am I asserting
> that the type of util()'s argument must be the same type as $x?
I would guess there are two distinct ¢foo::T and ¢foo::util::T free
type variables. In the call of util($x) the type reference is handed
or rebound down the call chain just like value refs. BTW, will there
be a topic type ¢_, grammar type ¢/ and the exception type ¢! as well?
What operations are available for type variables? E.g. ¢foo <= ¢bar could
be the subtype relation. But what would ¢foo + ¢bar mean? Is ¢foo - ¢bar
the dispatch distance? Is the compiler obliged to separate type variables
from value variables? Or does
$foo = \¢bar;
produce a type reference? How would that be dereferenced then? Is the type
inferencer in the compiler automatically calculating a supertype bound
for every expression? If yes, how is that accessable?
I think that's up to the implementation. From the language
perspective, no, it behaves as though it was compiled once. But an
implementation is free to instantiate the routine for various types
for optimization.
> Which brings up the question if ¢T will be allowed in multi defs?
Good question. I believe the ordering multi algorithm can be extended
to handle it, but I'll have to think about what it means.
> > So it's a type position thing if it can be. Good. (I wonder if,
> > since it's allowed in term position, we will come up with ambiguities)
> >
> > How about this:
> >
> > sub foo(c|T $x) {
> > my sub util (c|T $in) {...}
> > util($x)
> > }
> >
> > Is that c|T in util() a new, free type variable, or am I asserting
> > that the type of util()'s argument must be the same type as $x?
>
> I would guess there are two distinct ¢foo::T and ¢foo::util::T free
> type variables.
Hmm, yeah, that makes sense, but it can also be annoying. For
instance, in Haskell, I wrote this:
closure :: (Ord a) => (a -> [a]) -> [a] -> [a]
clsoure f init = closure' Set.empty init
where
closure' :: (Ord a) => Set a -> [a] -> [a]
closure' set [] = []
closure' set (x:xs) = ...
This gives me a type error on closure', because the inner "a" is
different from the outer "a". Incidentally, there is no signautre
that closure' can possibly have. So I was forced to leave off the
signature and let the type inferencer do the work. In this case it
would have been nice to have the variable carry over to inner clauses.
But letting that happen also has problems. You can't freely move code
around, because you depend on the type variables that were bound in
outer scopes. However, if the number of "type topicalizers" (as it
were) is small, then maybe that's okay.
> In the call of util($x) the type reference is handed
> or rebound down the call chain just like value refs. BTW, will there
> be a topic type ¢_, grammar type ¢/ and the exception type ¢! as well?
The topic type ¢_ is discussed in theory.pod. I don't see much use
for the others (there is no @/ or @!, for instance).
> What operations are available for type variables? E.g. ¢foo <= ¢bar could
> be the subtype relation. But what would ¢foo + ¢bar mean?
Nothing.
Perhaps ¢foo (+) ¢bar is a union type, but I don't think it should be.
Again, see theory.pod for formalisms of the difference between things
that are in type variables and the types you declare in the program.
Essentially the things that are in type variables are only
instantiable, concrete types, whereas the types you declare in the
program are more like interfaces. There is no concept of a subtype
in the concrete world; only in the interface world. But theory.pod
isn't gospel (yet ;-).
> Is ¢foo - ¢bar the dispatch distance?
Especially not since that concept doesn't exist anymore.
> Is the compiler obliged to separate type variables from value variables? Or does
>
> $foo = \¢bar;
>
> produce a type reference? How would that be dereferenced then? Is the type
> inferencer in the compiler automatically calculating a supertype bound
> for every expression? If yes, how is that accessable?
Hmm, don't know about that. Exactly how "first-class" are type variables?
Luke
I considered | last week, but decided it was better to hold unary | in
reserve, especially since it's likely to be confusing with junctions.
And if we use | for type set notation, then unary | would preclude
the ability to stack types, and I've been treating an utterance like
my Mammal ¢T $fido where Bark :($a,$b,$c --> Wag)
as having at least five implicitly ANDed type specifications:
must do Mammal
must do Class
must do Scalar
must do Bark
must do Wag
plus there must be three components that are Scalar, plus whatever
extra type constraints Wag puts onto those three components. Having
Mammal |T be ambiguous with Mammal|T would be bad, at least visually.
Anyway, having mulled over all this while off in Amsterdam and
Budapest, my current thinking is that the ascii shortcut for ¢T is
simply "class T", so you could write any of:
sub Bool eqv (¢T $x, T $y)
sub Bool eqv (class ¢T $x, T $y)
sub Bool eqv (Any ¢T $x, T $y)
sub Bool eqv (Any class T $x, T $y)
and mean the same thing.
Basically, ¢T is a close analog of &t, which is the variableish form
for "sub t". When used in a declaration, both of them introduce a
bare name as an alias into whatever scope the declaration is inserting
symbols, albeit with different syntactic slots. So just as
my &t := { ... }
introduces the possibility of
t 1,2,3
so also a
my ¢T := sometype();
introduces the possibility of
my T $x;
Use as an rvalue can be either T or ¢T without declaring a new type.
We're probably converging on a general rule that two or more
declarations of the same variable in the same scope refer to the
same entity:
my $x = 1; # outer $x;
{
$x = 2; # bound to OUTER::<$x>
if my $x = foo() {...} # new $x declared here
if my $x = bar() {...} # same $x, "my" is optional
baz(); # baz sees single inner CALLER::<$x>.
}
So too these would mean the same thing:
sub Bool eqv (¢T $x, T $y) { my T $z; }
sub Bool eqv (¢T $x, ¢T $y) { my ¢T $z; }
Only the first declarative ¢ actually installs a new symbol T.
An inner scope would of course establish its own type space, but
the formal parameters to a block count as part of the block, which
is why the second form above applies the existing T to $z rather
than capturing the type of $z. But it's a bit like writing &foo()
when you could just say foo() instead.
Larry
We'd have to outlaw A..Z as well. :-)
I think a better plan might just be to say that we'll treat any fullwidth
character as equivalent to its narrow companion, at least when used as
an operator. Canonicalizing identifiers may be another matter though.
On the other hand, certain of the double-width characters are likely to
be confused with two singles, such as
= FF1D FULLWIDTH EQUALS SIGN
_ FF3F FULLWIDTH LOW LINE
so maybe they should be equivalent to == and __, or outlawed.
And one could (un)reasonably argue that
~ FF5E FULLWIDTH TILDE
ought to mean ~~ rather than ~. But in general we need to go slow
on such decisions. For now just sticking our toe into Latin-1
is enough, as long as we're looking ahead for visual pitfalls.
As for the ¥ pitfall, so far we've intentionally been careful to use
it only where an operator is expected, whereas \ is legal only where a
term is expected. So at least for Perl code, we can translate legacy
¥ to different codepoints. (Whether the Japanese font distinguishes
them is another issue, of course. I have a "Unicode" font on my
machine that prints backslash as ¥, which I find slightly irritating,
but doubtless will be par for the course in Japan for the foreseeable
future. Maybe that's a good reason to allow the doublewith backslash
as an alias for normal backslash. Maybe not.)
Anyway, I think people will be able to distinguish visually between
"A ¥ B" and "¥X" as long as we keep the operator/term distinction.
Larry
I think we actually speculated about that identity in the Apocalypse.
: > One way to approach it is to make Perl 6 accept both full- and
: > half-width variants.
: >
: > Another way would be to use ASCII fallbacks exclusively in real programs, and
: > reserve unicode variants for pretty-printing, the same way that PLT Scheme and
: > Haskell recognizes λ in literatures, but actually write "lambda" and
: > "\" respectively
: > in everyday coding.
I think we should enable both approaches. Restricting Unicode characters
to literature is wrong, but so is forcing Unicode on someone prematurely.
On Sun, Oct 23, 2005 at 07:07:33PM -0400, Rob Kinyon wrote:
: Isn't this starting to be the question of why we have the Unicode
: operators instead of just functions? Would it be possible to have a
: function be infix?
At which precedence level?
Larry
I'm assuming that when you allow
my ¢T := sometype();
you're also allowing
my class T := sometype();
So, what happens when stupid me names a class "class" through
symbol-table craziness?
Rob
...
...
Cool!
> So too these would mean the same thing:
>
> sub Bool eqv (¢T $x, T $y) { my T $z; }
> sub Bool eqv (¢T $x, ¢T $y) { my ¢T $z; }
I like that symmetry between &foo and ¢foo. So to get the behavior
that an outer type variable applies to an inner sub, could I do this:
# a complicated identity function :-)
sub foo (¢T $x --> ¢T) {
my sub bar (T $z --> T) {
$z;
}
bar $x;
}
Because omitting the ¢ would not bind T. Whereas if I wrote:
sub foo (¢T $x --> ¢T) {
my sub bar (¢T $z --> T) {
$z;
}
bar $x;
}
It would be a totally new variable in both spots in the inner sub, and
if I wrote:
sub foo (¢T $x --> ¢T) {
my sub bar (T $z --> ¢T) {
$z;
}
bar $x;
}
It would be equivalent to:
sub foo (¢T $x --> ¢T) {
my sub bar (T $z --> ¢U) {
$z;
}
bar $z;
}
(Thus causing a "signature too general" type error)
Right? Totally off?
Luke
Yes, that's the idea.
: So, what happens when stupid me names a class "class" through
: symbol-table craziness?
How much class could a class class class if a class class could class class?
What happens is either that you have to say "class class" or "¢class"
or you redefine the "class" keyword to something else like "frobnitz".
I think "class" and "sub" are keywords in the, er, class of things
that trump mere symbol table entries. Either that, or "class" is merely
the name of the metaclass, and you'll get a class collision when
you try to redefine it. But I expect "class" is really a declarator
of the same status as "sub", at least syntactically.
Larry
You can even do
sub foo (¢T $x --> T) {
my sub bar (T $z --> T) {
$z;
}
bar $x;
}
I do believe.
> Whereas if I wrote:
>
> sub foo (¢T $x --> ¢T) {
> my sub bar (¢T $z --> T) {
> $z;
> }
> bar $x;
> }
It would be semantically the same as above. (just like C<my $x; my $x>
would only declare one C<$x>, so too C<¢T $x ... ¢T $y> should only
bind one type to T (or ¢T) for the duration of the scope.
> It would be a totally new variable in both spots in the inner sub, and
> if I wrote:
>
> sub foo (¢T $x --> ¢T) {
> my sub bar (T $z --> ¢T) {
> $z;
> }
> bar $x;
> }
>
> It would be equivalent to:
>
> sub foo (¢T $x --> ¢T) {
> my sub bar (T $z --> ¢U) {
> $z;
> }
> bar $x;
> }
I don't think so. In the first example all the T (or ¢T) are the same
type after the first ¢T (where the type is bound). In the second one
you'd get two separate types ¢T and ¢U. But ¢U would probably get bound
to the same type as ¢T as that's the type of thing that it returns
(assuming perl can figure that out).
That's if I understand Larry correctly.
-Scott
--
Jonathan Scott Duff
du...@pobox.com
<snip examples from luqui of type variables being used multiple times
with and without sigils>
> I don't think so. In the first example all the T (or ㎡) are the same
> type after the first ㎡ (where the type is bound). In the second one
> you'd get two separate types ㎡ and ㎎. But ㎎ would probably get bound
> to the same type as ㎡ as that's the type of thing that it returns
> (assuming perl can figure that out).
We have (or have had?) parameterised classes where you can specify
parameters to the class enclosed in [].
eg. class Foo[...] { ... }
So couldn't the same be used for functions? This way you wouldn't need
a special sigil for classes declared in such a way.
sub foo[Bar] (Bar $tab) { ... }
Since perl6 isn't really a static language, I don't think you need to be
allowed to have non-type variables in the [] (dependent-typing, or where
you can use primitive types like int in template parameters in C++),
since being parameters in [] means only that they're types, and not that
they are always bound at compile time.
(apologies for breaking the unicode)
--
Benjamin Smith <bsm...@vtrl.co.uk, bsm...@cpan.org, bs...@srcf.ucam.org>
Christ's College - Mathematics Part 1B
IRC: integral on irc.perl.org, and irc.freenode.net (channel: #perl)
I thought that, too, until I realized it wouldn't work as an rvalue:
^T.count # 1's complement of number of T instances
On top of which, if it did work, it should be a placeholder variable,
not something you see in a signature.
Larry
It's a new T according to the current thinking. Just use T if you
want the same one. (But that does force util to be recloned on every
entry to foo, I expect.)
Larry