Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

charnames extension proposal

0 views
Skip to first unread message

H.Merijn Brand

unread,
Sep 26, 2002, 5:40:09 AM9/26/02
to Jarkko Hietaniemi, Perl 5 Porters
I've got some projects now where I need a lot of Unicode, and typeing the full
names is a tedious and unneeded way to proceed.

We are dealing with a limited subset of the Unicode set and have a list of
unique names for the characters we use for over 12 years now, and I wanted to
make aliases for these to Unicode official names.

The proposed change allows several new features to charnames:

Addition 1:

--8<---

use charnames ":full", {
my_name => "FULL UNICODE OFFICIAL NAME",
e_ACUTE => "LATIN SMALL LETTER E WITH ACUTE",
};

-->8---

Addition 2:

Given a file named "unicore/pro_alias.pl", findable in @INC filled like:

--8<--- /pro/lib/perl5/site_perl/5.8.0/unicore/pro_alias.pl
#!/usr/bin/perl

(
A_GRAVE => "LATIN CAPITAL LETTER A WITH GRAVE",
A_CIRCUM => "LATIN CAPITAL LETTER A WITH CIRCUMFLEX",
A_DIAERES => "LATIN CAPITAL LETTER A WITH DIAERESIS",
A_TILDE => "LATIN CAPITAL LETTER A WITH TILDE",
A_BREVE => "LATIN CAPITAL LETTER A WITH BREVE",
A_RING => "LATIN CAPITAL LETTER A WITH RING ABOVE",
A_MACRON => "LATIN CAPITAL LETTER A WITH MACRON",
:
:
lMDOT_IDX => "LATIN SMALL LETTER L WITH MIDDLE DOT",
lSTROKE_IDX => "LATIN SMALL LETTER L WITH STROKE",
oSLASH_IDX => "LATIN SMALL LETTER O WITH STROKE",
SMALL_OE_IDX => "LATIN SMALL LIGATURE OE",
RINGELS_IDX => "LATIN SMALL LETTER SHARP S",
SMALL_THORN_IDX => "LATIN SMALL LETTER THORN",
tSTROKE_IDX => "LATIN SMALL LETTER T WITH STROKE",
SMALL_ENG_IDX => "LATIN SMALL LETTER ENG",
IEM_IDX => "INVERTED EXCLAMATION MARK",
)
-->8---

I can now do

use charnames ":pro";

if a ":name" is the *only* argument, it is automatically promoted to ":full"
after the aliasses have been read. Otherwise, you have to support it yourself:

use charnames ":short", ":pro";

The anonymous HASH is only supported as last argument.
Once that is done, the :name is only supported as last argument.

use charnames ":short", ":pro",
{ A_TILDE => "LATIN CAPITAL LETTER A WITH TILDE" };

IMHO *very* useful. Opinions?

--- lib/charnames.pm 2002-05-31 13:07:59.000000000 +0200
+++ lib/charnames.pm 2002-09-26 11:20:46.000000000 +0200
@@ -38,8 +38,18 @@ my %alias2 = (
'PARTIAL LINE UP' => 'PARTIAL LINE BACKWARD',
);

+my %alias3 = (
+ # User defined aliasses. Even more convenient :)
+ );
my $txt;

+sub alias (@)
+{
+ @_ or return %alias3;
+ my %alias = ref $_[0] ? %{$_[0]} : @_;
+ @alias3{keys %alias} = values %alias;
+ } # alias
+
# This is not optimized in any way yet
sub charnames
{
@@ -48,11 +58,14 @@ sub charnames
if (exists $alias1{$name}) {
$name = $alias1{$name};
}
- if (exists $alias2{$name}) {
+ elsif (exists $alias2{$name}) {
require warnings;
warnings::warnif('deprecated', qq{Unicode character name "$name" is deprecated, use "$alias2{$name}" ins
tead});
$name = $alias2{$name};
}
+ elsif (exists $alias3{$name}) {
+ $name = $alias3{$name};
+ }

my $ord;
my @off;
@@ -156,6 +169,14 @@ sub import
## fill %h keys with our @_ args.
##
my %h;
+ if (@_ and ref $_[-1] eq "HASH") {
+ alias (pop);
+ }
+ if (@_ and $_[-1] =~ m{:(?!full|short)\w+$}) {
+ (my $file = pop) =~ s{:(.*)}{unicore/$1_alias.pl};
+ alias (do $file);
+ @_ == 0 and @_ = (":full");
+ }
@h{@_} = (1) x @_;

$^H{charnames_full} = delete $h{':full'};

--
H.Merijn Brand Amsterdam Perl Mongers (http://amsterdam.pm.org/)
using perl-5.6.1, 5.8.0 & 633 on HP-UX 10.20 & 11.00, AIX 4.2, AIX 4.3,
WinNT 4, Win2K pro & WinCE 2.11. Smoking perl CORE: smo...@perl.org
http://archives.develooper.com/daily...@perl.org/ per...@perl.org
send smoke reports to: smokers...@perl.org, QA: http://qa.perl.org


Jarkko Hietaniemi

unread,
Sep 26, 2002, 8:59:33 AM9/26/02
to H.Merijn Brand, Perl 5 Porters
(charnames.pm is Ilya's brainchild, really...)

> Given a file named "unicore/pro_alias.pl", findable in @INC filled like:

> ...


> use charnames ":pro";
>
> if a ":name" is the *only* argument, it is automatically promoted to ":full"
> after the aliasses have been read. Otherwise, you have to support it yourself:
> use charnames ":short", ":pro";

Good idea but I have small problems with the implementation... going
just by the name for the @INCable file feels ... unsafe. How about a
keyword-value pair:

use charname load => "pro";

And then you could have just "pro.pl" in your @INC.

> The anonymous HASH is only supported as last argument.
> Once that is done, the :name is only supported as last argument.

--
Jarkko Hietaniemi <j...@iki.fi> http://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen

H.Merijn Brand

unread,
Sep 26, 2002, 9:08:32 AM9/26/02
to j...@iki.fi, Perl 5 Porters
On Thu 26 Sep 2002 14:59, Jarkko Hietaniemi <j...@iki.fi> wrote:
> (charnames.pm is Ilya's brainchild, really...)
>
> > Given a file named "unicore/pro_alias.pl", findable in @INC filled like:
> > ...
> > use charnames ":pro";
> >
> > if a ":name" is the *only* argument, it is automatically promoted to ":full"
> > after the aliasses have been read. Otherwise, you have to support it yourself:
> > use charnames ":short", ":pro";
>
> Good idea but I have small problems with the implementation... going
> just by the name for the @INCable file feels ... unsafe. How about a
> keyword-value pair:
>
> use charname load => "pro";
>
> And then you could have just "pro.pl" in your @INC.

And what would make that safer than having pro_alias.pl in your @INC path,
still having an interface that feels - errr - familiar.

I agree that the unicore/ part could be dropped, but OTOH, it is just even
more clear what we are doing.

> > The anonymous HASH is only supported as last argument.
> > Once that is done, the :name is only supported as last argument.

--

Jarkko Hietaniemi

unread,
Sep 26, 2002, 9:12:19 AM9/26/02
to H.Merijn Brand, Perl 5 Porters
> And what would make that safer than having pro_alias.pl in your @INC path,
> still having an interface that feels - errr - familiar.

Okay, forget the safeness argument ... but having a tag named by the
file being loaded just feels ... wrong. It feels like open(FH, ":pro")
to me.... I would leave the tags for the charnames internal definitions,
and have a keyword-value pair for the externals.

> I agree that the unicore/ part could be dropped, but OTOH, it is just even
> more clear what we are doing.

--

Rafael Garcia-Suarez

unread,
Sep 26, 2002, 9:11:02 AM9/26/02
to H.Merijn Brand, j...@iki.fi, perl5-...@perl.org
"H.Merijn Brand" <h.m....@hccnet.nl> wrote:
> I can now do
>
> use charnames ":pro";
>
> if a ":name" is the *only* argument, it is automatically promoted to ":full"
> after the aliasses have been read. Otherwise, you have to support it yourself:
>
> use charnames ":short", ":pro";

I don't understand your last sentence, but I think I'll wait for the doc patch ;-)

> IMHO *very* useful. Opinions?

I like it.

> + if (@_ and $_[-1] =~ m{:(?!full|short)\w+$}) {
> + (my $file = pop) =~ s{:(.*)}{unicore/$1_alias.pl};
> + alias (do $file);

You should check the return value of "do" here. Or use "require".

Jarkko Hietaniemi

unread,
Sep 26, 2002, 9:16:44 AM9/26/02
to H.Merijn Brand, Perl 5 Porters
Make it

use charname alias => "pro";

and the pro_alias.pl naming and I'm happy.

H.Merijn Brand

unread,
Sep 26, 2002, 9:18:14 AM9/26/02
to Rafael Garcia-Suarez, Perl 5 Porters
On Thu 26 Sep 2002 15:11, Rafael Garcia-Suarez <raphel.gar...@hexaflux.com> wrote:
> "H.Merijn Brand" <h.m....@hccnet.nl> wrote:
> > I can now do
> >
> > use charnames ":pro";
> >
> > if a ":name" is the *only* argument, it is automatically promoted to ":full"
> > after the aliasses have been read. Otherwise, you have to support it yourself:
> >
> > use charnames ":short", ":pro";
>
> I don't understand your last sentence, but I think I'll wait for the doc patch ;-)

It was just a proof of concept that just *works* here. If we agree to take it
in, doc patches *will* follow (I guess)

> > IMHO *very* useful. Opinions?
>
> I like it.

Good!

> > + if (@_ and $_[-1] =~ m{:(?!full|short)\w+$}) {
> > + (my $file = pop) =~ s{:(.*)}{unicore/$1_alias.pl};
> > + alias (do $file);
>
> You should check the return value of "do" here. Or use "require".

Of course. Again. Just proof of concept

H.Merijn Brand

unread,
Sep 26, 2002, 9:20:08 AM9/26/02
to j...@iki.fi, Perl 5 Porters
On Thu 26 Sep 2002 15:12, Jarkko Hietaniemi <j...@iki.fi> wrote:
> > And what would make that safer than having pro_alias.pl in your @INC path,
> > still having an interface that feels - errr - familiar.
>
> Okay, forget the safeness argument ... but having a tag named by the
> file being loaded just feels ... wrong. It feels like open(FH, ":pro")
> to me....

Not at all, just as :full and :short, it defines a way to be able to name your
Unicode characters.

> I would leave the tags for the charnames internal definitions,
> and have a keyword-value pair for the externals.

Why does that not sound convincing?

> > I agree that the unicore/ part could be dropped, but OTOH, it is just even
> > more clear what we are doing.

--

H.Merijn Brand

unread,
Sep 26, 2002, 9:30:55 AM9/26/02
to j...@iki.fi, Perl 5 Porters
On Thu 26 Sep 2002 15:16, Jarkko Hietaniemi <j...@iki.fi> wrote:
> Make it
>
> use charname alias => "pro";
>
> and the pro_alias.pl naming and I'm happy.

Scan from the front, and keep the rest of the proposal the same?

t.i. if it is the only argument, promote to default :full?

I can live with that. I guess.
Ahh, one more argument *in favour* of :pro will be

# perl -Mcharnames\ q#:pro# -le'print "\N{e_ACUTE}"

to be much easier to type than the alias convention.

Jarkko Hietaniemi

unread,
Sep 26, 2002, 9:25:51 AM9/26/02
to H.Merijn Brand, Perl 5 Porters
> > > And what would make that safer than having pro_alias.pl in your @INC path,
> > > still having an interface that feels - errr - familiar.
> >
> > Okay, forget the safeness argument ... but having a tag named by the
> > file being loaded just feels ... wrong. It feels like open(FH, ":pro")
> > to me....
>
> Not at all, just as :full and :short, it defines a way to be able to name your
> Unicode characters.

I'm sorry but they are not "your Unicode characters", they are your
aliases for Unicode characters. (I'm very picky before my second cup
of coffee...) The :full and :short are not making up any aliases,
they are using the existing official names. But, I'll defer the
decision on this to Hugo.

H.Merijn Brand

unread,
Sep 26, 2002, 11:36:26 AM9/26/02
to Jarkko Hietaniemi, Perl 5 Porters
On Thu 26 Sep 2002 11:40, "H.Merijn Brand" <h.m....@hccnet.nl> wrote:
> The proposed change allows several new features to charnames:

Take 2. All of the above and docs. No tests (yet).

--- /pro/lib/perl5/5.8.0/charnames.pm 2002-05-31 13:07:59.000000000 +0200
+++ charnames.pm 2002-09-26 17:32:32.000000000 +0200


@@ -38,8 +38,18 @@ my %alias2 = (
'PARTIAL LINE UP' => 'PARTIAL LINE BACKWARD',
);

+my %alias3 = (
+ # User defined aliasses. Even more convenient :)
+ );
my $txt;

+sub alias (@)
+{
+ @_ or return %alias3;
+ my %alias = ref $_[0] ? %{$_[0]} : @_;
+ @alias3{keys %alias} = values %alias;
+ } # alias
+
# This is not optimized in any way yet
sub charnames
{
@@ -48,11 +58,14 @@ sub charnames
if (exists $alias1{$name}) {
$name = $alias1{$name};
}
- if (exists $alias2{$name}) {
+ elsif (exists $alias2{$name}) {
require warnings;

warnings::warnif('deprecated', qq{Unicode character name "$name" is deprecated, use "$alias2{$name}" instead});


$name = $alias2{$name};
}
+ elsif (exists $alias3{$name}) {
+ $name = $alias3{$name};
+ }

my $ord;
my @off;

@@ -155,8 +168,32 @@ sub import
##


## fill %h keys with our @_ args.
##

- my %h;
- @h{@_} = (1) x @_;
+ my ($promote, %h, @args) = (0);
+ my @args;
+ while (@_ and $_ = shift) {
+ if (ref $_ eq "HASH") {
+ alias ($_);
+ next;
+ }
+ if ($_ =~ m{:(?!full|short)\w+$}) {
+ (my $file = $_) =~ s{:(.*)}{unicore/$1_alias.pl};
+ if (my @alias = do $file) {
+ alias (@alias);
+ $promote++;
+ next;
+ }
+ }
+ if ($_ eq "alias" && @_) {
+ (my $file = shift) =~ s{:(.*)}{unicore/$1_alias.pl};
+ if (my @alias = do $file) {
+ alias (@alias);
+ next;
+ }
+ }
+ push @args, $_;
+ }
+ @args == 0 && $promote and @args = (":full");
+ @h{@args} = (1) x @args;



$^H{charnames_full} = delete $h{':full'};

$^H{charnames_short} = delete $h{':short'};
@@ -343,6 +380,44 @@ state of C<bytes>-flag as in:
}
}

+=head1 Custom Aliases
+
+This version of charnames supports three mechanisms of adding local
+or customized aliases to standard Unicode naming conventions (:full)
+
+=head2 Anonymous hashes
+
+ use charnames ":full", {
+ e_ACUTE => "LATIN SMALL LETTER E WITH ACUTE",
+ };
+ my $str = "\N{e_ACUTE}";
+
+=head2 Alias pairs
+
+ use charnames ":full", alias => "pro";
+
+ will try to read "unicore/pro_alias.pl" from the @INC path. This
+ file should return a list:
+
+ #!/usr/bin/perl
+ (
+ A_GRAVE => "LATIN CAPITAL LETTER A WITH GRAVE",
+ A_CIRCUM => "LATIN CAPITAL LETTER A WITH CIRCUMFLEX",
+ A_DIAERES => "LATIN CAPITAL LETTER A WITH DIAERESIS",
+ A_TILDE => "LATIN CAPITAL LETTER A WITH TILDE",
+ A_BREVE => "LATIN CAPITAL LETTER A WITH BREVE",
+ A_RING => "LATIN CAPITAL LETTER A WITH RING ABOVE",
+ A_MACRON => "LATIN CAPITAL LETTER A WITH MACRON",
+ );
+
+=head2 Alias shortcut
+
+ use charnames ":pro";
+
+ works exactly the same as the alias pairs, only this time,
+ ":full" is inserted automatically as first argument (if no
+ other argument is given).
+
=head1 charnames::viacode(code)

Returns the full name of the character indicated by the numeric code.

H.Merijn Brand

unread,
Sep 27, 2002, 5:04:46 AM9/27/02
to Hugo, Perl 5 Porters
On Thu 26 Sep 2002 17:36, "H.Merijn Brand" <h.m....@hccnet.nl> wrote:
> On Thu 26 Sep 2002 11:40, "H.Merijn Brand" <h.m....@hccnet.nl> wrote:
> > The proposed change allows several new features to charnames:
>
> Take 2. All of the above and docs. No tests (yet).

I'd appreciate a voice from Hugo in the matter, since I have to *use* it
shortly, and knowing that things won't break in the near future would help :)

Take 3. Now realy tested :)

bev a5:/pro/tu/bev/3gl/ars 135 > head -9 ZPS.pl
#!/pro/bin/perl

use strict;
use warnings;

use charnames ":full", { u_TILDE => "LATIN SMALL LETTER U WITH TILDE" };
print "\N{u_TILDE}\n";
__END__

bev a5:/pro/tu/bev/3gl/ars 136 > ZPS.pl
Wide character in print at ZPS.pl line 7.
?
bev a5:/pro/tu/bev/3gl/ars 137 > perl -Mcharnames=:pro -le'print"\N{u_TILDE}"' Wide character in print at -e line 1.
?
bev a5:/pro/tu/bev/3gl/ars 138 > perl -Mcharnames=:full,alias,pro -le'print"\N{u_TILDE}"'
Wide character in print at -e line 1.
?
bev a5:/pro/tu/bev/3gl/ars 139 >

The question marks actually show a u-TILDE in unicode :)

--- /pro/lib/perl5/5.8.0/charnames.pm 2002-05-31 13:07:59.000000000 +0200

+++ charnames.pm 2002-09-27 10:59:47.000000000 +0200
@@ -38,8 +38,18 @@

@@ -155,8 +168,31 @@ sub import


##
## fill %h keys with our @_ args.
##
- my %h;
- @h{@_} = (1) x @_;
+ my ($promote, %h, @args) = (0);

@@ -343,6 +379,44 @@ sub vianame

Rafael Garcia-Suarez

unread,
Sep 27, 2002, 7:31:07 AM9/27/02
to H.Merijn Brand, perl5-...@perl.org
"H.Merijn Brand" <h.m....@hccnet.nl> wrote:
> This as addition to prevent unneeded do's

do() updates %INC. You may want to take advantage of this.
Or you can use require() -- when it actually loads the file, it
returns the file's return value ; when the file has already been
loaded, it returns 1.

$ cat foo.pl
42;
$ perl -le 'print do "foo.pl";print do "foo.pl"'
42
42
$ perl -le 'print require "foo.pl";print require "foo.pl"'
42
1

H.Merijn Brand

unread,
Sep 27, 2002, 7:20:41 AM9/27/02
to H.Merijn Brand, Perl 5 Porters
On Fri 27 Sep 2002 11:04, "H.Merijn Brand" <h.m....@hccnet.nl> wrote:
> On Thu 26 Sep 2002 17:36, "H.Merijn Brand" <h.m....@hccnet.nl> wrote:
> > On Thu 26 Sep 2002 11:40, "H.Merijn Brand" <h.m....@hccnet.nl> wrote:
> > > The proposed change allows several new features to charnames:
> >
> > Take 2. All of the above and docs. No tests (yet).
>
> I'd appreciate a voice from Hugo in the matter, since I have to *use* it
> shortly, and knowing that things won't break in the near future would help :)
>
> Take 3. Now realy tested :)

This as addition to prevent unneeded do's

--- /pro/lib/perl5/5.8.0/charnames.pm 2002-09-27 11:04:32.000000000 +0200
+++ charnames.pm 2002-09-27 13:17:23.000000000 +0200
@@ -41,7 +41,7 @@
my %alias3 = (


# User defined aliasses. Even more convenient :)

);
-my $txt;
+my ($txt, %aliased);

sub alias (@)
{
@@ -176,6 +176,7 @@ sub import


}
if ($_ =~ m{:(?!full|short)\w+$}) {

(my $file = $_) =~ s{:(.*)}{unicore/$1_alias.pl};

+ $aliased{$file}++ and next;


if (my @alias = do $file) {

alias (@alias);
$promote++;
@@ -184,6 +185,7 @@ sub import


}
if ($_ eq "alias" && @_) {

(my $file = shift) =~ s{(.*)}{unicore/$1_alias.pl};

+ $aliased{$file}++ and next;


if (my @alias = do $file) {

alias (@alias);
next;

Slaven Rezic

unread,
Sep 27, 2002, 5:52:42 PM9/27/02
to H.Merijn Brand, j...@iki.fi, Perl 5 Porters
"H.Merijn Brand" <h.m....@hccnet.nl> writes:

> On Thu 26 Sep 2002 15:16, Jarkko Hietaniemi <j...@iki.fi> wrote:
> > Make it
> >
> > use charname alias => "pro";
> >
> > and the pro_alias.pl naming and I'm happy.
>
> Scan from the front, and keep the rest of the proposal the same?
>
> t.i. if it is the only argument, promote to default :full?
>
> I can live with that. I guess.
> Ahh, one more argument *in favour* of :pro will be
>
> # perl -Mcharnames\ q#:pro# -le'print "\N{e_ACUTE}"
>
> to be much easier to type than the alias convention.

Which would be:

perl -Mcharnames=alias,pro -le'print "\N{e_ACUTE}"

Not hard at all.

Regards,
Slaven

--
Slaven Rezic - slaven...@berlin.de
babybike - routeplanner for cyclists in Berlin
handheld (e.g. Compaq iPAQ with Linux) version of bbbike
http://bbbike.sourceforge.net

h...@crypt.org

unread,
Oct 2, 2002, 12:29:03 PM10/2/02
to H.Merijn Brand, Perl 5 Porters
"H.Merijn Brand" <h.m....@hccnet.nl> wrote:
:I'd appreciate a voice from Hugo in the matter, since I have to *use* it

:shortly, and knowing that things won't break in the near future would help :)

Well, I'm happy to adopt it in principle, but as always in the development
track there are no guarantees that we won't have thrown it out again by
the time we get to an actual release.

When you've got that far, please submit a patch with the new code as
well as tests and docs. Please ensure that the tests cater for compile
errors in the alias file, among other things.

One additional point: the "alias => pro" format is definitely the right
direction to go, else we'll be introducing nasty subtle cross-version
breakage any time we introduce a new export tag in the future.

Hugo

H.Merijn Brand

unread,
Oct 2, 2002, 2:10:39 PM10/2/02
to h...@crypt.org, Perl 5 Porters
On Wed 02 Oct 2002 18:29, <h...@crypt.org> wrote:
> "H.Merijn Brand" <h.m....@hccnet.nl> wrote:
> :I'd appreciate a voice from Hugo in the matter, since I have to *use* it
> :shortly, and knowing that things won't break in the near future would help :)
>
> Well, I'm happy to adopt it in principle, but as always in the development
> track there are no guarantees that we won't have thrown it out again by
> the time we get to an actual release.

N'ah :) We don't throw out things that are *that* useful

> When you've got that far, please submit a patch with the new code as
> well as tests and docs. Please ensure that the tests cater for compile
> errors in the alias file, among other things.

Will craft it together. Promise. (I will even try to use the native layout,
promise)

> One additional point: the "alias => pro" format is definitely the right
> direction to go, else we'll be introducing nasty subtle cross-version
> breakage any time we introduce a new export tag in the future.

As long as we /document the taken one's, well ...

OK, I admit. Alias it'll be :/

Anonymous hashes can stay?

> Hugo

0 new messages