Idea for how fields.pm might be made to work in 5.10

Benjamin Goldberg

unread,

Sep 7, 2002, 12:06:07 AM9/7/02

to p5p

Background:

[most everyone here on p5p knows this, but I'll restate it for
anyone who doesn't, and to make sure I've got it clear in my own head]:

In the old fields.pm implementation, the kind of object returned by
'fields->new' was a psuedohash, an arrayref which could be accessed as
if it were a hashref; it was efficient when accessed via a typed
lexical, as in "my Class $foo", in which case the hash accesses were
translated (at compile time) into array accesses, but everywhere else,
it was slower than a normal hash, and in addition, it slowed down *all*
normal hash accesses, everywhere in the program.

The Idea:

Let fields.pm return a normal (but restricted) hash, with a special
element under the key of "" (the empty string). In the value for that
key, have an arrayref. The elements of that arrayref are then *aliases*
to the values of the hash.

For example, if you do:
class Foo;
use fields qw(a b c);
my $x = Foo->fields::new;
Then that means:
\ $x->{""}[1] == \ $x->{"a"}
\ $x->{""}[2] == \ $x->{"b"}
\ $x->{""}[3] == \ $x->{"c"}

Note that $x is a normal (but restricted) hash.

Assignment from untyped variable to a typed lexical gets a hashref
element fetch of "" ... that is:
my Class $foo = $x; # $x is a hashref with special "" key.
gets translated to:
my $foo = $x->{""}; # $foo is now an arrayref

All hash accesses that use constant names (literal strings) are
translated to array accesses at compile time, just like with the old
psuedohash/fields.pm implementation did. Compile-time checking for the
names being valid fields would also be done.

So:
my Class $foo = $x; # $x is a hashref with special "" key.
my $bar = $foo->{a};
my ($baz, $quux) = @{$foo}{"b","c"}
my $oops = $foo->{"d"}; # compile time error.
gets translated to:
my $foo = $x->{""}; # $foo is now an arrayref
my $bar = $foo->[1];
my ($baz, $quux) = @{$foo}[2,3];
# compile time error translating $foo->{"d"}.

To make assignment *from* a typed lexical work right, the zero index of
the arrayref would point back to the hashref (it would need to be a weak
reference, because otherwise the circularity would prevent GC). At
compile time, translations such as the following would be made...

So code like this:
my Class $foo = $x; # $x is a fields.pm/Class object
my $bar = $foo; # $bar should now be the same as $x.
quux($foo);
becomes code like this:
my $foo = $x->{""}; # $foo is now an arrayref.
my $bar = $foo->[0]; # but it's first element is the hashref.
quux($foo->[0]);

Notes:

Note that the *only* magic that is occuring is the aliasing between
array elements and hash elements, and the compile time translation.

Any time that a fields.pm object is used by a class that doesn't know
it's nature, it is no slower than an ordinary (restricted) hashref,
because it *is* an ordinary (restricted) hashref.

As a result, we can get all the speed-up of the old psuedohashes, plus
the advantage of compile-time checking of keys, plus the advantage of it
being a restricted hash, and thus having run-time checking of keys in
places where compile-time checking isn't available, *without* the kinds
of penalties that the old psuedohashes had.

Caveats:

1/ fields::new will *need* to be done in C/XS, since there's no way
in pure perl to do the aliasing needed.
2/ the speed penalty of code translation at compile time.
3/ the cost of performing ->{""} and ->[0] in some places.
4/ If you do
my Class $foo = ...;
quux($foo);
sub quux { $_[0] = ... }
then it won't work as you expect, since it's translated to:
my $foo = (...)->{""};
quux($foo->[0]);
sub quux { $_[0] = ... }
which breaks the aliasing that used to exist between
$foo->[$Class::FIELDS{$k}] and $foo->[0]{$k},
since $foo->[0] is no longer what it once was.
5/ keys(), values(), and each() on the hashref will include ""
and it's value.
6/ Serialization via Data::Dumper won't work; or rather, it might,
but deserialization surely won't.

I don't think 1 is a problem.

2 should be minor (since it's little more than was done with the old
fields/psuedohash implementation).

I *hope* that 3 is minor (only benchmarks will tell).

4 can be "fixed" by documenting it as not working -- since assignment
into @_ is so rare, most users won't know or care that it wouldn't work.

I'm not sure whether or not 5 is really a problem -- looking through all
of an object's data is a violation of encapsulation.

6 probably can't be fixed. Well, maybe if we create a way for pure-perl
code to create an alias of array elements and hash elements to the same
SVs, and making Data::Dumper output some code which uses that. We can
tell users that they ought to use Storable.pm instead (which I *think*
should work.)

Is there anything that I'm overlooking?

--
tr/`4/ /d, print "@{[map --$| ? ucfirst lc : lc, split]},\n" for
pack 'u', pack 'H*', 'ab5cf4021bafd28972030972b00a218eb9720000';

Brian Ingerson

unread,

Sep 7, 2002, 1:58:39 AM9/7/02

to Benjamin Goldberg, p5p

On 07/09/02 00:06 -0400, Benjamin Goldberg wrote:
> For example, if you do:
> class Foo;
> use fields qw(a b c);
> my $x = Foo->fields::new;
> Then that means:
> \ $x->{""}[1] == \ $x->{"a"}
> \ $x->{""}[2] == \ $x->{"b"}
> \ $x->{""}[3] == \ $x->{"c"}
>

> Caveats:

>
> 6/ Serialization via Data::Dumper won't work; or rather, it might,
> but deserialization surely won't.
>

> 6 probably can't be fixed. Well, maybe if we create a way for pure-perl
> code to create an alias of array elements and hash elements to the same
> SVs, and making Data::Dumper output some code which uses that. We can
> tell users that they ought to use Storable.pm instead (which I *think*
> should work.)

YAML.pm allows a class to define how it (de)serializes itself. The class
defines a yaml_load() and yaml_dump() method that YAML will invoke
accordingly. Typically these representations consist of a hash mapping
augmented by a type URI. So in the example above

print Dump($x);

would produce:

--- !perl/Foo
a: foo
b: bar
c: baz

Upon deserialization, YAML would know to pass this hash to Foo::yaml_load
which would be imported from fields.pm. This transform could then faithfully
reproduce the desired object.

Some benefits are:
- human readable serialization
- fields dumped in same order as defined
- the messy implementation details of the $x{""} field are hidden but
implied by the serialization. clean!

Cheers, Brian

PS:

> perl -MYAML -e 'print Dump *STDOUT'
--- #YAML:1.0 !perl/glob:
PACKAGE: main
NAME: STDOUT
IO:
fileno: 1
stat:
device: 32554660
inode: 32718212
mode: 8630
links: 1
uid: 0
gid: 0
rdev: 67108867
size: 0
atime: 1031378092
mtime: 1031378092
ctime: 1031378092
blksize: 131072
blocks: 0
tell: 14869

H.Merijn Brand

unread,

Sep 7, 2002, 2:35:02 AM9/7/02

to Benjamin Goldberg, Perl 5 Porters

On Sat 07 Sep 2002 06:06, Benjamin Goldberg <gol...@earthlink.net> wrote:
> Background:
>
> [most everyone here on p5p knows this, but I'll restate it for
> anyone who doesn't, and to make sure I've got it clear in my own head]:
>
> In the old fields.pm implementation, the kind of object returned by
> 'fields->new' was a psuedohash, an arrayref which could be accessed as
> if it were a hashref; it was efficient when accessed via a typed
> lexical, as in "my Class $foo", in which case the hash accesses were
> translated (at compile time) into array accesses, but everywhere else,
> it was slower than a normal hash, and in addition, it slowed down *all*
> normal hash accesses, everywhere in the program.
>
> The Idea:
>
> Let fields.pm return a normal (but restricted) hash, with a special
> element under the key of "" (the empty string). In the value for that
> key, have an arrayref. The elements of that arrayref are then *aliases*
> to the values of the hash.

I don't like the idea of the "" as SuperKey
I think that most of my hashes use it as a valid key. Using it for special
purposes is not good. How about "_fields".

The idea is fine.

--
H.Merijn Brand Amsterdam Perl Mongers (http://amsterdam.pm.org/)
using perl-5.6.1, 5.8.0 & 633 on HP-UX 10.20 & 11.00, AIX 4.2, AIX 4.3,
WinNT 4, Win2K pro & WinCE 2.11. Smoking perl CORE: smo...@perl.org
http://archives.develooper.com/daily...@perl.org/ per...@perl.org
send smoke reports to: smokers...@perl.org, QA: http://qa.perl.org

Michael G Schwern

unread,

Sep 7, 2002, 4:31:12 AM9/7/02

to Benjamin Goldberg, p5p

Instead of adding hackery to replace the hackery we just removed, instead
put effort into making hashes faster in the general case. Do some
profiling, find some hotspots, speed them up. The basics.

More specific objections, many are the same as with a pseudo-hash.

- Adds a lot of code complexity for very little gain. (It makes typed,
lexical, constant key hashes a smidge faster).

- What happens when I want to assign a value with a key of an empty string?

- What about multiple inheritance?

- Throwing away the meaning of a lexically typed hash ref for a dubious
performance hack. Part of the reason for removing pseudo-hashes was
to free up "my Dog $spot" for more interesting things.

On Sat, Sep 07, 2002 at 12:06:07AM -0400, Benjamin Goldberg wrote:
> 5/ keys(), values(), and each() on the hashref will include ""
> and it's value.

This one can be hacked around internally.

> 4 can be "fixed" by documenting it as not working -- since assignment
> into @_ is so rare, most users won't know or care that it wouldn't work.

# common idiom for a simple accessor.
sub little_accessor { return $_[0]->{key} = $_[1] }

Will this cause a problem?

> I'm not sure whether or not 5 is really a problem -- looking through all
> of an object's data is a violation of encapsulation.

Pseudohashes, restricted hashes and fields are *not just for objects*. They
also serve the purpose of a C-style struct.

--

Michael G. Schwern <sch...@pobox.com> http://www.pobox.com/~schwern/
Perl Quality Assurance <per...@perl.org> Kwalitee Is Job One
"I didn't stop to ask, Pooh. Even at the very bottom of the river I
didn't stop to say to myself, 'Is this a Hearty Joke, or is it the
Merest Accident?' I just floated to the surface, and said to myself,
'It's wet.'"

Benjamin Goldberg

unread,

Sep 7, 2002, 10:58:09 PM9/7/02

to Michael G Schwern, p5p

Michael G Schwern wrote:
>
> Instead of adding hackery to replace the hackery we just removed,
> instead put effort into making hashes faster in the general case. Do
> some profiling, find some hotspots, speed them up. The basics.
>
> More specific objections, many are the same as with a pseudo-hash.
>
> - Adds a lot of code complexity for very little gain. (It makes
> typed, lexical, constant key hashes a smidge faster).

True enough, but -- psuedohashes did the same, but at the expense of
*all* hash access. This couldn't possibly slow down other hash access.

> - What happens when I want to assign a value with a key of an empty
> string?

Umm, err... The key for the arrayref doesn't need to be "", that was
just an idea. How about "_fields" or something? In fact, it could even
be determined by $SomeClass::FIELDS_KEY.

> - What about multiple inheritance?

Same problems here as the old psuedohash implementation of fields.pm
had.

There is a rather obvious but (at first glance) costly (in size) way of
working around this.

If the arrayref data were stored in "_fields::$SomeClassHere", for all
classes, then each fields.pm object would contain one arrayref for each
class it inherits from.

package A; use fields "x";
package B; use base "A"; use fields "y";
package C; use base "A"; use fields "z";
package D; use base "B", "C";

The object returned by D->fields::new would have:
$self = { x => undef, y=> undef, z => undef,
_fields::A => [ $self, alias_to($self->{x}), ],
_fields::B => [ $self,
alias_to($self->{x}), alias_to($self->{y}), ],
_fields::C => [ $self,
alias_to($self->{x}), alias_to($self->{z}), ],
_fields::D => [ $self, alias_to($self->{x}),
alias_to($self->{y}), alias_to($self->{z}), ],
}

package main;
my D $D = D->fields::new;
@{$D}{qw(x y z)} = (4, 5, 6);

This part of the code would translate to:
my $D = D->fields::new->{_fields::D};
@{$D}[1, 2, 3] = (4, 5, 6);

This produces an absurd cost overhead in terms of size, unless the
number of field-keys in the classes are quite large compared to the
number of classes involved in the inheritance.

We can reduce this size cost (at the cost of compile-time complexity)
through the observation that we could use the same arrayref for
_fields::A, _fields::B, and fields::D ... this way, we would only need
two arrayrefs, one for (A,B,D), and one for C.

And of course, since the inheritance structure is known at compile time,
we only need two extra keys:
$self = { x => undef, y=> undef, z => undef,
_fields::A => [ $self, alias_to($self->{x}),
alias_to($self->{y}), alias_to($self->{z}), ],
_fields::C => [ $self,
alias_to($self->{x}), alias_to($self->{z}), ],
}

So code like:
package main;
my A $a = D->fields::new;
my B $b = D->fields::new;
my C $c = D->fields::new;
my D $d = D->fields::new;
$a->{x} = $b->{y} = $c->{z} = $d->{z};
Becomes:
my $a = D->fields::new->{_fields::A};
my $b = D->fields::new->{_fields::A};
my $c = D->fields::new->{_fields::C};
my $d = D->fields::new->{_fields::A};
$a->[1] = $b->[2] = $c->[2] = $d->[3];

If single-inheritance is used, we still only have one special hashkey.

> - Throwing away the meaning of a lexically typed hash ref for a
> dubious performance hack. Part of the reason for removing
> pseudo-hashes was to free up "my Dog $spot" for more interesting
> things.

I can't really answer this. I thought that psuedohashes were being
thrown out to get rid of PVHV stuff -- run-time checks for whether an
arrayref was pretending to be a hashref.

> On Sat, Sep 07, 2002 at 12:06:07AM -0400, Benjamin Goldberg wrote:
> > 5/ keys(), values(), and each() on the hashref will include ""
> > and it's value.
>
> This one can be hacked around internally.
>
> > 4 can be "fixed" by documenting it as not working -- since
> > assignment into @_ is so rare, most users won't know or care that it
> > wouldn't work.
>
> # common idiom for a simple accessor.
> sub little_accessor { return $_[0]->{key} = $_[1] }
>
> Will this cause a problem?

No, this would cause no problem. You aren't actually *replacing* $_[0]
with something else. What *would* cause a problem would be something
like:
sub reset { $_[0] = __PACKAGE__->new(); }
or:
sub mutate { $_[0] = "Whee!"; }

In other words, any time that you use call-by-reference semantics of @_
instead of the usual call-by-value semantics, it can cause problems.

Since assigning into the elements of @_ is *already* discouraged (it's a
kind of action-at-a-distance, which is generally discouraged), breaking
it for typed lexicals shouldn't be a problem if it's documented.

> > I'm not sure whether or not 5 is really a problem -- looking through
> > all of an object's data is a violation of encapsulation.
>
> Pseudohashes, restricted hashes and fields are *not just for objects*.
> They also serve the purpose of a C-style struct.

True enough -- but my idea is aimed at *typed lexicals*. Can you have a
typed lexical without a class name? And if you have a class name,
doesn't that essentially make it an object? Ok, maybe it doesn't make
it an object, but will an extra key cause all that much of a problem?

And since when can you perform key() on a C-struct? It's not as if you
can do that kind of introspection in C, eh?

And C structs don't allow multiple-inheritance, fields.pm implemented as
I've suggested (with the modifcation made in this message) can.

Ronald J Kimball

unread,

Sep 8, 2002, 11:23:50 PM9/8/02

to Benjamin Goldberg, Michael G Schwern, p5p

On Sat, Sep 07, 2002 at 10:58:09PM -0400, Benjamin Goldberg wrote:
> Michael G Schwern wrote:
> >
> > Instead of adding hackery to replace the hackery we just removed,
> > instead put effort into making hashes faster in the general case. Do
> > some profiling, find some hotspots, speed them up. The basics.
>

> > - What happens when I want to assign a value with a key of an empty
> > string?
>
> Umm, err... The key for the arrayref doesn't need to be "", that was
> just an idea. How about "_fields" or something? In fact, it could even
> be determined by $SomeClass::FIELDS_KEY.

I object to any approach which uses a special hash key to hold the field
information. As Schwern said, this is just replacing old hackery with new.
If I have to know the special key so that I can handle it properly while
using each() or keys(), then restricted hashes are not useful. Nevermind
that values() becomes totally unusable, as there is no generic way to know
which value is the special one.

The one nice thing about the old pseudo-hashes was that hash access behaved
like a normal hash reference, aside from the restriction on keys. I think
this transparency should be preserved in the new implementation of
restricted hashes, whatever it may be.

Ronald

Michael G Schwern

unread,

Sep 13, 2002, 11:34:39 PM9/13/02

to Benjamin Goldberg, p5p

I think before you continue any further on this you might want to look back
through the p5p archives a few years ago on previous attempts to fix
or replace pseudo-hashes to avoid making the same mistakes.

On Sat, Sep 07, 2002 at 10:58:09PM -0400, Benjamin Goldberg wrote:

> > More specific objections, many are the same as with a pseudo-hash.
> >
> > - Adds a lot of code complexity for very little gain. (It makes
> > typed, lexical, constant key hashes a smidge faster).
>
> True enough, but -- psuedohashes did the same, but at the expense of
> *all* hash access. This couldn't possibly slow down other hash access.

We're talking about internal code complexity.

> > - What happens when I want to assign a value with a key of an empty
> > string?
>
> Umm, err... The key for the arrayref doesn't need to be "", that was
> just an idea. How about "_fields" or something? In fact, it could even
> be determined by $SomeClass::FIELDS_KEY.

Ok, what happens when I want to assign a value with a key of X where X is
being used for the magical key which contains the array?

> > - What about multiple inheritance?
>
> Same problems here as the old psuedohash implementation of fields.pm
> had.

One of the reasons pseudo-hashes were thrown out was because they didn't
play with multiple inheritence. If you put something in to replace them,
it'll have to work with MI.

> There is a rather obvious but (at first glance) costly (in size) way of
> working around this.

I think with all this extra baggage below you've already wildly blown over
the benefits of using an array over a hash. The margin is already very
slim.

Worse, this doesn't solve the problem. Observe...

> If the arrayref data were stored in "_fields::$SomeClassHere", for all
> classes, then each fields.pm object would contain one arrayref for each
> class it inherits from.
>
> package A; use fields "x";
> package B; use base "A"; use fields "y";
> package C; use base "A"; use fields "z";
> package D; use base "B", "C";
>
> The object returned by D->fields::new would have:
> $self = { x => undef, y=> undef, z => undef,
> _fields::A => [ $self, alias_to($self->{x}), ],
> _fields::B => [ $self,
> alias_to($self->{x}), alias_to($self->{y}), ],
> _fields::C => [ $self,
> alias_to($self->{x}), alias_to($self->{z}), ],
> _fields::D => [ $self, alias_to($self->{x}),
> alias_to($self->{y}), alias_to($self->{z}), ],
> }
>
> package main;
> my D $D = D->fields::new;
> @{$D}{qw(x y z)} = (4, 5, 6);
>
> This part of the code would translate to:
> my $D = D->fields::new->{_fields::D};
> @{$D}[1, 2, 3] = (4, 5, 6);

<snip>

> So code like:
> package main;
> my A $a = D->fields::new;
> my B $b = D->fields::new;
> my C $c = D->fields::new;
> my D $d = D->fields::new;
> $a->{x} = $b->{y} = $c->{z} = $d->{z};
> Becomes:
> my $a = D->fields::new->{_fields::A};
> my $b = D->fields::new->{_fields::A};
> my $c = D->fields::new->{_fields::C};
> my $d = D->fields::new->{_fields::A};
> $a->[1] = $b->[2] = $c->[2] = $d->[3];

Let me get this straight:

Fields of A: x => 1
Fields of B: x => 1, y => 2
Fields of C: x => 1, z => 2
Fields of D: x => 1, y => 2, z => 3

(I've forgotten what's supposed to be in 0).

package C;
sub foo {
my C $self = shift;
return $self->{z}; # print $self->[2];
}

package main;
my D $D = D->new;
@{$D}{qw(x y z)} = (4, 5, 6); # @{$D}[1,2,3] = (4,5,6);

print $D->foo;

z is in position 3 for D but it's in position 2 for C. So you get the wrong
value when $D is used as a C object. This is the same problem as with
pseudo-hashes.

> > - Throwing away the meaning of a lexically typed hash ref for a
> > dubious performance hack. Part of the reason for removing
> > pseudo-hashes was to free up "my Dog $spot" for more interesting
> > things.
>
> I can't really answer this. I thought that psuedohashes were being
> thrown out to get rid of PVHV stuff -- run-time checks for whether an
> arrayref was pretending to be a hashref.

Pseudo-hashes are being thrown out for a great number of reasons, death of a
thousand cuts and all that. Please read "Pseudo-hashes must die" for a sum
up. http://dev.perl.org/rfc/241.pod

> > > I'm not sure whether or not 5 is really a problem -- looking through
> > > all of an object's data is a violation of encapsulation.
> >
> > Pseudohashes, restricted hashes and fields are *not just for objects*.
> > They also serve the purpose of a C-style struct.
>
> True enough -- but my idea is aimed at *typed lexicals*. Can you have a
> typed lexical without a class name? And if you have a class name,
> doesn't that essentially make it an object? Ok, maybe it doesn't make
> it an object, but will an extra key cause all that much of a problem?

Yes, it causes a problem. Hashes are general use data structures. We've
just removed a data structure with a pile of caveats, don't replace it with
another pile of caveats.

Besides, we're already up to N+2 special hash keys where N is the number of
superclasses.

> And since when can you perform key() on a C-struct? It's not as if you
> can do that kind of introspection in C, eh?

This is not C, it's Perl. We can do whatever we like. Please don't read
too much into "C-style structs".

--

Michael G. Schwern <sch...@pobox.com> http://www.pobox.com/~schwern/
Perl Quality Assurance <per...@perl.org> Kwalitee Is Job One

There is nothing wrong. We have taken control of this sig file. We will
return it to you as soon as you are groovy.