Thinking Subtypes

4 views
Skip to first unread message

Henning Diedrich

unread,
Jan 18, 2011, 1:54:08 AM1/18/11
to lua table semantics
Hi folks,

some pre-dawn reflections about tables.

(I) The Key-Shift

It would appear to me that the main culprit for inconsistencies is the
key-shifting. That has to go.

I was looking at ways to speed tables up, because obviously I am
currently adding to the table core source and make it slower. Only to
a very small degree but it still we be bemoaned.

So what could be left out in turn, by virtue of those added internal
counters, and a clearer sub-typing of tables (into sets, arrays and
lists possibly): obviously that's the key-shifting. Apart from the
fact that to my mind it might be of use only in few situations, it's
the source of the major inconsistency of the table spec, as it is now.

And it's so glaring obviously a performance killer you almost overlook
it. But for all the table tests to test patches, that is where most of
the time goes into I think, the key-shifts. It might be the re-hashing
instead, I need to find that out. But I thought it's the keys which
are making my tests stall for seconds, pretty often. The potential to
save performance there could be enormous (again, depending on how much
of it is really rehashing, not key-shifting.) And that could be a
really nice thing if this turns out to be correct.

(II) Automatic Subtypes

In general, the key-shift appears to make not much sense, except if
the decision of what a table is used for is avoided entirely.

But in reality, it is never avoided. That decision is something quite
implicit, expressed by the way a table is used. It cannot actually be
avoided because by the way you use the table, you have already stated
your decision. It's just that Lua can't/doesn't know about it. Or does
not care to 'take note and remember'. That Python offers indices into
lists does not mean that's really used very much or is very necessary.
If you have a list, you will want to have insert, append, remove,
next. But put naively: do you actually really ever /need/ indices? If
you have indices, making use of insert() makes almost no sense in the
wider perspective of things. Any index you have saved anywhere
becomes /somewhat/ invalid, or at least got its meaning changed in
passing. This kind of implicit state changes is duly to be avoided
anyway, isn't it? Sure, you want to have an insert, for an array,
every now and then. But I am not sure if that isn't almost always a
sign for a bad implementation approach, whatever it is you are just
implementing?

For sets, shifting is meaningless and it's where the current Lua
implementation fails worst, with hard to predict overwrites now and
then.

So as transpires, I would propose thinking of
- tables as being either arrays, lists, or sets;
- allow insert() only for lists;
- allow no indices for lists; (!)
- allow only indices element N in arrays;
- allow for a predetermined size for arrays;
- which implicitly (!) brings 'holes' with it;

Also, I'd opt for
- throwing errors for stuff that is ambiguous, i.e. disallowed
according to the above list;
- keeping nils as they are, as non-first citizens but representation
for 'not there/not used/deleted/free'.

Is that consistent? It should be intuitive and in good accord to
common definitions of the terms. It should also allow for speed ups!

Can the subtypes (list, array, set) become effective automatically?
Can they be gleaned from how a table is used? And then be used to
detect 'illegal' use, with sufficiently clear newbie addressing
warnings/errors?

E.g.
- Setting the size, should make a table an array. Using non-N keys
after that should throw an error.
- Using non-N keys should make that table a set. Using insert after
that, should throw an error.
- Using insert might suffice to make a table a list. Using an index
(key) on it after that should throw an error.

That list could be thought through I'd propose and maybe we could even
agree on a result. It's trivial in nature, not even symmetric I'd
expect, but sufficiently intuitive, is it?

(III) Implementation

I usually seem to think closer to the core, it feels safer to me than
adding stuff. My aim is to make certain errors impossible. But that
may not be what is most useful. The above may work as a separate
library, or extension. I am not sure if it even needs any live
counters implemented to work. You'd simply have to adhere to a set of
functions once you decide for one. But for the automatism of subtypes,
that could only be done in the core.

If implemented as a core patch, it could be using a flag for a subtype
that rules the error conditions.

But it would appear more true to Lua to me, to assign metatables
instead, to make the tables duck-type object specializations of type
'Set', 'Array' or 'List' to base type 'Table'. That need never be made
explicit or stressing the notion of static types. The functions
implemented could be the same for every subtype, only some throw
errors instead of doing anything. And the act of tying the metatable
in would be the specialization, that would happen on the fly, as soon
as one decisive function is used on the table, as listed above.

This would leave the problem of direct setting t[k]. That could also
work via metatables, maybe something must be added to complement
__newindex?

I am not sure how the default metatables needed to give the behavior
of array, list and set respectively, fit into the picture.

Not so sure ... in all, patching without metatables may be easier,
cleaner, faster, more stable and less breakable ... but that
impression, I think, might turn out to be wrong.

Best,
Henning

Axel Kittenberger

unread,
Jan 18, 2011, 5:19:55 AM1/18/11
to lua-table...@googlegroups.com
> (I) The Key-Shift

inserting and removing is an O(1) for chained lists. Thats what a
sensible coder picks if he has a datastructure in which insert and
removes happen a lot. Or if the sorting is on the fly s/he picks a
tree so it becomes O(log(n)). Index operations on lists are expensive,
but they are still valid, so I would not forbit them. As I posted
before, if you look to some other high level languages, the beauty is
that you simply replace the basic data structure, which has exactly
the same interface, only the costs of operations differ.

> (II) Automatic Subtypes

What I've been pondering about in the lua hash+array way "the core
picks a data structure for you", is to enhance on this concept and
switch datastrucures on the fly. Like counting the index,
insert/remove operations and if it exceeds some level on the next
"rehash", dont rehash into a table but say into a chain. Additionally
some interface to give "hints" for the advanced coder could be done.
But maybe that idea is just too crazy.

> But I am not sure if that isn't almost always a
> sign for a bad implementation approach, whatever it is you are just
> implementing?

Indeed sometimes you have to do an insert once and index then a
thousend tims, so the array with insert/remove is perfect.

> Also, I'd opt for
> - throwing errors for stuff that is ambiguous, i.e. disallowed
> according to the above list;
> - keeping nils as they are, as non-first citizens but representation
> for 'not there/not used/deleted/free'.

All for it. I dislike the become a subtype by first use tough, since
that can as well be very confusing, especially if a vanilla table
might be "stamped" into this or that differently depending on which
subfunction it was first used. The precious newbie is easy going to
pull hair over that.

> (III) Implementation


>
> But it would appear more true to Lua to me, to assign metatables
> instead, to make the tables duck-type object specializations of type
> 'Set', 'Array' or 'List' to base type 'Table'. That need never be made
> explicit or stressing the notion of static types. The functions
> implemented could be the same for every subtype, only some throw
> errors instead of doing anything. And the act of tying the metatable
> in would be the specialization, that would happen on the fly, as soon
> as one decisive function is used on the table, as listed above.
>
> This would leave the problem of direct setting t[k]. That could also
> work via metatables, maybe something must be added to complement
> __newindex?

Likly, if used in large scale adding __read and __write is at more
efficient than proxy tables.
I thought about some intermediate hijacking, using the metatable
interface of the core, but
provide the user yet another metatable so s/he can make additional
logic to liking.

A idea I'm playing with is to add "__metatable" to metatable which is
returned when getmetatable(t) is called instead of the "true"
metatable. So in lua this could look like this:

t = {}
do
local proxymt = {}
local mt = { }
setmetatable(t, mt);
mt.__index = function(k)
if k < 0 then error("I dont want negative indici in there"); end
if proxymt.__index then proxymt.__index(k) end
end
mt.__setmetatable(p) = function(p) proxymt = p end
mt.__getmetatable(p) = function(p) return proxymt end
end

So the user could add transperently his logic for __index to it,
without affecting the demo logic here to not allow negative values.

Henning Diedrich

unread,
Jan 18, 2011, 6:37:55 AM1/18/11
to lua table semantics
On Jan 18, 11:19 am, Axel Kittenberger <axk...@gmail.com> wrote:
> Index operations on lists are expensive,
> but they are still valid, so I would not forbit them.

Forbidding is hardly a good strategy, granted. But then again, FP
'forbids' re-assigning variables and if you can gain a lot from 'not
allowing', because it causes huge penalties, then the question
remains, if it was really needed in the first place.

> you simply replace the basic data structure, which has exactly
> the same interface, only the costs of operations differ.

I agree that only this would be truly polymorphic. But it is exactly
what invites the "undefined" behaviors, isn't it.

> Like counting the index, insert/remove operations and if it exceeds
> some level on the next
> "rehash", dont rehash into a table but say into a chain.

Yes! I would venture to predict that this is what we will see in Lua 6
one day. The array/hash underneath is half way down that road.

So - shall we try that? It's not like we'd have to reinvent that
wheel.
http://freebsd.active-venture.com/FreeBSD-srctree/newsrc/sys/queue.h.html
We can use that and strip it down I guess.

>
> All for it. I dislike the become a subtype by first use tough, since
> that can as well be very confusing, especially if a vanilla table
> might be "stamped" into this or that differently depending on which
> subfunction it was first used. The precious newbie is easy going to
> pull hair over that.

How else? Explicit declaration?

> A idea I'm playing with is to add  "__metatable" to metatable which is
> returned when getmetatable(t) is called instead of the "true"
> metatable. So in lua this could look like this:

To add a 'type' table behind the actual metatable?

Best,
Henning
Reply all
Reply to author
Forward
0 new messages