typedef struct Table {
CommonHeader;
lu_byte flags; /* 1<<p means tagmethod(p) is not present */
lu_byte lsizenode; /* log2 of size of `node' array */
struct Table *metatable;
TValue *array; /* array part */
Node *node;
Node *lastfree; /* any free position is before this position */
GCObject *gclist;
int sizearray; /* size of `array' array */
} Table;
To get the table to behave properly as an array of Lua values is trivial
from C using a standard API function:
--
void lua_createtable (lua_State *L, int narr, int nrec);
Creates a new empty table and pushes it onto the stack. The new table
has space pre-allocated for narr array elements and nrec non-array
elements. This pre-allocation is useful when you know exactly how many
elements the table will have.
--
All one needs is a pair of functions in C but callable from Lua that
will construct an empty table this way, and return sizearray for #n.
In my opinion the functions table.insert and table.remove do not
apply to arrays.
For lists, the present Lua implementation is in fact perfect as long as
we behave ourselves. In particular, we get the right #. The moment that
we require an error message in the case of abuse, the difficulty starts.
The structure above will need an extra integer field n for the actual
list length. n=-1 can be used to indicate that the table is not a list.
A lot of overheads is incurred: every table access must be monitored to
check that the table is still a list and whether n needs to be modified.
I can't see Roberto agreeing to this.
We have a better chance of getting some error messages for the table
library itself, since (a) the operation is not in the Lua core (b) the
routines are O(n) so a test does not hurt performance. E.g.
1. insert or remove at a non-positive index
2. insert nil
On the other hand, both these operations could be legitimate in the
sense that the the programmer restores the list property a little later.
Dirk
I've been working through the code to see if sizearray (storing the
amount allocated) is always a power of 2.
Or if it would be easily possibele to require that, in that case one
could store the size of the list in sizearray knowing
for realloc()s that it is 2^(trunc(log(n)+1). But I see that the
checking is CPU eating and contrary to the Lua percs of being the
fastest interpreted language around.
The behaving ourselves is absolutely true and there is no problem for
a well behaved coder. Its merly against the newbie friendlyness of
Lua, a language which does accept some (albeit little) performance
downs by indexing with 1. And a nasty source for bugs in case one
stores a nil into a list by accident.
> In my opinion the functions table.insert and table.remove do not
> apply to arrays.
There we are at the basic semantics. What are arrays and what are
lists? For me in Lua its the same. One can argue what would be the
right thing for table.insert/remove. Maybe I should analyse the old
Lua traffic to realise what was wrong with setn and getn.
Since there was some argument on the main list accusing me wanting to
insert a new basic type. For me a list is a table fullfilling some
requirements, be that whatever. Maybe I'm naive about it, but for me
most problems would go away if the list would store its length as a
seperate variable - in Lua. insert() and remove() would change the
length. Setting a non nill value at # would increaes size, but and
maybe this is the biggest caveat, setting nil at # would keep the
current size with trailing nils.
That would be a 'pure' function. much beloved of functional programmers.
To my mind, this is a great thing to have in a _library_.
The functional description of Lua tables is pretty short, although it
has a few 'buts' in it that make it less than elegant. It should not
get much longer.
Adding stuff to the core is incredibly hard, because there are both
elegance and performance constraints.
I must say I would like table.insert() to be stricter. But then (as
Mark Hamburg says) it should be a separate function called
array.insert, etc.
There is an inbetween state between 'prototype in Lua' and 'extend the
core' which is 'write a C extension'. This can often be just as
efficient and much easier to sell than a patch to the core.
steve d.
list.insert, please.
Lists and arrays both contain elements that can be numbered from
1 to n, and Python does a repectable but not great job of ignoring
the distinction. But distinction there is.
list: no holes; when an item is removed, the gap is closed; when
an item is inserted, some other elements move up; the number
of an item depends on what has happened to other elements;
# means the actual number of elements present and is variable.
Already possible with the present table type, except that
checking list integrity can't be done efficiently.
The Lua table library should be called "list" and at the very
least have different names for the append and insert functions.
array: holes allowed; when an element is removed, there is a hole;
elements can be replaced but not inserted; the number of an item
does not depend on what has happened to the other elements;
# means the maximum possible number of elements and is constant.
Possible with a C extension.
tuple: Pythonese for an immutable array. Do we really need it?
set: only the keys matter, values anything except false; # means the
number of keys. I haven't thought hard about the implementation
but suspect it is harder than it looks (see paragraph on core
patching below).
queue, stack, deque: Essentially the same as list, but not allowing
inserts and removes only at the ends. I suspect these could all
be implemented efficiently with little trouble.
> There is an inbetween state between 'prototype in Lua' and 'extend the
> core' which is 'write a C extension'. This can often be just as
> efficient and much easier to sell than a patch to the core.
>
I spent a whole day trying to see how much the core would needed to
be patched to track the "pristine" property, and was appalled to find
that the lua interpreter itself does not make any use whatsoever of
lua_setfield, which is where I hoped a small change would suffice.
One would need patches to many routines, including (ouch) the virtual
machine.
Yes, if it could be as simple as writing C code to generate xtable.so
so that require"xtable" adds the various table specializations, that
would be grand. We'll have to devote some thought to a uniform API
for them.
Dirk
list{"a","b","c"}
array{"a", "b", "c"}
or
set{"a", "b", "c"}
So instead of patching the core, why not use Luas metatable facility
for this customs and to track the "prestine" flag? Or would this be
too slow?
BTW: Should we write code at some future moment, Henning suggested a
great project name "Project Vanity" :-)
You can view Hennings patch for the % count (on the list a little
ago). He inserted a little count up/down code. Its actually one Macro
that does the job.
Kind regards, Axel
This should be a first step. If we feel it is too slow, then go to C.
It will be easier with a working Lua prototype than from scratch.
> BTW: Should we write code at some future moment, Henning suggested a
> great project name "Project Vanity" :-)
If we read lua-l only twice daily, and ration our use of the Reply
button, we may find that a surprising amount of programming time has
miraculously become available ;-)
Dirk
Absolutely, sir! We can gather here to discuss our mailing-list
addiction, without fear of upsetting the bears.
Correct functionality is the important step and pure Lua can do the
job. If it's too slow, then sure we optimize, but not before.
Personally, I tend to express my thoughts in code, and then have to
struggle to find the human words to describe what I have just done ;)
steve d.
PS Proxies are pretty subtle and it took me a long time before I said 'aha!'.
Why not 42?
Dirk
Is it Lua 5.1 ? __len only works in 5.2 for tables.
In 5.1, you have to do the newproxy() trick to get __len to work.
steve d.
I think the authors were a bit embarrassed about it; it appears they
will be taking it out for final Lua 5.2. But other hidden features
like 'frontier patterns' got promoted to full existence...
steve d.
I'm thinking of a border that we balance at. On the one side, its so
coding classics that the coder as pick the right datastructure for the
right job. Array, (Double) Linked Lists, Hashtable, Tree. The beauty
of modern encapsulated design (objectoriented or not) is that you can
easily exchange the basic structure at creation and the other
interface is untouchted. Only different operation will to get
different costs.
Lua on the other hand has two arguments it wants to remove this
decission for simplicty and to some degree smallness, You have the
basic core that makes the decission "Array" or "Hashtable" for you, on
the fly. And this brings the problems. Other argument I not completly
buy in is the "table is the basic building block", implement the other
structures with that.Given the language as given, this is valid, but
if not one can fairly ask "why this native and build other with this".
One could just as well only provide Arrays and tell the user to
implement Hashtables with them. Just an example, not that I would
propose that or consider it a good idea.
With Lua 5.2 having at the first time a full metatable interface we
could go along with the border. A table is still a table, but its
metatable defines basic functions. Then we have the argument for
stabilty, clearness and defensive programming. It has its advantages
to declare at creation time "This is going to be a list with no nil
values in it", and have the code fail if it happens never the less. If
you have the "all in one" thing, you never know what the coder wanted,
and this guesswork is what brings the flaws.
> BUT at the same time you are entirely correct in my eyes and
> if looked at it like this, the question might become, if the wrong,
> too convenient allowance of insert and remove on arrays is the
> problem!
>
> Because those actually belong to lists, not indexed arrays.
They might apply to both, but behave differently if you try to insert
a nil. A list should error() then.
> Do I remember correctly that the current implementation of
> tables, when-used-for-strict-arrays is O(1) for inserts though?
Do you mean in the sense of append? If not its always at least O(n-p),
if all the value are in the array part. If they are in the hashpart
calculating # is already O(log(n))
> Like, n? I think the approach was on the one hand very pragmatic but
> not
> o.k. for core level. Maybe Lua evolved like this all the time. But
> most of
> all, if you have this sort of native n-variable that is thought of
> not-there in a way, for strict arrays, then that's of course a strong
> argument against implementing a 'true counter' for table size.
> But it's so wrong and down the wrong road, it must be protested.
Why was it not ok for core level? In C you always have to store the
size of an array as well in a seperate variable.
> The proposed 'tru count' $ operator is exactly that variable, as $t,
> if you want. (And I agree any day that it should probably better be
> len(t) or something other than '$').
I like the idea of tru count, as in my project (Lsyncd) I have some
places I have to keep explicit counters for numbers of items in the
table. But IMHO it is not a 100% statisfactory replacement for # since
storing for example strings and a list/array in the same table is
pretty valid and tru count would not return what you expected.
Kind regards, Axel
> Bears the question of 'admitting' to 'subtypes' of tables?
>
> An array, as we are talking about it, is a subtype, to which certain
> functions should and others should not apply?
>
> I am still not for it.
>
The only way to do object-oriented programming in Lua by
specializing tables.
> BUT at the same time you are entirely correct in my eyes and
> if looked at it like this, the question might become, if the wrong,
> too convenient allowance of insert and remove on arrays is the
> problem!
>
> Because those actually belong to lists, not indexed arrays.
>
Indeed.
> Do I remember correctly that the current implementation of
> tables, when-used-for-strict-arrays is O(1) for inserts though?
>
No, it is not. No matter how strict the array, if it has 1000
elements and I insert something at position 2, 999 elements are
going to be moved. The move may well be faster if everything is
in the "array part", but it is still O(n).
> > There we are at the basic semantics. What are arrays and what are
> > lists? For me in Lua its the same.
>
You're in good company. The reference manual says: "Most functions
in the table library assume that the table represents an array or a list."
What the table library actually does is to implement lists as arrays
without holes. So insert/remove is O(n), whereas in a true list
implementation, they would be O(1).
Dirk
There are two ways we can go, one is a .so library you can
require/load against an existing core for added functionality. The
other might indeed be a "lua mostly bullet proof" dialect. Even
without a large userbase it could be nice for some proof-of-concept
experiments. As far I see it, the issue why we are discussing this is
that. Lua has 3 design goals it trie|s/d to achieve:
* simplicity for the non-pro coder (arrays with 1, no += operator, no
= chaining, etc). It tries to look non-frightening
* speed
* smallness
I got the feeling as Lua evolved it more and more dropped on the
simplicty goal. Exactly # not being newbie-friendly at all.
> Yes, this I can absolutely subscribe to. But that count operator
> patch could not be done as C extension really, without complicating
> things further at the user level. Which, to me, has priority to be
> as simple as possible.
There are two ways, instead of changing the grammer and introducing a
count variable to table, you could add a "table.count" function that
returns the sum of the array and the hash part (calculation viewable
in rehash()), this one can be linked against an existing core without
changing it.
I envision some mechanics where we could hijack the metatable
functionality for tables to add additional functionality but yet let
the user provide yet other metatables to his liking.
From this a larger goal might be to get the core to open the table
implementation to thirdparty libraries, that is a table prototype
having function pointers for setkey/getkey/insert/remove etc and the
current hash+arrary table implementation being one of the possibilies.
The speed impact should be minimal and the core might still be slim in
provideing the current hash+array way only.
Kind regards,
Axel
> Can the pristine flag be healed, as #, when the hole is re-filled?
That would require diagnosing the table (O(n) operations).
Dirk
The prestine flag would indeed be another way to walk on the border of
the on data structure for all way of Lua.
Having O(n) when calling a # on what is not supposed to be "prestine"
is I think OK, since it should normally not happen, except the coder
shot a nil hole into an array/list and then filled it again, and would
only happen before it would error() if its not a strict list/array.
Kind regards,
Axel
I like the idea of tru count, as in my project (Lsyncd) I have some places I have to keep explicit counters for numbers of items in the table. But IMHO it is not a 100% statisfactory replacement for # since storing for example strings and a list/array in the same table is pretty valid and tru count would not return what you expected.
Should in spite of any other more widespreading solution at least result into:
stdin:2: use of # on an ill-defined array or list.
stack traceback:
...
The "prestine" flag could be a cost-effective solution for the core to
track if # is well or ill-defined. "Well" means bijective.
For a table, the user can see:
len(t) Smallest positive n so that t[n] is not nil but t[n+1] is nil,
or zero if no such n.
count(t) Total number of user-defined keys in use.
maxn(t) Largest positive n so that t[n] is not nil, or zero if no such n.
hole(t) Same functionality as present # but not guaranteed to equal
it --- because # is not guaranteed to give always the same
result on the same table.
Invariant: len(t) <= hole(t) <= maxn(t).
For compatibility, __len = hole to start with. The user may assign
one of the others to __len at his own risk. All applications
of # must go via __len.
The implementation requires three hidden fields per table.
A field is 'valid' if it contains a nonnegative number.
lastcount Actual number of keys in use, always valid.
lastlen If positive or zero, lastlen is the correct value of len(t).
If negative, -lastlen is a lower bound for len(t).
Assigning a nil in the range 1..lastlen-1: lastlen=nil
Assigning a nil at lastlen: lastlen=lastlen-1
Assigning a non-nil at lastlen+1:
If lastlen==lastmaxn, lastlen=lastlen+1.
Else lastlen==-(lastlen+1)
lastmaxn If positive or zero, lastmaxn is the correct value of maxn(t).
If negative, -lastmaxn is an upper bound for maxn(t).
Assigning a nil at lastmaxn: lastmaxn=-(lastmaxn-1)
Assigning a non-nil at k>lastmaxn: lastmaxn=k
Calling len when lastlen is invalid updates lastlen. Time O(len(t)).
Calling maxn when maxn is invalid updates lastmaxn. Time O(maxn(t)).
Calling hole returns lastlen if it is valid, else lastmaxn if it
is valid, else do binary search (time O(log(hole(t)))) as at present.
All other operations are O(1).
pristine(t) is not needed (it is equivalent to "lastlen is valid", but
the user does not need to know that).
strict(t) is not needed (it is equivalent to len(t)==maxn(t)).
Dirk
>
> > hole(t) � � Same functionality as present # but not guaranteed to equal
> > � � � � � � it --- because # is not guaranteed to give always the same
> > � � � � � � result on the same table. �
>
> Same here. I am asking because they are more 'convenience' than
> necessity maybe. As opposed to your count() and maxn().
1. Compatibility with the current specs.
2. It's free of charge if you have len() and maxn().
Maybe edge() is a better name: hole() makes one expect that there is
a nil at that key.
>
> > For compatibility, __len = hole to start with. �The user may assign
>
> Want to introduce __len into 5.1.4?
No. Lua 5.1 is in the process of being superseded.
"Don't put new wine into old skins."
>
> > Calling len when lastlen is invalid updates lastlen. Time O(len(t)).
> > Calling maxn when maxn is invalid updates lastmaxn. �Time O(maxn(t)).
> > Calling hole returns lastlen if it is valid, else lastmaxn if it
> > � � is valid, else do binary search (time O(log(hole(t)))) as at present.
>
> I think all operations should be implemented as O(1) at the cost of
> live counter tracking at the time of insertion. Leo had some
> encouraging result on bench marking for the cost of it. If we can do
> away with other overhead for it, it may even be accepted by a wider
> community.
The idea is that updating the counters is done just-in-time. E.g. if
in the application maxn(t) just keeps growing and one is not using len(t)
at all, it is never necessary to make an expensive update.
Dirk
Dirk
I'm thinking of a border that we balance at. On the one side, its so coding classics that the coder as pick the right datastructure for the right job. Array, (Double) Linked Lists, Hashtable, Tree.
The beauty of modern encapsulated design (objectoriented or not) is that you can easily exchange the basic structure at creation and the other interface is untouchted. Only different operation will to get different costs.
Lua on the other hand has two arguments it wants to remove this decission for simplicty and to some degree smallness, You have the basic core that makes the decission "Array" or "Hashtable" for you, on the fly. And this brings the problems.
Other argument I not completly buy in is the "table is the basic building block", implement the other structures with that.Given the language as given, this is valid, but if not one
With Lua 5.2 having at the first time a full metatable interface we could go along with the border.
A table is still a table, but its metatable defines basic functions. Then we have the argument for stabilty, clearness and defensive programming. It has its advantages to declare at creation time "This is going to be a list with no nil values in it", and have the code fail if it happens never the less.
If you have the "all in one" thing, you never know what the coder wanted, and this guesswork is what brings the flaws.
Because those actually belong to lists, not indexed arrays.They might apply to both, but behave differently if you try to insert a nil. A list should error() then.
Like, n? I think the approach was on the one hand very pragmatic but not o.k. for core level.
Why was it not ok for core level? In C you always have to store the size of an array as well in a seperate variable.
createtable(narr,nrec)
that simply calls lua_createtable and returns the stack top,
address some of those problems?
Dirk
It's certainly one of the simplest useful C modules possible. And it
can make filling a big array more efficient by pre-allocation.
As for making the core decide, well I don't think it's the core's job.
Imagine trying to explain this feature to someone new!
OK, yes, proxy tables are going to be slower and have to be userdata
proxies in Lua 5.1.
Remember that a lot of serious users of Lua will not move to 5.2 until
Mike Pall gets around to implementing it in LuaJIT.
LuaJIT also raises another issue of premature optimization. He can
optimize plain straightforward Lua to the point where C can see it in
its rear-view mirror. Paradoxically, depending too much on C modules
can slow it down significantly - this is why he's been working on his
ffi (foreign-function interface) which BTW will allow fast indexing of
native arrays.
steve d.
I suppose this means pure LuaJIT is faster than the current Lua-C
interface, right? I hardly see a point in a pure Lua implementation
being faster than a pure C - both using appropriate datastructures.
There has been a lot of fuss about Java being faster in some very
specific cases, where the JIT changes some coding instructions based
on profiles on the fly. Like
if (condition) true_part; else false_part;
into
if (!condition) false_part; else true_part;
Since falling into is a tad more cheaper than elseing. But in my
opinion this are very artificially constructed cases. And anytime, you
could reproduce a comparable logic in C.
Kind regards, Axel
I think it's very cool. In fact the ffi interface is going to make
most bindings to C code possible without a compiler.
Of course, this ffi is luajit-specific, which raises other tricky questions.
The _appropriate datastructure_ is of course a factor. Although the
runtime speed is becoming comparable, there are Lua overheads with
using tables for very big datasets.
C always wins on the lean & mean (both speed and memory) but C is
slower to write and freely allows you to shoot yourself in your foot.
steve d.
I don't think he's written them yet ;) Still very new.
This has the potential to split the Lua universe, which isn't such a
cool thing. Unless some clever person could take something like Alien
(which uses libffi) and makes something which is call-compatible with
built-in luajit ffi.
steve d.
Ah! This exactly what we are also pondered about. At least I believe
this boils down to the same issue I was pondering with the more basic
"projection strutucture" beneth table and userdata.
We might be closing in on something :-)
So are compilers. The only thing JITs have what compilers dont have is
runtime profiling, on which they can recompile the code while it is
running acording to the profile.
Of course its much nicer to code Lua than to code in C, no second
doubt about that!
Hm, Mike is not exactly the most approachable guy. We'll really have
to do our homework!
steve d.
What I and I suppose Dirk and Mike are thinking is:
a) We want to be able to use other binary datastructures in some cases
other than table being hash+array. Be it lists, trees, or a plain
array only.
b) We want the have an interface for those to be just as efficient as table.
c) Why is table something special and cannot use that interface as well?
The world is divided between those that talk and those that just do.
People in the latter class often come across as unapproachable. But
where would we be without them?
Dirk
Is this really giving up making newbie's lives easier? In my opinion
any declaration like "this table is going to be this or that" will
eventually make you live a lot easier to spot bugs.
The core problem with '#' or count or whatever is to find one meaning
for it that fits all, and this is futile. Thats why I suggested to
remove any default meaning for # on tables but to call __len, which
then can make meaning out if it, and if it calls "table.border(t)",
the current n+1 == nil meaning.
Where the golden path lies in the middle. You don't have to be
unsocialable to be skilled :-)
> It's because anything can land in the hash part underneath?
>
No, the array part can be as bad and even worse.
Calculating maxn is equivalent to
maxn=0
for k in pairs(t) do if isposint(k) and k>maxn then maxn=k end end
Time for the 'pairs' function is O(N)+O(m), where N is the allocated
size of the table and m is the number of items in the hash part (some
of which may have positive integer keys).
Suppose somehow a table sprang into existence with an array part
containing 1000000 items, of which 100000 are not nil. Then it
would have been better to have everything in the hash part.
That's why Lua rehashes the array part to be not larger than twice the number
of filled positions, moving the items with larger keys into the hash part.
Dirk
Well, to be honest, the learning curve for Lua is so gentle that
at least one person (not one of us, I hasten to add) was posting
prolifically on the Lua list, actually offering answers to questions,
while freely admitting to have been exposed to Lua for only a week.
So the life of a newbie is easy enough already, I guess.
Dirk