I'll put a current version up at http://eonblast.com/trucount .
Thanks,
Henning
Thanks a lot for looking into it. Maybe you find time to go further and
please keep telling me your impression!
Can I point out my motivation:
[At this point, '$' is the tru count operator in the patch]
> "$ should be used instead of '#' on tables in almost all cases",
because arrays containing non-numeric keys aren't rear.
The reason is that for strict arrays $ == #. And $ is faster (must be
faster) than #. If you apply the patch, it incurs a minuscule penalty
during any array set, insert or remove. You are paying that with the
patch, no matter if you ever use $. That is the weak point about it. But
if you have it patched, using $ instead of # for strict arrays is going
to return faster, even though '#' on strict arrays is already fast.
Caveat: no benchmarks yet, I am writing based on code inspection.
> I also don't think that $ result is of any use for an array with
holes. In that case the only way to have length is to take care of it
manually, keeping it in 't.n' or having a vector library for that.
I did not bring across then what '$' is all about. This case. It is
equivalent to taking care of it manually, or, traversing manually when
needed.
What made you think otherwise, or, where would you have needed the clear
statement telling you so?
Or are you talking about the '#+1' for a safe insertion place?
Thanks,
Henning
That would be my concern - that tracking the true count of a table is
usually not a priority.
It is also a flawed proxy measure for 'holiness' because the true
count also counts string keys. If people are careful not to tag their
arrays (a common Lua strategy), then it is a fast way to tell, but
there is an 'if'.
If we count all sorts of things, number of integer keys, etc, etc then
each count adds another 4 bytes to the table structure. That can
affect the scalability of programs that need to create a _lot_ of
tables.
Appropriate data types still feel like the way to go, if one wants
particular semantics.
steve d.
Moreover, some of the pointer fields imply the existence of objects
that are unique to each table. Total overheads is more like 64 bytes
per table. Increasing that by four size_t's is 25% for an empty table,
20% for a table with two Numbers (e.g. 2D points).
These percentage do not frighten me too much.
> Appropriate data types still feel like the way to go, if one wants
> particular semantics.
>
If you can suggest a way to avoid proxy tables, I'll agree unreservedly.
Although — the overrides of __newindex and __len are much easier if a
core patch is in effect that makes len(), maxn() and count() available.
Dirk
The "vanity" module could still contain all those functions, though,
even if they don't exploit core patches, so the user can simply put
vanity.truecount(t)
which would set getmetatable(t).__len to vanity.count, creating
a metatable for t if none is available.
> my opinion We're back to datatypes/metatables to track them only for
> tables where needed.
If we are going to code the module in C, lists, stacks, queues, deques,
heaps and arrays are better implemented as userdata than as
specializations of tables. There is all of the C++ template library
to borrow the algorithms from.
The moment you take away the feature that any value can be a key, the
need to use a table vanishes too. Sets are debatable; I think the
table wins, but only just. Multisets give a clear case in favour of
a table.
Dirk
I would not do that, first there is the License issue, second the C++
template library is a pain to read! If you ever open it you will
wonder. Third its all so standard today, you can look them up in any
teaching book. Okay, real implementations are always a tad different
than the teached stuff. But IMHO its better to specialize from
standard implementation than to borrow from an other language.
> The moment you take away the feature that any value can be a key, the
> need to use a table vanishes too. Sets are debatable; I think the
> table wins, but only just. Multisets give a clear case in favour of
> a table.
Uhh userdata. This might be a large patch to the core, but how about
changing userdata and table into one object "tableish". I'm thinking
of something like the inodes in the Linux kernel. Every node has a set
of function pointers that define its behaviour. So yes calling a
function pointer is a tad less than the current preprocessor macros,
but that might be argueable for the benifit. A set of function pointer
per "node" is too expensive, but we could have groups/prototypes. The
current hash+array implementation is then one, the default "tableish".
Userdata is too "tableish". A hash+chained list might be a another
one, etc. This would actually reduce the number of basic types to Lua,
that would be very Lua-ish minimalist, wouldn't it? Also this would
allow C-extensions to add their own basic types. But well, it might be
done with current Userdata tough...
> Every node has a set of function pointers that define its behaviour.
A metatable, in other words?
Dirk
A metatable is exactly a table. When supposing a mechanism more basic
in which the ltable.c and userdata is merly one instance of it, it
cannot be metatables.
Its all still a vague idea, as new thoughts can break easy on the
beginning. Why are tables not the same as userdata?
They really are very different animals - the inner structure of
userdata is arbitrary and managed by C/C++. They can be given
equivalent _interfaces_ using the metatable mechanism but they are
otherwise opaque.
(OK, one can do evil things. Like adding new methods to Lua file
objects by poking them into the (shared) file object metatable. I have
made a New Year's resolution not to do such things.)
steve d.
The question, why are tables not "user"data?
Am I'm overlooking something or is this pure paradigmsn for some
operations on tables to be optimized by inlining the setter/getter
routines?
They are a projections:
k --> v, k € K, K being finite group. and for tables k € K v is not nil.
For the rest of the Lua core the current table data structure is
abitrary and managed by C code :-)
This might be a large patch to the core, but how about changing userdata and table into one object "tableish". I'm thinking of something like the inodes in the Linux kernel.
Actually, I do not see how one gets 25% overhead.
On a 32-bit system size_t is 4 bytes, thus, overhead of the counter (4
bytes) is compared to sizeof(struct Table)=32 and is 12.5%
On a 64-bit system size_t is 8 bytes but sizeof(struct Table) is
probably 64 (with proper alignment and holes due to standalone bytes).
Overhead is again around 12.5%.
Also 64-bit systems typically have much more memory to begin with and
this tiny overhead thing is mute.
--Leo--
Yes, it really is that simple - piece of cake!
I have an idea for a project[1] which automates this process; you can
build Lua and specify what extra modules you want to include,
particularly standard useful stuff like LuaFileSystem and LuaSocket
[2]. (Or exclude, like if you wanted an embedded Lua that did not have
some standard libraries.)
Would probably be an excuse to exercise Lake, which is yet another
build engine [3]
steve d.
[1] how many nightmares have begun this way?
[2] not a piece of cake - there's Lua code involved which has to be
wrapped in C.
[3] https://github.com/stevedonovan/Lake
LuaRocks is good for delivering dynamically loaded modules, but not (I
think) for custom Lua builds.
> Which would require you to have Lua first to then build a better Lua?
Well, I assume that people already have Lua - easy enough, just
apt-get or Lua for Windows. But people sometimes need to build a
custom Lua, e.g. cross-compiling for ARM, etc.
> And is it even somewhat circular? Lake could already make good use of
> a packaged filesystem as I see it there on the github pages ... :-o
I'd like to see what you see ;)
> What would you deliver as initial package? Or am I not seeing the very
> point you are making?
The way I see it, you start off with a binary Lua, plus lfs and lake.
LuaBuild (to give it a name) would then grab the Lua source (and other
packages, if requested) and build a custom Lua that has (say)
LuaFileSystem statically included. It would patch linit.c and adjust
the build accordingly. It would know how to apply common patches like
ltokens or LNUM.
Why Lake as a starting point - well you know how awkward Make is for
tricky jobs. A lakefile can build a program using either GCC or MSVC
on Windows, and is basically a Lua script, so it can do clever stuff
without depending on the shell environment like Make does. Not a new
idea (there's CMake) but useful for this current task.
However, we all know the temptations of _too many projects_. I can't
afford to neglect things like Penlight which people actually use. And
my particular itch at the moment is Orbiter, hoping to get that out
this weekend.
steve d.
That's exactly right. Not for newbies.
> Because you made it? Which is fair. But I also think it's appropriate
> as a native make system --- you could even hack it if you needed
> something very special.
This is true - could be a Lua script which patched things like
luaconf.h, linit.c and the current makefile.
_That_ would be an entertaining rainy Sunday's work, no?[1]
steve d.
[1] yes, I know. I should get out more if this is my idea of fun.
Henning,
it is 32 bytes only on a 32-bit architecture. On x86_64 machine it is
64 bytes. But it is still power of two. Possibly helps to optimize
memory allocator.
I attached below a little module (for Lua-5.1.4) that you can use to
see exactly what is sizeof(Table) in your specific case.
Adopt Makefile accordingly.
--Leo--
Interesting! On Linux x86 32bits with Lua-5.1.4 Lsizeof.table()
returns 32. I guess that 36 bytes you quote is on MAC OS X.
> I must admit that I still can't see a use case where that matters. You
> would usually not have empty tables but tables with something in, so
> the relative increase is way smaller, maybe 1%.
IMHO, the overhead of this extra counter does not matter, although I
would like to see a realistic example to the contrary.
> Anyway, thanks for insisting there, Leo,
> Henning
Insisting is my forte:-)
--Leo--
On Fri, Jan 21, 2011 at 01:45, Henning Diedrich <hd2...@eonblast.com> wrote:Thanks! So --- the end of guessing, currently that is 36 bytes instead of 32Interesting! On Linux x86 32bits with Lua-5.1.4 Lsizeof.table() returns 32. I guess that 36 bytes you quote is on MAC OS X.