Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Arrays and databases

1 view
Skip to first unread message

Richard.Suchenwirth

unread,
May 14, 2001, 3:01:31 AM5/14/01
to
Yet another weekend fun project, this time Tcl only:

"A simple database", http://mini.net/cgi-bin/wikit/1598.html

I know Tcl for some years now, but I'm still amazed what all we can do
with arrays, so that I'm almost tempted to say: "Tcl is a database with
a powerful scripting language" ;-)

By the way: does anyone of you use the
array startsearch/anymore/nextelement/donesearch
subcommands? They look so _clumsy_, compared to the crystal-clear
foreach i [array names a] {...}
--
Schoene Gruesse/best regards, Richard Suchenwirth - +49-7531-86 2703
Siemens Dematic AG, PA RC D2, Buecklestr.1-5, 78467 Konstanz,Germany
Personal opinions expressed only unless explicitly stated otherwise.

Zoran Vasiljevic

unread,
May 14, 2001, 3:39:15 AM5/14/01
to
Richard.Suchenwirth wrote:

> Yet another weekend fun project, this time Tcl only:
>
> "A simple database", http://mini.net/cgi-bin/wikit/1598.html
>
> I know Tcl for some years now, but I'm still amazed what all we can do
> with arrays, so that I'm almost tempted to say: "Tcl is a database with
> a powerful scripting language" ;-)
>

... and using a fine extension from Frederic Bonnet, the "dictionary",
you can even pass databases per-value over function calls...

> By the way: does anyone of you use the
> array startsearch/anymore/nextelement/donesearch
> subcommands? They look so _clumsy_, compared to the crystal-clear
> foreach i [array names a] {...}

We do. For very large arrays in excess off 10k elements.

Cheer's
Zoran

Richard.Suchenwirth

unread,
May 14, 2001, 4:17:26 AM5/14/01
to
Zoran Vasiljevic wrote:

>
> Richard.Suchenwirth wrote:
> > By the way: does anyone of you use the
> > array startsearch/anymore/nextelement/donesearch
> > subcommands? They look so _clumsy_, compared to the crystal-clear
> > foreach i [array names a] {...}
>
> We do. For very large arrays in excess off 10k elements.
>
But does that bring advantages? I understand that in the startsearch..
pattern, the iterator is kept on the C side. But even a large array is
held in memory, so the resulting list from [array names] should normally
also fit there..

Chang LI

unread,
May 14, 2001, 8:43:22 AM5/14/01
to

Richard.Suchenwirth wrote in message <3AFF82CB...@kst.siemens.de>...

>Yet another weekend fun project, this time Tcl only:
>
>"A simple database", http://mini.net/cgi-bin/wikit/1598.html
>

>By the way: does anyone of you use the


> array startsearch/anymore/nextelement/donesearch
>subcommands? They look so _clumsy_, compared to the crystal-clear
> foreach i [array names a] {...}

I have never used it. It is not necessary and should be deleted from Tcl.

Chang

Donal K. Fellows

unread,
May 14, 2001, 11:39:03 AM5/14/01
to
"Richard.Suchenwirth" wrote:
> But does that bring advantages? I understand that in the startsearch..
> pattern, the iterator is kept on the C side. But even a large array is
> held in memory, so the resulting list from [array names] should normally
> also fit there..

It makes a difference if keys+values plus another copy of the keys is
larger than physical memory (after allowing overhead for OS, other
programs, the script itself, etc.) Sometimes, you really do not want to
take another copy of the keys...

Donal.
--
"Understanding leads to tolerance, which in turn leads to acceptance. And from
there, it's just a quick hop to speeding in Ohio, chewing peyote, and
frottage in the woods with a family of moose. And I just want to claim my
part of the credit." -- bunnythor <bunn...@uswest.net>

Donal K. Fellows

unread,
May 14, 2001, 11:50:09 AM5/14/01
to
Chang LI wrote:
> Richard.Suchenwirth wrote in message <3AFF82CB...@kst.siemens.de>...
> >By the way: does anyone of you use the
> > array startsearch/anymore/nextelement/donesearch
> >subcommands? They look so _clumsy_, compared to the crystal-clear
> > foreach i [array names a] {...}
>
> I have never used it. It is not necessary and should be deleted from Tcl.

Whoa there! Just because you never use a feature doesn't mean that it
should be removed. For people handling *BIG* arrays, that interface is
useful as it provides a more direct interface to the Tcl_HashSearch
stuff and it keeps the number of copies of the data down. People with
physical_memory >> memory_needed_for_array have no feeling for this
issue... :^)

Kevin Kenny

unread,
May 14, 2001, 12:28:41 PM5/14/01
to
Richard.Suchenwirth:

>By the way: does anyone of you use the
> array startsearch/anymore/nextelement/donesearch
>subcommands? They look so _clumsy_, compared to the crystal-clear
> foreach i [array names a] {...}

Chang LI wrote:
> I have never used it. It is not necessary and should be deleted from Tcl.

Donal K. Fellows:


> Whoa there! Just because you never use a feature doesn't mean that it
> should be removed. For people handling *BIG* arrays, that interface is
> useful as it provides a more direct interface to the Tcl_HashSearch
> stuff and it keeps the number of copies of the data down. People with
> physical_memory >> memory_needed_for_array have no feeling for this
> issue... :^)

Of course, the real solution to this sort of problem would be to implement
lazy evaluation of [foreach x [array names y] {...}]. Right now we have
an awkward situation, because the [array startsearch/anymore/nextelement/
donesearch] commands are not only awkwardnessful, but also SLOW -- until
[foreach x [array names y] {...}] starts thrashing.

--
73 de ke9tv/2, Kevin KENNY GE Corporate R&D, Niskayuna, New York, USA

Cameron Laird

unread,
May 14, 2001, 1:00:36 PM5/14/01
to
In article <3B0007B9...@crd.ge.com>,
Kevin Kenny <ken...@crd.ge.com> wrote:
.
.
.

>lazy evaluation of [foreach x [array names y] {...}]. Right now we have
>an awkward situation, because the [array startsearch/anymore/nextelement/
>donesearch] commands are not only awkwardnessful, but also SLOW -- until
>[foreach x [array names y] {...}] starts thrashing.
.
.
.
... at which point [array startsearch ...] is
slow--but [foreach ... [array names ...] ... ]
is *painfully* slow.
--

Cameron Laird <cla...@NeoSoft.com>
Business: http://www.Phaseit.net
Personal: http://starbase.neosoft.com/~claird/home.html

Andreas Kupries

unread,
May 14, 2001, 11:54:17 AM5/14/01
to

"Richard.Suchenwirth" <Richard.S...@kst.siemens.de> writes:

> Zoran Vasiljevic wrote:

>> Richard.Suchenwirth wrote:

>>> By the way: does anyone of you use the
>>> array startsearch/anymore/nextelement/donesearch
>>> subcommands? They look so _clumsy_, compared to the crystal-clear
>>> foreach i [array names a] {...}

>> We do. For very large arrays in excess off 10k elements.

> But does that bring advantages? I understand that in the
> startsearch.. pattern, the iterator is kept on the C side. But even
> a large array is held in memory, so the resulting list from [array
> names] should normally also fit there..

Really ? The keys of the Tcl hashtables used to implement arrays are
currently 'char*' and not 'Tcl_Obj*'. The list created by [array
names] therefore contains duplicates of all keys, and not references
to the existing objects. Required memory doubles.

--
Sincerely,
Andreas Kupries <a.ku...@westend.com>
Developer @ <http://www.activestate.com/>
Private <http://www.purl.org/NET/akupries/>
-------------------------------------------------------------------------------

Petasis George

unread,
May 15, 2001, 2:05:42 AM5/15/01
to
"Donal K. Fellows" wrote:
>
> Chang LI wrote:
> > Richard.Suchenwirth wrote in message <3AFF82CB...@kst.siemens.de>...
> > >By the way: does anyone of you use the
> > > array startsearch/anymore/nextelement/donesearch
> > >subcommands? They look so _clumsy_, compared to the crystal-clear
> > > foreach i [array names a] {...}
> >
> > I have never used it. It is not necessary and should be deleted from Tcl.
>
> Whoa there! Just because you never use a feature doesn't mean that it
> should be removed. For people handling *BIG* arrays, that interface is
> useful as it provides a more direct interface to the Tcl_HashSearch
> stuff and it keeps the number of copies of the data down. People with
> physical_memory >> memory_needed_for_array have no feeling for this
> issue... :^)
>

Well, I have used it:-) Using array names on a hash table that contains
about 800.000 keys in Greek characters requires about 300 MB, and as I only
have 256 MB, I don't want to wait the swapping to finish if I can do the
same
job in nearly half memory. Of course, if I want to iterate over the keys
sorted (which is often needed to me) I have to do [lsort [array names...
and wait :-)

Also, the introduction of tcl objects as hash keys was a major improovement
to me, as it allowed me to use techniques for hashing various tcl object
in my application (an NLP platform), reducing memory requirements for about
15%! Exactly the same techniques, without object keys result in using
more memory:-) Many thanks to the ones contributed to this :-)

And finally a question. Is there a simple way to expose a C defined hash
table as a Tcl hash table? I.e., for debugging purposes to examine
what is inside a Tcl hash table manipulated by C code? I know I can
write a simple extension to expose it, but is there an easier way?

George

Uwe Gerken

unread,
May 15, 2001, 3:02:44 AM5/15/01
to
Kevin Kenny wrote:
>
<<SNIP>>

>
> Donal K. Fellows:
> > Whoa there! Just because you never use a feature doesn't mean that it
> > should be removed. For people handling *BIG* arrays, that interface is
> > useful as it provides a more direct interface to the Tcl_HashSearch
> > stuff and it keeps the number of copies of the data down. People with
> > physical_memory >> memory_needed_for_array have no feeling for this
> > issue... :^)
>
> Of course, the real solution to this sort of problem would be to implement
> lazy evaluation of [foreach x [array names y] {...}]. Right now we have
> an awkward situation, because the [array startsearch/anymore/nextelement/
> donesearch] commands are not only awkwardnessful, but also SLOW -- until
> [foreach x [array names y] {...}] starts thrashing.

I would call it piping. Instead of building up a huge list which is
consumed by the foreach command, i'd rather see a implementation which
can turn a list into a fifo.
This would be handy for other cases too; for example

foreach [glob *]

would become much faster (and less memory consuming) for big
directories.

Ciao!

Richard.Suchenwirth

unread,
May 15, 2001, 3:52:46 AM5/15/01
to

Another application: with a "piped"/"lazy" iterator, one could reduce
many needs for 'for' commands, if one could instead write
foreach i [range 1 10] {...}

Of course this can be done in pure Tcl:

proc range {from to {step 1}} {
set res {}
for {set i $from} {$i<=$to} {incr i $step} {lappend res $i}
set res
}
but on real huge ranges a "lazy ranger" could be helpful. (I remember
the Pythonites are proud of theirs ;-)

Donal K. Fellows

unread,
May 15, 2001, 4:39:49 AM5/15/01
to
Petasis George wrote:
> And finally a question. Is there a simple way to expose a C defined hash
> table as a Tcl hash table? I.e., for debugging purposes to examine
> what is inside a Tcl hash table manipulated by C code? I know I can
> write a simple extension to expose it, but is there an easier way?

Does your code use something like Tcl_TraceVar()? That seems to me to be
the easy route, though I'd hesitate to use any whole-array operations on
it (you can do it, but...)

Donal.
--
Donal K. Fellows http://www.cs.man.ac.uk/~fellowsd/ fell...@cs.man.ac.uk
-- US citizens? Remember, I rule the world in this scenario. They aren't
citizens of the US, unless that stands for United Stevenland.
-- Steven Odhner <ta...@primenet.com>

Donal K. Fellows

unread,
May 15, 2001, 4:44:25 AM5/15/01
to
Kevin Kenny wrote:
> Of course, the real solution to this sort of problem would be to implement
> lazy evaluation of [foreach x [array names y] {...}]. Right now we have
> an awkward situation, because the [array startsearch/anymore/nextelement/
> donesearch] commands are not only awkwardnessful, but also SLOW -- until
> [foreach x [array names y] {...}] starts thrashing.

[Memo to self] Tcl_Obj-enable the array search mechanism...

Petasis George

unread,
May 15, 2001, 7:07:40 AM5/15/01
to
"Donal K. Fellows" wrote:
>
> Petasis George wrote:
> > And finally a question. Is there a simple way to expose a C defined hash
> > table as a Tcl hash table? I.e., for debugging purposes to examine
> > what is inside a Tcl hash table manipulated by C code? I know I can
> > write a simple extension to expose it, but is there an easier way?
>
> Does your code use something like Tcl_TraceVar()? That seems to me to be
> the easy route, though I'd hesitate to use any whole-array operations on
> it (you can do it, but...)
>

No, I have never used it. Define an array var, place a trace on it for read
and do the appropriate actions? (I want only read access anyway...)
But probably parray won't work on it...
Perhaps the easiest think would be to mirror the array in another "real"
tcl array. It shouldn't use too much memory since I use tcl objects as
both keys and values in the array...

Thanks,

George

Andreas Leitgeb

unread,
May 15, 2001, 11:29:08 AM5/15/01
to
Cameron Laird <cla...@starbase.neosoft.com> wrote:
>... at which point [array startsearch ...] is
>slow--but [foreach ... [array names ...] ... ]
>is *painfully* slow.

it may be *painfully* slow for large arrays, but (afaik) as opposed to the
startsearch-version, you can delete and create array-elements inside
the loop.

and speed-degradation doesn't hurt so much to justify removing
the possibility of "editing the array" during iteration.

otoh, if startsearch were changed to correctly handle new array-elements, ...

--
Newsflash: Sproingy made it to the ground !
read more ... <http://avl.enemy.org/sproingy>

Andreas Kupries

unread,
May 15, 2001, 5:30:11 PM5/15/01
to

"Donal K. Fellows" <fell...@cs.man.ac.uk> writes:

> Kevin Kenny wrote:

>> Of course, the real solution to this sort of problem would be to
>> implement lazy evaluation of [foreach x [array names y] {...}].
>> Right now we have an awkward situation, because the [array
>> startsearch/anymore/nextelement/ donesearch] commands are not only
>> awkwardnessful, but also SLOW -- until [foreach x [array names y]
>> {...}] starts thrashing.

> [Memo to self] Tcl_Obj-enable the array search mechanism...

You will have to Tcl_Obj-enable the implementation of array variables
first. I mean, Tcl_HashTable now has the ability to use Tcl_Obj's as
keys, yes, but this ability is not used by array variables yet IIRC.

Richard.Suchenwirth

unread,
May 16, 2001, 2:57:41 AM5/16/01
to
Andreas Leitgeb wrote:
>
> Cameron Laird <cla...@starbase.neosoft.com> wrote:
> >... at which point [array startsearch ...] is
> >slow--but [foreach ... [array names ...] ... ]
> >is *painfully* slow.
>
> it may be *painfully* slow for large arrays, but (afaik) as opposed to the
> startsearch-version, you can delete and create array-elements inside
> the loop.
>
> and speed-degradation doesn't hurt so much to justify removing
> the possibility of "editing the array" during iteration.
>
Another advantage of [array names] is that you can filter the keys
you'll receive with a glob pattern:
array names db *foo*
gives you only those keys that contain the string "foo", while the
[array startsearch..] approach iterates over all keys (if you don't
terminate it early).
So I support Kevin Kenny's proposal: lazy evaluation of
[foreach x [array names y] {...}]
which would imply changes in both the foreach and array commands..

Donal K. Fellows

unread,
May 16, 2001, 8:47:10 AM5/16/01
to
Andreas Kupries wrote:
> "Donal K. Fellows" <fell...@cs.man.ac.uk> writes:
>> [Memo to self] Tcl_Obj-enable the array search mechanism...
>
> You will have to Tcl_Obj-enable the implementation of array variables
> first. I mean, Tcl_HashTable now has the ability to use Tcl_Obj's as
> keys, yes, but this ability is not used by array variables yet IIRC.

[Waves hand slowly in front of Andreas's face]
What are you talking about? I was referring to the fact that the code that
handles array searches parses the string coming in from the Tcl world each
time; calling strtol() several times for each iteration of the loop is not
efficient! Tcl_Obj-ifying the keys themselves is a separate issue...

-- Anyone using MFC desperatly needs a nasal enigma.
-- David Steuber <tras...@david-steuber.com>

Neil Madden

unread,
May 16, 2001, 10:42:10 AM5/16/01
to
One neat thing I have seen while using Haskell (http://www.haskell.org)
is the use of lazy evaluation to allow functions such as 'repeat' which
creates an infinitely long list (but because of lazy evaluation, it is
only as long as it needs to be). I also really like Haskell's
optimisation towards recursive functions, which allows some very elegant
definition of functions, which seems like something Tcl could benefit
from. If only I understood how it works...

--
--------------------------------------------------------------
Neil Madden. |
Personal: | Computer Science:
ne...@tallniel.co.uk | nem...@cs.nott.ac.uk
http://www.tallniel.co.uk | http://www.cs.nott.ac.uk/~nem00u
--------------------------------------------------------------

Andreas Kupries

unread,
May 16, 2001, 11:29:24 AM5/16/01
to

"Donal K. Fellows" <fell...@cs.man.ac.uk> writes:

> Andreas Kupries wrote:

>> "Donal K. Fellows" <fell...@cs.man.ac.uk> writes:
>>> [Memo to self] Tcl_Obj-enable the array search mechanism...

>> You will have to Tcl_Obj-enable the implementation of array
>> variables first. I mean, Tcl_HashTable now has the ability to use
>> Tcl_Obj's as keys, yes, but this ability is not used by array
>> variables yet IIRC.

> [Waves hand slowly in front of Andreas's face]

> What are you talking about? I was referring to the fact that the
> code that handles array searches parses the string coming in from
> the Tcl world each time; calling strtol() several times for each
> iteration of the loop is not efficient!

Ah. I misunderstood you completely, sorry.

And I've now decided to go to sleep today at midnight and not a minute
after. :)

> Tcl_Obj-ifying the keys themselves is a separate issue...

--

0 new messages