search table

5 views
Skip to first unread message

Serge D. Mechveliani

unread,
Mar 17, 2012, 9:43:52 AM3/17/12
to fricas...@googlegroups.com
People,
I need an advice:
how to organize a fast key search table in a Spad program?
where in the FriCAS materials it is written about this?

For example, a finite table i : Integer --> descr : CertainDescription.
Binary search/store methods of O( log(card(table)) ) are sufficient.
May be, I could use an array. But
a) I do not like them,
b) it is not nice to reserve any fixed memory space for this array for
this table
(for example, we do not reserve space for a list).

Thanks,

------
Sergei
mec...@botik.ru

Ralf Hemmecke

unread,
Mar 17, 2012, 2:11:30 PM3/17/12
to fricas...@googlegroups.com

Serge D. Mechveliani

unread,
Mar 17, 2012, 2:40:24 PM3/17/12
to fricas...@googlegroups.com
On Sat, Mar 17, 2012 at 07:11:30PM +0100, Ralf Hemmecke wrote:
> https://github.com/hemmecke/fricas-svn/blob/master/src/algebra/table.spad.pamphlet

Yes, I am looking into HasTable(Key, Entry, hashfn),
and even managed to use an example with HasTable(INT, String, "ID").

-------------------------------------------------------
INT ==> Integer
Item ==> Record(key : INT, entry : String)
Tab ==> HashTable(INT, String, "ID")

)abbrev package FOO Foo
Foo() : with
tab : () -> Tab
==
add
tab() ==
item1 := [1, "1"] :: Item
item2 := [2, "2"] :: Item
itemList : List Item := [item1, item2]
construct(itemList) $Tab
-------------------------------------------------------

(11) -> tab1:= tab()
(11) table(2= "2",1= "1") Type: HashTable(Integer,String,ID)

(12) -> search(2, tab1)
(12) "2" Type: Union(String,...)


Does "ID" mean to compare the keys as they are?
Will this be a binary search?
Key is declared of SetCategory.
And the binary search requires a total ordering on Key, which, I think,
is beyond SetCategory (?)
May be, this is some internally used ordering, I do not know ...
How to set hashfn is a bit unclear: "ID", "EQ" ...
So far, I am trying "ID" and hope for the binary search.

In the GHC library for Haskell, we have Ord key => Map key elt,
where Ord enables to compare the keys by `compare' and ">",
and this `compare' can be programmed by the user.

Thanks,

------
Sergei
mec...@botik.ru


Waldek Hebisch

unread,
Mar 17, 2012, 3:03:30 PM3/17/12
to fricas...@googlegroups.com

This is _hash_ table, no order is required. Currently valid
hashfn are EQ, ID, CVEC, EQL, UEQUAL, and EQUAL. EQ and ID
are the same and they check if arguments are the same thing
(that is two different strings consisting if the same
characters compare as non equal). EQL is good for numbers.
CVEC, UEQUAL and EQUAL means the same and compare objects
look the same. Normally you want EQUAL.

Actually, FriCAS HashTable domain is a tiny wrapper over
Lisp hash tables and semantics is directly taken from
Lisp. In practice this means that unless you are using one
of small set of domains the results will be incorrect,
in the sense that hash table uses its own notion of
equality.

Note that FriCAS has a Table domain, which automatically will
use HashTable if that works and otherwise use linear search
on lists (IIUC you want to avoid this).

--
Waldek Hebisch
heb...@math.uni.wroc.pl

Serge D. Mechveliani

unread,
Mar 17, 2012, 3:10:37 PM3/17/12
to fricas...@googlegroups.com
On Sat, Mar 17, 2012 at 10:40:24PM +0400, Serge D. Mechveliani wrote:
> On Sat, Mar 17, 2012 at 07:11:30PM +0100, Ralf Hemmecke wrote:
> > https://github.com/hemmecke/fricas-svn/blob/master/src/algebra/table.spad.pamphlet
>
> Yes, I am looking into HasTable(Key, Entry, hashfn),
> and even managed to use an example with HasTable(INT, String, "ID").
> [..]

However, the real example needs to be
HasTable(INT, ConstructedDomain, ...),

and docs requires Key, Entry : SetCategory,
and ConstructedDomain has Any inside it, and thus has not SetCategory.

Key: SetCategory looks necessary (one would even expect OrderedSet).
But how this SetCategory is used for Entry ?
If it is, for example, only for coercion to OutputForm, for printing,
then I coud provide ConstructedDomain with some reasonable
SetCategory instance.

Thanks,

------
Sergei
mec...@botik.ru

Serge D. Mechveliani

unread,
Mar 17, 2012, 3:31:30 PM3/17/12
to fricas...@googlegroups.com
On Sat, Mar 17, 2012 at 08:03:30PM +0100, Waldek Hebisch wrote:
> Serge D. Mechveliani wrote:
> >
> > On Sat, Mar 17, 2012 at 07:11:30PM +0100, Ralf Hemmecke wrote:
> > > https://github.com/hemmecke/fricas-svn/blob/master/src/algebra/table.spad.pamphlet
> >
> > -------------------------------------------------------
> > INT ==> Integer
> > Item ==> Record(key : INT, entry : String)
> > Tab ==> HashTable(INT, String, "ID")
> >
> > )abbrev package FOO Foo
> > Foo() : with
> > tab : () -> Tab
> > ==
> > add
> > tab() ==
> > item1 := [1, "1"] :: Item
> > item2 := [2, "2"] :: Item
> > itemList : List Item := [item1, item2]
> > construct(itemList) $Tab
> > -------------------------------------------------------
> > [..]

>
> This is _hash_ table, no order is required. Currently valid
> hashfn are EQ, ID, CVEC, EQL, UEQUAL, and EQUAL. EQ and ID
> are the same and they check if arguments are the same thing
> (that is two different strings consisting if the same
> characters compare as non equal). EQL is good for numbers.
> CVEC, UEQUAL and EQUAL means the same and compare objects
> look the same. Normally you want EQUAL.
>
> Actually, FriCAS HashTable domain is a tiny wrapper over
> Lisp hash tables and semantics is directly taken from
> Lisp. In practice this means that unless you are using one
> of small set of domains the results will be incorrect,
> in the sense that hash table uses its own notion of
> equality.
>
> Note that FriCAS has a Table domain, which automatically will
> use HashTable if that works and otherwise use linear search
> on lists (IIUC you want to avoid this).
>

I need Appropriate_Table(key : Integer, entry : ConstructedDomain)

for a fast search/insert of O(log(card table)),

Also ConstructedDomain has `Any' inside it, and it would be better
to avoid SetCategory for ConstructedDomain.

If this is difficult to organize, it would remain to try Array.

Regards,

------
Sergei
mec...@botik.ru

Ralf Hemmecke

unread,
Mar 17, 2012, 5:00:16 PM3/17/12
to fricas...@googlegroups.com
> I need Appropriate_Table(key : Integer, entry : ConstructedDomain)
>
> for a fast search/insert of O(log(card table)),
>
> Also ConstructedDomain has `Any' inside it, and it would be better
> to avoid SetCategory for ConstructedDomain.
>
> If this is difficult to organize, it would remain to try Array.

Workaround:

Turn ConstructedDomain into something of type SetCategory. You basically
only have to provide

((x: %) = (y: %)): Boolean == false
coerce(x: %): OutputForm == "<unprintable>" :: OutputForm

Since you are probably never going to use those faked implementations,
it would be the easiest workaround. But look also into the respective
implementation of HashTable. Of course, you shouldn't call function from
HashTable that rely on "Entry" being of type "SetCategory". But these
are only a few. (I haven't actually checked. Coercion to OutputForm
should be OK. It's just that the output is pretty useless.)

Personally, I tend to agree that there should be a hashtable that does
not impose SetCategory on its second argument. Of course, one cannot
output such a table, but maybe the user doesn't want output anyway.

Ralf

Serge D. Mechveliani

unread,
Mar 18, 2012, 11:57:24 AM3/18/12
to fricas...@googlegroups.com

I see. Thank you. The workaround looks all right.
But what will be cost of search/insert for this
HashTable(Integer, ConstructedDomain, ?) ?
Regards,

------
Sergei
mec...@botik.ru

Ralf Hemmecke

unread,
Mar 18, 2012, 1:33:54 PM3/18/12
to fricas...@googlegroups.com
> But what will be cost of search/insert for this
> HashTable(Integer, ConstructedDomain, ?) ?

Waldek wrote
http://groups.google.com/group/fricas-devel/browse_thread/thread/e629652b3291e021#b7b44869dd0ab376
so I don't understand your question.
You find complexity values for Hashtables here
http://en.wikipedia.org/wiki/Hash_table

But maybe you want a binary search tree. There is one in FriCAS, but I
don't see whether it is matching your purpose.

Ralf

Serge D. Mechveliani

unread,
Mar 18, 2012, 2:56:35 PM3/18/12
to fricas...@googlegroups.com
On Sun, Mar 18, 2012 at 06:33:54PM +0100, Ralf Hemmecke wrote:
>> But what will be cost of search/insert for this
>> HashTable(Integer, ConstructedDomain, ?) ?
>
> Waldek wrote
> http://groups.google.com/group/fricas-devel/browse_thread/thread/e629652b3291e021#b7b44869dd0ab376
> so I don't understand your question.
> You find complexity values for Hashtables here
> http://en.wikipedia.org/wiki/Hash_table

I see. This is because I never used a hash table.


> But maybe you want a binary search tree. There is one in FriCAS, but I
> don't see whether it is matching your purpose.

I am looking into "Browse -- BinarySearchTree -- domains".
It shows BinarySearchTree(S) -- it has only S.
And it should be BinarySearchTree(Key, Entry), and each node should
contain a key value and an entry value.
May be, BinarySearchTree(S) fits defining a _finite set_,
but I need to define a _finite map_ Key -> Entry.
I do not understand Axiom at this point, may be, I am missing something.
I do not know, so far, I try HashTable.

Thanks,

------
Sergei
mec...@botik.ru

Waldek Hebisch

unread,
Mar 19, 2012, 1:24:07 PM3/19/12
to fricas...@googlegroups.com
Ralf Hemmecke wrote:
>
> Personally, I tend to agree that there should be a hashtable that does
> not impose SetCategory on its second argument. Of course, one cannot
> output such a table, but maybe the user doesn't want output anyway.
>

SetCategory is not very important, but BasicType (that is equality)
is crucial.

--
Waldek Hebisch
heb...@math.uni.wroc.pl

Ralf Hemmecke

unread,
Mar 19, 2012, 2:15:18 PM3/19/12
to fricas...@googlegroups.com
On 03/19/2012 06:24 PM, Waldek Hebisch wrote:
> Ralf Hemmecke wrote:
>>
>> Personally, I tend to agree that there should be a hashtable that does
>> not impose SetCategory on its second argument. Of course, one cannot
>> output such a table, but maybe the user doesn't want output anyway.

> SetCategory is not very important, but BasicType (that is equality)
> is crucial.

I just browsed a bit through the exports of HashTable. I cannot find a
real need for = for the "Entry" parameter. Maybe, I've overlooked something.

You might be right for the Axiom library, but there is conceptually no
need to impose any restrictions on the Entry type as long as one doesn't
expect too much in return.

https://svn.origo.ethz.ch/algebraist/trunk/aldor/lib/aldor/src/datastruc/sal_hash.as

Ralf

Waldek Hebisch

unread,
Mar 19, 2012, 2:29:38 PM3/19/12
to fricas...@googlegroups.com
Ralf Hemmecke wrote:
>
> On 03/19/2012 06:24 PM, Waldek Hebisch wrote:
> > Ralf Hemmecke wrote:
> >>
> >> Personally, I tend to agree that there should be a hashtable that does
> >> not impose SetCategory on its second argument. Of course, one cannot
> >> output such a table, but maybe the user doesn't want output anyway.
>
> > SetCategory is not very important, but BasicType (that is equality)
> > is crucial.
>
> I just browsed a bit through the exports of HashTable. I cannot find a
> real need for = for the "Entry" parameter. Maybe, I've overlooked something.
>

Well, HashTable uses Lisp "equality" (I use quotes because Lisp
gives you a few choices which not necessarly agree with your
notion of equality).

My point was about hashtable in general: by definition hashtable
is supposed to find entry with equal key, so you need equality
of keys to _specify_ what hashtable is doing. Of course,
you could specify different name for equality, but I am not
sure how much gain it is.

--
Waldek Hebisch
heb...@math.uni.wroc.pl

Ralf Hemmecke

unread,
Mar 19, 2012, 2:45:16 PM3/19/12
to fricas...@googlegroups.com
> My point was about hashtable in general: by definition hashtable
> is supposed to find entry with equal key, so you need equality
> of keys to _specify_ what hashtable is doing.

Right. So the Key parameter must have some concept of equality. I was
never saying anything else. I was solely talking about the "Entry"
parameter (i.e. the V parameter in the Aldor example).

Ralf

HashTable(K:PrimitiveType, V:Type, hash: K -> Z): TableType(K, V) with {
...
}

define PrimitiveType: Category == with {
=: (%, %) -> Boolean;
~=: (%, %) -> Boolean;
}

Serge D. Mechveliani

unread,
Mar 20, 2012, 9:11:02 AM3/20/12
to fricas...@googlegroups.com
On Mon, Mar 19, 2012 at 06:24:07PM +0100, Waldek Hebisch wrote:
> Ralf Hemmecke wrote:
> >
> > Personally, I tend to agree that there should be a hashtable that does
> > not impose SetCategory on its second argument. Of course, one cannot
> > output such a table, but maybe the user doesn't want output anyway.
> >
>
> SetCategory is not very important, but BasicType (that is equality)
> is crucial.


It is desirable to understand how this `=' is used for the second
(Entry) argument in the operations `search' and `insert!'.
I suspect that when insert!([key, entry], tab)
finds some key' = key in tab, then is compares entry = entry',
and acts futher depending on the result of this comparison.
Is this so?
If yes, then my application, probably, needs to pre-analyse
member?(key, tab) in order to avoid an expensive and unneeded operation
of entry = entry'.

And the main thing which I cannot understand about Tables in Axiom is:
whether it has a regular
balanced binary search tree

for defining a _finite map_ Key -> Entry.
For example, in the Glasgow Haskell library, data Map ...
has the internal representation like

data Tree kKey eEntry =
Leaf kKey eEntry | Node (Tree kKey eEntry) (Tree kKey eEntry)
-- left branch right branch
(may be, not precisely this).
If any total ordering `compare' (of Ord) is declared for kKey, then
* the operations of (lookup key tab) and (insert key entry tab)
are possible,
* they cost O( log(card(table)) ),
* lookup is by applying compare key key' (or key > key') several
times in a binary search,
* insert also implies re-balancing of a tree, but it is also of
O( log(card(table)) ).
The thing is explained clearly in docs and also works reliably
(it is a pity that it is not in the Standard).

And in Axiom, HashTable and BinarySeachTree look like something
different and difficult to understand.
For BinarySeachTree, I do not see how to search a value for a key
(because it has only one type parameter S).
For HashTable, I cannot tell what may be (in my example), for example,
the cost of `insert!'.
Meanwhile, I use HashTable. But in future I shall need to rewise its
usage, for the thing may ruin the performance.

Regards,

------
Sergei
mec...@botik.ru


Waldek Hebisch

unread,
Mar 20, 2012, 10:15:41 AM3/20/12
to fricas...@googlegroups.com
Serge D. Mechveliani wrote:
>
> On Mon, Mar 19, 2012 at 06:24:07PM +0100, Waldek Hebisch wrote:
> > Ralf Hemmecke wrote:
> > >
> > > Personally, I tend to agree that there should be a hashtable that does
> > > not impose SetCategory on its second argument. Of course, one cannot
> > > output such a table, but maybe the user doesn't want output anyway.
> > >
> >
> > SetCategory is not very important, but BasicType (that is equality)
> > is crucial.
>
>
> It is desirable to understand how this `=' is used for the second
> (Entry) argument in the operations `search' and `insert!'.

Not used at all. I meant equality of keys. AFAICS SetCategory
for Entry is used to define bunch of extra functions which
are uninteresting to you. In fact, we probably should not
require SetCategory for Entry and define those functions
only conditionally.

>
> And the main thing which I cannot understand about Tables in Axiom is:
> whether it has a regular
> balanced binary search tree
>
> for defining a _finite map_ Key -> Entry.

No.

> And in Axiom, HashTable and BinarySeachTree look like something
> different and difficult to understand.
> For BinarySeachTree, I do not see how to search a value for a key
> (because it has only one type parameter S).

This is not the main problem (you could use a record with key and
entry components as S) -- the real problem is that search function
is unimplemented. The second problem is that it performs no
balancing (so the worst case is linear).

> For HashTable, I cannot tell what may be (in my example), for example,
> the cost of `insert!'.

Hash tables give very good average access time assuming that
hash function produces values which are pseudorandom enough.
Lisp hash tables may have bad performance for some types of
keys (basically arrays), because some implementations use
the same hash function for all types and language definition
essentially forbids defining a single hash function which
works well for all types.

> Meanwhile, I use HashTable. But in future I shall need to rewise its
> usage, for the thing may ruin the performance.

As I wrote, there are limitations on types of keys. Basically,
if you have record with more than two components or an
array inside the keys, then it will not work. In particular,
you can not use Type as component of key. Any in itself is
OK because it uses two component record (which is OK) and
hidden compoment is an SExpression which is OK too. But
of course the actual value stored in Any must be of appropriate
type...

AFAICS your ConstructedDomain contains Type inside. This
is tricky case, it will work most of the time but is not
guaranteed to always work. Basically, as optimization FriCAS
runtime tries to construct given type only once. This
is done by keeping a cache of already constructed types.
But size of cache is limited and if you construct so many types
that you overflow the cache then FriCAS may construct the
same type second time. Technically it means that most of
the time Lisp EQ function will correctly compute equality
of types, but it type is constructed second time EQ will
return false and one needs to use more complicated function
to establish equality. For types HashTable will use Lisp
EQ, so it may think that two types which are in equal are
not and give wrong results.

Also, your ConstructedDomain seem to store almost arbitrary
FriCAS values inside. Of popular domains Expression and
DistributedMultivariatePolynomial are not good as keys.

So it seems that you can not use your ConstructedDomain as
HashTable keys.

--
Waldek Hebisch
heb...@math.uni.wroc.pl

Serge D. Mechveliani

unread,
Mar 20, 2012, 10:51:08 AM3/20/12
to fricas...@googlegroups.com
On Tue, Mar 20, 2012 at 03:15:41PM +0100, Waldek Hebisch wrote:
> Serge D. Mechveliani wrote:
> >
> > On Mon, Mar 19, 2012 at 06:24:07PM +0100, Waldek Hebisch wrote:
> > > Ralf Hemmecke wrote:
> > > >
> > > > Personally, I tend to agree that there should be a hashtable that does
> > > > not impose SetCategory on its second argument. Of course, one cannot
> > > > output such a table, but maybe the user doesn't want output anyway.
> > > >
> > >
> > > SetCategory is not very important, but BasicType (that is equality)
> > > is crucial.
> >
> >
> > It is desirable to understand how this `=' is used for the second
> > (Entry) argument in the operations `search' and `insert!'.
>
> Not used at all. I meant equality of keys. AFAICS SetCategory
> for Entry is used to define bunch of extra functions which
> are uninteresting to you. In fact, we probably should not
> require SetCategory for Entry and define those functions
> only conditionally.

Yes.


> > For HashTable, I cannot tell what may be (in my example), for example,
> > the cost of `insert!'.

> [..]


> > Meanwhile, I use HashTable. But in future I shall need to rewise its
> > usage, for the thing may ruin the performance.
>
> As I wrote, there are limitations on types of keys. Basically,
> if you have record with more than two components or an
> array inside the keys, then it will not work.

I hope that Key := Integer
is enough for my purpose.

> AFAICS your ConstructedDomain contains Type inside. This
> is tricky case, it will work most of the time but is not
> guaranteed to always work. Basically, as optimization FriCAS
> runtime tries to construct given type only once. This
> is done by keeping a cache of already constructed types.
> But size of cache is limited and if you construct so many types
> that you overflow the cache then FriCAS may construct the
> same type second time. Technically it means that most of
> the time Lisp EQ function will correctly compute equality
> of types, but it type is constructed second time EQ will
> return false and one needs to use more complicated function
> to establish equality.

> [..]

I have HashTable(Key, Entry),
where Key := Integer; Entry := ConstructedDomain.

And from your above notes I conclude that `search' and `insert!'
certainly _will not_ apply equality to any part of ConstructedDomain.
On the other hand, Axiom insists on declaring SetCategory for
ConstructedDomain.
Also I need "=" for ConstructedDomain for the
_needs other than HasTable_.
Its implementation needs to be like this:
build from cd : ConstructedDomain a certain symbolic representation
descr : InputForm, and put cd = cd' == descr(cd) = descr(cd').
This descr is something instead of SExpression.
It may have a considerable cost,
and I hope that `search' and `insert!' for
HasTable(Integer, ConstructedDomain)
_will not_ involve applyig this "=" for ConstructedDomain.

> So it seems that you can not use your ConstructedDomain as
> HashTable keys.

ConstructedDomain is used for Entry, not for Key.

Thanks,

------
Sergei
mec...@botik.ru

Waldek Hebisch

unread,
Mar 20, 2012, 11:02:43 AM3/20/12
to fricas...@googlegroups.com
Serge D. Mechveliani wrote:
>
> > > For HashTable, I cannot tell what may be (in my example), for example,
> > > the cost of `insert!'.
> > [..]
> > > Meanwhile, I use HashTable. But in future I shall need to rewise its
> > > usage, for the thing may ruin the performance.
> >
> > As I wrote, there are limitations on types of keys. Basically,
> > if you have record with more than two components or an
> > array inside the keys, then it will not work.
>
> I hope that Key := Integer
> is enough for my purpose.

I wonder why are you using search for Integer keys. If
your keys form a consecutive range then array (FlexibleArray
if size is variable) is more appropriate.

--
Waldek Hebisch
heb...@math.uni.wroc.pl

Serge D. Mechveliani

unread,
Mar 20, 2012, 2:19:30 PM3/20/12
to fricas...@googlegroups.com
On Tue, Mar 20, 2012 at 04:02:43PM +0100, Waldek Hebisch wrote:
> Serge D. Mechveliani wrote:
> >
[..]
> > > > Meanwhile, I use HashTable. But in future I shall need to rewise its
> > > > usage, for the thing may ruin the performance.
> > >
[..]

> >
> > I hope that Key := Integer
> > is enough for my purpose.
>
> I wonder why are you using search for Integer keys. If
> your keys form a consecutive range then array (FlexibleArray
> if size is variable) is more appropriate.


* I was not aware of FlexibleArray.
* In principle, when there are n items, does this FlexibleArray need
a _continuous_ space in the memory of size O(n) ?

Thanks,

------
Sergei
mec...@botik.ru

Waldek Hebisch

unread,
Mar 20, 2012, 2:43:29 PM3/20/12
to fricas...@googlegroups.com
Serge D. Mechveliani wrote:
>
> * I was not aware of FlexibleArray.
> * In principle, when there are n items, does this FlexibleArray need
> a _continuous_ space in the memory of size O(n) ?

Yes. Are you worried by this continuity requirement? Note that
even when using nonrelocating garbage collector in worst
you need 2Nlog_2(N) memory to satisfy arbitrary sequence of
allocations and deallocation which at any given time sum
of allocated memory does not exceed N. For realistic sizes
log(N) is of order 30, which frequently is less than constants
hidden in big O notation. In practice on 64-bit machines
paging will reduce this worst case factor from 30 to something
like 8, and probablistically anything sigificantly bigger than 1
is unlikely.

And of Lisp we use only ecl uses non-relocating garbage collector.

BTW. If something like list (or a tree) os discontinuous, then
accessing new node is likely to cause cache misses. Excessive
cache misses may slow down program by factors of order 50-200.
So, you _really_ want to keep data in continuous space, because
otherwise you risk significat slowdown.

--
Waldek Hebisch
heb...@math.uni.wroc.pl

Reply all
Reply to author
Forward
0 new messages