Easydict (dictionaries wrapper)

28 views
Skip to first unread message

Jan-Pieter Jacobs

unread,
Feb 25, 2026, 6:55:39 PM (8 days ago) Feb 25
to fo...@jsoftware.com
Hi,

Given recent discussions about dictionaries and symbols, I had made an addon "EasyDict" that extends the data/dictionaries with a lazy creation mechanism, and string-based options (for now, only options that take strings or simple numbers are supported); both with the aim or reducing ceremony and typing required to play around with dictionaries. The underlying dictionary is created upon the first put, with items in both keys and values being records in the dictionary. Types are set based on what is provided. I also extended "put" to work as ''&put when called monadically.

Usage:

install'github:jpjacobs/data_easydict' 
require'data/easydict'
   foo=: easydict 'hash concurrent' NB. creation, no backing dict yet.
   has__foo 1 2 3 NB. Some methods have a (according to me) reasonable placeholders before lazy initialization.
0 0 0
   count__foo ''
0
   min__foo ''    NB. defined as for empty dictionary.
|index error: min__foo
|       min__foo''
   'abc' put__foo  i. 3 4 NB. initial put sets value/key types to literal and integer and shapes to i.0 and ,4 .

After the recent discussion on symbols, I tried to extend and render thread-safe the proposal of Henry, and added it to the easydict addon as well (might split them in the future).

I made the symbol table a class, so multiple symbol tables can be created if needed. Compared to s:, the data stored is extended to any boxed data (not only literals).
"sym__st" corresponds to s: and takes boxed data arrays, and stores each box in the dictionary in the symbol table st and its data array, and returns symbol indices into the data array. The shape of the array is maintained.
"dat__st" corresponds to s:^:_1 does the opposite, looking up data given symbol indices.
The type of the dictionary is a tree, as I planned to allow comparisons between symbols as is currently possible.

   st =: symbols '' NB. create symbol table, ignores its argument.
   sym__st <"+ i._3 _3 NB. symbols
0 1 2
3 4 5
6 7 8
   dat__st 6 4 2 NB. Lookup data for symbols.
┌─┬─┬─┐
│2│4│6│
└─┴─┴─┘

It should be thread-safe (though untested; let me know if you do;) ) and tries to do insertions in bulk.

Things I wondered about when implementing this:
- trees would probably allow grading/sorting of keys (given that there's order, min, max, ranges etc). Are such methods planned to be implemented? I know I could probably use keyhash directly between pairs of key data and implement a quicksort or the like, but this seems wasteful: data has to be copied, and hashing performed for each comparison, rather than just looking it up in the tree.
- As Pascal, I would be a big fan of reverse lookup: It would avoid the need for separately keeping the data array, which seems to duplicate data needlessly, since the dictionary is already storing the keys (i.e. data the symbols represent) anyhow (and not only the hashes, as the key data itself can be retrieved). It would seem even more wasteful to keep 2 dictionaries, both storing both keys and values...

To my feeling, both of the above are very much related aspects.

Hope this is useful to someone.

Best regards,
Jan-Pieter

Henry Rich

unread,
Feb 25, 2026, 7:22:11 PM (8 days ago) Feb 25
to forum
Thanks for this, J-P. It will make it easier to wean users off symbols. 

The tree is a balanced binary search tree, so the keys are always stored in sorted order. Returns from mget are in ascending key order. 

Agreed that storing keys & values twice is wasteful. Maybe in the future we will allow keeping both maps automatically. That wouldn't provide inverted maps for columns of the value, though. 

Henry Rich

To unsubscribe from this group and stop receiving emails from it, send an email to forum+un...@jsoftware.com.

Raul Miller

unread,
Feb 26, 2026, 10:21:40 AM (8 days ago) Feb 26
to fo...@jsoftware.com
Wait, if you're using the keys and values twice, wouldn't you be
incrementing the ref counts on the underlying nouns?

If dictionaries are making needless copies of distinct nouns, that
sounds like a bigger problem.

--
Raul

Henry Rich

unread,
Feb 26, 2026, 2:38:39 PM (8 days ago) Feb 26
to forum
The dictionary copies the data into its array. That's better than pointing to the separate kvs. 

Henry Rich

Jan-Pieter Jacobs

unread,
Feb 26, 2026, 5:29:58 PM (7 days ago) Feb 26
to fo...@jsoftware.com
Hi Henry, 

While mget allows some cases of retrieving multiple contiguous keys/values, it does not allow a list of arbitrarily non-contiguous keys to be sorted. This would be needed to implement a sorting behaviour as s: now provides.

If the get,min,max etc verbs could have an option or alternative that returns references (perhaps the hashes?) to the kv data, I think most voiced concerns could be addressed:
These could serve instead of increasing integer symbols. Maybe this is wishful thinking, as I don't know whether unique hashes are readily available in the dict structure...

A data lookup should be present as well then (preferably both ref-> data and data-> ref for both key and value). Querying a non-existing reference or data would be an index error.

This would simplify the symbol table example a lot, no longer requiring external data, nor values in the dictionary to be carefully picked based on whether the values are already present or not. As such the symbol table would simply be a dictionary.

Only thing required for sorting would a grade_keys and grade_hashes methods; consistent with /: (i.e. dyadic does sort).

Thanks,

Jan-Pieter

Henry Rich

unread,
Feb 26, 2026, 5:51:49 PM (7 days ago) Feb 26
to forum
Hashing is of no avail since different strings may have the same hash. 

If you use my simple symbols model, which keeps a table of keys corresponding to each symbol number, you would sort the symbols by

symnums ([ /: {) keylist

This is exactly what happens when you short symbols now. 

Henry Rich

Pascal Jasmin

unread,
Feb 26, 2026, 6:02:47 PM (7 days ago) Feb 26
to fo...@jsoftware.com
Hi Henry, sorry for so many requests but,

any thought on 'aa' , 'aa  ', and 'aa                   '  all hashing to same value, and referencing the same key, so that string keys of "natural language" could be used instead of boxed strings?  There's no documentation on "custom hashing functions" even if an alternative to 16!:0 seems possible.  Would 16!:0@dltb work for this request?

Henry Rich

unread,
Feb 26, 2026, 9:14:24 PM (7 days ago) Feb 26
to forum
Making the variants hash to the same value wouldn't help unless they also compared equal. You could do that with a custom comparison function, but it would be more efficient to convert strings to a canonical form before sending them to the dictionary. 

Henry Rich

Pascal Jasmin

unread,
Feb 26, 2026, 9:53:21 PM (7 days ago) Feb 26
to fo...@jsoftware.com
can a dictionary hold string keys 'abc' and 'aa' without expanding 'aa' to 'aa ' ?

bill lam

unread,
Feb 26, 2026, 9:59:15 PM (7 days ago) Feb 26
to fo...@jsoftware.com
IIRC symbol consider ascii strings (codepoint below 128) of different types LIT and unicode equal. 

Henry Rich

unread,
Feb 26, 2026, 10:29:05 PM (7 days ago) Feb 26
to forum
Make the strings boxed.

Henry Rich

Jan-Pieter Jacobs

unread,
Feb 27, 2026, 12:58:52 AM (7 days ago) Feb 27
to fo...@jsoftware.com
Yes, sure it can be done outside of the dictionary.

However it would require:
- another copy of the key data
- sorting arbitrary (boxed data)

Especially when data is large, this is bound to be slower than:
- using a mapping/references interal to the dictionary which would not require a copy of the data
- using the order the tree dictionary has stored anyhow, instead of sorting the full data.

Hence my suggestion.

Jan-Pieter

Henry Rich

unread,
Feb 27, 2026, 1:58:52 AM (7 days ago) Feb 27
to forum
Concerning the storage, yes, for the special case of symbols you could have the value-to-key mapping associate a key index with each value. This might cause problems if symbols could be deleted. I didn't propose this because it would require additions to the dictionary that I'm not convinced are justified. 

Concerning the ordering, I think you have a wrong idea. The keys are kept in sorted order, but that doesn't help in ordering a subset of the keys - you still have to sort the subset. 

Henry Rich
Reply all
Reply to author
Forward
0 new messages