Hi,
Given recent discussions about dictionaries and symbols, I had made an addon "EasyDict" that extends the data/dictionaries with a lazy creation mechanism, and string-based options (for now, only options that take strings or simple numbers are supported); both with the aim or reducing ceremony and typing required to play around with dictionaries. The underlying dictionary is created upon the first put, with items in both keys and values being records in the dictionary. Types are set based on what is provided. I also extended "put" to work as ''&put when called monadically.
Usage:
install'github:jpjacobs/data_easydict'
require'data/easydict'
foo=: easydict 'hash concurrent' NB. creation, no backing dict yet.
has__foo 1 2 3 NB. Some methods have a (according to me) reasonable placeholders before lazy initialization.
0 0 0
count__foo ''
0
min__foo '' NB. defined as for empty dictionary.
|index error: min__foo
| min__foo''
'abc' put__foo i. 3 4 NB. initial put sets value/key types to literal and integer and shapes to i.0 and ,4 .
After the recent discussion on symbols, I tried to extend and render thread-safe the proposal of Henry, and added it to the easydict addon as well (might split them in the future).
I made the symbol table a class, so multiple symbol tables can be created if needed. Compared to s:, the data stored is extended to any boxed data (not only literals).
"sym__st" corresponds to s: and takes boxed data arrays, and stores each box in the dictionary in the symbol table st and its data array, and returns symbol indices into the data array. The shape of the array is maintained.
"dat__st" corresponds to s:^:_1 does the opposite, looking up data given symbol indices.
The type of the dictionary is a tree, as I planned to allow comparisons between symbols as is currently possible.
st =: symbols '' NB. create symbol table, ignores its argument.
sym__st <"+ i._3 _3 NB. symbols
0 1 2
3 4 5
6 7 8
dat__st 6 4 2 NB. Lookup data for symbols.
┌─┬─┬─┐
│2│4│6│
└─┴─┴─┘
It should be thread-safe (though untested; let me know if you do;) ) and tries to do insertions in bulk.
Things I wondered about when implementing this:
- trees would probably allow grading/sorting of keys (given that there's order, min, max, ranges etc). Are such methods planned to be implemented? I know I could probably use keyhash directly between pairs of key data and implement a quicksort or the like, but this seems wasteful: data has to be copied, and hashing performed for each comparison, rather than just looking it up in the tree.
- As Pascal, I would be a big fan of reverse lookup: It would avoid the need for separately keeping the data array, which seems to duplicate data needlessly, since the dictionary is already storing the keys (i.e. data the symbols represent) anyhow (and not only the hashes, as the key data itself can be retrieved). It would seem even more wasteful to keep 2 dictionaries, both storing both keys and values...
To my feeling, both of the above are very much related aspects.
Hope this is useful to someone.
Best regards,
Jan-Pieter