Explanation of the design of dictionaries

56 views
Skip to first unread message

Marcin Żołek

unread,
Feb 28, 2026, 7:36:54 AM (6 days ago) Feb 28
to Forum

There have been many suggestions and ideas on the forum regarding the design of dictionaries. However, most of these discussions have remained purely theoretical and were not supported by concrete examples showing practical benefits.

I would like to explain the current design of dictionaries using 3 specific problems from 3 different areas of computer science in which the use of current dictionaries addon leads to improvements in time or memory complexity. These examples also illustrate that J was not incomplete without dictionaries, but dictionaries serve as a powerful addon that is particularly useful for a certain class of problems.

Informally speaking, dictionaries are especially beneficial in algorithms that maintain and continuously update a state, where each update affects only a small part of that state. At the same time, the algorithm must always have access to the current state. In such cases, dictionaries can significantly simplify the implementation and improve efficiency.

1. Dijkstra's algorithm for shortest paths in weighted graphs (https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm)

Dijkstra’s algorithm has many variants, but they can essentially be divided into two main categories:
a. Dense-graph variant, implemented using arrays.
b. Sparse-graph variant, implemented using a priority queue.
For dense graphs, where the number of edges E is close to V^2, where V is number of vertices, the dense variant can be implemented in J without dictionaries and is often faster in practice when the graph is nearly complete. For sparse graphs, however, using a priority queue significantly improves performance. In this variant, a tree dictionary can be used as priority queue. This leads to a time complexity of O(E log E). When E << V^2, this approach is asymptotically and practically more efficient.

Implementation:
https://code.jsoftware.com/wiki/Vocabulary/Dictionaries_(addon)#Applications 
https://github.com/jsoftware/jsource/blob/master/test/gdic.ijs#L614 

2. Context Tree Weighting (CTW) method for lossless compression (https://ieeexplore.ieee.org/document/382012)

A dictionary is used to represent a context tree, which is a complete binary tree whose depth equals the context length (e.g. D=100). Each node of the tree stores estimated probabilities that are updated during sequence processing (both compression and decompression). Binary tree can be represented as dictionary with (extended) integer keys. The root has key 1, and for every node with key i, its left and right children are assigned keys 2i and 2i + 1. This indexing corresponds to the standard array representation of a binary tree. However, a complete binary tree of depth D would require O(2^D) memory. For D = 100, such a representation is clearly infeasible. The CTW algorithm guarantees that only O(n*D) nodes are ever used, where n is the length of the processed sequence. Using a dictionary instead of an array allows us to store only the nodes that are actually visited or updated, so we can handle large contexts without using exponential memory.

Implementation: https://github.com/jsoftware/jsource/blob/master/test/gdic.ijs#L762 

3. Sweep line technique in computational geometry (https://en.wikipedia.org/wiki/Sweep_line_algorithm)

This technique is widely used in computational geometry algorithms, where geometric objects (typically points and line segments) are processed in a specific order. The following implementation solves an entertaining geometric problem: partitioning a set of horizontal line segments into disjoint components such that, within each component, every pair of segments is mutually reachable. Two segments are considered reachable if there exists a path that moves along segments, allowing jumps of at most H and falls of at most H. In other words, we construct connected components under a equivalence reachability relation.

Implementation: https://github.com/jsoftware/jsource/blob/master/test/gdic.ijs#L691 


Regarding the suggestions on the forum:

Since the dictionary is implemented as an addon, it does not overload the J primitives in NuVoc. This avoids introducing unnecessary complexity into the NuVoc definitions and preserves their clarity. At the same time, it provides greater flexibility when extending the dictionary with advanced features that are not naturally expressible as single J primitives, such as range queries in a tree dictionary.

The dictionaries addon is NOT an implementation of purely functional data structures (also known as persistent data structures https://en.wikipedia.org/wiki/Persistent_data_structure). In the problems described above, using a purely functional dictionary would provide no algorithmic benefit and would result in significantly worse performance due to the overhead of maintaining persistence. In a purely functional approach, the dictionary would be usually accumulated using a fold: the verb inside the fold would return a new dictionary at each step, constructed from the previous dictionary and the current element of the sequence. Persistent data structures ensure that old and new dictionaries share most of memory. In contrast, the current approach relies on a single mutable dictionary instance. The dictionary is updated in place, so there is no need to accumulate it explicitly through the fold. If no additional state is being accumulated, using a fold in this setting becomes equivalent to a for loop, with the updates applied directly to the same dictionary object.

Also, the current Dictionaries addon is designed for efficient in-place updates to modify a small part of a large state while requiring immediate access to the current state. Such a design is compatible with multithreaded execution, so multithreading support has been implemented for more complicated parallel algorithms.

Moreover, the addon is fully compatible with J’s rank concept. It follows the same principles as the dyadic i. verb, ensuring consistency with core language and preserving the array-oriented design of J.

Are there any examples of use cases for purely functional dictionaries in J that cannot be easily implemented using exiting simple static dictionary (m&i. special combination) or the current dictionary addon? Concrete examples of problems/use cases would be especially welcome and very helpful in further development.


Marcin

Henry Rich

unread,
Feb 28, 2026, 12:26:44 PM (6 days ago) Feb 28
to forum
Marcin explains perfectly what we were trying to achieve. 

[Marcin and I were yokefellows in the design and implementation. He created the examples on his own.]

Henry Rich

To unsubscribe from this group and stop receiving emails from it, send an email to forum+un...@jsoftware.com.

Pascal Jasmin

unread,
Mar 2, 2026, 12:33:25 PM (4 days ago) Mar 2
to fo...@jsoftware.com


I fully understtand the need to focus on in place optimization for near term releases.  Easily 99% of dictionary update cases are for a permanent dictionary, and temporary ones/clones can easily be J kv columns without any optimizations, and likely used to merge into some other dictionary/structure.

Focusing on the public dict variable inside dictionary class, it is a collection of 7 boxes holding pointers (though first box is very strangely the text of the J type function) of mostly undisplayable data.  This is fine, as is the imperative only focus for initial implementation.

Because I hate OOP, I will develop a functional interface where even if "pointer structure"  held in a J variable is directly modified, that the put/del functions return that "pointer structure variable" to facilitate tacit code chains and adverbial programming. 

OOP approaches are not only terrible for clean (according to me, with syntax sugar bias) reusable code patterns, dictionaries are the perfect alternative to OOP "organization of properties into a single variable"

Another application of functional first implementations is that you can first see if your modification code produces a correct result, and after confirmation, make the change permanent.

my KV implementation does these fine, and with parallel functions can serve as staging prior to modifying a dictionary.  It is fine for small class OOP replacement, because as Henry mentioned, index hashing can cost more than searching through a small set.  I'm satisfied with symbol replacement approach, which may save significant memory storing a byte for "key universes under 256 in size".

I can abandon personal request for clone functionality, no matter how easy it is to implement.

the functions that are enabled for 'tree' structure allow key/value extraction.  It they could also apply to 'hash' structure in insertion rather than sorted order, that would be great.  But key/value extraction even if it is not data accurate to the millisecond under concurrent access, or slow.  It is just table stakes for a dictionary, where workflows need to verify that operations are working as intendend. 

Henry Rich

unread,
Mar 2, 2026, 12:39:34 PM (4 days ago) Mar 2
to fo...@jsoftware.com
unload from a hash dictionary will return the kvs in accidental order. 
It starts out as the order of insertion but if there are deletions the
holes are filled with later kvs.

I reiterate, /DO NOT/ look at the dict variable.  It is subject to
change.  Use load/unload.


Henry Rich

Pascal Jasmin

unread,
Mar 2, 2026, 1:28:50 PM (4 days ago) Mar 2
to fo...@jsoftware.com
>  /DO NOT/ look at the dict variable.

I understand your warning in the sense that I will be forced to update my adaptation of jdictionary class in the future, rather than a dict or equivalent variable will cease to exist, and 16!: type functions will be used to manipulate it.  I accept that any near future implementations are not a locked interface.

Pawel Jakubas

unread,
Mar 4, 2026, 6:36:32 AM (2 days ago) Mar 4
to forum, Pascal Jasmin
I am following the discussion with a great interest.
I am very fond of the explanation Marcin presented and would advocate for adding it to dictionary documentation.
Cheers,
Pawel

Reply all
Reply to author
Forward
0 new messages