There have been many suggestions and ideas on the forum regarding the design of dictionaries. However, most of these discussions have remained purely theoretical and were not supported by concrete examples showing practical benefits.
I would like to explain the current design of dictionaries using 3 specific problems from 3 different areas of computer science in which the use of current dictionaries addon leads to improvements in time or memory complexity. These examples also illustrate that J was not incomplete without dictionaries, but dictionaries serve as a powerful addon that is particularly useful for a certain class of problems.
Informally speaking, dictionaries are especially beneficial in algorithms that maintain and continuously update a state, where each update affects only a small part of that state. At the same time, the algorithm must always have access to the current state. In such cases, dictionaries can significantly simplify the implementation and improve efficiency.
1. Dijkstra's algorithm for shortest paths in weighted graphs (https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm)
Dijkstra’s algorithm has many variants, but they can essentially be divided into two main categories:
a. Dense-graph variant, implemented using arrays.
b. Sparse-graph variant, implemented using a priority queue.
For dense graphs, where the number of edges E is close to V^2, where V is number of vertices, the dense variant can be implemented in J without dictionaries and is often faster in practice when the graph is nearly complete. For sparse graphs, however, using a priority queue significantly improves performance. In this variant, a tree dictionary can be used as priority queue. This leads to a time complexity of O(E log E). When E << V^2, this approach is asymptotically and practically more efficient.
Implementation:
https://code.jsoftware.com/wiki/Vocabulary/Dictionaries_(addon)#Applications
https://github.com/jsoftware/jsource/blob/master/test/gdic.ijs#L614
2. Context Tree Weighting (CTW) method for lossless compression (https://ieeexplore.ieee.org/document/382012)
A dictionary is used to represent a context tree, which is a complete binary tree whose depth equals the context length (e.g. D=100). Each node of the tree stores estimated probabilities that are updated during sequence processing (both compression and decompression). Binary tree can be represented as dictionary with (extended) integer keys. The root has key 1, and for every node with key i, its left and right children are assigned keys 2i and 2i + 1. This indexing corresponds to the standard array representation of a binary tree. However, a complete binary tree of depth D would require O(2^D) memory. For D = 100, such a representation is clearly infeasible. The CTW algorithm guarantees that only O(n*D) nodes are ever used, where n is the length of the processed sequence. Using a dictionary instead of an array allows us to store only the nodes that are actually visited or updated, so we can handle large contexts without using exponential memory.
Implementation: https://github.com/jsoftware/jsource/blob/master/test/gdic.ijs#L762
3. Sweep line technique in computational geometry (https://en.wikipedia.org/wiki/Sweep_line_algorithm)
This technique is widely used in computational geometry algorithms, where geometric objects (typically points and line segments) are processed in a specific order. The following implementation solves an entertaining geometric problem: partitioning a set of horizontal line segments into disjoint components such that, within each component, every pair of segments is mutually reachable. Two segments are considered reachable if there exists a path that moves along segments, allowing jumps of at most H and falls of at most H. In other words, we construct connected components under a equivalence reachability relation.
Implementation: https://github.com/jsoftware/jsource/blob/master/test/gdic.ijs#L691
Regarding the suggestions on the forum:
Since the dictionary is implemented as an addon, it does not overload the J primitives in NuVoc. This avoids introducing unnecessary complexity into the NuVoc definitions and preserves their clarity. At the same time, it provides greater flexibility when extending the dictionary with advanced features that are not naturally expressible as single J primitives, such as range queries in a tree dictionary.
The dictionaries addon is NOT an implementation of purely functional data structures (also known as persistent data structures https://en.wikipedia.org/wiki/Persistent_data_structure). In the problems described above, using a purely functional dictionary would provide no algorithmic benefit and would result in significantly worse performance due to the overhead of maintaining persistence. In a purely functional approach, the dictionary would be usually accumulated using a fold: the verb inside the fold would return a new dictionary at each step, constructed from the previous dictionary and the current element of the sequence. Persistent data structures ensure that old and new dictionaries share most of memory. In contrast, the current approach relies on a single mutable dictionary instance. The dictionary is updated in place, so there is no need to accumulate it explicitly through the fold. If no additional state is being accumulated, using a fold in this setting becomes equivalent to a for loop, with the updates applied directly to the same dictionary object.
Also, the current Dictionaries addon is designed for efficient in-place updates to modify a small part of a large state while requiring immediate access to the current state. Such a design is compatible with multithreaded execution, so multithreading support has been implemented for more complicated parallel algorithms.
Moreover, the addon is fully compatible with J’s rank concept. It follows the same principles as the dyadic i. verb, ensuring consistency with core language and preserving the array-oriented design of J.
Are there any examples of use cases for purely functional dictionaries in J that cannot be easily implemented using exiting simple static dictionary (m&i. special combination) or the current dictionary addon? Concrete examples of problems/use cases would be especially welcome and very helpful in further development.
Marcin
To unsubscribe from this group and stop receiving emails from it, send an email to forum+un...@jsoftware.com.