Dictionary lookup using only the hash value

364 views
Skip to first unread message

Ritchie Lee

unread,
Jan 13, 2016, 3:59:57 PM1/13/16
to julia-users
As I understand it, Julia dicts use Base.hash when storing custom types as keys.  Is there a way to look up based on the hash alone?

e.g.,

type MyType
a::Int
end

Base.hash(x::MyType) = hash(x.a, hash(MyType))

x = MyType(1)
D = Dict{MyType, Bool}()
D[x] = true

Now, is there a way to lookup using h only? 

h = hash(x)
D[h]

Is this be safe to do?

Thanks!

Mauro

unread,
Jan 13, 2016, 4:57:27 PM1/13/16
to julia...@googlegroups.com
No, this is not possible. Dicts use the function Base.hashindex to
convert a hash into an index into the internal storage arrays (one for
keys, one for values, one a bool stating whether used or not, see [1]). However,
collisions are likely. For instance, 1 and 18 map to the same index for
a length 16 internal storage array:

julia> Base.hashindex(1,16)
16

julia> Base.hashindex(18,16)
16

Thus you need to check for equality too: if the content of the key-array
at that location is not equal to the key, the next location is checked
until the key is found (or an empty slot is encountered and an error is
thrown).

[1] https://github.com/JuliaLang/julia/blob/5cbb72ca46791c6491a098399642c00daf97cec0/base/dict.jl#L390

vav...@uwaterloo.ca

unread,
Jan 14, 2016, 10:44:01 PM1/14/16
to julia-users
Could I ask what would be an application of this capability (if it were possible)?  -- Steve Vavasis

Jeffrey Sarnoff

unread,
Jan 18, 2016, 1:22:09 PM1/18/16
to julia-users
Steve, 
afaik, depending upon the hash implementation, the only advantage might be moderately faster lookup but the cost in generality often would outweigh that.

vav...@uwaterloo.ca

unread,
Jan 18, 2016, 11:23:37 PM1/18/16
to julia-users
Jeffrey,

Your interpretation is that the original poster wanted to read or write dictionary entries that had already been found via previous hashing and searching without again hashing the key (presumably in order to attain higher performance).  That was also my guess.  I am interested in this issue because Julia currently has a mechanism to refer to a specific object inside a container: the Ref{T} type.  My understanding is that this type was mainly created in order to build safer Julia/C interfaces, but one could imagine using it more generally.  Unfortunately (last time I checked) there is a performance problem with Ref{T}: these objects are heap-allocated and so shouldn't be created in inner loops.  In addition, as far as I know, the implementation has not yet been extended to elements of Dict{K,V}.  The sorted containers in DataStructures.jl have a token/semitoken type associated with them for the purpose of high-performance access and iteration, but the approach taken in DataStructures.jl does not generalize to other Julia containers such as the standard Dict{K,V}.

The C++ standard containers all have 'iterators' for this purpose.  Maybe some thought could be given to extending the Ref{T} framework to across-the-board high-performance references to objects inside containers in some future version of Julia.

-- Steve
Reply all
Reply to author
Forward
0 new messages