Though AFAIK "hashable", "unordered-containers", ... are not currently
"core libraries", I hope this may be a proper forum to reach out to for
thoughts and advice.
The hashtable implementations available in the Haskell ecosystem err on
the side of "purity", and (via the "hashable" package), employ an
essentially predictable mapping from keys to hashes, which means that
the resulting hashtables are not a good fit for handling bulk data from
untrusted sources.
While I have a tentative work-around[1], I wonder whether it might not
be more productive to bite the bullet and recognise that hashtables not
only don't have a well-defined specified element order,
https://hackage-content.haskell.org/package/unordered-containers-0.2.21/docs/Data-HashMap-Strict.html#v:toList
toList :: HashMap k v -> [(k, v)]
... The order of its elements is unspecified, and it may change from
version to version of either this package or of hashable.
but, also perhaps SHOULD NOT have a stable order across process
lifetimes, with built-in random salting being the default. (Per-table
rather than process-lifetime salt makes some functions (intersection,
union, difference, ...) that operate on pairs of tables less efficient,
so per-process seems a more pragmatic choice).
It would, e.g., be enough for there to be a new function in "Hashable"
package:
hashWithSalt' :: Hashable a => Int -> a -> Int
which internally adds a process-lifetime random value to the
caller-provided salt argument, before tail-calling `hashWithSalt`. Hash
table implementations would then use `hashWithSalt'` with whatever
table-specific constant (if not just 0) makes sense.
The resulting hash maps (or hash sets) would then be suitable for
untrusted content, and users would not have to jump through hoops
to handle untrusted data efficiently (without being forced to use
more expensive ordered containers).
--
Viktor. 🇺🇦 Слава Україні!
[1] Newtype-wrappers for keys with phantom reified salt making it
possible to define suitable "Hashable" instances that use a
table-specifc salt without changing the runtime representation of
the keys, so coercible, ... with corresponding API adapters.
$ cabal repl -v0 dnsbase
λ> import Data.Salted.Map as SM
λ> import qualified Data.HashMap.Strict as HM
λ> import Data.Coerce (coerce)
λ>
λ> -- explicit salt at table creation time
λ>
λ> m = fromList $ SKVL (SaltValue 0) [("abc", "def"),("foo","bar")]
λ> n = fromList $ SKVL (toEnum 1) [("abc", "def"),("foo","bar")]
λ>
λ> -- extensional equality
λ>
λ> m == n
True
λ>
λ> -- natural keys
λ>
λ> SM.lookup "abc" m
Just "def"
λ> n' = insert "xyzzy" "elbereth" n
λ> SM.lookup "xyzzy" n'
Just "elbereth"
λ> SM.lookup "foo" n
λ>
λ> -- salt-dependent ordering
λ>
λ> m
fromList (SKVL (SaltValue 0) [("abc","def"),("foo","bar")])
λ> n
fromList (SKVL (SaltValue 1) [("foo","bar"),("abc","def")])
λ>
λ> -- coercion (salt-dependent) to/from unsalted keys
λ>
λ> case toList n' of { SKVL _ kvl -> kvl }
[("foo","bar"),("abc","def"),("xyzzy","elbereth")]
λ>
λ> kvl = [("foo", "bar"), ("abc", "def")]
λ> m' = withSalt (toEnum 0) \ p -> SaltedMap p $ HM.fromList $ coerce kvl
λ> m == m'
True
I do realise that a user can of course explicitly replace every `k` with
`(s, k)` for some process-lifetime or even table-lifetime constant `s`,
but this pervades table storage, lookup, updates, while it should be
possible to use the natural keys with not mondifications.