Hey everyone! :)
I was recently looking at the source for
get-in to check if it would exit early as soon as a key was not present in the nested maps.
The source for quick reference:
(defn get-in
"Returns the value in a nested associative structure,
where ks is a sequence of keys. Returns nil if the key
is not present, or the not-found value if supplied."
{:added "1.2"
:static true}
([m ks]
(reduce1 get m ks))
([m ks not-found]
(loop [sentinel (Object.)
m m
ks (seq ks)]
(if ks
(let [m (get m (first ks) sentinel)]
(if (identical? sentinel m)
not-found
(recur sentinel m (next ks))))
m))))
I noticed that the 2-arity variant does not exit early, but the 3-arity variant does. Also looking at the implementation of
reduce1 vs the implementation of the 3-arity variant of
get-in got me thinking if it wouldn't be slightly more efficient to just implement the 2-arity variant by calling the 3-arity variant, eg:
([m ks]
(get-in m ks nil))
I've run a few micro benchmarks comparing speeds of the 2-arity and 3-arity variant using
criterium and the 3-arity variant seems to be more efficient even if it can't exit early.
Below are a few of the criterium benchmarks I did on my laptop:
(crit/bench (get-in {} [:a :b]))
Evaluation count : 1874799960 in 60 samples of 31246666 calls.
Execution time mean : 32.013948 ns
Execution time std-deviation : 1.172814 ns
Execution time lower quantile : 29.839525 ns ( 2.5%)
Execution time upper quantile : 34.283877 ns (97.5%)
Overhead used : 2.189987 ns
Found 17 outliers in 60 samples (28.3333 %)
low-severe 13 (21.6667 %)
low-mild 4 (6.6667 %)
Variance from outliers : 23.7873 % Variance is moderately inflated by outliers
(crit/bench (get-in {} [:a :b] nil))
Evaluation count : 4071018060 in 60 samples of 67850301 calls.
Execution time mean : 15.124086 ns
Execution time std-deviation : 1.492542 ns
Execution time lower quantile : 12.488672 ns ( 2.5%)
Execution time upper quantile : 17.908223 ns (97.5%)
Overhead used : 2.189987 ns
Found 1 outliers in 60 samples (1.6667 %)
low-severe 1 (1.6667 %)
Variance from outliers : 68.6898 % Variance is severely inflated by outliers
(crit/bench (get-in {:a {:b {:c 1}}} [:a :b :c]))
Evaluation count : 1034704440 in 60 samples of 17245074 calls.
Execution time mean : 58.600293 ns
Execution time std-deviation : 2.939752 ns
Execution time lower quantile : 55.782160 ns ( 2.5%)
Execution time upper quantile : 64.772887 ns (97.5%)
Overhead used : 2.189987 ns
Found 2 outliers in 60 samples (3.3333 %)
low-severe 2 (3.3333 %)
Variance from outliers : 36.8128 % Variance is moderately inflated by outliers
(crit/bench (get-in {:a {:b {:c 1}}} [:a :b :c] nil))
Evaluation count : 1359247260 in 60 samples of 22654121 calls.
Execution time mean : 42.744780 ns
Execution time std-deviation : 3.356838 ns
Execution time lower quantile : 39.571655 ns ( 2.5%)
Execution time upper quantile : 48.841751 ns (97.5%)
Overhead used : 2.189987 ns
Found 1 outliers in 60 samples (1.6667 %)
low-severe 1 (1.6667 %)
Variance from outliers : 58.4799 % Variance is severely inflated by outliers
While I can imagine that changing the implementation of the 2-arity variant would barely bring noticeable improvements to projects, I also can't see much of a reason not to change it. It doesn't make the code less readable or maintainable. It just makes performance for both arities consistently good.
What would be pros and cons of such a change that I don't see at the moment?