There is an efficient internal implementation to compare two terms in
map-key order (undocumented erts_internal:cmp_term/2).
But there is no efficient implementation to produce a map-key-value
ordered list. That would require N*log(N) sorting.
'==' for hashmaps (> 32 keys) does a one pass iteration in key hash
order over both trees (HAMTs) and keeps track of the minimal key or
value seen so far.
/Sverker
Our "sister language" Go *randomizes* iteration order on its maps. This is to force programmers into not relying on the map order at all, even if it happens to be ordered right now. This opens up implementations in the future.
But they are IN NO WAY a replacement for records.
-Craig
...and this means there is a bug in the calling code, NOT in maps (which I'm pretty sure is the point Attila is pointing out).
I can't believe we are having this discussion. Again.
Having a discussion about internal ordering in the context of efficient matches and comparisons *in implementation*: totally logical
Relying on that implementation detail to leak out: ridiculous
If we really want ordered maps we should make ordered maps. And highlight in big bold letters all the baggage that brings with it. Alternately, we already have other data structures that work this way, perhaps they could be utilized. I think people just really really wish they could add wazoo syntax to various tree structures... ugh.
This discussion comes up a little more frequently than once a year and every time it reminds me of watching distributed systems engineers (try desperately to) explain CAP theorem tradeoffs to a VP of marketing.
-Craig
Yes. All Erlang terms, including maps, have a globally defined,
implementation independent, total order.
I can't believe we are having this discussion. Again.
Having a discussion about internal ordering in the context of efficient matches and comparisons *in implementation*: totally logical
Relying on that implementation detail to leak out: ridiculous
This discussion comes up a little more frequently than once a year and every time it reminds me of watching distributed systems engineers (try desperately to) explain CAP theorem tradeoffs to a VP of marketing.
I am curious, though, why a compound data type was added to the language as a primitive data type. This was the only thing that really bothered me about maps.
Second-class data type, sure. Ordering unknown. Whatever. Same with all the other compound data types that we make up for ourselves.
Having it as a data primitive introduces inconsistency in the language itself, as we either have to have a tradeoff to amortize enforcement of an arbitrary ordering to make comparisons faster, or force an ordering at the time of comparison (and/or serialization, maybe) at the cost of some computation time in order for things to work. That adds one more odd point of weirdness to the language we *never* once worried about before but now need to remember in edge cases where performance matters. In other words, this creates a new edge case for performance, if I understand thing correctly.
Who before was ever seriously considering using dicts as keys to dicts?
"But lists are compound data types!" Sort of. The ordering of a list very often IS its meaning -- thus strings. Not so for maps by their very nature.
Interesting. I have been dealing with this exact case myself for quite
some time (before the advent of maps, actually). The tradeoff we had to
make was exactly as you describe: Universal ordered representation of
pretty much any K/V sort of data type (especially across languages) only
works as an ordered list, so the external, canonical representation of
the data must be defined as an ordered list (sorted some specific way).
Occasionally this also means internal lists must be themselves sorted.
That is one critical part of the definition of any serialization
procedure for any data that needs to be verifiable via signature.
Either that or build an ASN.1 DER representation for everything; which
isn't actually so bad, but would be way better if there were tutorials
on just the DDL part of ASN.1 for the cool kids to brush up on...
Having maps inside an Erlang program (and dicts in Python and blats in
Frozz and so on) is quite nice, but it can never be relied on for things
like canonical serialization. Imagine if every SQL query return had to
suddenly be ordered!
Those internal representations can only ever be immediate conveniences,
never canonical data representations. I think this is easy to forget
because most of the time we never even have to worry about what a
"canonical representation" even might be for 99% of the data most of us
ever deal with.
Yes. All Erlang terms, including maps, have a globally defined, implementation independent, total order.
The reason, for this "surprising" order of maps, is the alternative of using the normal arithmetic order for keys is much worse.
If we want 1 and 1.0 to be different keys (which we do as we use matching (=:=) to lookup keys)
then we need to order 1 and 1.0, and normal arithmetic term ordering does not (1 == 1.0).
It has been discussed (more and less serious) to expose this "map-key-order" with operators like :<, :>, =:<, >:=.
2017-10-27 15:26 GMT+02:00 Sverker Eriksson <sverker....@ericsson.com>:
Yes. All Erlang terms, including maps, have a globally defined, implementation independent, total order.
The reason, for this "surprising" order of maps, is the alternative of using the normal arithmetic order for keys is much worse.
If we want 1 and 1.0 to be different keys (which we do as we use matching (=:=) to lookup keys)
then we need to order 1 and 1.0, and normal arithmetic term ordering does not (1 == 1.0).
Yes... but I'm not sure that the term order when comparing one map to another needs to have anything to do with how lookup works within maps (much like it doesn't in an orddict, gb_trees, or similar). The global term order only needs to be fixed and preferably straightforward. I could, for example, implement maps using the sort of tuple I showed: {K1, ..., Kn, V1, ..., Vn}, and they would have one order given by '>', but using #{K1=>V1, ... Kn=>Vn}, they would be ordered differently if keys were floats. I think that key order as used internally by the maps could and should be kept separate from the global term order.
How would you order #{1 => x, 1.0 => y} and #{1 => y, 1.0 => x}
if you can't order 1 and 1.0?
On Fri, Oct 27, 2017 at 09:36:58PM +0200, Richard Carlsson wrote:
> 2017-10-27 17:45 GMT+02:00 Sverker Eriksson <sverker....@ericsson.com>
> >
> > How would you order #{1 => x, 1.0 => y} and #{1 => y, 1.0 => x}
> > if you can't order 1 and 1.0?
>
> As long as we're talking about the arithmetic term order (<, >, ==), I
> don't see why they would need to be. Look at tuples:
>
> {1, 1.0} < {1.0, 1}.
> false
> {1, 1.0} > {1.0, 1}.
> false
> {1, 1.0} == {1.0, 1}.
> true
> {1, 1.0} =:= {1.0, 1}.
> false
>
> Maps ought to behave analogously, in the arithmetic ordering.
The problem is, the tuples you provide and comparison operators (<, >,
=<, >=, and == (not =:= one)) form a well-defined partial order (total
order, actually); mainly, if A =< B and B =< A, then A == B. Being
a partial order is a very important property of Erlang's type system,
one that was quite explicitly baked in the VM and is used in many
different places.
But again, Jesper, just about everyone relies on the fact that maps follow the general principle that there is a well-defined term comparison order. Otherwise, maps would be highly unsuitable to use in keys, and generally treacherous to use as a replacement for records. Following the Principle of Least Surprise, it's a darned good thing that Erlang doesn't randomize the key order in its maps.
On 10/27, Richard Carlsson wrote:
One way to
avoid this situation would be to say that you can't have two keys that
compare equal with == in the same map, just as for an orddict.
This would, unfortunately, break pattern matching:
M = #{1 => 1},
case M of
#{1.0 := _} -> % it is safe to add 1.0 if we match
M#{1.0 := 2}; % does this crush a value and change the key?
_ ->
other
end.
The fourth line here is problematic. Either (1) maps have a special magical pattern matching case where 1.0 and 1 compare equal (which happens nowhere else) and decide whether to replace the key or keep it the same, or (2) you keep current pattern matching semantics and you can't use existing pattern matching to enforce the constraints above.
On Thu, Oct 26, 2017 at 9:13 PM Ulf Wiger <u...@wiger.net> wrote:But again, Jesper, just about everyone relies on the fact that maps follow the general principle that there is a well-defined term comparison order. Otherwise, maps would be highly unsuitable to use in keys, and generally treacherous to use as a replacement for records. Following the Principle of Least Surprise, it's a darned good thing that Erlang doesn't randomize the key order in its maps.I largely regard a total term order as a language mistake. The right solution, obviously, is to have several equalities, where the programmer can define what they mean by equality in a certain part of the program. Equality is way too important in programming for you to be left with a single one!
That said, there is a well-defined order of maps currently, but it is not the logical one you might expect (which is by ordering the keys of the map). Rather the order is defined on* What the key hashes to* What the internal HAMT structure looks like right now
* Your "sext" library exists to plug yet another hole in the language, namely that binary_to_term have certain freedoms with certain data structures and this leads to situations where you cannot rely on the binary output for, e.g., cryptographic applications.
About VM instance and version; I don't see the big divide there.
Would it not be treacherous to base an application design
on term_to_binary being a pure function among VM instances
as long as you don't upgrade them.
/Sverker
As we quoted from the reference manual, the < ordering on maps is actually implementation independent and future proof (while still being a total order). The sacrifice is that to compare two maps with <, their keys must be mapped into canonical order (with integers before floats as discussed). This is clearly more costly than just taking whatever order the current underlying hash+HAMT produces, but worth it since it preserves those nice properties.