[erlang-questions] map elements in defined order

947 views
Skip to first unread message

Ulf Wiger

unread,
Oct 26, 2017, 11:30:35 AM10/26/17
to erlang-questions
Looking at the maps API, I see no function that can present the contents of a map in the defined order.

Because there IS a defined order, described in the Erlang Reference Manual:

«Maps are ordered by size, two maps with the same size are compared by keys in ascending term order and then by values in key order. In maps key order integers types are considered less than floats types.»

Note that the key ordering described above doesn't follow normal Erlang term ordering, so lists:sort(maps:to_list(Map)) will produce something will a well-defined order, but comparing the sorted lists of two maps is not guaranteed to have the same outcome as comparing the maps themselves.

I can solve this by using lists:sort/2, but this seems contrived.

To clarify, e.g. msgpack:pack(Map) will perform a maps:to_list(Map). This is not guaranteed to produce the same result across Erlang versions, since maps:to_list/1 says pairs "are returned in arbitrary order".

Yet there is no alternative API function that will return pairs in the defined sort order, although presumably there is an efficient internal implementation for producing it.

Wouldn't it be reasonable to have such a function?

BR,
Ulf W

Roger Lipscombe

unread,
Oct 26, 2017, 12:20:30 PM10/26/17
to Ulf Wiger, erlang-questions
On 26 October 2017 at 16:30, Ulf Wiger <u...@wiger.net> wrote:
> Wouldn't it be reasonable to have such a function?

First thought: No, because people would start abusing it. Maps *aren't* ordered.
_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

Sverker Eriksson

unread,
Oct 26, 2017, 1:10:57 PM10/26/17
to Ulf Wiger, erlang-questions

There is an efficient internal implementation to compare two terms in
map-key order (undocumented erts_internal:cmp_term/2).

But there is no efficient implementation to produce a map-key-value
ordered list. That would require N*log(N) sorting.

'==' for hashmaps (> 32 keys) does a one pass iteration in key hash
order over both trees (HAMTs) and keeps track of the minimal key or
value seen so far.

/Sverker

Ulf Wiger

unread,
Oct 26, 2017, 2:07:19 PM10/26/17
to Roger Lipscombe, erlang-questions
But they *are* ordered. Otherwise, comparison of two maps would be undefined. 

BR, 
Ulf W

Karl Nilsson

unread,
Oct 26, 2017, 2:29:02 PM10/26/17
to Roger Lipscombe, Ulf Wiger, erlang-questions
Doesn’t it just mean that for any two maps with the same keyvalues the internal order is the same for a given erlang version? That should be sufficient for comparison but unlikely be useful for anything else.

Jesper Louis Andersen

unread,
Oct 26, 2017, 2:32:03 PM10/26/17
to Ulf Wiger, erlang-questions
It is generally a bad idea to rely on order in anything which uses a hash function. The hash function is often subject to change---rather quickly I might add if it proves to be a security bug. Picking a family of hashes and seeding it randomly is usually a good trick.

Our "sister language" Go *randomizes* iteration order on its maps. This is to force programmers into not relying on the map order at all, even if it happens to be ordered right now. This opens up implementations in the future.

If you wanted order in a map, it would be *far* better if you could create a map based on RB-trees or the like. Those are naturally ordered by structure. OCaml, for instance, defines Hash Tables as well as Maps. The latter is the ordered variant.


Ulf Wiger

unread,
Oct 26, 2017, 3:13:20 PM10/26/17
to Jesper Louis Andersen, erlang-questions
But again, Jesper, just about everyone relies on the fact that maps follow the general principle that there is a well-defined term comparison order. Otherwise, maps would be highly unsuitable to use in keys, and generally treacherous to use as a replacement for records. Following the Principle of Least Surprise, it's a darned good thing that Erlang doesn't randomize the key order in its maps.

I doubt that anyone would abuse an extra function that produces the map element pairs in the internally defined sort order, given that the documentation would clearly state that it's more expensive than maps:to_list/1 (though likely faster than lists:sort(maps:to_list(M), fun custom_sort_fun/2) - not to mention less error-prone.)

But it's not a feature I'm willing to go to war over. If no one else sees a use for it, I'm willing to concede that it has low priority. :)

BR,
Ulf W

José Valim

unread,
Oct 26, 2017, 3:43:14 PM10/26/17
to Jesper Louis Andersen, erlang-questions
 
Our "sister language" Go *randomizes* iteration order on its maps. This is to force programmers into not relying on the map order at all, even if it happens to be ordered right now. This opens up implementations in the future.

Randomizing how elements are stored/hashed is also useful to avoid hash collision attacks.

Michael Truog

unread,
Oct 26, 2017, 3:43:52 PM10/26/17
to Ulf Wiger, erlang-questions
On 10/26/2017 12:13 PM, Ulf Wiger wrote:
> But again, Jesper, just about everyone relies on the fact that maps follow the general principle that there is a well-defined term comparison order. Otherwise, maps would be highly unsuitable to use in keys, and generally treacherous to use as a replacement for records. Following the Principle of Least Surprise, it's a darned good thing that Erlang doesn't randomize the key order in its maps.
>
> I doubt that anyone would abuse an extra function that produces the map element pairs in the internally defined sort order, given that the documentation would clearly state that it's more expensive than maps:to_list/1 (though likely faster than lists:sort(maps:to_list(M), fun custom_sort_fun/2) - not to mention less error-prone.)
>
> But it's not a feature I'm willing to go to war over. If no one else sees a use for it, I'm willing to concede that it has low priority. :)
>

At a low-level, a hash array mapped trie is unordered because it relies on a hash function, so using the Erlang map in an ordered way might be possible but would be inefficient. If you need an ordered mapping, I think my Erlang trie is a better option at https://github.com/okeuday/trie , though it requires Erlang string keys (binary keys are supported by the btrie module, but that is slow). That doesn't necessarily help msgpack source code though, unless it would be a separate output option that was added.

Best Regards,
Michael

Ulf Wiger

unread,
Oct 26, 2017, 4:33:36 PM10/26/17
to Michael Truog, erlang-questions
There are some things in Erlang that people have come to rely on, but for which there's a price. Having a globally defined sort order is such a thing. And maps must be implemented in such a way that the following is not only supported, but efficient:

comp(#{} = A, #{} = B) when A > B -> greater;
...

That is, maps must behave, efficiently, as an ordered mapping on request.

I've had reason to ponder the Erlang term comparison semantics while maintaining the sext library. The ordering semantics of maps threw a wrench into the works, with its non-standard ordering of keys (I should have left some room between the type tags for such unexpected developments.) The jury (i.e. QuickCheck) is still out on whether I'll be able to support it with the existing tag scheme.

Now, for sext, maps are at least better than refs, pids and ports, since there is no documentation on how they are sorted internally. As it turns out (for now), using term_to_binary() and some bit syntax seems to work in practice for those.

For the external term format for maps, actually, the documentation says:

«Key and value pairs (Ki => Vi) are encoded in section Pairs in the following order: K1, V1, K2, V2,..., Kn, Vn. »

... which does seem to imply that the ordering in the external term format is defined. I assume this is not the intended meaning, but rather that whichever key K1 is, the corresponding value V1 is the item that follows it, and so on.

Anyway, I've come across code that uses maps as a serialization format for a cryptographic hash. Based on what is known about maps, this would seem like a Bad Idea, since in that case, presumably, the result of the hash operation might change between implementations.

BR,
Ulf W

Benoit Chesneau

unread,
Oct 26, 2017, 4:49:37 PM10/26/17
to Jesper Louis Andersen, erlang-questions
I guess if at least `maps:to_list/1` would return the keys in order it would  be already a benefit. 

Another usage of such order is when you want to sign the object to compare  with others across the network. I have such usage when I’m using maps as a representation for JSON.

- benoît

Richard A. O'Keefe

unread,
Oct 26, 2017, 10:05:08 PM10/26/17
to erlang-q...@erlang.org
The current thread has left me somewhat confused.

Suppose E1 and E2 are different computations,
delivering maps M1 and M2 respectively, such
that M1 =:= M2 is true.

Are we guaranteed that term_to_binary(M1) and
term_to_binary(M2) are equal binaries?

zxq9

unread,
Oct 27, 2017, 12:02:09 AM10/27/17
to erlang-q...@erlang.org
On 2017年10月26日 木曜日 21:13:07 Ulf Wiger wrote:
> ... to use as a replacement for records.

But they are IN NO WAY a replacement for records.

-Craig

Ulf Wiger

unread,
Oct 27, 2017, 3:25:42 AM10/27/17
to zxq9, erlang-questions
Actually, they ARE, in some ways.

I should have been clearer. I was, of course, referring to the uses of records where they were never optimal in the first place, but were tolerated e.g. because of the great convenience of referencing elements by name in pattern-matching.

Quoted from EEP-0043:

«The idea was a data-type, a syntax-aware mapping of key-value associations with pattern matching. A syntax similar to records but without the hassle of compile-time dependency and with arbitrary terms as keys. Order was not important and it could be implemented with a Hash-Array-Mapped-Trie with good performance and memory trade-offs. This was a different approach than to replace records. It was meant to replace records where suitable and in other regards not be a replacement but its own _thing_.»

Further, relevant to this discussion:

«A restriction set on the implementation by the Erlang specification is that order is total, i.e. satisfies _antisymmetry, transitivity and totality_.
...
- Ordered maps impose restrictions on the underlying implementation and a hashing approach will be nearly impossible.
- The underlying structure does not need to be sorted, an order could be produced when needed,»

The thing I tried to highlight was that "an order could be produced when needed" is actually a bit more involved than you'd think, given that no API function does it for you. At least if you want to reflect the internal sort order - given that the actual implementation is NOT as described in the EEP: "If a key or value differs, the order of the respective terms, in Erlang term order".

You'd have to write your own sort function, in which you visit all elements of complex structures, finding all instances of floats and ensuring that they get the right priority.

Or... you simply do this:

lists:sort(fun(A, B) -> #{A => 0} =< #{B => 0} end, maps:to_list(Map))


As evidenced e.g. by Benoit's comment above, I believe lots of people already rely on the order of map elements being IN SOME WAY stable. This is likely a brittle assumption, but one that may - in time - trap the OTP team into having to preserve the current APPARENT order.

And the apparent order does appear to be sorted, as long as only maps of size =< 32 ("flatmaps") are inspected. For larger maps ("hashmaps"), the order of elements returned by maps:to_list/1 indeed appears arbitrary.

BR,
Ulf W

Attila Rajmund Nohl

unread,
Oct 27, 2017, 4:38:59 AM10/27/17
to erlang-q...@erlang.org
2017-10-27 9:25 GMT+02:00 Ulf Wiger <u...@wiger.net>:
[...]
> As evidenced e.g. by Benoit's comment above, I believe lots of people
> already rely on the order of map elements being IN SOME WAY stable. This is
> likely a brittle assumption, but one that may - in time - trap the OTP team
> into having to preserve the current APPARENT order.
>
> And the apparent order does appear to be sorted, as long as only maps of
> size =< 32 ("flatmaps") are inspected. For larger maps ("hashmaps"), the
> order of elements returned by maps:to_list/1 indeed appears arbitrary.

I already run into a bug where the code expected the map to be sorted.
When the 33rd element was added, it started to show pathological
behavior.

Richard Carlsson

unread,
Oct 27, 2017, 4:49:53 AM10/27/17
to Ulf Wiger, erlang-questions
My interpretation of the reference manual is that maps compare to each other like the tuples created by this transformation:

t(Map) ->
    {Ks,Vs} = lists:unzip(lists:keysort(1, [{{if is_integer(K) -> 0; is_float(K) -> 1; true -> 2 end, K}, V} || {K,V} <- maps:to_list(M)])),
    list_to_tuple(Ks ++ Vs).

For example:

    t(#{65 => "A", 2.71 => e, 1.0e308 => alot, pi => 3.14, [] => 0})

yields

    {{0,65}, {1,2.71}, {1,1.0e308}, {2,pi}, {2,[]},
        "A",    e,        alot,        3.14,   0}

Which:
1) should be independent of the underlying implementation and choice of hash function, and thus stable across future versions of Erlang;
2) is cumbersome to express using the normal term order over the basic data types, since float keys always sort higher than integers (not really following the rule of least surprise).





        /Richard

2017-10-27 9:25 GMT+02:00 Ulf Wiger <u...@wiger.net>:

zxq9

unread,
Oct 27, 2017, 6:57:09 AM10/27/17
to erlang-q...@erlang.org
On 2017年10月27日 金曜日 10:38:46 Attila Rajmund Nohl wrote:
> 2017-10-27 9:25 GMT+02:00 Ulf Wiger <u...@wiger.net>:
> [...]
> > As evidenced e.g. by Benoit's comment above, I believe lots of people
> > already rely on the order of map elements being IN SOME WAY stable. This is
> > likely a brittle assumption, but one that may - in time - trap the OTP team
> > into having to preserve the current APPARENT order.
> >
> > And the apparent order does appear to be sorted, as long as only maps of
> > size =< 32 ("flatmaps") are inspected. For larger maps ("hashmaps"), the
> > order of elements returned by maps:to_list/1 indeed appears arbitrary.
>
> I already run into a bug where the code expected the map to be sorted.
> When the 33rd element was added, it started to show pathological
> behavior.

...and this means there is a bug in the calling code, NOT in maps (which I'm pretty sure is the point Attila is pointing out).

I can't believe we are having this discussion. Again.

Having a discussion about internal ordering in the context of efficient matches and comparisons *in implementation*: totally logical

Relying on that implementation detail to leak out: ridiculous


If we really want ordered maps we should make ordered maps. And highlight in big bold letters all the baggage that brings with it. Alternately, we already have other data structures that work this way, perhaps they could be utilized. I think people just really really wish they could add wazoo syntax to various tree structures... ugh.

This discussion comes up a little more frequently than once a year and every time it reminds me of watching distributed systems engineers (try desperately to) explain CAP theorem tradeoffs to a VP of marketing.

-Craig

Ulf Wiger

unread,
Oct 27, 2017, 9:13:46 AM10/27/17
to Richard Carlsson, erlang-questions
Actually, that's roughly what I started out assuming too, but it's not sufficient. Consider:

1> lists:sort([#{[-1.0] => 0}, #{[1] => 0}]).
[#{[1] => 0},#{[-1.0] => 0}]

You must dig into each structure and find occurrences of floats to determine whether they affect the ordering.

But yes, the documented sort order should be stable across versions. This means, in practice, that it's safe to rely on the sorting properties of maps.

BR,
Ulf

Sverker Eriksson

unread,
Oct 27, 2017, 9:26:45 AM10/27/17
to erlang-questions

Yes. All Erlang terms, including maps, have a globally defined, implementation independent, total order.

The reason, for this "surprising" order of maps, is the alternative of using the normal arithmetic order for keys is much worse.

If we want 1 and 1.0 to be different keys (which we do as we use matching (=:=) to lookup keys)
then we need to order 1 and 1.0, and normal arithmetic term ordering does not (1 == 1.0).


It has been discussed (more and less serious) to expose this "map-key-order" with operators like :<, :>, =:<, >:=.


/Sverker

Ulf Wiger

unread,
Oct 27, 2017, 9:28:24 AM10/27/17
to zxq9, erlang-questions
2017-10-27 12:56 GMT+02:00 zxq9 <zx...@zxq9.com>:

I can't believe we are having this discussion. Again.

I will admit that I haven't been following the list that closely recently, but I wasn't aware that this discussion has been had before.

 
Having a discussion about internal ordering in the context of efficient matches and comparisons *in implementation*: totally logical

Relying on that implementation detail to leak out: ridiculous

Again, quoting from the EEP: 

«The underlying structure does not need to be sorted, an order could be produced when needed»

I think that among those who fully accepted the maps API returning elements in arbitrary order, most would have assumed that lists:sort(maps:to_list(M)) would do the trick (and, according to the EEP, it would), and were perfectly content with that.

What I brought up was the (admittedly subtle) point that even if you WANTED to settle for that, it actually doesn't produce the sort order that applies to maps internally, and there is no function, anywhere, that will give you that result - even one where the docs are riddled with warnings about its inefficiency.
 
This discussion comes up a little more frequently than once a year and every time it reminds me of watching distributed systems engineers (try desperately to) explain CAP theorem tradeoffs to a VP of marketing.

Am I the VP of marketing in this story?

BR,
Ulf 

Sverker Eriksson

unread,
Oct 27, 2017, 9:36:13 AM10/27/17
to Richard A. O'Keefe, erlang-q...@erlang.org


On 10/27/2017 04:04 AM, Richard A. O'Keefe wrote:
> The current thread has left me somewhat confused.
>
> Suppose E1 and E2 are different computations,
> delivering maps M1 and M2 respectively, such
> that M1 =:= M2 is true.
>
> Are we guaranteed that term_to_binary(M1) and
> term_to_binary(M2) are equal binaries?
>
>

No. term_to_binary does not give any such guarantees.

And in the case where M1 and M2 were created by different VM instances,
you can with current implementation get different binaries.


/Sverker

Ulf Wiger

unread,
Oct 27, 2017, 9:42:39 AM10/27/17
to Sverker Eriksson, erlang-questions
Sverker, I understand and agree with the reasons behind the implementation decision.

I can expand a little on discussions I've been involved in:

We have a scenario where we need to serialize objects partly for network comms, partly for storage and partly for signing and cryptographic hasing. We also prefer the serialization to work easily across programming languages.

The trouble is of course the signing/crypto-hashing. For this to be stable, the order must be fixed. Obviously, adding a function that makes Erlang maps appear ordered will not suffice, since other language environments would either not expect maps to be ordered at all, or are likely to have a different take on how keys are ordered*.

Better then to stick with the assumption that map key ordering is undefined.

Picking e.g. msgpack encoding as an example, a possible alternative would be to use arrays of 1-element maps:

[#{K1 => V1, #{K2 => V2}, ...]

This is easy enough to decode and handle in e.g. JavaScript, and should be stable enough to sign (using the msgpack-encoded representation as input to the signing/hash function). The encoding overhead compared to using a single map seems to be about one byte per key.

* For example, the Go implementation of msgpack has a function to make maps appear ordered, but with its own idea of what key types can be sorted.

BR,
Ulf W

zxq9

unread,
Oct 27, 2017, 9:48:04 AM10/27/17
to erlang-q...@erlang.org
On 2017年10月27日 金曜日 15:26:22 Sverker Eriksson wrote:
> Yes. All Erlang terms, including maps, have a globally defined,
> implementation independent, total order.
>
> The reason, for this "surprising" order of maps, is the alternative of
> using the normal arithmetic order for keys is much worse.

I am curious, though, why a compound data type was added to the language as a primitive data type. This was the only thing that really bothered me about maps.

Second-class data type, sure. Ordering unknown. Whatever. Same with all the other compound data types that we make up for ourselves.

Having it as a data primitive introduces inconsistency in the language itself, as we either have to have a tradeoff to amortize enforcement of an arbitrary ordering to make comparisons faster, or force an ordering at the time of comparison (and/or serialization, maybe) at the cost of some computation time in order for things to work. That adds one more odd point of weirdness to the language we *never* once worried about before but now need to remember in edge cases where performance matters. In other words, this creates a new edge case for performance, if I understand thing correctly.

Who before was ever seriously considering using dicts as keys to dicts?

"But lists are compound data types!" Sort of. The ordering of a list very often IS its meaning -- thus strings. Not so for maps by their very nature.

zxq9

unread,
Oct 27, 2017, 10:05:12 AM10/27/17
to erlang-q...@erlang.org
On 2017年10月27日 金曜日 15:42:30 Ulf Wiger wrote:
> We have a scenario where we need to serialize objects partly for network
> comms, partly for storage and partly for signing and cryptographic hasing.
> We also prefer the serialization to work easily across programming
> languages.
>
> The trouble is of course the signing/crypto-hashing. For this to be stable,
> the order must be fixed. Obviously, adding a function that makes Erlang
> maps appear ordered will not suffice, since other language environments
> would either not expect maps to be ordered at all, or are likely to have a
> different take on how keys are ordered*.

Interesting. I have been dealing with this exact case myself for quite
some time (before the advent of maps, actually). The tradeoff we had to
make was exactly as you describe: Universal ordered representation of
pretty much any K/V sort of data type (especially across languages) only
works as an ordered list, so the external, canonical representation of
the data must be defined as an ordered list (sorted some specific way).
Occasionally this also means internal lists must be themselves sorted.

That is one critical part of the definition of any serialization
procedure for any data that needs to be verifiable via signature.

Either that or build an ASN.1 DER representation for everything; which
isn't actually so bad, but would be way better if there were tutorials
on just the DDL part of ASN.1 for the cool kids to brush up on...

Having maps inside an Erlang program (and dicts in Python and blats in
Frozz and so on) is quite nice, but it can never be relied on for things
like canonical serialization. Imagine if every SQL query return had to
suddenly be ordered!

Those internal representations can only ever be immediate conveniences,
never canonical data representations. I think this is easy to forget
because most of the time we never even have to worry about what a
"canonical representation" even might be for 99% of the data most of us
ever deal with.

Richard Carlsson

unread,
Oct 27, 2017, 10:39:20 AM10/27/17
to Sverker Eriksson, erlang-questions
2017-10-27 15:26 GMT+02:00 Sverker Eriksson <sverker....@ericsson.com>:

Yes. All Erlang terms, including maps, have a globally defined, implementation independent, total order.

The reason, for this "surprising" order of maps, is the alternative of using the normal arithmetic order for keys is much worse.

If we want 1 and 1.0 to be different keys (which we do as we use matching (=:=) to lookup keys)
then we need to order 1 and 1.0, and normal arithmetic term ordering does not (1 == 1.0).

Yes... but I'm not sure that the term order when comparing one map to another needs to have anything to do with how lookup works within maps (much like it doesn't in an orddict, gb_trees, or similar). The global term order only needs to be fixed and preferably straightforward. I could, for example, implement maps using the sort of tuple I showed: {K1, ..., Kn, V1, ..., Vn}, and they would have one order given by '>', but using #{K1=>V1, ... Kn=>Vn}, they would be ordered differently if keys were floats. I think that key order as used internally by the maps could and should be kept separate from the global term order.
 
It has been discussed (more and less serious) to expose this "map-key-order" with operators like :<, :>, =:<, >:=.

And as I recall, that was suggested even before maps were a thing, since it would occasionally be useful also for things like lists of key/value tuples where the keys may be floats.

    /Richard

Sverker Eriksson

unread,
Oct 27, 2017, 11:45:27 AM10/27/17
to Richard Carlsson, erlang-questions



On 10/27/2017 04:39 PM, Richard Carlsson wrote:
2017-10-27 15:26 GMT+02:00 Sverker Eriksson <sverker....@ericsson.com>:

Yes. All Erlang terms, including maps, have a globally defined, implementation independent, total order.

The reason, for this "surprising" order of maps, is the alternative of using the normal arithmetic order for keys is much worse.

If we want 1 and 1.0 to be different keys (which we do as we use matching (=:=) to lookup keys)
then we need to order 1 and 1.0, and normal arithmetic term ordering does not (1 == 1.0).

Yes... but I'm not sure that the term order when comparing one map to another needs to have anything to do with how lookup works within maps (much like it doesn't in an orddict, gb_trees, or similar). The global term order only needs to be fixed and preferably straightforward. I could, for example, implement maps using the sort of tuple I showed: {K1, ..., Kn, V1, ..., Vn}, and they would have one order given by '>', but using #{K1=>V1, ... Kn=>Vn}, they would be ordered differently if keys were floats. I think that key order as used internally by the maps could and should be kept separate from the global term order.


orddict and gb_trees both use '==' to distinguish keys, which makes it possible to order them with '>'.

How would you order  #{1 => x, 1.0 => y} and #{1 => y, 1.0 => x}
if you can't order 1 and 1.0?



/Sverker

Richard Carlsson

unread,
Oct 27, 2017, 3:37:10 PM10/27/17
to Sverker Eriksson, erlang-questions
2017-10-27 17:45 GMT+02:00 Sverker Eriksson <sverker....@ericsson.com>

How would you order  #{1 => x, 1.0 => y} and #{1 => y, 1.0 => x}
if you can't order 1 and 1.0?

As long as we're talking about the arithmetic term order (<, >, ==), I don't see why they would need to be. Look at tuples:

  {1, 1.0} < {1.0, 1}.
  false
  {1, 1.0} > {1.0, 1}.
  false
  {1, 1.0} == {1.0, 1}. 
  true
  {1, 1.0} =:= {1.0, 1}.
  false

Maps ought to behave analogously, in the arithmetic ordering. The weirdness comes from enforcing strict ordering in the middle of the arithmetic one. The current ordering rule for maps should only be used in the strict ordering (the suggested :<), where it would also apply to tuples: {1, 1.0} :< {1.0, 1}.

 

Stanislaw Klekot

unread,
Oct 27, 2017, 4:26:06 PM10/27/17
to Richard Carlsson, erlang-questions
On Fri, Oct 27, 2017 at 09:36:58PM +0200, Richard Carlsson wrote:
> 2017-10-27 17:45 GMT+02:00 Sverker Eriksson <sverker....@ericsson.com>
> >
> >
> > How would you order #{1 => x, 1.0 => y} and #{1 => y, 1.0 => x}
> > if you can't order 1 and 1.0?
> >
>
> As long as we're talking about the arithmetic term order (<, >, ==), I
> don't see why they would need to be. Look at tuples:
>
> {1, 1.0} < {1.0, 1}.
> false
> {1, 1.0} > {1.0, 1}.
> false
> {1, 1.0} == {1.0, 1}.
> true
> {1, 1.0} =:= {1.0, 1}.
> false
>
> Maps ought to behave analogously, in the arithmetic ordering.

The problem is, the tuples you provide and comparison operators (<, >,
=<, >=, and == (not =:= one)) form a well-defined partial order (total
order, actually); mainly, if A =< B and B =< A, then A == B. Being
a partial order is a very important property of Erlang's type system,
one that was quite explicitly baked in the VM and is used in many
different places.

--
Stanislaw Klekot

Richard Carlsson

unread,
Oct 27, 2017, 5:10:34 PM10/27/17
to Stanislaw Klekot, erlang-questions
2017-10-27 22:25 GMT+02:00 Stanislaw Klekot <erlan...@jarowit.net>:
On Fri, Oct 27, 2017 at 09:36:58PM +0200, Richard Carlsson wrote:
> 2017-10-27 17:45 GMT+02:00 Sverker Eriksson <sverker....@ericsson.com>
> >
> > How would you order  #{1 => x, 1.0 => y} and #{1 => y, 1.0 => x}
> > if you can't order 1 and 1.0?
>
> As long as we're talking about the arithmetic term order (<, >, ==), I
> don't see why they would need to be. Look at tuples:
>
>   {1, 1.0} < {1.0, 1}.
>   false
>   {1, 1.0} > {1.0, 1}.
>   false
>   {1, 1.0} == {1.0, 1}.
>   true
>   {1, 1.0} =:= {1.0, 1}.
>   false
>
> Maps ought to behave analogously, in the arithmetic ordering.

The problem is, the tuples you provide and comparison operators (<, >,
=<, >=, and == (not =:= one)) form a well-defined partial order (total
order, actually); mainly, if A =< B and B =< A, then A == B. Being
a partial order is a very important property of Erlang's type system,
one that was quite explicitly baked in the VM and is used in many
different places.

No, the real problem, that I begin to see now, is that if arithmetic ordering is used, two keys could appear to be the same, and if their corresponding values differ, it is not clear in which order to compare them. I guess that's what Sverker was trying to convey, but I missed it at first.

So, using tuples {K1, ... Kn, V1, ... Vn} again for comparison, Sverker's example could become either:

  {1, 1.0, x, y} < {1, 1.0, y, x}
or
  {1.0, 1, y, x} == {1, 1.0, y, x}

(switching the order of the keys in the left tuple), both of which would be equally "legal" since 1 == 1.0. But this switches the order of x and y, so that the comparisons of the maps would give different results. One way to avoid this situation would be to say that you can't have two keys that compare equal with == in the same map, just as for an orddict.

Fred Hebert

unread,
Oct 27, 2017, 5:59:31 PM10/27/17
to Richard Carlsson, erlang-questions
On 10/27, Richard Carlsson wrote:
>(switching the order of the keys in the left tuple), both of which
>would be
>equally "legal" since 1 == 1.0. But this switches the order of x and y, so
>that the comparisons of the maps would give different results. One way to
>avoid this situation would be to say that you can't have two keys that
>compare equal with == in the same map, just as for an orddict.

This would, unfortunately, break pattern matching:

M = #{1 => 1},
case M of
#{1.0 := _} -> % it is safe to add 1.0 if we match
M#{1.0 := 2}; % does this crush a value and change the key?
_ ->
other
end.

The fourth line here is problematic. Either (1) maps have a special
magical pattern matching case where 1.0 and 1 compare equal (which
happens nowhere else) and decide whether to replace the key or keep it
the same, or (2) you keep current pattern matching semantics and you
can't use existing pattern matching to enforce the constraints above.

(1) is particularly nasty for cases such as:

X = 1.0,
case {#{1 => 1}, 1} of
{#{X := _}, X} -> true = X =:= X; % works
{#{X := _}, Y} -> false = X =/= Y % crashes
end

Which one should match? In the first clause, the map would match fine,
even though X =/= X! We just broke a lot of language here. To preserve
safe pattern matching, you should probably not be able to make 1 be
equal to 1.0 in a map through pattern insertions (M#{K := NewVal}).

Jesper Louis Andersen

unread,
Oct 28, 2017, 11:42:22 AM10/28/17
to Ulf Wiger, erlang-questions
On Thu, Oct 26, 2017 at 9:13 PM Ulf Wiger <u...@wiger.net> wrote:
But again, Jesper, just about everyone relies on the fact that maps follow the general principle that there is a well-defined term comparison order. Otherwise, maps would be highly unsuitable to use in keys, and generally treacherous to use as a replacement for records. Following the Principle of Least Surprise, it's a darned good thing that Erlang doesn't randomize the key order in its maps.


I largely regard a total term order as a language mistake. The right solution, obviously, is to have several equalities, where the programmer can define what they mean by equality in a certain part of the program. Equality is way too important in programming for you to be left with a single one!

That said, there is a well-defined order of maps currently, but it is not the logical one you might expect (which is by ordering the keys of the map). Rather the order is defined on

* What the key hashes to
* What the internal HAMT structure looks like right now

It is a total order even! So you can use maps as keys in a balanced search tree for instance.

However, your point does seem to touch on a couple of important things that should be considered for future inclusion:

* We may want to have an "ordered map" in the language. These are self-balancing binary trees. They are more costly in lookup time and they take up more memory space, but they have the "natural ordering" of keys which means they are well-defined in their traversal.

* Your "sext" library exists to plug yet another hole in the language, namely that binary_to_term have certain freedoms with certain data structures and this leads to situations where you cannot rely on the binary output for, e.g., cryptographic applications.

In short, one has to weigh different implementation details when building data structures. If you want to have it all, your efficiency eventually has to give.

Richard Carlsson

unread,
Oct 28, 2017, 4:46:22 PM10/28/17
to Fred Hebert, erlang-questions
2017-10-27 23:58 GMT+02:00 Fred Hebert <mono...@ferd.ca>:
On 10/27, Richard Carlsson wrote:
One way to
avoid this situation would be to say that you can't have two keys that
compare equal with == in the same map, just as for an orddict.

This would, unfortunately, break pattern matching:

M = #{1 => 1},
case M of
   #{1.0 := _} -> % it is safe to add 1.0 if we match
       M#{1.0 := 2}; % does this crush a value and change the key?
   _ ->
       other
end.

The fourth line here is problematic. Either (1) maps have a special magical pattern matching case where 1.0 and 1 compare equal (which happens nowhere else) and decide whether to replace the key or keep it the same, or (2) you keep current pattern matching semantics and you can't use existing pattern matching to enforce the constraints above.

Yes, you can't change it now that it's in existing code; it would have had to be done when maps were new. But, having slept on the issue, I now think that the current state of things is fine. The keys must be seen as existing on a different level than the values they map to, more part of the structure of the thing, much like the arity of a tuple. As in your example, they are used in matching, where exact equality is the expected behaviour. It is still bothersome for people like Ulf who want to easily generate a list of the key-value pairs in the order that '<' would consider them - but that is a fairly unusual use case and can be fixed by making separate ordering functions accessible (either as a new :< operator or as a regular BIF).

One thing can be noted: there is no strict need as far as I can see for the key order to be related to any other particular order like < or :< (apart from convenience), as long as there is always a canonical key order independent of underlying implementation. One could hypothetically use lexicographical order on the stringified keys, and it would work just as well, since the ordering is really only used for deciding which maps are at all comparable. It does decide the ordering of maps of the same size and with different keys (in which case the values are never examined), but if that by necessity must be different from the arithmetic order anyway, it doesn't matter much which order it is. The main thing is that it's cheap to compute, and the :< order is as cheap as any.

But until :< is actually available as an operator or function, poor Ulf will need to emulate it using 0/1/2-tagging like I showed (recursively, as he pointed out). Or write a NIF for performance.

Richard Carlsson

unread,
Oct 28, 2017, 5:09:24 PM10/28/17
to Jesper Louis Andersen, erlang-questions
2017-10-28 17:41 GMT+02:00 Jesper Louis Andersen <jesper.lou...@gmail.com>:
On Thu, Oct 26, 2017 at 9:13 PM Ulf Wiger <u...@wiger.net> wrote:
But again, Jesper, just about everyone relies on the fact that maps follow the general principle that there is a well-defined term comparison order. Otherwise, maps would be highly unsuitable to use in keys, and generally treacherous to use as a replacement for records. Following the Principle of Least Surprise, it's a darned good thing that Erlang doesn't randomize the key order in its maps.

I largely regard a total term order as a language mistake. The right solution, obviously, is to have several equalities, where the programmer can define what they mean by equality in a certain part of the program. Equality is way too important in programming for you to be left with a single one!

That may well be right, but doesn't change the fact that Erlang is pretty much permeated with the idea of the total term order, and in many cases it does make things very straightforward. So it would be a very bad thing if some language change would break this property.
 
That said, there is a well-defined order of maps currently, but it is not the logical one you might expect (which is by ordering the keys of the map). Rather the order is defined on

* What the key hashes to
* What the internal HAMT structure looks like right now

As we quoted from the reference manual, the < ordering on maps is actually implementation independent and future proof (while still being a total order). The sacrifice is that to compare two maps with <, their keys must be mapped into canonical order (with integers before floats as discussed). This is clearly more costly than just taking whatever order the current underlying hash+HAMT produces, but worth it since it preserves those nice properties.

On the other hand, the result from maps:to_list(M) is returned in the current internal order, which is efficient but subject to change between versions (albeit also well defined and total as you said), and there is also no current comparison function available that you could give to lists:sort(CompFunc, List) which would produce the key order used by < for comparing maps - which is what Ulf needs for sext.

* Your "sext" library exists to plug yet another hole in the language, namely that binary_to_term have certain freedoms with certain data structures and this leads to situations where you cannot rely on the binary output for, e.g., cryptographic applications.

Not only that it wants to provide a stable mapping from terms to their serialized forms, but also ensure that these binary representations sort in the same order as the corresponding terms - a trickier problem.


Richard A. O'Keefe

unread,
Oct 29, 2017, 8:09:37 PM10/29/17
to erlang-q...@erlang.org


On 28/10/17 2:26 AM, Sverker Eriksson wrote:
> Yes. All Erlang terms, including maps, have a globally defined,
> implementation independent, total order.
>
> The reason, for this "surprising" order of maps, is the alternative of
> using the normal arithmetic order for keys is much worse.
>
> If we want 1 and 1.0 to be different keys (which we do as we use
> matching (=:=) to lookup keys)
> then we need to order 1 and 1.0, and normal arithmetic term ordering
> does not (1 == 1.0).

The fundamental mistake here is confusing *arithmetic* ordering
with *term* ordering. In *term* ordering, if X and Y are
behaviourally distinguishable then either X < Y or X > Y should
be true. Since integers and floats are behaviourally
distinguishable,
1> X = 1 bsl 53.
9007199254740992
2> Y = float(X).
9007199254740992.0
3> X == Y.
true
4> X+1 == Y+1.
false
it follows that either X < Y or Y > X should be true in
*term* (but not *arithmetic*) order.

Arithmetic order in Erlang has some seriously weird issues, like
you can find numbers X Y such that X - Y is 0.0 but X == Y is false.

>
>
> It has been discussed (more and less serious) to expose this
> "map-key-order" with operators like :<, :>, =:<, >:=.

Surely we deserve *some* sane ordering in Erlang?

Prolog had two sets of comparison operators: term ordering and
arithmetic ordering. There were *reasons* for that.

Richard A. O'Keefe

unread,
Oct 29, 2017, 8:16:57 PM10/29/17
to Sverker Eriksson, erlang-q...@erlang.org


On 28/10/17 2:34 AM, Sverker Eriksson wrote:
[term_to_binary/1 is not a pure function; it
depends on the representation of its argument,
not just its value].

OUCH.

For term_to_binary/2, of course the result
depends on the Options argument, but I take
it now that even being explicit about the
Options is not enough. Can we have a
'canonical' option?

>
> And in the case where M1 and M2 were created by different VM instances,
> you can with current implementation get different binaries.

I can live with different *versions* of the VM using different
versions of the binary term format, but two instances of the *same*
VM turning mathematically identical terms into different binaries
is, well, did Nyarlathotep, the Crawling Chaos, have a hand in the
design? In all seriousness, *OUCH*. Please mention this in LARGE
red letters in the documentation for term_to_binary; I don't see it
at the moment, but it puts limits on what you can reasonably do
with terms-as-binaries.

Sverker Eriksson

unread,
Oct 30, 2017, 5:07:18 PM10/30/17
to Richard A. O'Keefe, erlang-q...@erlang.org
A clear statement of lack of guarantee makes sense.
A 'canonical' option could make sense.
That would require N*log(N) sorting to get map keys in canonical order
(deja vu).

About VM instance and version; I don't see the big divide there.
Would it not be treacherous to base an application design
on term_to_binary being a pure function among VM instances
as long as you don't upgrade them.


/Sverker

Jesper Louis Andersen

unread,
Oct 31, 2017, 6:52:08 AM10/31/17
to Richard Carlsson, erlang-questions
On Sat, Oct 28, 2017 at 11:09 PM Richard Carlsson <carlsson...@gmail.com> wrote:
As we quoted from the reference manual, the < ordering on maps is actually implementation independent and future proof (while still being a total order). The sacrifice is that to compare two maps with <, their keys must be mapped into canonical order (with integers before floats as discussed). This is clearly more costly than just taking whatever order the current underlying hash+HAMT produces, but worth it since it preserves those nice properties.


Oh, then I was wrong!

Thanks for the correction.

Benoit Chesneau

unread,
Nov 5, 2017, 11:44:50 AM11/5/17
to Richard A. O'Keefe, erlang-q...@erlang.org


> On 30 Oct 2017, at 01:16, Richard A. O'Keefe <o...@cs.otago.ac.nz> wrote:
>
> For term_to_binary/2, of course the result
> depends on the Options argument, but I take
> it now that even being explicit about the
> Options is not enough. Can we have a
> 'canonical' option?
>

+1 . That would be an awesome option :)

- benoit

Benoit Chesneau

unread,
Nov 23, 2017, 3:17:46 AM11/23/17
to Erlang Questions
to continue the discussion i just commited a simple library that does this:


Hopefully such thing will be optimised soon :)

- benoit
Reply all
Reply to author
Forward
0 new messages