[Proposal] Strict form of `Map.new/2`

59 views
Skip to first unread message

Myron Marston

unread,
Oct 3, 2017, 11:23:07 PM10/3/17
to elixir-lang-core

Map.new/2 is quite useful and I use it frequently. Whenever I use it, I expect that my transform function returns unique keys, so that Enum.count(input) == map_size(output) holds true. Unfortunately, this hasn’t always held true for the data I’m working with, and it results in an output map that’s essentially “lost” some of the input data. Here’s an example:

iex(2)> input = [%{name: "Bob", age: 37}, %{name: "Jill", age: 24}, %{name: "Bob", age: 0}]
[%{age: 37, name: "Bob"}, %{age: 24, name: "Jill"}, %{age: 0, name: "Bob"}]
iex(3)> Map.new(input, &{&1.name, &1.age})
%{"Bob" => 0, "Jill" => 24}

Here the %{name: "Bob", age: 37} data has been lost because it was overwritten by %{name: "Bob", age: 0} since maps obviously can’t have duplicates. In this situation, it would be really nice if Map.new/2 did not silently ignore the fact that I gave it duplicate keys. I’d like a strict from of Map.new/2 that raises an error when your transform function returns duplicate keys, to notify you that you aren’t using it properly.

If we add this, I’m not sure if it can just be a change to Map.new/2 (would that be a breaking change for some users?) or if we’d need to add an alternate function like Map.new!/2 or Map.strict_new/2 or something.

Thoughts?

Myron


José Valim

unread,
Oct 4, 2017, 4:57:34 AM10/4/17
to elixir-l...@googlegroups.com
We definitely shouldn't change Map.new as that would be backwards incompatible. I am quite happy with the Map API as is, it covers the majority of cases, so that should probably be written on your own application, especially because I am not aware of any optimization we could do that would be specific for Elixir.
--


José Valim
Founder and 
Director of R&D

Pedro Medeiros

unread,
Oct 4, 2017, 4:58:17 AM10/4/17
to elixir-l...@googlegroups.com
Overall I like the idea of having a separate function that throws an error in case there are duplicated keys. Just don't think it's the place to be done on Map.new/2 It works that way on Erlang maps:from_list and also with for

for n <- input, into: %{}, do: {n.name, n.age}


--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-core+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/5e15315d-b6c3-4fa7-b31f-a2d7c5576941%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Pedro Henrique de Souza Medeiros
----------------------------------
Cel: +55 (21) 9914-86898
Email: pedr...@gmail.com

Beautiful is better than ugly,
Explicit is better than implicit,
Simple is better than complex,
Complex is better than complicated.

The Zen of Python, by Tim Peters

OvermindDL1

unread,
Oct 4, 2017, 10:13:44 AM10/4/17
to elixir-lang-core
Why not just fold over your list and `Map.merge/3` over them with a function that handles the duplicates (and throws or whatever you wish)?


On Wednesday, October 4, 2017 at 2:58:17 AM UTC-6, Pedro Medeiros wrote:
Overall I like the idea of having a separate function that throws an error in case there are duplicated keys. Just don't think it's the place to be done on Map.new/2 It works that way on Erlang maps:from_list and also with for

for n <- input, into: %{}, do: {n.name, n.age}

2017-10-03 23:23 GMT-04:00 Myron Marston <myron....@gmail.com>:

Map.new/2 is quite useful and I use it frequently. Whenever I use it, I expect that my transform function returns unique keys, so that Enum.count(input) == map_size(output) holds true. Unfortunately, this hasn’t always held true for the data I’m working with, and it results in an output map that’s essentially “lost” some of the input data. Here’s an example:

iex(2)> input = [%{name: "Bob", age: 37}, %{name: "Jill", age: 24}, %{name: "Bob", age: 0}]
[%{age: 37, name: "Bob"}, %{age: 24, name: "Jill"}, %{age: 0, name: "Bob"}]
iex(3)> Map.new(input, &{&1.name, &1.age})
%{"Bob" => 0, "Jill" => 24}

Here the %{name: "Bob", age: 37} data has been lost because it was overwritten by %{name: "Bob", age: 0} since maps obviously can’t have duplicates. In this situation, it would be really nice if Map.new/2 did not silently ignore the fact that I gave it duplicate keys. I’d like a strict from of Map.new/2 that raises an error when your transform function returns duplicate keys, to notify you that you aren’t using it properly.

If we add this, I’m not sure if it can just be a change to Map.new/2 (would that be a breaking change for some users?) or if we’d need to add an alternate function like Map.new!/2 or Map.strict_new/2 or something.

Thoughts?

Myron


--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.

Myron Marston

unread,
Oct 4, 2017, 11:25:24 AM10/4/17
to elixir-lang-core

We definitely shouldn’t change Map.new as that would be backwards incompatible.

I had this concern as well, although, I’m curious: can anyone come up with a valid use case for using Map.new/2 with duplicate keys and wanting it to discard the earlier duplicate? I can’t think of a use case for that, but it’s probably just lack of imagination on my part.

Why not just fold over your list and Map.merge/3 over them with a function that handles the duplicates (and throws or whatever you wish)?

If I realized up front that my input data violated my expectations and would result in duplicate keys, then sure, I’d do something like this (or I’d pre-filter out the excess entires). The problem that’s bitten me on multiple occasions is that according to my understanding of my input data, I expect Map.new/2 to work just fine, and for there not to be any duplicate keys. As an example, consider the common use case of working with legacy SQL DBs, where you’ve got a table with a column that is conceptually unique, but that lacks a unique index on to enforce it. I’ve wound up with bugs in my application as a result. Obviously, this isn’t Elixir’s fault, but I do think a warning or an error in this situation would help surface the problem much sooner.

José, would you be open to the addition of a Map.new!/2 function that raises an error when there are duplicate keys? Although, maybe that implies that Map.new/2 returns {:ok, map} (which is obviously doesn’t). Maybe there’s a better name we can come up with for this function if there’s agreement to add it.

I am quite happy with the Map API as is, it covers the majority of cases, so that should probably be written on your own application

This issue has bit me in multiple code bases, and it’s really not ideal to have to re-implement it each time. Given that under normal use of Map.new, you’d expect your input to have unique keys, doesn’t it seem worthwhile to have a version of Map.new/2 when this is violated, rather than silently discarding data?

I am not aware of any optimization we could do that would be specific for Elixir.

What kind of optimization are you referring to? I’d assume that adding logic for this would make the function a bit slower than the existing Map.new/2 but I was hoping we could at least detect this situation (but not necessarily detect what the duplicate keys are) in constant time by using map_size/1 on the resulting map, and comparing it to the input size.

Thanks,
Myron




--
You received this message because you are subscribed to a topic in the Google Groups "elixir-lang-core" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elixir-lang-core/EessBQaaBsE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elixir-lang-core+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/dd49f8a6-43f3-4ce9-bc75-e89dba65b887%40googlegroups.com.

José Valim

unread,
Oct 4, 2017, 11:45:41 AM10/4/17
to elixir-l...@googlegroups.com

José, would you be open to the addition of a Map.new!/2 function that raises an error when there are duplicate keys? Although, maybe that implies that Map.new/2 returns {:ok, map} (which is obviously doesn’t).

I am not convinced this is an extremely common use case to justify its addition to the standard library. Especially one that can be implemented in few lines of code.

 I was hoping we could at least detect this situation (but not necessarily detect what the duplicate keys are) in constant time by using map_size/1 on the resulting map, and comparing it to the input size.

The input size is an enumerable and it is not guaranteed we can retrieve its size in constant time. The most common input is a list and it would require at least traversing the list one more time.


Myron Marston

unread,
Oct 4, 2017, 12:48:11 PM10/4/17
to elixir-lang-core

The input size is an enumerable and it is not guaranteed we can retrieve its size in constant time. The most common input is a list and it would require at least traversing the list one more time.

Right, for Map.new/1, you’d have to traverse it one more time. But for Map.new/2, isn’t the enumerable reduced into a list before generating the map? I was thinking that the reduction could count the list at the same time, requiring no extra traversals.

Regardless, thanks for explaining your reasoning!

Myron


--
You received this message because you are subscribed to a topic in the Google Groups "elixir-lang-core" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elixir-lang-core/EessBQaaBsE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elixir-lang-core+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages