Proposal: Enum.merge to extend uniq like functionality

59 views
Skip to first unread message

Bartosz Kalinowski

unread,
Jun 22, 2016, 3:15:27 PM6/22/16
to elixir-lang-core

Hey guys, this is my first proposal so please be kind :) I don't know if WE need this functionality, but I certainly needed this so I already developed it. Obviously I'm not that familiar with Elixir yet so it might happen that there is something very simmilar somewhere in the code, but I looked and couldn't find it.


My idea is to "cleanup" lists which have duplicate values, but in a way that will merge the duplicate values (not just ignore them like uniq/1 does) by using given function (one of the params). Please let me know what you think.


Examples:


      iex> Enum.merge([1, 2, 3, 3, 2, 1], fn (x, y) -> x + y end)

      [2, 4, 6]


      # This one is simmilar to `Enum.uniq/1`

      iex> Enum.merge([1, 2, 3, 3, 2, 1], fn (x, _y) -> x end)

      [1, 2, 3]


      iex> Enum.merge_by([{1, :x}, {2, :y}, {1, :z}], fn (x, _) -> x end, fn {x, _} -> x end)

      [{1, :x}, {2, :y}]


      iex> Enum.merge_by([a: {:tea, 2}, b: {:tea, 2}, c: {:coffee, 1}], fn (x, _) -> x end, fn {_, y} -> y end)

      [a: {:tea, 2}, c: {:coffee, 1}]


      iex> Enum.merge_by([%{k: "a", v: 1}, %{k: "b", v: 2}, %{k: "a", v: 3}, %{k: "b", v: 4}], fn(t1, t2) -> %{t1 | v: t1.v + t2.v} end, fn s -> s.k end)

      [%{k: "a", v: 4}, %{k: "b", v: 6}]


Because I'm new here I already created PR on github which obviously got closed immediately... :) But there is an upside to this, I already implemented this functionality so you can see the examples in more "human-readable" form on github (https://github.com/elixir-lang/elixir/pull/4854/files) to better understand what I wanted to achieve :) 


Please let me know if you think this would be useful and maybe if there is no simmilar functionality already!

Onorio Catenacci

unread,
Jun 22, 2016, 4:04:58 PM6/22/16
to elixir-lang-core
I'm not sure what the use case is you were trying to address but this seems very similar to a run length encoded use case.  If so, this isn't that hard to roll with existing code. Here:


So, yeah, we don't _need_ this functionality.  That said, bear in mind, that whenever new things are added to core libraries they have to be supported from that point forward.  So we tend to be conservative about what gets added to core libraries (and justifiably so).

--
Onorio
Message has been deleted

Bartosz Kalinowski

unread,
Jun 22, 2016, 4:37:01 PM6/22/16
to elixir-lang-core
I can perfectly understand being conservative about new features and I don't mind if this doesn't get accepted because of that. Although your example is a bit different. To be honest, what I tried to achieve felt like a natural extension of Enum.uniq and when I discovered it's not possible to do what I needed I was a bit dissapointed. Your example of this RunLengthEncoder is not really the same case. The solution to the quiz depends on the fact that the letters need to be one after the other and Enum.merge which I suggest takes into account every element of the list.

Bartosz Kalinowski

unread,
Jun 22, 2016, 4:46:19 PM6/22/16
to elixir-lang-core
Ah and if you know exact use case I needed this for then it was exactly what this function is for.
To be exact: I have a list of bitcoin transactions. Sometimes it happens that someone sends a transaction with multiple output addresses which are the same. At first I assumed that I can divide bitcoin transactions into parts by using unique address and TX, but when I learned that there might be a scenario like this I figured that the easiest would be to merge those actually (like add the values of each output to the same address and make them as one transaction in my system). That's why I needed to merge all the elements of the list that match some function call and are merged in a certain way.

Peter Hamilton

unread,
Jun 22, 2016, 4:50:45 PM6/22/16
to elixir-lang-core
From a functionality standpoint, group_by will produce a map of lists of duplicates and then you can map over it and process each list of duplicates.

The question then is whether there is a performance optimization available that cannot be accomplished via composition.

On Wed, Jun 22, 2016 at 1:37 PM Bartosz Kalinowski <kelos...@gmail.com> wrote:
I can perfectly understand being conservative about new features and I don't mind if this doesn't get accepted because of that. Although your example is a bit different. To be honest, what I tried to achieve felt like a natural extension of Enum.uniq and when I discovered it's not possible to do what I needed I was a bit dissapointed. Your example of this RunLengthEncoder is not really the same case. The solution to the quiz depends on the fact that the letters need to be one after the other and Enum.merge which I suggest takes into account every element of the list.

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/ede59498-9bd5-4e94-afd3-6333da85fa2a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bartosz Kalinowski

unread,
Jun 22, 2016, 4:56:27 PM6/22/16
to elixir-lang-core
Oh thank you, I felt like this should be possible somehow using just the core. As you mentioned yourself there is still a permormance case 

Peter Hamilton

unread,
Jun 22, 2016, 6:01:08 PM6/22/16
to elixir-lang-core
I feel like the generic form here is Grouped Map Reduce. Something like:

Enum.group_map_reduce(list, group_key_fun, map_fun, initial_acc, reducer)

It could be used to do a single pass group_by/count for example:

Enum.group_map_reduce([1,1,2,3,5,5,8], &(&1), &(&1), 0, fn _, acc -> acc + 1 end)

That would produce:

%{1 => 2, 2 => 1, 3 => 1, 5 => 2, 8 => 1}




On Wed, Jun 22, 2016 at 1:56 PM Bartosz Kalinowski <kelos...@gmail.com> wrote:
Oh thank you, I felt like this should be possible somehow using just the core. As you mentioned yourself there is still a permormance case 

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.

Onorio Catenacci

unread,
Jun 23, 2016, 7:57:59 AM6/23/16
to elixir-lang-core
Well I was sort of guessing about your use case @Bartosz.  But I can see that Peter has given you some code to work with anyway. 

Just to be clear about what I'm saying people can propose anything they want to propose--good ideas come from lots of ideas.  But I think a lot of us feel as if any changes made to core libraries need to be of a fundamental and pretty universally applicable nature.  The fact that we can write code to answer some need doesn't mean it shouldn't be in the core libraries--but if we add something to the core libs it should be something with broad application and something of a pretty foundational quality. Of course my opinion is only important to me anyway but I don't want anyone to misunderstand my thinking. 

--
Onorio


On Wed, Jun 22, 2016 at 4:34 PM, Bartosz Kalinowski <kelos...@gmail.com> wrote:
I can perfectly understand being conservative about new features and I don't mind if this doesn't get accepted because of that. Although your example is a bit different. To be honest, what I tried to achieve felt like a natural extension of Enum.uniq and when I discovered it's not possible to do what I needed I was a bit dissapointed. Your example of this RunLengthEncoder is not really the same case. The solution to the quiz depends on the fact that the letters need to be one after the other and Enum.merge which I suggest takes into account every element of the list.

--
You received this message because you are subscribed to a topic in the Google Groups "elixir-lang-core" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elixir-lang-core/NVT7XEMmldo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elixir-lang-co...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/84b33056-1ed1-4291-be72-5900e053fccb%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Martin Svalin

unread,
Jun 23, 2016, 8:26:13 AM6/23/16
to elixir-l...@googlegroups.com
tors 23 juni 2016 kl 00:01 skrev Peter Hamilton <petergh...@gmail.com>:
Enum.group_map_reduce(list, group_key_fun, map_fun, initial_acc, reducer)

Enum.group_map_reduce([1,1,2,3,5,5,8], &(&1), &(&1), 0, fn _, acc -> acc + 1 end)

That would produce:

%{1 => 2, 2 => 1, 3 => 1, 5 => 2, 8 => 1}

The whole point here is to do some transformation on all unique values, right? So we can group unique values and transform them?

I feel like this is a straightforward `list |> Enum.group_by(& &1) |> Enum.map(&transformation/1)`. Possibly piped to Map.new.
For the case quoted above: `[1,1,2,3,5,5,8] |> Enum.group_by(& &1) |> Enum.map(fn {k, v} -> {k, Enum.count v} end) |> Map.new`.

I feel like this is well-composed of what we already have in Enum. There are a bunch these "pre-composed" functions in Enum, like `reverse_slice`, `map_join`, `filter_map` etc. It's unclear to me when and how the need for these pre-composed functions arise. Are they all performance optimisations to have common compositions done in one pass? Is this particular case common enough to be included as `group_map_reduce`?

Anyway, I would advice against the name `merge` or `merge_by`, as it carries the connotation of combining two enumerables, the way `Map.merge` does.

- Martin

Peter Hamilton

unread,
Jun 23, 2016, 11:05:01 AM6/23/16
to elixir-l...@googlegroups.com

The problem with what we currently have is that it requires two passes. There's no one-pass solution available via composition.

I spent quite a bit of time trying to get Stream.group_by working (back in my streamz days) so we can compose things like this in a single pass. It was rough, because it results in a stream of streams which makes it very complicated. It requires one process per key, which leads to a lot complexity around supervision/linking/monitoring and is sort of in it's own league compared to the rest of Stream.

Working with RethinkDB I've seen an alternative approach. Rather than have group_by produce a map of keys to lists, it produces a somewhat monadic result. Applying map to a grouped stream doesn't iterate over keys, instead it applies the function to each group of values. It continues to do so until you call ungroup, at which point the result is an ordinary map of keys to lists.

I'll try to find some time to take a stab at that approach, just to see whether it is at all possible.


--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages