[Proposal] Enum.preserve_reduce/3

58 views
Skip to first unread message

Douglas Vought

unread,
Aug 6, 2024, 4:28:03 AM8/6/24
to elixir-lang-core
An accompanying livebook is attached if you'd prefer to read this proposal interactively.
Introduction

Enum.split_while/2 (“split_while/2“) is one of the few (only?) functions in the Enum suite that does not throw away the rest of the Enumerable.t() when accumulation halts.

split_while/2 is useful for processing part of a list, but passing the rest somewhere else. For example, if I have a list of integers and I only want to collect integers within a certain range from the beginning, I can use split_while/2 like this:

ints = [0, 4, 3, 18, 20, -7, -4, 6]

{first, rest} = Enum.split_while(ints, &(&1 >= 0 && &1 <= 10))

=> {[0, 4, 3], [18, 20, -7, -4, 6]}

And process them individually:

Enum.sum(first)

=> 7

Enum.product(rest)

=> 60480


Problem Description

However, there is no way to use split_while/2 with state. It will only run the predicate function it receives on the current element. Take the following pseudocode example:

split_while(ints, 0, fn
  _int, acc when acc > 10 -> {:halt, acc}
  int, acc -> {:cont, acc + int}
end)

Here, it is advantageous to use an accumulation of previous results to know if we should halt. The remaining list should be returned for further processing.

Without state in split_while/2, we have to do something rather inelegant:

result =
  Enum.reduce_while(ints, %{ints: [], sum: 0, encountered: 0}, fn
    _int, %{sum: sum} = acc when sum > 10 ->
      {:halt, %{acc | ints: Enum.reverse(acc.ints)}}

    int, acc ->
      {:cont,
       %{
         acc
         | ints: [int | acc.ints],
           sum: acc.sum + int,
           encountered: acc.encountered + 1
       }}
  end)

=> %{encountered: 4, ints: [0, 4, 3, 18], sum: 25}

{ints, rest} = Enum.split(ints, result.encountered)

=> {[0, 4, 3, 18], [20, -7, -4, 6]}

{:ok, %{ints: ints, sum: result.sum}, rest}

=> {:ok, %{ints: [0, 4, 3, 18], sum: 25}, [20, -7, -4, 6]}


Or write boilerplate code every time this sort of operation is needed. A “stateful split while” should be a part of the standard library.

I propose a version of split_while/2 that takes the Enumerable.t(), collecting an accumulator until the discriminator function returns {:halt, any}. (Or something like it, see below.) Here’s the first draft:

EnumExt

Here is an example of how that might be achieved by modifying Enum.take_while/2:

defmodule EnumExt do
  @spec split_while_reduce(
          Enumerable.t(),
          any,
          (Enum.element(), any ->
             {:cont, any} | {:halt, any})
        ) ::
          {{[Enum.element()], any}, [Enum.element()]}
  def split_while_reduce(enumerable, initial_state, fun) when is_list(enumerable) do
    split_while_reduce_list(enumerable, fun, {[], initial_state})
  end

  def split_while_reduce(enumerable, initial_state, fun) do
    {{list1, state}, list2} =
      Enum.reduce(enumerable, {{[], initial_state}, []}, fn
        entry, {{acc1, state}, []} ->
          case fun.(entry, state) do
            {:cont, new_state} -> {{[entry | acc1], new_state}, []}
            {:halt, new_state} -> {{acc1, new_state}, [entry]}
          end

        entry, {{acc1, state}, acc2} ->
          {{acc1, state}, [entry | acc2]}
      end)

    {{:lists.reverse(list1), state}, :lists.reverse(list2)}
  end

  defp split_while_reduce_list([head | tail], fun, {acc, state}) do
    case fun.(head, state) do
      {:cont, new_state} -> split_while_reduce_list(tail, fun, {[head | acc], new_state})
      {:halt, new_state} -> {{:lists.reverse(acc), new_state}, [head | tail]}
    end
  end

  defp split_while_reduce_list([], _fun, {acc, state}) do
    {{:lists.reverse(acc), state}, []}
  end
end


=> {:module, EnumExt, <<70, 79, 82, 49, 0, 0, 12, ...>>, {:split_while_reduce_list, 3}}

Let’s try it out. We have a list of binaries. We want to halt accumulation as soon as our data reaches a size or length of 100:

items =
  [
    <<226, 188, 146, 226, 164, 184, 226, 135, 138, 226, 135, 162, 226, 183, 169, 226, 168, 166,
      226, 178, 165>>,
    <<226, 161, 181, 226, 134, 162, 226, 168, 189, 226, 133, 128, 226, 149, 157>>,
    <<226, 140, 176, 226, 156, 180, 226, 151, 182, 226, 190, 150, 226, 138, 156, 226, 155, 186>>,
    <<226, 173, 134, 226, 185, 134, 226, 179, 176, 226, 156, 163, 226, 139, 128, 226, 189, 185>>,
    <<226, 156, 188, 226, 171, 187, 226, 163, 182, 226, 133, 145>>,
    <<226, 174, 185, 226, 186, 148, 226, 160, 163, 226, 133, 143, 226, 161, 147>>,
    <<226, 137, 182, 226, 152, 133, 226, 147, 167, 226, 142, 171, 226, 173, 153>>,
    <<226, 159, 143, 226, 173, 172, 226, 153, 183, 226, 145, 172>>,
    <<226, 176, 157, 226, 157, 183, 226, 136, 160>>,
    <<226, 181, 140, 226, 152, 172, 226, 162, 154, 226, 153, 172, 226, 137, 176, 226, 169, 176>>
  ]


=> ["⼒⤸⇊⇢ⷩ⨦ⲥ", "⡵↢⨽⅀╝", "⌰✴◶⾖⊜⛺", "⭆⹆⳰✣⋀⽹",
 "✼⫻⣶⅑", "⮹⺔⠣⅏⡓", "≶★ⓧ⎫⭙", "⟏⭬♷⑬", "Ⱍ❷∠",
 "ⵌ☬⢚♬≰⩰"]


import EnumExt, only: [split_while_reduce: 3]

split_while_reduce(items, %{size: 0, length: 0, string: ""}, fn item, state ->
  size = byte_size(item)
  length = String.length(item)

  new_size = state.size + size
  new_length = state.length + length

  if new_size < 100 and new_length < 100 do
    {:cont, %{state | size: new_size, length: new_length, string: state.string <> item}}
  else
    {:halt, state}
  end
end)

=> {{["⼒⤸⇊⇢ⷩ⨦ⲥ", "⡵↢⨽⅀╝", "⌰✴◶⾖⊜⛺", "⭆⹆⳰✣⋀⽹",
   "✼⫻⣶⅑", "⮹⺔⠣⅏⡓"],
  %{
    length: 31,
    size: 99,
    string: "⼒⤸⇊⇢ⷩ⨦ⲥ⡵↢⨽⅀╝⌰✴◶⾖⊜⛺⭆⹆⳰✣⋀⽹✼⫻⣶⅑⮹⺔⠣⅏⡓"
  }}, ["≶★ⓧ⎫⭙", "⟏⭬♷⑬", "Ⱍ❷∠", "ⵌ☬⢚♬≰⩰"]}


It’s still a little clunky, no? I mean, the part of the list that is being accumulated could be collected into a list in the state as well. I'm thinking of something similar to how Enum.map_reduce/3 returns a list and an accumulator.

In this case, however, the list returned would be the untouched values. The simplified version:

EnumExt2

defmodule EnumExt2 do
  @spec preserve_reduce(
          Enumerable.t(),
          any,
          (Enum.element(), any ->
             {:cont, any} | {:halt, any})
        ) ::
          {any, [Enum.element()]}
  def preserve_reduce(enumerable, initial_state, fun) when is_list(enumerable) do
    preserve_reduce_list(enumerable, fun, initial_state)
  end

  def preserve_reduce(enumerable, initial_state, fun) do
    {state, list} =
      Enum.reduce(enumerable, {initial_state, []}, fn
        entry, {state, []} ->
          case fun.(entry, state) do
            {:cont, new_state} -> {new_state, []}
            {:halt, new_state} -> {new_state, [entry]}
          end

        entry, {state, acc} ->
          {state, [entry | acc]}
      end)

    {state, :lists.reverse(list)}
  end

  defp preserve_reduce_list([head | tail], fun, state) do
    case fun.(head, state) do
      {:cont, new_state} -> preserve_reduce_list(tail, fun, new_state)
      {:halt, new_state} -> {new_state, [head | tail]}
    end
  end

  defp preserve_reduce_list([], _fun, state) do
    {state, []}
  end
end

=> {:module, EnumExt2, <<70, 79, 82, 49, 0, 0, 11, ...>>, {:preserve_reduce_list, 3}}

import EnumExt2, only: [preserve_reduce: 3]

initial_state = %{size: 0, length: 0, string: "", collected: []}

preserve_reduce(items, initial_state, fn item, state ->
  size = byte_size(item)
  length = String.length(item)

  new_size = state.size + size
  new_length = state.length + length

  if new_size < 100 and new_length < 100 do
    {:cont,
     %{
       state
       | size: new_size,
         length: new_length,
         string: state.string <> item,
         collected: [item | state.collected]
     }}
  else
    {:halt, %{state | collected: Enum.reverse(state.collected)}}
  end
end)

=> {%{
   collected: ["⼒⤸⇊⇢ⷩ⨦ⲥ", "⡵↢⨽⅀╝", "⌰✴◶⾖⊜⛺",
    "⭆⹆⳰✣⋀⽹", "✼⫻⣶⅑", "⮹⺔⠣⅏⡓"],
   length: 31,
   size: 99,
   string: "⼒⤸⇊⇢ⷩ⨦ⲥ⡵↢⨽⅀╝⌰✴◶⾖⊜⛺⭆⹆⳰✣⋀⽹✼⫻⣶⅑⮹⺔⠣⅏⡓"
 }, ["≶★ⓧ⎫⭙", "⟏⭬♷⑬", "Ⱍ❷∠", "ⵌ☬⢚♬≰⩰"]}


I have no idea why the fourth element in the collected list is displaying like that. The source HTML has the correct characters.

While the developer now has to reverse any list elements they want to keep, it gives us a cleaner return. In the case that the developer does not want to accumulate the processed list, they just don’t accumulate it.

Conclusion

The standard library would benefit from such a pattern. I’m unsure of a good name (split_while_reduce and preserve_reduce kind of seem out of place compared to the other names in the library) for such a function. I was considering partial_reduce, but that collides with the concept of partial functions.

I’d love to hear your feedback or your ideas about how this can already be done in the standard library!

Best regards,

Douglas
2024-08-06_proposal-enum-preserve-reduce-3.output.livemd
2024-08-06_proposal-enum-preserve-reduce-3.nooutput.livemd
Reply all
Reply to author
Forward
0 new messages