[Proposal] Why I believe we need a Stream.reduce/3 function

93 views
Skip to first unread message

Isaac Whitfield

unread,
Oct 25, 2016, 11:33:47 PM10/25/16
to elixir-lang-core
First off, I'm sorry if this has been discussed before - I've searched several places and haven't come across anything so that's why I'm here :) I'm sure there's a reason why this does not exist, but on the off chance it was simply never considered here's the formal request!

I believe there is currently a gap inside the Stream module with regards to reductions. As it stands it appears the only option available to someone who does not wish to buffer an entire File (for example) whilst running a Stream is Stream.run/1. As the documentation explains, this lets you run a Stream which may have side effects and ignores the value at the end. I believe there should be something similar which returns state at the end, namely Stream.reduce/3.

In the past, I have argued against this myself and said that there's no "good" use case but I've hit one I feel totally justifies the addition (assuming there's nothing I'm missing). Below is an example use case - it's a dumb example, but the concept is sound.

Consider you're streaming a file containing line separated numbers. Your task is to count the occurrences of even numbers, but the file is 50GB in size and you only have 4GB of RAM on your machine (you get the idea). So you can't just buffer the whole thing, because your app would crash. Instead you have to resort to a Stream, but you only ever need to keep track of a single variable; the number of even numbers.

If we look at the Stream library, it might appear that Stream.scan/3 is perfect for this as it passes the result of the last function through to the next execution - fantastic.

So we devise our call:

require Integer

"my_file.txt"
|> File.stream!
|> Stream.scan(0, fn(line, count) ->
     num
= convert_to_number(line)

     
if Integer.is_even(num) do
       count + 1
     else
       count
     
end
   
end)

This looks great because it means the last return value of our Stream is going to be the number of even numbers. But we then hit a wall, because Stream.run/1 doesn't return anything except :ok - and the only other way to get the last return value of the Stream is to do Enum.to_list/1 and walk the entire list until the end and then pull it out, which defeats the point of what we're trying to do.

It's because of this that I feel we need a Stream.reduce/3 which has the only intent of being able to run a Stream whilst returning a value - meaning that the last value returned by the callback would be the return value of the Stream.reduce/3 call.

Then you could do:

require Integer

"my_file.txt"
|> File.stream!
|> Stream.reduce(0, fn(line, count) ->
     num
= convert_to_number(line)

     
if Integer.is_even(num) do
       {:cont, count + 1}
     else
       
{:cont, count}
     
end
   
end)
|> IO.inspect # prints a number of even numbers, without ever buffering the list

This has the powerful ability to return a single value after processing a Stream without ever holding the entire thing in memory, which is awesome (imagine all of the neat use cases for analytics apps).

I don't know much about how this would be implemented, but it appears to be a small modification of scan to simply keep the last returned value rather than add it to a list. That's partly why I feel it should be added - it's likely that Stream.scan/3 could re-use Stream.reduce/3, meaning there's hopefully only a small amount of code bloat.

Is there a reason this does not/could not exist? The closest thing I've found to implementing this appears to be with Stream.transform/4 but I have been unsuccessful and even if I got it to work it feels hacky.

Thank you in advance for your replies!

Peter Hamilton

unread,
Oct 26, 2016, 1:06:48 AM10/26/16
to elixir-lang-core
Doesn't Enum.reduce handle this perfectly fine?

The Enum module doesn't require the entire stream to be evaluated before it starts performing the operation. It requires the entire stream before it emits any values. Since you only have one value you want to emit, and you don't want to emit it until you are done, Enum.reduce does exactly what you want.

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/4b6a48d2-4855-4877-b00c-d72a26748aac%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Isaac Whitfield

unread,
Oct 26, 2016, 1:44:25 AM10/26/16
to elixir-lang-core
Well after testing and letting my mind recover from being blown so hard, it appears you're correct. I never realised that your first call to Enum actually operates like that.

Thanks for the advice! I tested this with a 4GB file and it appears to be okay. I think I fell into the trap of Enum == List and got caught on thinking that my input had to be evaluated before the reduce call would happen. Did some src digging and I'm kinda with the flow of things now - but I do think that this might be a little under-documented and perhaps need some specific examples.

Thanks again :)
Reply all
Reply to author
Forward
0 new messages