--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Plug as a stream of connections could get very interesting if it were combined with group_by. Suddenly you've got a streaming router. Also, middleware becomes a Stream.map.
There's definitely a lot of cool things that can be done there.
It is better to model each http request as a process. That's how we can explore the concurrency as exposed by the VM. It could be streams at the acceptor level, but you would like to spawn a new process as soon as possible, so it wouldn't really matter.
It is better to model each http request as a process. That's how we can explore the concurrency as exposed by the VM. It could be streams at the acceptor level, but you would like to spawn a new process as soon as possible, so it wouldn't really matter.I've been thinking about how you'd model that, and got to wondering—given that you have no control over what the request process does in terms of spawning other processes, would you be safer creating a supervisor and a process per request, limiting the damage if something does go wrong?
--
Open this up in an issue in github. Right now they are all linked so if one dies the whole stream dies.
It is better to model each http request as a process. That's how we can explore the concurrency as exposed by the VM. It could be streams at the acceptor level, but you would like to spawn a new process as soon as possible, so it wouldn't really matter.I've been thinking about how you'd model that, and got to wondering—given that you have no control over what the request process does in terms of spawning other processes, would you be safer creating a supervisor and a process per request, limiting the damage if something does go wrong?
Also take a look at the Architecture section of ranch (on which cowboy builds, on which Plug builds) http://ninenines.eu/docs/en/ranch/HEAD/guide/internals/
On 21/07/14 18:47, Dave Thomas wrote:
It is better to model each http request as a process. That's how we
can explore the concurrency as exposed by the VM. It could be
streams at the acceptor level, but you would like to spawn a new
process as soon as possible, so it wouldn't really matter.
I've been thinking about how you'd model that, and got to
wondering—given that you have no control over what the request process
does in terms of spawning other processes, would you be safer creating a
supervisor and a process per request, limiting the damage if something
does go wrong?
--
You received this message because you are subscribed to the Google
Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send
<mailto:elixir-lang-core+unsub...@googlegroups.com>.an email to elixir-lang-core+unsubscribe@googlegroups.com
--
Kind regards,
Martin Schurrer
--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-core+unsubscribe@googlegroups.com.
users
|> fetch_profiles
|> fetch_investments
def fetch_profiles(users) do
for user <- users do
fetch_profile(user) # external request
end
end
def fetch_investments(profiles) do
for profile <- profiles do
fetch_investment(profile) # external request
end
end
users
|> fetch_profiles
|> fetch_investments
def fetch_profiles(users) do
parallel for user <- users do
fetch_profile(user) # external request
end
end
def fetch_investments(profiles) do
parallel for profile <- profiles do
fetch_investment(profile) # external request
end
end
Parallel.flow users,
[&fetch_profiles/2, &fetch_investments/2]
def fetch_profiles(input, output) do
parallel for user <- input, into: output do
fetch_profile(user) # external request
end
end
def fetch_investments(input, output) do
parallel for profile <- input, into: output do
fetch_investment(profile) # external request
end
end
Stream.scan(input, fn ... end) |> Enum.into(output)
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.
investments = users|> Stream.parallel_map(&fetch_profile/1)|> Enum.parallel_map(&fetch_investments/1)
investments = users|> Stream.map(&(fn -> fetch_profile(&1) end))|> Streamz.Task.stream|> Stream.map(&(fn -> fetch_investments(&1) end))|> Streamz.Task.stream|> Enum.to_list
old_profilers = get_profiles_from_cache
new_profiles = new_users|> Stream.map(&(fn -> fetch_profile(&1) end))|> Streamz.Task.streaminvestments = Stream.merge(old_profiles, new_profiles)|> Stream.map(&(fn -> fetch_investments(&1) end))|> Streamz.Task.stream|> Enum.to_list
One of the greatest strengths of Elixir is the fact that we have a powerful scheduler in BEAM and therefore can avoid callback based async code. It makes a huge difference in the user experience. I would much rather simply do:
investments = users|> Stream.parallel_map(&fetch_profile/1)|> Enum.parallel_map(&fetch_investments/1)
investments = users|> fetch_profiles()|> fetch_investments()
|> Enum.into([])
def fetch_profiles(users) do
parallel for ...
end
def fetch_investments(profiles) do
parallel for ...
end
I guess I'm saying 2 things here:
- Favor Enumerable over Collectable when possible. This could be the same argument as pull vs push or lazy vs eager. My preference is Enumerable/pull/lazy. It's what has made Elixir so much more enjoyable than Javascript or Scala. I don't need to jump through any weird hoops.
- Let's avoid callback / pipeline style programming. We have the pipe operator which is fantastic. It doesn't hide execution details. Parallel.flow doesn't seem to offer much over the pipe operator other than supporting output params. This is dependent on favoring Enumerable.
investments = users|> Stream.parallel_map(&fetch_profile/1)|> Enum.parallel_map(&fetch_investments/1)
users|> Stream.Parallel.map(...)|> Stream.Parallel.map(...)
|> Enum.into([])
users|> Stream.Parallel.map(...)
|> Stream.take(10)
|> Stream.Parallel.map(...)
|> Enum.into([])
parallel for x <- users, do: do_x(x)
parallel for x <- users, do: do_x(x) |> Enum.into([])
for x <- users, stream: true, do: do_x(x)parallel for x <- users, stream: true, do: do_x(x)
parallel(for x <- users, do: do_x(x))
Good summary José. I'd say one other piece that was left out was ordering on parallel operations.
If we want to preserve order while maintaining independent function composition (ie an element can move to the next stage even if other elements have not completed the current stage) the normal approach is to tag elements with an index and then sort at the end of the pipeline. This introduces the idea of "around" functions on streams, which opens up all sorts of interesting possibilities as well as complications.
As said, there's a lot of speculation still and so the goal is to focus on parallel map/for because that is where users will immediately see benefit.
Fun stuff!
- Peter
I based everything off of Clojure and Scala's parallel collections. I took a reduce-combine approach. This is a variant of map reduce.
I'm not sure what the best approach is for concurrency. Currently my parallel list spawns a process for each scheduler. It's not great, but it shows that it works.
You received this message because you are subscribed to a topic in the Google Groups "elixir-lang-core" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elixir-lang-core/l1RFH4vIhfg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elixir-lang-co...@googlegroups.com.
Yes, I like the idea of keeping parallel/sequential separate from the data structure. Scala’s philosophy is to use the same data structures, then use splitters to partition a data structure for various tasks to process. They even wrote a paper about it: http://infoscience.epfl.ch/record/150220/files/pc.pdf.
Experimenting with performance and Streamz.merge:
After a discussion with José on Saturday I have tried a few new approaches. The first was to avoid routing data through the collector but to have the collector merely provide synchronization. It led to a decent speed up when there were just a few streams being merged but slowed down with a lot of streams being merged. The second approach was to eliminate the collector. The speedup here was huge. There was little difference in execution time between 2 streams and 10k streams. It is much more efficient. My machine started to bottleneck at around 10k cores, but it scaled very well.
I don't think we can do much better than that.
Here's more information, with pretty graphs. https://github.com/hamiltop/streamz/issues/10