A microservices architecture using Elixir

570 views
Skip to first unread message

mouad benchchaoui

unread,
Sep 12, 2015, 12:31:26 PM9/12/15
to elixir-lang-talk
Hello,

Me and a friend of  mine were trying to use elixir to play around with how a SOA setup would look like in Elixir, but we have been without luck, making it work is not the big issue, the issue is how to do it right taken in consideration performance. here is how our setup work.

First Try
======

We started by having an HTTP API that uses Ecto and talk with a PostgreSQL DB running our benchmarks (using https://github.com/wg/wrk) which does a simple GET /user/ and the results were around 5000 req/s [1] which is ok for a starter.

Second Try
=========

Then we wanted to split it up to 2 parts an  API and "User service".  
The setup at first looked like this: wrk -> API (Plug) -> Customer.Handler (GenServer). 
This was very slow for a good reason because our Customer.Handler GenServer became the bottleneck, the QPS didn't exceeded 100 req/s.

Third Try
=======

To improve the numbers, we tried a different setup:

wrk -> API (Plug) -> Customer.Server (GenServer) -> Customer.Handler (50x GenServer)


With this setup we were able to reach 300 req/s, which is still not good :(

Fourth Try
========

To try to fix this we were thinking about using gproc (https://github.com/uwiger/gproc) especially gproc_pool, but at this point we were stuck because we couldn't understand how to use it, especially we didn't find good documentation neither (which would have been very helpful) code example, but the plan would be to basically have a gproc_pool managing multiple Customer.Handler processes and then from the API you could use gproc_pool.pick_worker to get one of the workers to call, at least this is the theory.

Now ours questions are:

- Is this the right way of architecting multiple services ? if not what would you suggest to us ?
- How can we use gproc_pool ? anyone have a good example in Elixir ?
- Any feedback about the code https://github.com/mouadino/sale is very welcome.

Thank you :)

footnotes:

[1]: Before we were using ecto 0.13 and QPS was around 50, the move to ecto 1.0 improved the performance by x100.

José Valim

unread,
Sep 12, 2015, 1:07:34 PM9/12/15
to elixir-l...@googlegroups.com
Given Ecto is Poolboy with a GenServer that talks to the database, I would expect the second try to be reasonably faster than Ecto, unless the action your GenServer is performing is horribly slow. Can you link to the GenServer code?

If you are using one HTTP library, make sure it doesn't have its own pool as well and, if it does, make sure it has a reasonable size.

Also look at the Poolboy configuration, like pool size and overflow and play with different configurations.

Finally, when stress testing, run :observer.start and see if you can identify in the process tab which process is getting behind, that is likely going to be your bottleneck.


--


José Valim
Skype: jv.ptec
Founder and Director of R&D

mouad benchchaoui

unread,
Sep 12, 2015, 2:59:50 PM9/12/15
to elixir-lang-talk, jose....@plataformatec.com.br
Hi Jose, thank you for the quick answer :)


On Saturday, September 12, 2015 at 7:07:34 PM UTC+2, José Valim wrote:
Given Ecto is Poolboy with a GenServer that talks to the database, I would expect the second try to be reasonably faster than Ecto, unless the action your GenServer is performing is horribly slow. Can you link to the GenServer code?

The GenServer code doing DB access is this one. While do you think it should be reasonable fast ? wouldn't it make all request go sequential b/c there only one GenServer i.e. One process which mean we end up with this:


    [http request]  ->                                                                    [ecto]

    [http request] ->             [GenServer Customer.Handler] ->  [ecto]

    [http request] ->                                                                     [ecto]


This why we went with third try and the fourth :)
 

If you are using one HTTP library, make sure it doesn't have its own pool as well and, if it does, make sure it has a reasonable size.

We are using Plug with Cowboy adapter and it's default values as in here. Actually the API code that talk with the GenServer can be found here


Also look at the Poolboy configuration, like pool size and overflow and play with different configurations.

I made it configurable for this purpose I also used the same pool size for Ecto and Customer.Handler GenServer for sanity, I already played with the numbers around with no luck, a big number just raise a lot of timeout which is understandable.


Finally, when stress testing, run :observer.start and see if you can identify in the process tab which process is getting behind, that is likely going to be your bottleneck.

I am running it, but not sure how to see the process going behind, there is usually a lot of processes and I not sure where to look any hints ?

Cheers,

José Valim

unread,
Sep 12, 2015, 3:29:22 PM9/12/15
to mouad benchchaoui, elixir-lang-talk
Sorry, I meant the third case should be ok. I will take a deeper look soon but why have you moved the database access inside the GenServer then?

Peter Hamilton

unread,
Sep 12, 2015, 5:17:21 PM9/12/15
to elixir-l...@googlegroups.com, mouad benchchaoui
You should try to avoid making blocking calls inside a GenServer because as you stated it serializes requests.

One useful technique that I'm using in rethinkdb-elixir is using :noreply in handle_call and the replying later.


You could, for example, launch a Task in handle_call, return :noreply, and then in handle_info you can wait for the Task to send its response and then use GenServer.reply to send the result. Something like:

  def handle_call({:get_by_token, token}, from, tasks) do
    t = Task.async(fn -> get_by_token(token) end)
    {:no_reply, Map.put_new(tasks, t.ref, from)}
  end

  def handle_info({ref, result}, tasks) when is_reference(ref) do
    GenServer.reply(tasks[ref], result)
    {:noreply, Map.delete(tasks, ref)}
  end


That's fairly simplistic, but I'd be curious to see your benchmark with such an implementation.

--
You received this message because you are subscribed to the Google Groups "elixir-lang-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-ta...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-talk/CAGnRm4K8OJ9jNsoEednsbynx3_8f5xxzhfMs8axvc8317tP7oA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

mouad benchchaoui

unread,
Sep 12, 2015, 5:35:19 PM9/12/15
to elixir-lang-talk, moua...@gmail.com, jose....@plataformatec.com.br


On Saturday, September 12, 2015 at 9:29:22 PM UTC+2, José Valim wrote:
Sorry, I meant the third case should be ok. I will take a deeper look soon but why have you moved the database access inside the GenServer then?

We though of using GenServer as the basic block for building a service (not sure if it's a good idea or no)

So basically the architecture contain a gateway HTTP API that talk to different services one of them is the customer service which **encapsulate** the customerdomain, we don't want to make DB calls from API directly because that will destroy the exercise of doing a microservice architecture.

The other choice that we made is to use erlang message passing as a means of transport between the services, instead of exposing an HTTP api for each service, the reason for this is that we wanted to prove that Elixir/Erlang is great for doing this kind of architectures without having to reinvent anything (battery included). 

One last note that may be helpful, since we are using normal process registry, we can't have the same service deployed in different "nodes" under the same name AFAIK, that's why we end up investigating gproc but with no luck so far :(

mouad benchchaoui

unread,
Sep 13, 2015, 6:59:51 AM9/13/15
to elixir-lang-talk, moua...@gmail.com


On Saturday, September 12, 2015 at 11:17:21 PM UTC+2, Peter Hamilton wrote:
You should try to avoid making blocking calls inside a GenServer because as you stated it serializes requests.

One useful technique that I'm using in rethinkdb-elixir is using :noreply in handle_call and the replying later.


You could, for example, launch a Task in handle_call, return :noreply, and then in handle_info you can wait for the Task to send its response and then use GenServer.reply to send the result. Something like:

  def handle_call({:get_by_token, token}, from, tasks) do
    t = Task.async(fn -> get_by_token(token) end)
    {:no_reply, Map.put_new(tasks, t.ref, from)}
  end

  def handle_info({ref, result}, tasks) when is_reference(ref) do
    GenServer.reply(tasks[ref], result)
    {:noreply, Map.delete(tasks, ref)}
  end


We were thinking of using a bounded pool of processes (code here) because AFAIK unbounded one is a recipe for disaster :)
 

That's fairly simplistic, but I'd be curious to see your benchmark with such an implementation.

But for the sake of experience I tried it out and the result is the same sadly, the code change can be found here (https://github.com/mouadino/sale/compare/using_handle_info?expand=1).

Benchmarks results:

Running 30s test @ http://127.0.0.1:8880/user/

  10 threads and 400 connections

  Thread Stats   Avg      Stdev     Max   +/- Stdev

    Latency     1.15s   143.86ms   1.32s    91.60%

    Req/Sec    35.93     23.85   141.00     66.98%

  10105 requests in 30.08s, 3.18MB read

  Socket errors: connect 0, read 247, write 0, timeout 0

Requests/sec:    335.98

Transfer/sec:    108.27KB

José Valim

unread,
Sep 13, 2015, 7:07:01 AM9/13/15
to elixir-l...@googlegroups.com
After taking a look at your code, I don't understand why you have moved the database access to a GenServer. Ecto already takes care of putting the connections behind a pool and managing resources. It feels like you have just added a lot of contention with no potential benefit. Unless I am missing something, here is the workflow:

1. request comes in
2. you ask the pool for the first gen server
3. the pool hands you a gen server when available
4. the gen server asks for the repo pool for a connection
5. the pool hands you a connection
6. you do the query

You are not even using the GenServer state. It is really unclear to me what you are trying to achieve. I would understand your changes if you were directly managing the database connections inside your own pool (which even though I wouldn't recommend) but putting one pool behind another pool is asking for a lot of contention.




José Valim
Skype: jv.ptec
Founder and Director of R&D

José Valim

unread,
Sep 13, 2015, 7:13:26 AM9/13/15
to elixir-l...@googlegroups.com
Reading your original e-mail, I found this:

===
Then we wanted to split it up to 2 parts an  API and "User service".  
The setup at first looked like this: wrk -> API (Plug) -> Customer.Handler (GenServer). 
This was very slow for a good reason because our Customer.Handler GenServer became the bottleneck, the QPS didn't exceeded 100 req/s.
===

You don't need a GenServer. The "User service" you mentioned is simply a *module*.

defmodule Customer.Handler do
  def get_by_token(...)
end

If, for some reason, you need a GenServer to keep state then please go ahead but it should be an implementation detail and not a pre-made decision. If you don't need one, then you don't need one.

The best part of making the "User service" a module and putting the logic behind a function is that you can later change it to use a GenServer if you need one and the caller doesn't need to know about it.

The following example is an extrapolation of the current design but I hope it makes it clear. What you are currently doing is as if I implemented a calculator like this:

defmodule Calculator do
  def add(left, right) do
    CalculatorServer.add(left, right)
  end
end

Which is completely unnecessary, just add the two numbers together:

defmodule Calculator do
  def add(left, right) do
    left + right
  end
end



José Valim
Skype: jv.ptec
Founder and Director of R&D

mouad benchchaoui

unread,
Sep 13, 2015, 7:25:49 AM9/13/15
to elixir-lang-talk, jose....@plataformatec.com.br
Yes the exercise is about create a micro-service architecture not a monolithic (The code of the monolithic that we are using a baseline for benchmarks is here).

The idea of what we are trying to do is to see how a micro-service architecture will look like in Elixir as I said before, that's why we are splitting the code in different services (each in it own app under one umbrella), and by services I mean a single code base that is deployed and managed separately from the other (services).

So to show more the setup let's imagine for instance that we have 2 nodes (or machines), a deployment of the sale repository may look like this:


 Node 1             |             Node 2               
                          |
[  API  ]              |            [ Customer ]   


The more I think about this, I have the feeling that the GenServer is really not what we want, maybe I should check again erlang RPC.

José Valim

unread,
Sep 13, 2015, 7:39:59 AM9/13/15
to mouad benchchaoui, elixir-lang-talk
The more I think about this, I have the feeling that the GenServer is really not what we want, maybe I should check again erlang RPC.

Precisely this. You should do a RPC call which will then decide, on the other node, if it should call a GenServer locally, call PG2, just add two numbers, etc:

It is the same as if you had a "RPC.Calculator". When you call the other node, it is up to the other node to choose what to do, you don't need to necessarily impose a GenServer. For example:

:rpc.call(:some_node@foo, Calculator, :add, [1, 2])
Reply all
Reply to author
Forward
0 new messages