[erlang-questions] count events within last XX minutes

76 views
Skip to first unread message

Max Bourinov

unread,
Aug 24, 2012, 10:16:51 AM8/24/12
to erlang-questions
Dear Erlangers,

Does anybody know best memory efficient way to count events within last XX minutes?

So far I have the following idea: For each counter I have a process. The counter process has a list of XX items. Each item represents a certain minute, so I always know where is my current counter. There is also must be a mechanism to remove last item from the list and add a new one. Summing values from all items is a number of events within last XX minutes. This is it.

Maybe there is a ready lib that does the same in a better way?

Any suggestions are welcome!

Best regards,
Max


Anders Nygren

unread,
Aug 24, 2012, 11:14:51 AM8/24/12
to Max Bourinov, erlang-questions
Use an ETS table with {Counter,{YYYY,MM,DD,HH,MM}} as key, and
ets:update_counter/2,3.
update_counter returns the new counter value, so if it is 1, (or the
increment used) You know that a new minute has been entered so You can
delete the oldest.

/Anders
> _______________________________________________
> erlang-questions mailing list
> erlang-q...@erlang.org
> http://erlang.org/mailman/listinfo/erlang-questions
>
_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

Mike Oxford

unread,
Aug 24, 2012, 4:05:03 PM8/24/12
to Anders Nygren, erlang-questions
If you care about, as you say, "certain minute" you can use Ulf's gproc.  If you want "last minute" then the ETS solution below will work well, with the modification on the Counter to use a ms/us timer like erlang:now().

-mox

Max Bourinov

unread,
Aug 26, 2012, 2:33:35 AM8/26/12
to Mike Oxford, erlang-questions
Hi guys,

Thank you guys for all your replies.

I think I will go with my own implementation. The only thing I cannot understand, why you suggest using ETS for this? Why keeping data in the state is not ok? I think on heavy load the state approach will perform better than ETS. Moreover, I think ETS is overkill for this task.

p.s. In my case XX minutes wont exceed 120.

Best regards,
Max

Max Lapshin

unread,
Aug 26, 2012, 3:22:19 AM8/26/12
to Max Bourinov, erlang-questions
On Sun, Aug 26, 2012 at 10:33 AM, Max Bourinov <bour...@gmail.com> wrote:
> Hi guys,
>
> Thank you guys for all your replies.
>
> I think I will go with my own implementation. The only thing I cannot
> understand, why you suggest using ETS for this?

because ets have update_counter API


> Why keeping data in the state is not ok?

because you will have giant state with all problems

> I think on heavy load the state approach will perform
> better than ETS. Moreover, I think ETS is overkill for this task.

"overkill" here is an emotion without any exact results.
Keeping this info in state will lead to copy of growing amount of memory.

Max Bourinov

unread,
Aug 26, 2012, 3:54:48 AM8/26/12
to Max Lapshin, erlang-questions
Hi Max,

Thank you for you comments and explanation.

Did you saw that I need only 120 minutes? So it will be a list with 121 element at most. In this case memory won't grow at all and size of the state will be the same.

I agree about coping but I don't understand why state will grow. Could you please explain it?

Best regards,
Max

Max Lapshin

unread,
Aug 26, 2012, 3:56:45 AM8/26/12
to Max Bourinov, erlang-questions
with 120 minute limit, it will not grow and you can use in-state memory storage.
But ets is not an overkill. It is just another solution.

Michael Truog

unread,
Aug 26, 2012, 4:02:09 AM8/26/12
to Max Bourinov, erlang-questions
There often is a habit with quick projects to throw data in ets, since it is easy to access the data as global data.  This helps people coming from an imperative programming background.  I don't see a good reason in the email thread that shows that ets is the best solution, simply because the amount of data is not clear.  ets can be used to limit the memory consumption, but so far, memory consumption was not mentioned as a concern.  Having an update_counter function doesn't sound too convincing because programmers are generally capable of a fetch-increment-store routine.

So, the moral of the story is to test and experiment.  You could try internal state with a dict as compared to ets, and see which make sense based on your requirements.

Max Lapshin

unread,
Aug 26, 2012, 4:04:17 AM8/26/12
to Michael Truog, erlang-questions
On Sun, Aug 26, 2012 at 12:02 PM, Michael Truog <mjt...@gmail.com> wrote:
> There often is a habit with quick projects to throw data in ets, since it is
> easy to access the data as global data. This helps people coming from an
> imperative programming background. I don't see a good reason in the email
> thread that shows that ets is the best solution,


But there is a good reason. It is performance. For example, in
erlyvideo all major statistics data are
collected not via gen_server:cal: you cannot ask process to tell its
statistics, because
it is very easy to DOS your server with such replies.

If you put stats into public ets, than collector will not be
overloaded with requests.

Max Bourinov

unread,
Aug 26, 2012, 4:21:40 AM8/26/12
to Max Lapshin, erlang-questions
>If you put stats into public ets, than collector will not be
>overloaded with requests.

This is interesting. I also have thoughts about avoid DOS. But I don't understand how ETS will help here.

Here what I have. I have only one process that process clients requests. It really doesn't make sense to have more processes because the process does nothing but returns some pre-calculated value. If I store data in ETS my process will have to do additional work - extracting data from ETS.

Or maybe I am missing something? Please give the right direction here. Or maybe in my case it doesn't matter because my case is trivial?

Best regards,
Max

Gleb Peregud

unread,
Aug 26, 2012, 4:30:51 AM8/26/12
to Erlang

Lately when working on project with up to a million concurrent clients connected I had to replace few instances of central servers with a public ets'es to avoid bottlenecks under heavy concurrent access. So choice of data storage may depend on what are access patterns of your data - i.e. how concurrent reads and writes are.

Gleb Peregud

unread,
Aug 26, 2012, 4:41:43 AM8/26/12
to Erlang
On Sunday, August 26, 2012, Max Bourinov wrote:

Lately when working on project with up to a million concurrent clients connected I had to replace few instances of central servers with a public ets'es to avoid bottlenecks under heavy concurrent access. So choice of data storage may depend on what are access patterns of your data - i.e. how concurrent reads and writes are.

That is a good point.

I have idea how to utilize ETS in my case too: one process will write to ETS, and all other will read from ETS.

The question now: will it benefit me or not?
Probably yes, if read_concurrency is enabled and you have enough concurrent processes reading data. The best way to answer for sure is to implement both and measure under some high load situation.

Max Bourinov

unread,
Aug 26, 2012, 5:02:05 AM8/26/12
to Gleb Peregud, Erlang
Thank you!

+1 for testing :-)

Best regards,
Max




Loïc Hoguin

unread,
Aug 26, 2012, 5:45:21 AM8/26/12
to Gleb Peregud, Erlang
If he's using update_counter then he only needs write_concurrency.
update_counter is a write operation that also returns the new value; if
you update_counter with an increment of 0, then you just happened to
read the value with a write context.

--
Loïc Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu
Reply all
Reply to author
Forward
0 new messages