Does anybody know best memory efficient way to count events within last XX
minutes?
So far I have the following idea: For each counter I have a process. The
counter process has a list of XX items. Each item represents a certain
minute, so I always know where is my current counter. There is also must be
a mechanism to remove last item from the list and add a new one. Summing
values from all items is a number of events within last XX minutes. This is
it.
Maybe there is a ready lib that does the same in a better way?
Use an ETS table with {Counter,{YYYY,MM,DD,HH,MM}} as key, and
ets:update_counter/2,3.
update_counter returns the new counter value, so if it is 1, (or the
increment used) You know that a new minute has been entered so You can
delete the oldest.
On Fri, Aug 24, 2012 at 9:16 AM, Max Bourinov <bouri...@gmail.com> wrote:
> Dear Erlangers,
> Does anybody know best memory efficient way to count events within last XX
> minutes?
> So far I have the following idea: For each counter I have a process. The
> counter process has a list of XX items. Each item represents a certain
> minute, so I always know where is my current counter. There is also must be
> a mechanism to remove last item from the list and add a new one. Summing
> values from all items is a number of events within last XX minutes. This is
> it.
> Maybe there is a ready lib that does the same in a better way?
If you care about, as you say, "certain minute" you can use Ulf's gproc.
If you want "last minute" then the ETS solution below will work well, with
the modification on the Counter to use a ms/us timer like erlang:now().
-mox
On Fri, Aug 24, 2012 at 8:14 AM, Anders Nygren <anders.nyg...@gmail.com>wrote:
> Use an ETS table with {Counter,{YYYY,MM,DD,HH,MM}} as key, and
> ets:update_counter/2,3.
> update_counter returns the new counter value, so if it is 1, (or the
> increment used) You know that a new minute has been entered so You can
> delete the oldest.
> /Anders
> On Fri, Aug 24, 2012 at 9:16 AM, Max Bourinov <bouri...@gmail.com> wrote:
> > Dear Erlangers,
> > Does anybody know best memory efficient way to count events within last
> XX
> > minutes?
> > So far I have the following idea: For each counter I have a process. The
> > counter process has a list of XX items. Each item represents a certain
> > minute, so I always know where is my current counter. There is also must
> be
> > a mechanism to remove last item from the list and add a new one. Summing
> > values from all items is a number of events within last XX minutes. This
> is
> > it.
> > Maybe there is a ready lib that does the same in a better way?
I think I will go with my own implementation. The only thing I cannot
understand, why you suggest using ETS for this? Why keeping data in the
state is not ok? I think on heavy load the state approach will perform
better than ETS. Moreover, I think ETS is overkill for this task.
On Sat, Aug 25, 2012 at 12:05 AM, Mike Oxford <moxf...@gmail.com> wrote:
> If you care about, as you say, "certain minute" you can use Ulf's gproc.
> If you want "last minute" then the ETS solution below will work well, with
> the modification on the Counter to use a ms/us timer like erlang:now().
> -mox
> On Fri, Aug 24, 2012 at 8:14 AM, Anders Nygren <anders.nyg...@gmail.com>wrote:
>> Use an ETS table with {Counter,{YYYY,MM,DD,HH,MM}} as key, and
>> ets:update_counter/2,3.
>> update_counter returns the new counter value, so if it is 1, (or the
>> increment used) You know that a new minute has been entered so You can
>> delete the oldest.
>> /Anders
>> On Fri, Aug 24, 2012 at 9:16 AM, Max Bourinov <bouri...@gmail.com> wrote:
>> > Dear Erlangers,
>> > Does anybody know best memory efficient way to count events within last
>> XX
>> > minutes?
>> > So far I have the following idea: For each counter I have a process. The
>> > counter process has a list of XX items. Each item represents a certain
>> > minute, so I always know where is my current counter. There is also
>> must be
>> > a mechanism to remove last item from the list and add a new one. Summing
>> > values from all items is a number of events within last XX minutes.
>> This is
>> > it.
>> > Maybe there is a ready lib that does the same in a better way?
> On Sat, Aug 25, 2012 at 12:05 AM, Mike Oxford <moxf...@gmail.com> wrote:
>> If you care about, as you say, "certain minute" you can use Ulf's gproc.
>> If you want "last minute" then the ETS solution below will work well, with
>> the modification on the Counter to use a ms/us timer like erlang:now().
>> -mox
>> On Fri, Aug 24, 2012 at 8:14 AM, Anders Nygren <anders.nyg...@gmail.com>
>> wrote:
>>> Use an ETS table with {Counter,{YYYY,MM,DD,HH,MM}} as key, and
>>> ets:update_counter/2,3.
>>> update_counter returns the new counter value, so if it is 1, (or the
>>> increment used) You know that a new minute has been entered so You can
>>> delete the oldest.
>>> /Anders
>>> On Fri, Aug 24, 2012 at 9:16 AM, Max Bourinov <bouri...@gmail.com> wrote:
>>> > Dear Erlangers,
>>> > Does anybody know best memory efficient way to count events within last
>>> > XX
>>> > minutes?
>>> > So far I have the following idea: For each counter I have a process.
>>> > The
>>> > counter process has a list of XX items. Each item represents a certain
>>> > minute, so I always know where is my current counter. There is also
>>> > must be
>>> > a mechanism to remove last item from the list and add a new one.
>>> > Summing
>>> > values from all items is a number of events within last XX minutes.
>>> > This is
>>> > it.
>>> > Maybe there is a ready lib that does the same in a better way?
Did you saw that I need only 120 minutes? So it will be a list with 121
element at most. In this case memory won't grow at all and size of the
state will be the same.
I agree about coping but I don't understand why state will grow. Could you
please explain it?
On Sun, Aug 26, 2012 at 11:22 AM, Max Lapshin <max.laps...@gmail.com> wrote:
> On Sun, Aug 26, 2012 at 10:33 AM, Max Bourinov <bouri...@gmail.com> wrote:
> > Hi guys,
> > Thank you guys for all your replies.
> > I think I will go with my own implementation. The only thing I cannot
> > understand, why you suggest using ETS for this?
> because ets have update_counter API
> > Why keeping data in the state is not ok?
> because you will have giant state with all problems
> > I think on heavy load the state approach will perform
> > better than ETS. Moreover, I think ETS is overkill for this task.
> "overkill" here is an emotion without any exact results.
> Keeping this info in state will lead to copy of growing amount of memory.
> > p.s. In my case XX minutes wont exceed 120.
> > Best regards,
> > Max
> > On Sat, Aug 25, 2012 at 12:05 AM, Mike Oxford <moxf...@gmail.com> wrote:
> >> If you care about, as you say, "certain minute" you can use Ulf's gproc.
> >> If you want "last minute" then the ETS solution below will work well,
> with
> >> the modification on the Counter to use a ms/us timer like erlang:now().
> >> -mox
> >> On Fri, Aug 24, 2012 at 8:14 AM, Anders Nygren <anders.nyg...@gmail.com
> >> wrote:
> >>> Use an ETS table with {Counter,{YYYY,MM,DD,HH,MM}} as key, and
> >>> ets:update_counter/2,3.
> >>> update_counter returns the new counter value, so if it is 1, (or the
> >>> increment used) You know that a new minute has been entered so You can
> >>> delete the oldest.
> >>> /Anders
> >>> On Fri, Aug 24, 2012 at 9:16 AM, Max Bourinov <bouri...@gmail.com>
> wrote:
> >>> > Dear Erlangers,
> >>> > Does anybody know best memory efficient way to count events within
> last
> >>> > XX
> >>> > minutes?
> >>> > So far I have the following idea: For each counter I have a process.
> >>> > The
> >>> > counter process has a list of XX items. Each item represents a
> certain
> >>> > minute, so I always know where is my current counter. There is also
> >>> > must be
> >>> > a mechanism to remove last item from the list and add a new one.
> >>> > Summing
> >>> > values from all items is a number of events within last XX minutes.
> >>> > This is
> >>> > it.
> >>> > Maybe there is a ready lib that does the same in a better way?
On Sun, Aug 26, 2012 at 11:54 AM, Max Bourinov <bouri...@gmail.com> wrote:
> Hi Max,
> Thank you for you comments and explanation.
> Did you saw that I need only 120 minutes? So it will be a list with 121
> element at most. In this case memory won't grow at all and size of the state
> will be the same.
> I agree about coping but I don't understand why state will grow. Could you
> please explain it?
> Best regards,
> Max
> On Sun, Aug 26, 2012 at 11:22 AM, Max Lapshin <max.laps...@gmail.com> wrote:
>> On Sun, Aug 26, 2012 at 10:33 AM, Max Bourinov <bouri...@gmail.com> wrote:
>> > Hi guys,
>> > Thank you guys for all your replies.
>> > I think I will go with my own implementation. The only thing I cannot
>> > understand, why you suggest using ETS for this?
>> because ets have update_counter API
>> > Why keeping data in the state is not ok?
>> because you will have giant state with all problems
>> > I think on heavy load the state approach will perform
>> > better than ETS. Moreover, I think ETS is overkill for this task.
>> "overkill" here is an emotion without any exact results.
>> Keeping this info in state will lead to copy of growing amount of memory.
>> > p.s. In my case XX minutes wont exceed 120.
>> > Best regards,
>> > Max
>> > On Sat, Aug 25, 2012 at 12:05 AM, Mike Oxford <moxf...@gmail.com> wrote:
>> >> If you care about, as you say, "certain minute" you can use Ulf's
>> >> gproc.
>> >> If you want "last minute" then the ETS solution below will work well,
>> >> with
>> >> the modification on the Counter to use a ms/us timer like erlang:now().
>> >> -mox
>> >> On Fri, Aug 24, 2012 at 8:14 AM, Anders Nygren
>> >> <anders.nyg...@gmail.com>
>> >> wrote:
>> >>> Use an ETS table with {Counter,{YYYY,MM,DD,HH,MM}} as key, and
>> >>> ets:update_counter/2,3.
>> >>> update_counter returns the new counter value, so if it is 1, (or the
>> >>> increment used) You know that a new minute has been entered so You can
>> >>> delete the oldest.
>> >>> /Anders
>> >>> On Fri, Aug 24, 2012 at 9:16 AM, Max Bourinov <bouri...@gmail.com>
>> >>> wrote:
>> >>> > Dear Erlangers,
>> >>> > Does anybody know best memory efficient way to count events within
>> >>> > last
>> >>> > XX
>> >>> > minutes?
>> >>> > So far I have the following idea: For each counter I have a process.
>> >>> > The
>> >>> > counter process has a list of XX items. Each item represents a
>> >>> > certain
>> >>> > minute, so I always know where is my current counter. There is also
>> >>> > must be
>> >>> > a mechanism to remove last item from the list and add a new one.
>> >>> > Summing
>> >>> > values from all items is a number of events within last XX minutes.
>> >>> > This is
>> >>> > it.
>> >>> > Maybe there is a ready lib that does the same in a better way?
There often is a habit with quick projects to throw data in ets, since it is easy to access the data as global data. This helps people coming from an imperative programming background. I don't see a good reason in the email thread that shows that ets is the best solution, simply because the amount of data is not clear. ets can be used to limit the memory consumption, but so far, memory consumption was not mentioned as a concern. Having an update_counter function doesn't sound too convincing because programmers are generally capable of a fetch-increment-store routine.
So, the moral of the story is to test and experiment. You could try internal state with a dict as compared to ets, and see which make sense based on your requirements.
> I think I will go with my own implementation. The only thing I cannot understand, why you suggest using ETS for this? Why keeping data in the state is not ok? I think on heavy load the state approach will perform better than ETS. Moreover, I think ETS is overkill for this task.
> p.s. In my case XX minutes wont exceed 120.
> Best regards,
> Max
> On Sat, Aug 25, 2012 at 12:05 AM, Mike Oxford <moxf...@gmail.com <mailto:moxf...@gmail.com>> wrote:
> If you care about, as you say, "certain minute" you can use Ulf's gproc. If you want "last minute" then the ETS solution below will work well, with the modification on the Counter to use a ms/us timer like erlang:now().
> -mox
> On Fri, Aug 24, 2012 at 8:14 AM, Anders Nygren <anders.nyg...@gmail.com <mailto:anders.nyg...@gmail.com>> wrote:
> Use an ETS table with {Counter,{YYYY,MM,DD,HH,MM}} as key, and
> ets:update_counter/2,3.
> update_counter returns the new counter value, so if it is 1, (or the
> increment used) You know that a new minute has been entered so You can
> delete the oldest.
> /Anders
> On Fri, Aug 24, 2012 at 9:16 AM, Max Bourinov <bouri...@gmail.com <mailto:bouri...@gmail.com>> wrote:
> > Dear Erlangers,
> > Does anybody know best memory efficient way to count events within last XX
> > minutes?
> > So far I have the following idea: For each counter I have a process. The
> > counter process has a list of XX items. Each item represents a certain
> > minute, so I always know where is my current counter. There is also must be
> > a mechanism to remove last item from the list and add a new one. Summing
> > values from all items is a number of events within last XX minutes. This is
> > it.
> > Maybe there is a ready lib that does the same in a better way?
On Sun, Aug 26, 2012 at 12:02 PM, Michael Truog <mjtr...@gmail.com> wrote:
> There often is a habit with quick projects to throw data in ets, since it is
> easy to access the data as global data. This helps people coming from an
> imperative programming background. I don't see a good reason in the email
> thread that shows that ets is the best solution,
But there is a good reason. It is performance. For example, in
erlyvideo all major statistics data are
collected not via gen_server:cal: you cannot ask process to tell its
statistics, because
it is very easy to DOS your server with such replies.
If you put stats into public ets, than collector will not be
overloaded with requests.
_______________________________________________
erlang-questions mailing list
erlang-questi...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions
>If you put stats into public ets, than collector will not be
>overloaded with requests.
This is interesting. I also have thoughts about avoid DOS. But I don't
understand how ETS will help here.
Here what I have. I have only one process that process clients requests. It
really doesn't make sense to have more processes because the process does
nothing but returns some pre-calculated value. If I store data in ETS my
process will have to do additional work - extracting data from ETS.
Or maybe I am missing something? Please give the right direction here. Or
maybe in my case it doesn't matter because my case is trivial?
On Sun, Aug 26, 2012 at 12:04 PM, Max Lapshin <max.laps...@gmail.com> wrote:
> On Sun, Aug 26, 2012 at 12:02 PM, Michael Truog <mjtr...@gmail.com> wrote:
> > There often is a habit with quick projects to throw data in ets, since
> it is
> > easy to access the data as global data. This helps people coming from an
> > imperative programming background. I don't see a good reason in the
> email
> > thread that shows that ets is the best solution,
> But there is a good reason. It is performance. For example, in
> erlyvideo all major statistics data are
> collected not via gen_server:cal: you cannot ask process to tell its
> statistics, because
> it is very easy to DOS your server with such replies.
> If you put stats into public ets, than collector will not be
> overloaded with requests.
Lately when working on project with up to a million concurrent clients
connected I had to replace few instances of central servers with a public
ets'es to avoid bottlenecks under heavy concurrent access. So choice of
data storage may depend on what are access patterns of your data - i.e. how
concurrent reads and writes are.
On Aug 26, 2012 10:22 AM, "Max Bourinov" <bouri...@gmail.com> wrote:
> >If you put stats into public ets, than collector will not be
> >overloaded with requests.
> This is interesting. I also have thoughts about avoid DOS. But I don't
understand how ETS will help here.
> Here what I have. I have only one process that process clients requests.
It really doesn't make sense to have more processes because the process
does nothing but returns some pre-calculated value. If I store data in ETS
my process will have to do additional work - extracting data from ETS.
> Or maybe I am missing something? Please give the right direction here. Or
maybe in my case it doesn't matter because my case is trivial?
> On Sun, Aug 26, 2012 at 12:04 PM, Max Lapshin <max.laps...@gmail.com>
wrote:
>> On Sun, Aug 26, 2012 at 12:02 PM, Michael Truog <mjtr...@gmail.com>
wrote:
>> > There often is a habit with quick projects to throw data in ets, since
it is
>> > easy to access the data as global data. This helps people coming from
an
>> > imperative programming background. I don't see a good reason in the
email
>> > thread that shows that ets is the best solution,
>> But there is a good reason. It is performance. For example, in
>> erlyvideo all major statistics data are
>> collected not via gen_server:cal: you cannot ask process to tell its
>> statistics, because
>> it is very easy to DOS your server with such replies.
>> If you put stats into public ets, than collector will not be
>> overloaded with requests.
> Lately when working on project with up to a million concurrent clients
>> connected I had to replace few instances of central servers with a public
>> ets'es to avoid bottlenecks under heavy concurrent access. So choice of
>> data storage may depend on what are access patterns of your data - i.e. how
>> concurrent reads and writes are.
> That is a good point.
> I have idea how to utilize ETS in my case too: one process will write to
> ETS, and all other will read from ETS.
> The question now: will it benefit me or not?
Probably yes, if read_concurrency is enabled and you have enough concurrent
processes reading data. The best way to answer for sure is to implement
both and measure under some high load situation.
On Sun, Aug 26, 2012 at 12:41 PM, Gleb Peregud <glebe...@gmail.com> wrote:
> On Sunday, August 26, 2012, Max Bourinov wrote:
>> Lately when working on project with up to a million concurrent clients
>>> connected I had to replace few instances of central servers with a public
>>> ets'es to avoid bottlenecks under heavy concurrent access. So choice of
>>> data storage may depend on what are access patterns of your data - i.e. how
>>> concurrent reads and writes are.
>> That is a good point.
>> I have idea how to utilize ETS in my case too: one process will write to
>> ETS, and all other will read from ETS.
>> The question now: will it benefit me or not?
> Probably yes, if read_concurrency is enabled and you have enough
> concurrent processes reading data. The best way to answer for sure is to
> implement both and measure under some high load situation.
> Lately when working on project with up to a million concurrent
> clients connected I had to replace few instances of central
> servers with a public ets'es to avoid bottlenecks under heavy
> concurrent access. So choice of data storage may depend on what
> are access patterns of your data - i.e. how concurrent reads and
> writes are.
> That is a good point.
> I have idea how to utilize ETS in my case too: one process will
> write to ETS, and all other will read from ETS.
> The question now: will it benefit me or not?
> Probably yes, if read_concurrency is enabled and you have enough
> concurrent processes reading data. The best way to answer for sure is to
> implement both and measure under some high load situation.
If he's using update_counter then he only needs write_concurrency. update_counter is a write operation that also returns the new value; if you update_counter with an increment of 0, then you just happened to read the value with a write context.