[erlang-questions] Guidance to a cache mechanism

60 views
Skip to first unread message

Maruthavanan Subbarayan

unread,
Oct 30, 2012, 7:08:51 AM10/30/12
to erlang-q...@erlang.org
Hi,

I have to develop a caching queue, which also should should after a expiry time.

I may have around 500+ messages which may come in a second. Each message might contain 100-200 bytes.

So I have designed the system like below.

1. Queue would be a gen server.
2. All messages which would come on same second would be stored in state of the gen_server like a record {datetime(),[<list of messages>]}
3. When the time differs, I would insert the above record to ets table and update the state of gen_server.
4. There would be a timer running timer:send_interval which would message timeout the gen_server for every second, when this message is received, then gen_server would delete the ets table according to expiry configured.

I was looking on some guidance to check if the above is fine and with with stand the performance. I am foreseeing maximum expiry would be around 60 minutes.

Thanks,
Marutha

Rapsey

unread,
Oct 30, 2012, 8:01:55 AM10/30/12
to Maruthavanan Subbarayan, erlang-q...@erlang.org
Do not use timer module. Use erlang:send_after/3
In my experience the timer module is unreliable and send_after will always work.


Sergej

_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions


Paul Peregud

unread,
Oct 30, 2012, 11:14:23 AM10/30/12
to Rapsey, erlang-q...@erlang.org
Guarding writes to cache using gen_server is ok. You may leave want to
make ets table public and use {read_concurrency, true}.

Aside from bad performance of timer module, timer:send_interval has
other bad property. If your cache gen_server's message queue is
overloaded, timer:send_interval will add additional messages to it
thus making the problem worse.

You may also want to store datetime() of oldest entry of your ets
table in gen_server's State and check how what seconds' entries should
be trimmed while processing tick message by comparing oldest
datetime() value with current time.
--
Best regards,
Paul Peregud
+48602112091

Tim Watson

unread,
Oct 30, 2012, 11:29:29 AM10/30/12
to Paul Peregud, erlang-q...@erlang.org
Hi all

On 30 Oct 2012, at 15:14, Paul Peregud wrote:

> Guarding writes to cache using gen_server is ok. You may leave want to
> make ets table public and use {read_concurrency, true}.
>

This *can* be helpful, but individual ets tables used from lots of processes can also become a bottleneck and start throttling the schedulers. Manually breaking up your ets tables into smaller chunks sometimes helps, but can add a lot of complexity to the design too.

> [snip]
>
> On Tue, Oct 30, 2012 at 1:01 PM, Rapsey <rap...@gmail.com> wrote:
>> Do not use timer module. Use erlang:send_after/3
>> In my experience the timer module is unreliable and send_after will always
>> work.
>>

Whilst this is true, you should be aware that for very long timeouts, erlang:send_after/3 explodes if millis is > 2^32-1. If your timeouts are never that long (which they probably aren't) then it's not an issue, but worth baring in mind if you're planning on caching things for days at a time.

signature.asc

Dmitry Kolesnikov

unread,
Oct 30, 2012, 3:21:19 PM10/30/12
to Maruthavanan Subbarayan, erlang-q...@erlang.org
Hi,

You can check my implementation of same issue...

- Dmitry

Richard O'Keefe

unread,
Oct 30, 2012, 6:18:11 PM10/30/12
to Maruthavanan Subbarayan, erlang-q...@erlang.org

On 31/10/2012, at 12:08 AM, Maruthavanan Subbarayan wrote:
- messages contain up to 200 bytes
- "maybe 500+" messages per second
- maximum expiry 60 minutes
for a total of 360,000,000 bytes.

I'm wondering whether you actually need ETS at all.

+ Something gives your program a message.
+ Time tells your program when to delete a message.
? How is your program asked to give a message back?
A message is presumably going to be requested via
some sort of key, which is presumably _not_ going
to be the timestamp when it arrived, so you will
want some kind of index to find the right message, no?

Maruthavanan Subbarayan

unread,
Oct 31, 2012, 1:55:08 AM10/31/12
to Richard O'Keefe, erlang-q...@erlang.org
Hi Richard,

I thought of keeping the time stamp as the key in the ets table and maintain that in state when a timeout event occurs. I hope this would help me to delete the entries in ets table with less burden.

Thanks,
Marutha

> Subject: Re: [erlang-questions] Guidance to a cache mechanism
> From: o...@cs.otago.ac.nz
> Date: Wed, 31 Oct 2012 11:18:11 +1300
> CC: erlang-q...@erlang.org
> To: marutha...@hotmail.com

Jesper Louis Andersen

unread,
Oct 31, 2012, 8:03:23 AM10/31/12
to Maruthavanan Subbarayan, erlang-q...@erlang.org
On Oct 31, 2012, at 6:55 AM, Maruthavanan Subbarayan <marutha...@hotmail.com> wrote:

> Hi Richard,
>
> I thought of keeping the time stamp as the key in the ets table and maintain that in state when a timeout event occurs. I hope this would help me to delete the entries in ets table with less burden.

Well, ETS tables have two possible configurations that may be of interest to you: by default they are hash-tables, but you can map them as ordered sets. So if you store them as {Key, Payload} you are able to use ets:first/1 to grab the next guy in the queue. And you can quickly walk the table and time out entries. See for instance ets:select_delete/2. You may want, however, to discuss if you want a FIFO or a LIFO order here. FIFO often have the problem that if the queue size goes up, then everybody waits. LIFO stacks have the advantage that long waiters will wait longer.

Jesper Louis Andersen
Erlang Solutions Ltd., Copenhagen

Richard O'Keefe

unread,
Oct 31, 2012, 8:27:53 PM10/31/12
to Maruthavanan Subbarayan, erlang-q...@erlang.org

On 31/10/2012, at 6:55 PM, Maruthavanan Subbarayan wrote:
> I thought of keeping the time stamp as the key in the ets table and maintain that in state when a timeout event occurs. I hope this would help me to delete the entries in ets table with less burden.

Like I said, "Time tells your program when to delete a message."
But the question was "How is your program asked to give a message back?"
Because if your program is never asked to give a message back, why keep
it at all?

Suppose you are running Brekekek.example.com, a site where users
exchange 'croaks' (which must be at most 140 characters of Attic
Greek). You receive

{add,User,Croak}

Here is User's latest Croak, which you add to
the cache with erlang:now() -- ignoring the
microsecond part -- as its timestamp.

{ask,User,Sender,Time}

The Sender process wants to be told all of
User's Croaks since Time.

messages, amongst others. In order to delete Croaks easily,
you want to store them keyed by time. But in order to answer
'ask' requests, you want to store them keyed by User. And
ETS allows a table only one key. Oh dear...

Think of it in relational data base terms.

Cache(User, Time, Croak)
fd User x Time -> Croak
hash bag index on User
b-tree index on Time

{add,User,Croak} => into Cache insert (User, now(), Croak)
{ask,User,Sender,Time} =>
Sender ! select Croak from Cache
where Cache.User = User and Cache.Time >= Time
tick => delete Cache where Cache.Time < now() - Expiry_Time

Basically what I am saying is that if your cache is to be any use,
there _have_ to be operations other than adding and deleting, and
you need to consider _all_ the operations.

dmitry kolesnikov

unread,
Nov 1, 2012, 2:47:03 AM11/1/12
to Richard O'Keefe, erlang-q...@erlang.org
Hello Richard,

Somehow, I have missed your point here. It is obvious that two indexes
are required: hash and b-tree. Unless we have a double linked list but
we do not :-(

we can implement those indexes as ETS tables set and ordered_set, isn't IT?

Best Regards,
Dmitry >-|-|-*>

Ian

unread,
Nov 1, 2012, 12:16:40 PM11/1/12
to erlang-q...@erlang.org
Hi,

I'm not quite clear on what you want to do, but it appears that you want to discard messages if they are "too old", for some definition of too old, and you are proposing a combination of ETS and gen-server to do this.

Why not simply discard messages that are too old when you read them, and read the next? That, coupled with having no synchronous send/reads to stall your process, and you are golden.

Or have I missed something?

Ian

Richard O'Keefe

unread,
Nov 1, 2012, 4:50:30 PM11/1/12
to dmitry kolesnikov, erlang-q...@erlang.org

On 1/11/2012, at 7:47 PM, dmitry kolesnikov wrote:

> Hello Richard,
>
> Somehow, I have missed your point here. It is obvious that two indexes
> are required: hash and b-tree. Unless we have a double linked list but
> we do not :-(

Let me quote the text I was replying to:

>>
>> On 31/10/2012, at 6:55 PM, Maruthavanan Subbarayan wrote:
>>> I thought of keeping the time stamp as the key in the ets table and maintain that in state when a timeout event occurs. I hope this would help me to delete the entries in ets table with less burden.

>
*THE* key. That means one. *THE* table. That means one.

An ETS table can have only one key, only one index.

Now, let's think about this.
Sticking with Brekekek.example.com, suppose Xanthias sent a Croak
that is due to expire. And suppose the structure is
T1: {**Msg_Id, Time, User, Croak}
T2: {**Time, [Msg_Id]}
T3: {**User, [Msg_Id]}.
If Dionysus wants to see Croaks from Xanthias, he looks in T3,
and finds a list of Msg_Ids, which he then looks up in T1.

If Father Time wants to remove some messages, he looks in T2,
and finds a list of Msg_Ids, which he then removes from T1.
But what happens to T3? T3 is not indexed by Time or Msg_Id.

One approach, which has been mentioned already, and I'm sorry for
forgetting by whom, is to make the lookup process a bit more
complicated.

When Dionysus wants to see Croaks from Xanthias, he looks in T3,
and finds a list of Msg_Ids. But now he cannot be sure that those
Msg_Ids still exist. So *while holding a lock on T3['Xanthias']*,
he picks up the messages from T1, and makes a new list of Msg_Ids,
the ones that still exist and have not expired. Dionysus then
stores {'Xanthias', Filtered_Msg_Ids} back in T3.

Dealing with all the details gets a bit painful. And when you've
done it, you have the problem that T3 can grow without limit: an
entry in T3 shrinks only when someone looks at it. Suppose, for
example, that Pluto Croaks frequently, but has no followers. His
messages are regularly purged from T1, but his entry in T3 keeps
on growing.

ETS certainly doesn't make any of this easy.

Mnesia is another matter.
Using (truncated) Time as primary key of a bag-structured table,
mnesia:add_table_index/2 to add an index on User,
mnesia:index_read/3 to find the Croaks of a User, and
mnesia:delete/3 to delete expired Croaks,
it all fits beautifully.

Dmitry Belyaev

unread,
Nov 1, 2012, 4:59:38 PM11/1/12
to hobs...@gmail.com, erlang-q...@erlang.org
Garbage may fill the memory if consumer checks cache too seldom.

--
Dmitry Belyaev
Reply all
Reply to author
Forward
0 new messages