Paging through Comments

coder

unread,

Mar 31, 2009, 5:58:49 AM3/31/09

to project-voldemort

Hi everyone,

I need to implement commenting functionality, where every user can
start viewing all comments starting from the latest one. I guess
without any shared state one simple way to implement this in a key-
value store would be to form a key consisting of timestamp + userid +
commentid.

Now I need users to be able to start browsing starting from "the
latest" by forming a key - however any user cannot just form the
"userid + commentid" part of the key, only the timestamp part. I would
think VM stores ids alphabetically, my question is, is there a way I
can get to a key using a pattern, say '20090331*', get the first item
in that list, and start scrolling?

If there are alternative ways to implement this sort of functionality,
I'd be happy to hear them.

Also I found this on highscalability.com.

http://tinyurl.com/dm246t

coder

unread,

Mar 31, 2009, 6:15:18 AM3/31/09

to project-voldemort

Addition:

Forming an id with timestamp, plus other vars could of course be
implemented in client code; My worry there was, in order to scroll, I
would have to maintain another list consisting of just a timestamp.
But then this list would be a point of contention during writes. Many
clients would form the same timestamp-based ID using their local time,
they would all try to insert at the same time, trying to update that
timestamp's list, causing waits. Commenting is usually a feature that
demands massive inserts, that's why I shied away from using any client
maintained a list. If VM naturally orders by key alphabetically, and
alphabetic keys end up on the same shard (most important), then it
would be great to get at that data somehow.

I can contribute code here if necessary, if required.

Dain Sundstrom

unread,

Mar 31, 2009, 12:35:49 PM3/31/09

to project-...@googlegroups.com

Voldemort is a key-value store and does have operations for listing
keys. This is a common problem in key value-stores, so here is how
the memcached FAQ explained the solution:

Storing lists of data
Storing lists of data into memcached can mean either storing a single
item with a serialized array, or trying to manipulate a huge
"collection" of data by adding, removing items without operating on
the whole set. Both should be possible.

One thing to keep in mind is memcached's 1 megabyte limit on item
size, so storing the whole collection (ids, data) into memcached might
not be the best idea.

Steven Grimm explains a better approach on the mailing list: http://lists.danga.com/pipermail/memcached/2007-July/004578.html

Chris Hondl and Paul Stacey detail alternative approaches to the same
ideal: http://lists.danga.com/pipermail/memcached/2007-July/004581.html

A combination of both would make for very scalable lists. IDs between
a range are stored in separate keys, and data is strewn about using
individual keys.

-dain

coder

unread,

Mar 31, 2009, 1:27:41 PM3/31/09

to project-voldemort

Thanks for the pointers - what they are describing is the same method
I considered before. Let's say timestamp app1, app2, app3 all get
timestamp 20090331-1200 and this is the new java.util.List Id for
adding comments into. And also we are storing comment IDs, not
comments themselves for performance, as it is said in the post you
shared. Now wouldn't all app servers will be rushing to add comments
to this single list? I think the benefit of what is described in
http://tinyurl.com/dm246t was that clients do not add anything to a
list, we simply create a key and put(). For example 20090331-1200-
user1234-seq1 is one comment, 20090331-1200-user1234-seq2 is the next
one (by the same user) and so on. If the datastore has these keys on
disk as sorted, I imagined it would be possible to say get
("20090331-1200*") and get an Iterator perhaps, and start iterating..
This way during inserts we do not have to maintain a list.

This is my first experience with key-value stores btw, I apologize if
these are very basic issues and problems. I imagined an insert load
for comments at 1 mil/day, divided it to 24*60*60 and came up with ~11
inserts/sec, and thought it be a point of chokepoint for my commenting
feature with a shared list, even if that list is batched up in 500s,
or timestamped on its key down to the minute or something.

Dain Sundstrom

unread,

Mar 31, 2009, 5:13:34 PM3/31/09

to project-...@googlegroups.com

On Mar 31, 2009, at 10:27 AM, coder wrote:

> Thanks for the pointers - what they are describing is the same method
> I considered before. Let's say timestamp app1, app2, app3 all get
> timestamp 20090331-1200 and this is the new java.util.List Id for
> adding comments into. And also we are storing comment IDs, not
> comments themselves for performance, as it is said in the post you
> shared. Now wouldn't all app servers will be rushing to add comments
> to this single list?

Yes, everyone adding to the comment stream would be appending to the
same list which can result in a lot of write/write conflicts which may
result in having to resubmit the write. FWIU, you can reduce the
number of notes written to increase concurrency at the cost of more
read repairs.

I suggest you write the easiest solution, and measure the
performance. If it doesn't meet your performance goals, tweak the
settings. The Amazon Dynamo paper (http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
) has a lot of information on tuning a store like Voldemort

> I think the benefit of what is described in
> http://tinyurl.com/dm246t was that clients do not add anything to a
> list, we simply create a key and put(). For example 20090331-1200-
> user1234-seq1 is one comment, 20090331-1200-user1234-seq2 is the next
> one (by the same user) and so on. If the datastore has these keys on
> disk as sorted, I imagined it would be possible to say get
> ("20090331-1200*") and get an Iterator perhaps, and start iterating..
> This way during inserts we do not have to maintain a list.

FWIU, the keys are hashed (randomized) to spread them across the
cluster, so load is evenly distributed.

> This is my first experience with key-value stores btw, I apologize if
> these are very basic issues and problems. I imagined an insert load
> for comments at 1 mil/day, divided it to 24*60*60 and came up with ~11
> inserts/sec, and thought it be a point of chokepoint for my commenting
> feature with a shared list, even if that list is batched up in 500s,
> or timestamped on its key down to the minute or something.

Is that 1 mil/day/comment stream? My guess is that comments clump
into hot topics that get nailed for a period of time and then the
users move on to something else. If so, I'd try to measure the
peakish (99.9%) comment traffic for a single stream, and make sure my
key-value store can handle that load. The rest of the load (non-hot
topics) should be easy to handle because you'll have few write/write
conflicts.

-dain

coder

unread,

Mar 31, 2009, 5:36:43 PM3/31/09

to project-voldemort

>
> I suggest you write the easiest solution, and measure the
> performance.

The easiest solution actually is dumping the keys in a local Mysql
(which I will use for fulltext search anyway) and do select ...
limit. :) I try to stay away from this as much as possible.

> FWIU, the keys are hashed (randomized) to spread them across the
> cluster, so load is evenly distributed.

I see; if, even 20-30 alphabetical keys do not happen to fall on the
same shard, what I am talking about would not be an improvement. If,
for every batch of 20, there are 20 hits on 20 different databases
(potentially), then we are right back to having a single database.

>
> Is that 1 mil/day/comment stream? My guess is that comments clump
> into hot topics that get nailed for a period of time and then the
> users move on to something else.

Good point; not all comments are new topics, most will be answers to
existing ones. In that case, all topics have their own lists which
will have less conflicting puts - not everyone is responding to the
same thing.

Adam Rosien

unread,

Mar 31, 2009, 7:30:08 PM3/31/09

to project-...@googlegroups.com

Don't forget you can insert the incoming comments into a queue and
control the throughput of readers of the queue who write to the store.
That way you can smooth out your bursts to help avoid write conflicts.
You can even partition your queue readers by comment thread, or
whatever, to serialize writes to a given key.

Just 'cause you got a super DHT don't mean you have to abuse it :)

.. Adam

coder

unread,

Mar 31, 2009, 8:26:38 PM3/31/09

to project-voldemort

On Apr 1, 2:30 am, Adam Rosien <a...@rosien.net> wrote:
> Don't forget you can insert the incoming comments into a queue and
> control the throughput of readers of the queue who write to the store.

Introducing queues would solve the problem for one app server case,
definitely. For more than one app server, the queues from multiple app
servers would have to be offloaded in a way not to conflict eachother,
maybe scheduled to wake up in a way so overlaps are less likely.

> Just 'cause you got a super DHT don't mean you have to abuse it :)

:) I do admit I was thinking of an extreme case, massive inserts into
same list, massive reads from the same list... I like going over such
cases though so I know the strengths, limitations before I start major
coding.

Geir Magnusson

unread,

Mar 31, 2009, 8:30:41 PM3/31/09

to project-...@googlegroups.com

On Mar 31, 2009, at 8:26 PM, coder wrote:

>
>
> On Apr 1, 2:30 am, Adam Rosien <a...@rosien.net> wrote:
>> Don't forget you can insert the incoming comments into a queue and
>> control the throughput of readers of the queue who write to the
>> store.
>
> Introducing queues would solve the problem for one app server case,
> definitely. For more than one app server, the queues from multiple app
> servers would have to be offloaded in a way not to conflict eachother,
> maybe scheduled to wake up in a way so overlaps are less likely.

Why? Aren't you going to be combining the comments? having a queue
collecting all the comments means that you can have one process
updating the keys, eliminating the write contention.

geir

Jay Kreps

unread,

Mar 31, 2009, 8:45:37 PM3/31/09

to project-...@googlegroups.com

Makes sense. We are doing high-contention inserts into large lists at
linkedin, but it is non-primary data. The optimistic locking support
has worked for us, but obviously that will only work up to a certain
level of contention. I believe that the logic in the optimistic
locking could be improved, though, to include some kind of backoff
that would help avoid repeated conflicts in this kind of situation.

Currently it is something like

int count = 0;
do {
try {
tryUpdate()
break;
} catch(OptimisticLockingFailure e) { count++; }
} while(count < MAX_TRIES);

But perhaps it should include a random backoff to avoid repeated collisions.

-Jay

coder

unread,

Apr 1, 2009, 5:56:54 AM4/1/09

to project-voldemort

I just started thinking on a solution using a variation of "sharded
counter" idea presented in http://tinyurl.com/dm246t. I could have
comments1, comments2,.., commentsN - all lists with well known names
to the app, with names different enough so that Voldemort can put
these list objects on different shards. A commenter (app server)
chooses a list name by random, say comments4, and writes into that
list. This reduces possible contention by 1/N. In order to paginate
through the list, app servers read comments1 though commentsN, mesh
the comment list together by sorting based on timestamp, and present
that to the user. Older comments in each list can be gradually moved
to archive lists whose keys are simple timestamps down to the month.
On these archve lists there will be no contention because noone is
writing into them. I think this could work.

What I got from that highscalability article was that for DHTs,
optimize for low write contention, optimize wide, and know reads are
cheap.

Jay Kreps

unread,

Apr 1, 2009, 12:12:53 PM4/1/09

to project-...@googlegroups.com

Yeah, I think it would be really awesome to have a library that
implements some of these ideas along with a merge strategy to support
conflict resolution for them.

-Jay

coder

unread,

Apr 1, 2009, 12:32:20 PM4/1/09

to project-voldemort

I agree. So let me try to write this comment maintanance code with
"librarization" in mind (hey people talk about Mavenification so why
not :)), and share it here once it's finished. If I am not mistaken
this requires no VM internal code change, so it can develop on its
own.

With that, I believe all my major architectural issues are resolved,
and can move on to coding. Thanks all who answered my questions.. I am
on the wait for the getSince() addition, if any help is needed there
please let me know.

Regards,

On Apr 1, 7:12 pm, Jay Kreps <jay.kr...@gmail.com> wrote:
> Yeah, I think it would be really awesome to have a library that
> implements some of these ideas along with a merge strategy to support
> conflict resolution for them.
>
> -Jay
>

> On Wed, Apr 1, 2009 at 2:56 AM, coder <burakbayra...@gmail.com> wrote:
>
> > I just started thinking on a solution using a variation of "sharded

> > counter" idea presented inhttp://tinyurl.com/dm246t. I could have

Reply all

Reply to author

Forward

Message has been deleted