> Thanks for the pointers - what they are describing is the same method
> I considered before. Let's say timestamp app1, app2, app3 all get
> timestamp 20090331-1200 and this is the new java.util.List Id for
> adding comments into. And also we are storing comment IDs, not
> comments themselves for performance, as it is said in the post you
> shared. Now wouldn't all app servers will be rushing to add comments
> to this single list?
Yes, everyone adding to the comment stream would be appending to the
same list which can result in a lot of write/write conflicts which may
result in having to resubmit the write. FWIU, you can reduce the
number of notes written to increase concurrency at the cost of more
read repairs.
I suggest you write the easiest solution, and measure the
performance. If it doesn't meet your performance goals, tweak the
settings. The Amazon Dynamo paper (http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
) has a lot of information on tuning a store like Voldemort
> I think the benefit of what is described in
> http://tinyurl.com/dm246t was that clients do not add anything to a
> list, we simply create a key and put(). For example 20090331-1200-
> user1234-seq1 is one comment, 20090331-1200-user1234-seq2 is the next
> one (by the same user) and so on. If the datastore has these keys on
> disk as sorted, I imagined it would be possible to say get
> ("20090331-1200*") and get an Iterator perhaps, and start iterating..
> This way during inserts we do not have to maintain a list.
FWIU, the keys are hashed (randomized) to spread them across the
cluster, so load is evenly distributed.
> This is my first experience with key-value stores btw, I apologize if
> these are very basic issues and problems. I imagined an insert load
> for comments at 1 mil/day, divided it to 24*60*60 and came up with ~11
> inserts/sec, and thought it be a point of chokepoint for my commenting
> feature with a shared list, even if that list is batched up in 500s,
> or timestamped on its key down to the minute or something.
Is that 1 mil/day/comment stream? My guess is that comments clump
into hot topics that get nailed for a period of time and then the
users move on to something else. If so, I'd try to measure the
peakish (99.9%) comment traffic for a single stream, and make sure my
key-value store can handle that load. The rest of the load (non-hot
topics) should be easy to handle because you'll have few write/write
conflicts.
-dain