Stats like count, group by, datewise, etc

53 views
Skip to first unread message

Mani

unread,
Nov 24, 2010, 4:48:21 AM11/24/10
to project-voldemort
Hello Everyone,

I am using Voldemort to store the status of the messages. It can be
success, failure, etc.

Here is the store structure:

1 => {"message" => "hello", "status" => "success"}
2 => {"message" => "test", "status" => "pending"}
3 => {"message" => "great", "status" => "failure"}

Many tasks/actions like retry, etc are happening based on the status.
Inaddition to the tasks/actions, I would like to show the summary in
the user interface/dashboard displaying the success count, failure
count, pending count, datewise messages, etc.

Basically, reqmt is to show the stats like described above. Though
Voldemort is not designed/developed to provide these kind of stats,
just want to know how the users of voldemort are doing this kind of
group by, count operations from the voldemort stores?

Thanks,
Mani

Mani

unread,
Nov 29, 2010, 5:50:22 AM11/29/10
to project-voldemort
Hi,

From the LinkedIn PPT's, come to know features like people you may
know, search, etc has been powered by Voldemort.

In the case of RDBMS, to provide paginated results, we retrieve using
LIMIT and OFFSET clause to show only the required results.

@LinkedIn Engineers,

Did you used voldemort stores directly to display the data (for ex,
total count of people you may know, total count of search results,
etc) in those web pages? or Is any other intermediate storage has been
used to develop those web pages?


Thanks,
Mani

Mani

unread,
Dec 3, 2010, 1:57:22 AM12/3/10
to project-voldemort
Can anyone answer this?

Suggestions/advice/answers from the community experts really helps and
increase confidence to new users like me, especially when trying out
this project with some risk factor (with strong belief that community
will help when some problem comes..)

Thanks,
Mani

Alex Feinberg

unread,
Dec 3, 2010, 11:56:15 PM12/3/10
to project-...@googlegroups.com, cricc...@gmail.com
Search isn't powered by Voldemort (although Voldemort is used in parts
of the search systems e.g., to track appearances in people search): a
search system like Lucene is more appropriate for the full text search
case. You can find more information about LinkedIn search here:

http://thenoisychannel.com/2010/01/31/linkedin-search-a-look-beneath-the-hood/

We do use Voldemort directly (i.e., it isn't extract and processed
separately) for People You May Know, who viewied my profile and other
features. People You May Know is pre-computed in Hadoop and then
pushed out to a read-only Voldemort cluster (using the read only
storage engine).

"Who viewed my profile" (another site feature that uses aggregation
and counting) is powered by read/write Voldemort. Unfortunately, I am
not sure exactly how much about information I am allowed to provide
about the working of this feature (how much is public and how much
isn't), but I've CC'd one of the engineers familiar with this feature
to see if he can point you to some public information on this.

- Alex

> --
> You received this message because you are subscribed to the Google Groups "project-voldemort" group.
> To post to this group, send email to project-...@googlegroups.com.
> To unsubscribe from this group, send email to project-voldem...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/project-voldemort?hl=en.
>
>

Manikandan R

unread,
Dec 4, 2010, 7:21:02 AM12/4/10
to project-...@googlegroups.com
Alex,

Thanks a lot for your quick reply.

In my case, I will need to provide the delivery status of the messages in real time as and when messages goes out from our platform. Let me try using Hadoop and ready only store and see...

Thanks,
Mani

Alex Feinberg

unread,
Dec 4, 2010, 7:25:48 PM12/4/10
to project-...@googlegroups.com
I think if you're looking for real time updates, then Hadoop/read-only
stores might not be the best bet. Read-write stores may be your best
bet. Although a combination of read/write stores for some data and
read/only stores for other data may work.

A simple suggestion may be to have stores mapping:

* Time sent -> message ids (e.g., a list of message ids sent between
0830 and 0900 on a specific day)
* Time received -> message ids
* Message id -> status object (when was it sent, when was it received)

You may want to look at papers related to OLAP cubing:
http://en.wikipedia.org/wiki/OLAP_cube

Here's a public presentation on Avatara, or scalable OLAP layer built
on top of Hadoop and Voldemort (both read-write and read-only):

http://sna-projects.com/sna/images/avatara-sam-sig.ppt

- Alex

Manikandan R

unread,
Dec 6, 2010, 4:26:48 AM12/6/10
to project-...@googlegroups.com
Alex,

Avatara seems to be the exact fit ! Thanks for sharing.

I don't see much information about Avatara on the internet. Is it open sourced? Where can I see the API doc?

Thanks again for your help.

Thanks,
Mani
Reply all
Reply to author
Forward
0 new messages