How to run voldemort client on Spark worker side?

118 views
Skip to first unread message

xi...@tune.com

unread,
Feb 12, 2015, 5:17:39 PM2/12/15
to project-...@googlegroups.com
Hi,

We are trying to explore the solution to use SPARK job to process streaming data and query voldemort DB in the flow. This requires to run voldemort client on each spark WORKER. We may read or write to voldemort for each log.  It is obviously too heavy to launch a voldemort client and  read/write one <key,value> pair for each row of data.

What is the correct way to do it? Is it possible to have a client persistent on the WORKER side?

Thanks,

Xinyu

Felix GV

unread,
Feb 12, 2015, 5:26:47 PM2/12/15
to project-...@googlegroups.com
You should aim to have one StoreClientFactory per process.

The factory provides client instances which can speak to any number of stores. The thread pool, connection pool and other shared resources are at the factory level, not client level.

All configurations and tuning parameter are also at the factory level. If you wish to support many use cases which have different configuration needs from within the same process, then it makes sense to have a single factory for each use case, each configured differently.

You should not be too concerned with client re-use, as long as you re-use the same factory.

BTW, if you have a stream processing use case that needs to interact with a non-trivial amount of state, I encourage you to consider using stateful Samza processors, which have been engineered for exactly that purpose, with very desirable performance, fault-tolerance and operability characteristics. If you already have a large investment in Spark Streaming that may not be an option, but I thought it was worth mentioning (:

Let us know if you have other questions (:

--
 
Felix GV
Data Infrastructure Engineer
Distributed Data Systems
LinkedIn
 
f...@linkedin.com
linkedin.com/in/felixgv

From: project-...@googlegroups.com [project-...@googlegroups.com] on behalf of xi...@tune.com [xi...@tune.com]
Sent: Thursday, February 12, 2015 2:17 PM
To: project-...@googlegroups.com
Subject: [project-voldemort] How to run voldemort client on Spark worker side?

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.
Visit this group at http://groups.google.com/group/project-voldemort.
For more options, visit https://groups.google.com/d/optout.

xi...@tune.com

unread,
Feb 12, 2015, 6:11:11 PM2/12/15
to project-...@googlegroups.com
Hi Felix,

Thanks for the reply. Actually we had tried to create one factory on the Driver side and pass it to the Worker,or generate a client and pass it to the Worker. Both methods failed because both the client-factory and the client are not serializable. 

Any insights on this?

Thanks,

Xinyu

Felix GV

unread,
Feb 12, 2015, 11:13:46 PM2/12/15
to project-...@googlegroups.com
I don't know much about Spark Streaming, but if your "Worker" is long-lived, it should be fine to instantiate one factory per Worker and use it to get as many client instance(s) as that Worker needs... Your Driver could just pass along the factory configs you wish your Worker to use when instantiating its factory. Does that seem reasonable?



--
 
Felix GV
Data Infrastructure Engineer
Distributed Data Systems
LinkedIn
 
f...@linkedin.com
linkedin.com/in/felixgv
Sent: Thursday, February 12, 2015 3:11 PM
To: project-...@googlegroups.com
Subject: Re: [project-voldemort] How to run voldemort client on Spark worker side?

Reply all
Reply to author
Forward
0 new messages