example for build RO store with avro format

119 views
Skip to first unread message

Xiao Zhou

unread,
May 28, 2015, 7:17:16 PM5/28/15
to project-...@googlegroups.com
Any example how to build store with avro format
what need to be done in the makekey() makevalue() override?
Thanks,

Jeremiah Edwards

unread,
Jun 22, 2015, 5:34:14 PM6/22/15
to project-...@googlegroups.com
@Xiao, any luck getting this working?  I've been attempting the same thing myself (build and push with avro), but have yet to get a working combination of makekey() and makevalue()...

Félix GV

unread,
Jun 22, 2015, 5:50:06 PM6/22/15
to project-...@googlegroups.com
Hi,

You should not need to make any code changes to use Avro. Just set the following configs:

build.input.path=/some/hdfs/path/to/an/avro/dataset
build.type.avro=true
avro.serializer.versioned=true
avro.key.field=field_name_of_your_key_in_the_avro_records
avro.value.field=field_name_of_your_value_in_the_avro_records

Let me know if that doesn't work.

-F 

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.
Visit this group at http://groups.google.com/group/project-voldemort.
For more options, visit https://groups.google.com/d/optout.





--
--
Félix

Jeremiah Edwards

unread,
Jun 22, 2015, 7:00:29 PM6/22/15
to project-...@googlegroups.com
Hey Félix! 


It requires that a custom class be passed in, which extends AbstractHadoopStoreBuilderMapper, and implements the two functions which @Xiao mentioned. 

If I have data in a .avro container, of the form: 
record MyRecord {
  string id;
  union {null, string} received_at;
  string occurred_at;
  int granularity;
  string version;
}

and I'd like to generate key-value pairs, where the key is (id, version) and the value is the rest of the record, and load into Voldemort, what is the best way to accomplish this? It looks like you're just specifying the config in an Azkaban .job file. If it's possible to do it all via Azkaban, could you share a full example .job config file? 

Relatedly, *does the version of Avro matter?* I've seen numerous errors when attempting to read data created with Avro 1.7.7, and it doesn't appear to be compatible with the current release (Voldemort 1.9.17 is on Avro 1.4.0). Is data created with Avro 1.7.7 currently supported? 


Thanks


On Monday, June 22, 2015 at 2:50:06 PM UTC-7, Félix GV wrote:
Hi,

You should not need to make any code changes to use Avro. Just set the following configs:

build.input.path=/some/hdfs/path/to/an/avro/dataset
build.type.avro=true
avro.serializer.versioned=true
avro.key.field=field_name_of_your_key_in_the_avro_records
avro.value.field=field_name_of_your_value_in_the_avro_records

Let me know if that doesn't work.

-F 
@Xiao, any luck getting this working?  I've been attempting the same thing myself (build and push with avro), but have yet to get a working combination of makekey() and makevalue()...

On Thursday, May 28, 2015 at 4:17:16 PM UTC-7, Xiao Zhou wrote:
Any example how to build store with avro format
what need to be done in the makekey() makevalue() override?
Thanks,

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldemort+unsubscribe@googlegroups.com.





--
--
Félix

Félix GV

unread,
Jun 22, 2015, 11:18:44 PM6/22/15
to project-...@googlegroups.com
Data created with Avro 1.7 is compatible. We have some internal users that do that.

If you look in VoldemortBuildAndPushJob.java, you'll see a list of config constants which are marked as // required properties, that should be it...

Then if you want to use avro (which is optional), you just need to also add the configs I mentioned earlier as well.

-F
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.




--
--
Félix

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.

Jeremiah Edwards

unread,
Jun 23, 2015, 8:20:47 PM6/23/15
to project-...@googlegroups.com
Thanks Félix! 

Great news about Avro 1.7.

There doesn't seem to be a VoldemortBuildAndPush plugin available: Although one is listed here https://azkaban.github.io/azkaban/docs/2.5/#job-types, nothing is available in the jobtypes plugin ( here, https://azkaban.github.io/downloads.html and here: https://github.com/azkaban/azkaban-plugins/tree/master/plugins/jobtype/jobtypes ). 

Going back to the instructions here, http://www.project-voldemort.com/voldemort/build-and-push.html, I tried configuring Build and Push as a java job type with this configuration: 

type=java
job
.class=voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob
...

With this, though Azkaban is not able to locate the required class: 
Exception in thread "main" java.lang.ClassNotFoundException: voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob

I have added voldemort-1.9.17.jar and  voldemort-contrib-1.9.17.jar, which contains this class to Azkaban's extlib directory (which supposedly suffices to add custom jars to Azkaban's classpath), but this doesn't resolve the error. 

How have you configured Azkaban to run the build and push job (including the required voldemort classes)? 

To unsubscribe from this group and stop receiving emails from it, send an email to project-voldemort+unsubscribe@googlegroups.com.

Félix GV

unread,
Jun 24, 2015, 9:01:59 PM6/24/15
to project-...@googlegroups.com
Here is how our Azkaban is set up:

In the jobtypes/VoldemortBuildandPush directory, we have:

lib/voldemort-fat.jar
plugin.properties
private.properties

In the voldemort-fat.jar, we put together voldemort, voldemort-contrib, their transitive dependencies, as well as some proprietary code which we use for BuildAndPushHook implementations (this is just monitoring and alerting stuff). Unfortunately, this fat jar is not currently built by the open-source gradle build, but we may be able to port it in there if need be... Alternatively, in the short term, one can also just dump the transitive dependencies in many separate jars inside that nested lib/ directory. For us, the fat jar methodology was just an easier way to manage deployment since Azkaban is owned by a different team than Voldemort, but it's not at all necessary to do it like that.

More importantly, in plugin.properties, we have the following:

job.class=voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob

# WebHDFS is what we've been using for a while since it seems more resilient to Hadoop version changes, but this is not essential
voldemort.fetcher.protocol=webhdfs

# to deal with an Avro incompatibility issue
hadoop-conf.mapreduce.job.classloader=true

# and some other non-relevant configs for our custom hooks...

Then, in private.properties, we have:

jobtype.classpath=lib/*
jobtype.class=azkaban.jobtype.HadoopJavaJob
azkaban.no.user.classpath=true

I hope that helps. Please keep us posted (:

-F




To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.
Visit this group at http://groups.google.com/group/project-voldemort.
For more options, visit https://groups.google.com/d/optout.

--
--
Félix

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.
Visit this group at http://groups.google.com/group/project-voldemort.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.
Visit this group at http://groups.google.com/group/project-voldemort.
For more options, visit https://groups.google.com/d/optout.



--
--
 
Felix GV
Senior Software Engineer
Data Infrastructure
LinkedIn
 
f...@linkedin.com
linkedin.com/in/felixgv



--
--
Félix

Jeremiah Edwards

unread,
Jun 25, 2015, 5:12:07 PM6/25/15
to project-...@googlegroups.com
This is really helpful! 

Following your configuration, I believe we now have  all the required jars are in 
plugins/jobtypes/voldemortBuildAndPush/lib/

Unfortunately, there are still some classpath issues with the voldemortBuildAndPush plugin. I now see this CNF on azkaban startup: 
Caused by: java.lang.ClassNotFoundException: azkaban.jobtype.HadoopJavaJob

This is confusing, since the HadoopJavaJob plugin is also installed, and it is declaring the correct jobtype.class: 
$ cat plugins/jobtypes/hadoopJava/private.properties
jobtype.class=azkaban.jobtype.HadoopJavaJob

I'm wondering if this could be coming from the order in which individual plugins are loaded (e.g. the server is attempting to load VoldemortBuildandPush before HadoopJavaJob?). Based on the Azkaban docs, though this should "just work."

Is there anything needed for configuring the correct plugin classpath? (I've already included 
azkaban.jobtype.plugin.dir=plugins/jobtypes
in conf/azkaban.properties). 


For reference, here's the exact plugin VoldemortBuildandPush configuration I added: 

$ cat plugins/jobtypes/voldemortBuildAndPush/plugin.properties
job.class=voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob
voldemort.fetcher.protocol=webhdfs
hadoop-conf.mapreduce.job.classloader=true

$ cat plugins/jobtypes/voldemortBuildAndPush/private.properties
jobtype.classpath=lib/*
jobtype.class=azkaban.jobtype.HadoopJavaJob
azkaban.no.user.classpath=true
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldemort+unsubscribe@googlegroups.com.

--
--
Félix

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.
Visit this group at http://groups.google.com/group/project-voldemort.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldemort+unsubscribe@googlegroups.com.



--
--
 
Felix GV
Senior Software Engineer
Data Infrastructure
LinkedIn
 

--
--
Félix

Félix GV

unread,
Jun 25, 2015, 6:08:14 PM6/25/15
to project-...@googlegroups.com, azkab...@googlegroups.com
+azkaban-dev

I looked at our configs and I can't think of anything else you may need... So I'm adding the Azkaban mailing list, to see if anyone there has a clue?

I'd be curious to see the full CNF stacktrace. Also, what version of Azkaban and Voldemort are you running? Master or a specific tag?

-F
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.
Visit this group at http://groups.google.com/group/project-voldemort.
For more options, visit https://groups.google.com/d/optout.

--
--
Félix

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.
Visit this group at http://groups.google.com/group/project-voldemort.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.
Visit this group at http://groups.google.com/group/project-voldemort.
For more options, visit https://groups.google.com/d/optout.



--
--
 
Felix GV
Senior Software Engineer
Data Infrastructure
LinkedIn
 
--
--
Félix

--

Jeremiah Edwards

unread,
Jun 25, 2015, 6:10:26 PM6/25/15
to project-...@googlegroups.com
Ah, we were able to  solve the problem from the previous post: We had to modify the Azkban startup script to include more directories in the Azkban classpath construction. 

We're now able to start Azkban (with the build and push plugin). Unfortunately, we're still seeing the CNF when trying to run a test build and push job: 
INFO Class name voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob 25-06-2015 14:58:42 PDT build_and_push_avro ERROR - Exception in thread "main" java.lang.ClassNotFoundException: voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob

This is even stranger, since I can see the voldemort* jars (and other dependancies) in the Azkban classpath (which it echos on startup). 



Jeremiah Edwards

unread,
Jun 25, 2015, 6:31:01 PM6/25/15
to project-...@googlegroups.com, azkab...@googlegroups.com
Thanks! 

We were able to circumvent the CNF (full stack trace below), by modifying the Azkaban startup script to include the plugins/jobtypes/voldemortBuildAndPush/lib/ directory. 

>> Also, what version of Azkaban and Voldemort are you running? 
We're running azkaban-solo-2.5.0 together with voldemort-1.9.17. 

With the change in the startup script,  Azkaban does starts. 

Unfortunately, we're still seeing the build and push job itself fail with the same CNF: 


Stacktrace from BuildAndPush job execution: 

25-06-2015 15:28:38 PDT build_and_push_avro INFO - INFO Running job build_and_push_avro
25-06-2015 15:28:38 PDT build_and_push_avro INFO - INFO Class name voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob
25-06-2015 15:28:38 PDT build_and_push_avro ERROR - Exception in thread "main" java.lang.ClassNotFoundException: voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob
25-06-2015 15:28:38 PDT build_and_push_avro ERROR - 	at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
25-06-2015 15:28:38 PDT build_and_push_avro ERROR - 	at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
25-06-2015 15:28:38 PDT build_and_push_avro ERROR - 	at java.security.AccessController.doPrivileged(Native Method)
25-06-2015 15:28:38 PDT build_and_push_avro ERROR - 	at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
25-06-2015 15:28:38 PDT build_and_push_avro ERROR - 	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
25-06-2015 15:28:38 PDT build_and_push_avro ERROR - 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
25-06-2015 15:28:38 PDT build_and_push_avro ERROR - 	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
25-06-2015 15:28:38 PDT build_and_push_avro ERROR - 	at azkaban.jobtype.JavaJobRunnerMain.getObject(JavaJobRunnerMain.java:269)
25-06-2015 15:28:38 PDT build_and_push_avro ERROR - 	at azkaban.jobtype.JavaJobRunnerMain.(JavaJobRunnerMain.java:107)
25-06-2015 15:28:38 PDT build_and_push_avro ERROR - 	at azkaban.jobtype.JavaJobRunnerMain.main(JavaJobRunnerMain.java:72




Full  stacktrace from Azkaban Startup: 

2015/06/25 15:18:45.152 -0700 INFO [JobTypeManager] [Azkaban] Adding type override resources.
2015/06/25 15:18:45.153 -0700 ERROR [JobTypeManager] [Azkaban] Failed to load jobtype voldemortBuildAndPushjava.lang.ClassNotFoundException: azkaban.jobtype.HadoopJavaJob
azkaban.jobtype.JobTypeManagerException: azkaban.jobtype.JobTypeManagerException: java.lang.ClassNotFoundException: azkaban.jobtype.HadoopJavaJob
at azkaban.jobtype.JobTypeManager.loadPluginJobTypes(JobTypeManager.java:147)
at azkaban.jobtype.JobTypeManager.<init>(JobTypeManager.java:74)
at azkaban.execapp.FlowRunnerManager.<init>(FlowRunnerManager.java:141)
at azkaban.execapp.AzkabanExecutorServer.<init>(AzkabanExecutorServer.java:100)
at azkaban.execapp.AzkabanExecutorServer.main(AzkabanExecutorServer.java:184)
at azkaban.webapp.AzkabanSingleServer.main(AzkabanSingleServer.java:48)
Caused by: azkaban.jobtype.JobTypeManagerException: java.lang.ClassNotFoundException: azkaban.jobtype.HadoopJavaJob
at azkaban.jobtype.JobTypeManager.loadJob(JobTypeManager.java:314)
at azkaban.jobtype.JobTypeManager.loadPluginJobTypes(JobTypeManager.java:142)
... 5 more
Caused by: java.lang.ClassNotFoundException: azkaban.jobtype.HadoopJavaJob
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at azkaban.jobtype.JobTypeManager.loadJob(JobTypeManager.java:310)
... 6 more
2015/06/25 15:18:45.153 -0700 INFO [JobTypeManager] [Azkaban] Plugin jobtypes failed to load. azkaban.jobtype.JobTypeManagerException: azkaban.jobtype.JobTypeManagerException: java.lang.ClassNotFoundException: azkaban.jobtype.HadoopJavaJob
Exception in thread "main" azkaban.jobtype.JobTypeManagerException: azkaban.jobtype.JobTypeManagerException: azkaban.jobtype.JobTypeManagerException: azkaban.jobtype.JobTypeManagerException: java.lang.ClassNotFoundException: azkaban.jobtype.HadoopJavaJob
at azkaban.jobtype.JobTypeManager.<init>(JobTypeManager.java:78)
at azkaban.execapp.FlowRunnerManager.<init>(FlowRunnerManager.java:141)
at azkaban.execapp.AzkabanExecutorServer.<init>(AzkabanExecutorServer.java:100)
at azkaban.execapp.AzkabanExecutorServer.main(AzkabanExecutorServer.java:184)
at azkaban.webapp.AzkabanSingleServer.main(AzkabanSingleServer.java:48)
Caused by: azkaban.jobtype.JobTypeManagerException: azkaban.jobtype.JobTypeManagerException: azkaban.jobtype.JobTypeManagerException: java.lang.ClassNotFoundException: azkaban.jobtype.HadoopJavaJob
at azkaban.jobtype.JobTypeManager.loadPluginJobTypes(JobTypeManager.java:153)
at azkaban.jobtype.JobTypeManager.<init>(JobTypeManager.java:74)
... 4 more
Caused by: azkaban.jobtype.JobTypeManagerException: azkaban.jobtype.JobTypeManagerException: java.lang.ClassNotFoundException: azkaban.jobtype.HadoopJavaJob
at azkaban.jobtype.JobTypeManager.loadPluginJobTypes(JobTypeManager.java:147)
... 5 more
Caused by: azkaban.jobtype.JobTypeManagerException: java.lang.ClassNotFoundException: azkaban.jobtype.HadoopJavaJob
at azkaban.jobtype.JobTypeManager.loadJob(JobTypeManager.java:314)
at azkaban.jobtype.JobTypeManager.loadPluginJobTypes(JobTypeManager.java:142)
... 5 more
Caused by: java.lang.ClassNotFoundException: azkaban.jobtype.HadoopJavaJob
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at azkaban.jobtype.JobTypeManager.loadJob(JobTypeManager.java:310)
... 6 more




You received this message because you are subscribed to a topic in the Google Groups "project-voldemort" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/project-voldemort/GuIP5D2zmDg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to project-voldem...@googlegroups.com.

Félix GV

unread,
Jun 25, 2015, 6:53:17 PM6/25/15
to azkab...@googlegroups.com, project-...@googlegroups.com
Hi Jeremiah,

I just want to make sure I understand correctly. Are you saying you do NOT see the stacktrace during Azkaban start up anymore? And so you only see the one during BnP job execution? Or do you now see both?

-F
You received this message because you are subscribed to the Google Groups "azkaban" group.
To unsubscribe from this group and stop receiving emails from it, send an email to azkaban-dev...@googlegroups.com.

Jeremiah Edwards

unread,
Jun 26, 2015, 2:32:23 PM6/26/15
to project-...@googlegroups.com, azkab...@googlegroups.com
Hey Felix! 

>>  Are you saying you do NOT see the stacktrace during Azkaban start up anymore?

That's right. The problem was that the Azkban solo server startup script had wasn't finding all of the plugin jars. A small tweak fixed this. 

I'm also happy to report that by altering our specific build and push .job file we were able to get it to run. I'm still working to resolve one error regarding the Store schema, 

 java.lang.RuntimeException: Your store definition does not match the store definition that is already in the cluster. Tried to resolve identical schemas between local and remote, but failed.


But this appears to be a configuration problem with my specific job/Store setup. 

For the record, here is the  full plugin setup which got the job running: 

$ ls plugins/jobtypes/
VoldemortBuildAndPushJob hadoopJava
common.properties java
commonprivate.properties package.version
 
And everything in the VoldemortBuildAndPushJob folder was as you described above: We bundled the voldemort and voldemort-contrib jars together with all jars from the project-voldemort lib directory. 

I'll start a different thread if we can't resolve the problem with our specific job, but I think we've resolved the problems with the initial Azkaban setup. I'll open a PR on that project with the change we had to make to the solo server startup script, and pursue any other changes with that team. 

Thanks for your help!!


To unsubscribe from this group and stop receiving emails from it, send an email to project-voldemort+unsubscribe@googlegroups.com.

--
--
Félix

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.
Visit this group at http://groups.google.com/group/project-voldemort.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldemort+unsubscribe@googlegroups.com.

--

Félix GV

unread,
Jun 26, 2015, 2:49:44 PM6/26/15
to project-...@googlegroups.com, azkab...@googlegroups.com
Sounds great, congrats (: !
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.
Visit this group at http://groups.google.com/group/project-voldemort.
For more options, visit https://groups.google.com/d/optout.

--
--
Félix

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.
Visit this group at http://groups.google.com/group/project-voldemort.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.
Visit this group at http://groups.google.com/group/project-voldemort.
For more options, visit https://groups.google.com/d/optout.



--
--
 
Felix GV
Senior Software Engineer
Data Infrastructure
LinkedIn
 
--
--
Félix

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.
Visit this group at http://groups.google.com/group/project-voldemort.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "project-voldemort" group.
To unsubscribe from this group and all its topics, send an email to project-voldem...@googlegroups.com.
You received this message because you are subscribed to the Google Groups "azkaban" group.
To unsubscribe from this group and stop receiving emails from it, send an email to azkaban-dev...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Reply all
Reply to author
Forward
0 new messages