Re: [project-voldemort] bnp job fails

65 views
Skip to first unread message
Message has been deleted

Arunachalam

unread,
Jun 9, 2016, 10:46:40 AM6/9/16
to project-voldemort

Can you try the latest version of BNP ?


On Thu, Jun 9, 2016, 4:00 AM Janardhan BG <janard...@gmail.com> wrote:
Hi,

When i run bnp jobs with voldemort version 1.10.3 fails with following error.

16/06/09 03:54:19 INFO azkaban.VoldemortBuildAndPushJobRunner: Extracting config properties out of: /Users/janardhang/config
16/06/09 03:54:19 INFO shell-job: Job props.toString(): {avro.value.field: subject_id, push.store.owners: mye...@myworkplace.com, push.store.name: wordcount-json, build.input.path: /user/hadoop/dir9, push.store.description: "Testing avro build and push", job.class: voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob, avro.key.field: student_id, build.output.dir: /tmp, build.type.avro: true, build.replication.factor: 1, push.cluster: tcp://localhost:6099, type: java, }
16/06/09 03:54:20 INFO client.AbstractStoreClientFactory: Client zone-id [-1] Attempting to get raw store [voldsys$_metadata_version_persistence] 
16/06/09 03:54:20 WARN routed.PipelineRoutedStore: Client Zone is not specified. Default to Zone 0. The servers could be in a remote zone
16/06/09 03:54:20 INFO client.AbstractStoreClientFactory: Client zone-id [-1] Attempting to get raw store [voldsys$_store_quotas] 
16/06/09 03:54:20 WARN routed.PipelineRoutedStore: Client Zone is not specified. Default to Zone 0. The servers could be in a remote zone
16/06/09 03:54:20 INFO shell-job: voldemort.fetcher.protocol is set to : webhdfs
16/06/09 03:54:20 INFO shell-job: voldemort.fetcher.port is set to : 50070
16/06/09 03:54:20 INFO shell-job: Build and Push Job constructed for 1 cluster(s).
16/06/09 03:54:20 INFO shell-job: Requesting block-level compression codec expected by Server
16/06/09 03:54:20 INFO shell-job: Server responded with block-level compression codecs: [ NO_CODEC ]
16/06/09 03:54:20 INFO shell-job: Using no block-level compression
16/06/09 03:54:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/06/09 03:54:22 INFO shell-job: Verifying store against cluster URL: tcp://localhost:6099
<store>
<name>wordcount-json</name>
<persistence>read-only</persistence>
<description>"Testing avro build and push"</description>
<owners>mye...@myworkplace.com</owners>
<routing>client</routing>
<replication-factor>1</replication-factor>
<required-reads>1</required-reads>
<required-writes>1</required-writes>
<key-serializer>
<type>avro-generic</type>
<schema-info version="0">"int"</schema-info>
</key-serializer>
<value-serializer>
<type>avro-generic</type>
<schema-info version="0">"int"</schema-info>
</value-serializer>
</store>
16/06/09 03:54:22 INFO shell-job: Closing AdminClient with mainBootstrapUrl: tcp://localhost:6099
16/06/09 03:54:22 ERROR azkaban.VoldemortBuildAndPushJobRunner: Exception while running BnP job!
voldemort.VoldemortException: An exception occurred during Build and Push !!
at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob.run(VoldemortBuildAndPushJob.java:553)
at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJobRunner.main(VoldemortBuildAndPushJobRunner.java:34)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.NullPointerException
at voldemort.store.StoreDefinition.diff(StoreDefinition.java:537)
at voldemort.client.protocol.admin.AdminClient$StoreManagementOperations.diffMessage(AdminClient.java:1834)
at voldemort.client.protocol.admin.AdminClient$StoreManagementOperations.verifyOrAddStore(AdminClient.java:1789)
at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob.verifyOrAddStore(VoldemortBuildAndPushJob.java:873)
at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob.verifyOrAddAvroStore(VoldemortBuildAndPushJob.java:845)
at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob.run(VoldemortBuildAndPushJob.java:482)
... 7 more
BnP run script finished!

Can some help on this error.

Regards,
Janardhan B G

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.
Visit this group at https://groups.google.com/group/project-voldemort.
For more options, visit https://groups.google.com/d/optout.
Message has been deleted

Arunachalam

unread,
Jun 9, 2016, 1:07:49 PM6/9/16
to project-...@googlegroups.com
What is the definition of the store on the Server side ? From the log messages, it seems like you are pushing a Read Only store to Zoned Store. Voldemort Read Only stores does not work with zoned clusters.

Thanks,
Arun.

On Thu, Jun 9, 2016 at 9:54 AM, Janardhan BG <janard...@gmail.com> wrote:
Hi Arun,

With latest version also finding error.

Voldemort version detected: 1.10.17

Executing BnP with:

config_file : /Users/janardhang/config

hadoop_config_path : /usr/local/hadoop/etc/hadoop/

16/06/09 09:51:36 INFO azkaban.VoldemortBuildAndPushJobRunner: Extracting config properties out of: /Users/janardhang/config

16/06/09 09:51:36 INFO shell-job: Job props:

{

avro.key.field: student_id

avro.value.field: subject_id

build.input.path: /user/hadoop/dir9

build.output.dir: /tmp

build.replication.factor: 1

build.type.avro: true

job.class: voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob

push.cluster: tcp://localhost:6099

push.store.description: "Testing avro build and push"

push.store.name: wordcount-json

push.store.owners: mye...@myworkplace.com

type: java

}

16/06/09 09:51:37 INFO client.AbstractStoreClientFactory: Client zone-id [-1] Attempting to get raw store [voldsys$_metadata_version_persistence] 

16/06/09 09:51:37 WARN routed.PipelineRoutedStore: Client Zone is not specified. Default to Zone 0. The servers could be in a remote zone

16/06/09 09:51:37 INFO client.AbstractStoreClientFactory: Client zone-id [-1] Attempting to get raw store [voldsys$_store_quotas] 

16/06/09 09:51:37 WARN routed.PipelineRoutedStore: Client Zone is not specified. Default to Zone 0. The servers could be in a remote zone

16/06/09 09:51:37 INFO shell-job: voldemort.fetcher.protocol is set to : webhdfs

16/06/09 09:51:37 INFO shell-job: voldemort.fetcher.port is set to : 50070

16/06/09 09:51:37 INFO shell-job: Build and Push Job constructed for 1 cluster(s).

16/06/09 09:51:37 INFO shell-job: Requesting block-level compression codec expected by Server

16/06/09 09:51:37 INFO shell-job: Server responded with block-level compression codecs: [ NO_CODEC ]

16/06/09 09:51:37 INFO shell-job: Using no block-level compression

16/06/09 09:51:37 ERROR admin.AdminClient: Node vp21a00it-hpcx04094101:6099 [id 1] responded with an error to our GetConfigRequest for key '': Metadata Key passed '' is not handled yet

16/06/09 09:51:37 ERROR admin.AdminClient: Node vp21a00it-hpcx04094101:6099 [id 1] does not contain config key 'readonly.build.primary.replicas.only'.

16/06/09 09:51:38 ERROR admin.AdminClient: Node vp21a00it-hpcx04094201:6099 [id 2] responded with an error to our GetConfigRequest for key '': Metadata Key passed '' is not handled yet

16/06/09 09:51:38 ERROR admin.AdminClient: Node vp21a00it-hpcx04094201:6099 [id 2] does not contain config key 'readonly.build.primary.replicas.only'.

16/06/09 09:51:38 INFO shell-job: 'build.primary.replicas.only' is not supported on this destination cluster: tcp://localhost:6099

16/06/09 09:51:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

16/06/09 09:51:39 INFO shell-job: Verifying store against cluster URL: tcp://localhost:6099

<store>

<name>wordcount-json</name>

<persistence>read-only</persistence>

<description>"Testing avro build and push"</description>

<owners>mye...@myworkplace.com</owners>

<routing>client</routing>

<replication-factor>1</replication-factor>

<required-reads>1</required-reads>

<required-writes>1</required-writes>

<key-serializer>

<type>avro-generic</type>

<schema-info version="0">"int"</schema-info>

</key-serializer>

<value-serializer>

<type>avro-generic</type>

<schema-info version="0">"int"</schema-info>

</value-serializer>

</store>

16/06/09 09:51:39 INFO shell-job: Closing AdminClient with BootStrapUrls: [tcp://localhost:6099]

16/06/09 09:51:39 ERROR azkaban.VoldemortBuildAndPushJobRunner: Exception while running BnP job!

voldemort.VoldemortException: An exception occurred during Build and Push !!

at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob.run(VoldemortBuildAndPushJob.java:634)

at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJobRunner.main(VoldemortBuildAndPushJobRunner.java:34)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:497)

at org.apache.hadoop.util.RunJar.run(RunJar.java:221)

at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Caused by: java.lang.NullPointerException

at voldemort.store.StoreDefinition.diff(StoreDefinition.java:537)

at voldemort.client.protocol.admin.AdminClient$StoreManagementOperations.diffMessage(AdminClient.java:2083)

at voldemort.client.protocol.admin.AdminClient$StoreManagementOperations.verifyOrAddStore(AdminClient.java:2031)

at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob.verifyOrAddStore(VoldemortBuildAndPushJob.java:944)

at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob.verifyOrAddAvroStore(VoldemortBuildAndPushJob.java:916)

at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob.run(VoldemortBuildAndPushJob.java:563)

... 7 more

BnP run script finished!


Regards,
Janardhan B G

Félix GV

unread,
Jun 9, 2016, 1:55:18 PM6/9/16
to project-voldemort
We should maybe change this code in StoreDefinition.diff()


It is an irrelevant comparison, and it is vulnerable to NPEs.

That being said, Arun's point still stands that Voldemort RO is not intended to be used with zoned clusters...

--
Felix GV
Senior Software Engineer
Data Infrastructure
LinkedIn
 
f...@linkedin.com
linkedin.com/in/felixgv




--
--
Félix

Arunachalam

unread,
Jun 9, 2016, 2:03:33 PM6/9/16
to project-...@googlegroups.com
I have a pull request out to address that NPE.


That said, this masked the real diff exception, with NPE.

Thanks,
Arun.

vaibhav srivastava

unread,
Jun 15, 2016, 7:56:51 PM6/15/16
to project-voldemort
Hi arun,

I have a cluster with zones which is Read only and we were able to do that in 1.6.9 . Is this change related to 1.10 ? 

Arunachalam

unread,
Jun 16, 2016, 1:44:21 AM6/16/16
to project-...@googlegroups.com
Probably that was un-intentional. The main changes we did from 1.6.9 to 1.10.* was adding store validations, other than that if you use node by node build nothing changed.

So you can relax that check and see if it still works on the latest.

Thanks,
Arun.

vaibhav srivastava

unread,
Jun 16, 2016, 5:51:14 PM6/16/16
to project-voldemort
Hi Arun,

I was referring to you comment where you u said " Voldemort Read Only stores does not work with zoned clusters.". But ,  I have a live cluster which is working fine with zones and read only.  So ,  i am confused how is it working for us . Can you please clarify this. 

Arunachalam

unread,
Jun 16, 2016, 6:16:51 PM6/16/16
to project-...@googlegroups.com
Vaibhav,
     As Far As I can tell, Voldemort RO was designed for non-zoned clusters from the start. It could have worked as there was no explicit condition to prevent it. But one of the things we changed was added strict comparison ( In 1.6.9 you can change the schema/replication factor/compression in Hadoop build and push and corrupt the store). To Prevent this we added checks to make sure that the store definition matches. We know for sure this condition fails the zoned store push.

So I would recommend you to try the following, relax the store equality check 


Instead of throwing the exception, break out of the loop and let it continue. Also use the node by node build ( This is the default, unless you enable it in the config).

See if this change works for you. 

Thanks,
Arun.

Félix GV

unread,
Jun 16, 2016, 6:22:16 PM6/16/16
to project-voldemort
On Thu, Jun 16, 2016 at 3:16 PM, Arunachalam <arunac...@gmail.com> wrote:
Also use the node by node build ( This is the default, unless you enable it in the config).

Node by node build is the what we fall back to automatically if the server does not support "build.primary.replicas.only", but if it does support it, then the default is to enable it. So there should not need to be any config change in that regard.

-F



--
--
Félix
Reply all
Reply to author
Forward
0 new messages