bnp job with latest 1.10.15 failing

78 views
Skip to first unread message

Apurva

unread,
May 24, 2016, 6:50:36 PM5/24/16
to project-voldemort
Hi all,

I am trying to run the run-bhp.sh with following config:

config.txt:

type=java
job.class=voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob
hadoop.job.ugi=hadoop
build.input.path=/user/hadoop/dir2/
build.output.dir=/user/hadoop/dir2-1.10/
push.store.name=wordcount
push.cluster=tcp://host:6666
push.store.description="test store"
push.store.owners=mye...@myworkplace.com
build.replication.factor=1



Input file content:

0 d001  Marketing

16 d002  Finance

30 d003 Human Resources

51 d004 Production

67 d005 Development

84 d006 Quality Management

108 d007 Sales

119 d008 Research

133 d009 Customer Service



Exceptions encountered:

~/voldemort-1.10.15 $ ./bin/run-bnp.sh config.txt 
Voldemort version detected: 1.10.15
Executing BnP with:
config_file : config.txt
hadoop_config_path : /etc/hadoop/conf/
16/05/24 15:29:12 INFO azkaban.VoldemortBuildAndPushJobRunner: Extracting config properties out of: config.txt
16/05/24 15:29:12 INFO shell-job: Job props:
{
build.input.path: /user/hadoop/dir2/
build.output.dir: /user/hadoop/dir2-1.10/
build.replication.factor: 1
hadoop.job.ugi: hadoop
job.class: voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob
push.cluster: tcp://host:6666
push.store.description: "test store"
push.store.name: wordcount
push.store.owners: mye...@myworkplace.com
type: java
}
16/05/24 15:29:13 INFO client.AbstractStoreClientFactory: Client zone-id [-1] Attempting to get raw store [voldsys$_metadata_version_persistence] 
16/05/24 15:29:13 WARN routed.PipelineRoutedStore: Client Zone is not specified. Default to Zone 0. The servers could be in a remote zone
16/05/24 15:29:13 INFO client.AbstractStoreClientFactory: Client zone-id [-1] Attempting to get raw store [voldsys$_store_quotas] 
16/05/24 15:29:13 WARN routed.PipelineRoutedStore: Client Zone is not specified. Default to Zone 0. The servers could be in a remote zone
16/05/24 15:29:13 INFO shell-job: voldemort.fetcher.protocol is set to : webhdfs
16/05/24 15:29:13 INFO shell-job: voldemort.fetcher.port is set to : 50070
16/05/24 15:29:13 INFO shell-job: Build and Push Job constructed for 1 cluster(s).
16/05/24 15:29:13 INFO shell-job: Requesting block-level compression codec expected by Server
16/05/24 15:29:13 INFO shell-job: Server responded with block-level compression codecs: [ NO_CODEC ]
16/05/24 15:29:13 INFO shell-job: Using no block-level compression
16/05/24 15:29:13 ERROR admin.AdminClient: Node host:6666 [id 1] responded with an error to our GetConfigRequest for key '': Metadata Key passed '' is not handled yet
16/05/24 15:29:13 ERROR admin.AdminClient: Node host:6666 [id 1] does not contain config key 'readonly.build.primary.replicas.only'.
16/05/24 15:29:14 ERROR admin.AdminClient: Node host2:6666 [id 2] responded with an error to our GetConfigRequest for key '': Metadata Key passed '' is not handled yet
16/05/24 15:29:14 ERROR admin.AdminClient: Node host2:6666 [id 2] does not contain config key 'readonly.build.primary.replicas.only'.
16/05/24 15:29:14 INFO shell-job: 'build.primary.replicas.only' is not supported on this destination cluster: tcp://host:6666
16/05/24 15:29:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/05/24 15:29:15 ERROR utils.HadoopUtils: failed to get JSON metadata from path:hdfs://host:9000/user/hadoop/dir2/part-m-00000
16/05/24 15:29:15 ERROR utils.HadoopUtils: failed to get JSON metadata from path:/user/hadoop/dir2
16/05/24 15:29:15 INFO shell-job: Closing AdminClient with BootStrapUrls: [tcp://host:6666]
16/05/24 15:29:15 ERROR azkaban.VoldemortBuildAndPushJobRunner: Exception while running BnP job!
voldemort.VoldemortException: An exception occurred during Build and Push !!
at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob.run(VoldemortBuildAndPushJob.java:631)
at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJobRunner.main(VoldemortBuildAndPushJobRunner.java:34)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.IllegalArgumentException: No JSON schema found on file hdfs://host:9000/user/hadoop/dir2/part-m-00000
at voldemort.store.readonly.mr.utils.HadoopUtils.getSchemaFromPath(HadoopUtils.java:166)
at voldemort.store.readonly.mr.utils.HadoopUtils.getSchemaFromPath(HadoopUtils.java:98)
at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob.getInputPathJsonSchema(VoldemortBuildAndPushJob.java:800)
at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob.verifyOrAddJsonStore(VoldemortBuildAndPushJob.java:841)
at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob.run(VoldemortBuildAndPushJob.java:563)
... 7 more
Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: No JSON schema found on file hdfs://host:9000/user/hadoop/dir2/part-m-00000
at voldemort.store.readonly.mr.utils.HadoopUtils.getSchemaFromPath(HadoopUtils.java:166)
at voldemort.store.readonly.mr.utils.HadoopUtils.getSchemaFromPath(HadoopUtils.java:143)
... 11 more
Caused by: java.lang.IllegalArgumentException: No JSON schema found on file hdfs://host:9000/user/hadoop/dir2/part-m-00000
at voldemort.store.readonly.mr.utils.HadoopUtils.getSchemaFromPath(HadoopUtils.java:121)
... 12 more
BnP run script finished!

Could you please help me out with this?

Thanks,
Apurva

Arunachalam

unread,
May 24, 2016, 9:01:56 PM5/24/16
to project-...@googlegroups.com
What is the store type ? Is it Avro or JSON data ?

Thanks,
Arun.

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.
Visit this group at https://groups.google.com/group/project-voldemort.
For more options, visit https://groups.google.com/d/optout.

Apurva

unread,
May 25, 2016, 2:11:11 PM5/25/16
to project-voldemort
Hi Arun,

We used string as key and value but I tried the same config for 1.6.9 as client and it worked.

<stores>
<store>
                <name>wordcount</name>

                <persistence>read-only</persistence>

                <routing>client</routing>
                <routing-strategy>zone-routing</routing-strategy>

                <replication-factor>2</replication-factor>

                <required-reads>1</required-reads>
                <required-writes>1</required-writes>

                <zone-replication-factor>
                  <replication-factor zone-id="0">1</replication-factor>
        <replication-factor zone-id="1">1</replication-factor>
                </zone-replication-factor>
                <zone-count-reads>0</zone-count-reads>
                <zone-count-writes>0</zone-count-writes>

                <enable-hinted-handoff>true</enable-hinted-handoff>
                <hinted-handoff-strategy>proximity-handoff</hinted-handoff-strategy>
                <hint-preflist-size>1</hint-preflist-size>

                <key-serializer>
                  <type>string</type>
                </key-serializer>

                <value-serializer>
                  <type>string</type>
                </value-serializer>

        </store>
</stores>

Thanks,
Apurva

Arunachalam

unread,
May 26, 2016, 1:44:21 AM5/26/16
to project-...@googlegroups.com
Interesting, I am not aware of any non backward compatible changes.

One thing that I can see is new BuildAndPush expects a directory to contain only the files you want to push. 

Also what type of file you have in Hadoop ? Avro or json ? With the 1.6.9 push were you able to query the data you pushed. 

Thanks,
Arun.

Apurva

unread,
May 26, 2016, 5:58:24 PM5/26/16
to project-voldemort
Hi Arun,

I have tried using both as input

1) txt having the content:

d001  Marketing

d002  Finance

d003 Human Resources

d004 Production

d005 Development

d006 Quality Management

d007 Sales

d008 Research

d009 Customer Service



2)its reduced file in hadoop having content:


0 d001  Marketing

16 d002  Finance

30 d003 Human Resources

51 d004 Production

67 d005 Development

84 d006 Quality Management

108 d007 Sales

119 d008 Research

133 d009 Customer Service


Thanks,

Apurva

Arunachalam

unread,
May 26, 2016, 5:59:46 PM5/26/16
to project-...@googlegroups.com
Shouldn't it be json formatted data for the input ?


Apurva Thomas

unread,
May 31, 2016, 6:49:04 PM5/31/16
to project-...@googlegroups.com
Hi Arun,

I have tried json input as well and face the same issue. Could you provide a sample input on the project-voldemort website which would be useful? 

Thanks,
Apurva

--
You received this message because you are subscribed to a topic in the Google Groups "project-voldemort" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/project-voldemort/3mXtD_rtqxY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to project-voldem...@googlegroups.com.

Arunachalam

unread,
Jun 6, 2016, 1:59:54 AM6/6/16
to project-...@googlegroups.com
Apurva,
    I  believe the problem is you are using the Store as string Serialization. As of the latest release build and push only supports Avro or Json.

I recommend you set it to avro-generic-versioned String and see if the Build and Push job works.

Also, I am not very familiar with the hadoop setup and I am used to run the jobs from Azkaban. So you need to educate me where exactly you are running into problem with detailed steps for me to help you.

Thanks,
Arun.

vaibhav srivastava

unread,
Jun 15, 2016, 7:52:53 PM6/15/16
to project-voldemort
Hi Arun,

Can you give example of json store xml , how to define this . We are not using avro , since existing cluster have string as key and value serializer or protobuf as serializer (1.6.9) and now we want to use 1.10 client with bnp without changing current store defination. 

Arunachalam

unread,
Jun 16, 2016, 1:46:24 AM6/16/16
to project-...@googlegroups.com
Voldemort client 1.10 is backward compatible with Voldemort Server 1.6.9 

Voldemort BnP may or may not be compatible with Voldemort Server 1.6.9

Can you post the store definition you have and where it fails ?

Thanks,
Arun.

Félix GV

unread,
Jun 16, 2016, 4:00:30 PM6/16/16
to project-voldemort
We try our best to maintain compatibility between servers and BnP jobs, but not all code paths are exercised at Linkedin, so we may have missed something. Furthermore, we usually are only a handful of patch versions behind between BnP and servers, so we do not actively test compatibility across several minor versions (like between 1.6.x and 1.10.x).

Hopefully we can narrow down the issue and the fix or workaround isn't too complicated.

-F



--
--
Félix

Apurva

unread,
Jun 16, 2016, 6:46:33 PM6/16/16
to project-voldemort
Hi Arun/ Felix,

I am running the following configs with client(1.10.17)  and server(1.10.3) both being 1.10+ and still encountering the following exceptions:

-apurvathomas@host1:~/voldemort $ ./bin/run-bnp.sh config.txt
Voldemort version detected: 1.10.17
Executing BnP with:
config_file : config.txt
hadoop_config_path : /etc/hadoop/conf/
16/06/16 15:31:32 INFO azkaban.VoldemortBuildAndPushJobRunner: Extracting config properties out of: config.txt
16/06/16 15:31:32 INFO shell-job: Job props:
{
avro.key.field: student_id
avro.value.field: subject_id
azkaban.should.proxy: true
build.input.path: /user/hadoop/dir9
build.output.dir: /tmp
build.replication.factor: 1
build.type.avro: true
job.class: voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob
push.cluster: tcp://localhost:6666
push.store.description: "Testing avro build and push"
push.store.name: wordcount-avro
push.store.owners: mye...@myworkplace.com
type: java
user.to.proxy: hadoop
}
16/06/16 15:31:33 INFO client.AbstractStoreClientFactory: Client zone-id [-1] Attempting to get raw store [voldsys$_metadata_version_persistence]
16/06/16 15:31:33 WARN routed.PipelineRoutedStore: Client Zone is not specified. Default to Zone 0. The servers could be in a remote zone
16/06/16 15:31:33 INFO client.AbstractStoreClientFactory: Client zone-id [-1] Attempting to get raw store [voldsys$_store_quotas]
16/06/16 15:31:33 WARN routed.PipelineRoutedStore: Client Zone is not specified. Default to Zone 0. The servers could be in a remote zone
16/06/16 15:31:33 INFO shell-job: voldemort.fetcher.protocol is set to : webhdfs
16/06/16 15:31:33 INFO shell-job: voldemort.fetcher.port is set to : 50070
16/06/16 15:31:33 INFO shell-job: Build and Push Job constructed for 1 cluster(s).
16/06/16 15:31:33 INFO shell-job: Requesting block-level compression codec expected by Server
16/06/16 15:31:34 INFO shell-job: Server responded with block-level compression codecs: [ NO_CODEC ]
16/06/16 15:31:34 INFO shell-job: Using no block-level compression
16/06/16 15:31:34 ERROR admin.AdminClient: Node host1:6666 [id 1] responded with an error to our GetConfigRequest for key '': Metadata Key passed '' is not handled yet
16/06/16 15:31:34 ERROR admin.AdminClient: Node host1:6666 [id 1] does not contain config key 'readonly.build.primary.replicas.only'.
16/06/16 15:31:35 ERROR admin.AdminClient: Node host2:6666 [id 2] responded with an error to our GetConfigRequest for key '': Metadata Key passed '' is not handled yet
16/06/16 15:31:35 ERROR admin.AdminClient: Node host2:6666 [id 2] does not contain config key 'readonly.build.primary.replicas.only'.
16/06/16 15:31:35 INFO shell-job: 'build.primary.replicas.only' is not supported on this destination cluster: tcp://localhost:6666
16/06/16 15:31:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/06/16 15:31:36 INFO shell-job: Verifying store against cluster URL: tcp://localhost:6666
<store>
<name>wordcount-avro</name>
<persistence>read-only</persistence>
<description>"Testing avro build and push"</description>
<owners>mye...@myworkplace.com</owners>
<routing>client</routing>
<replication-factor>1</replication-factor>
<required-reads>1</required-reads>
<required-writes>1</required-writes>
<key-serializer>
<type>avro-generic</type>
<schema-info version="0">"int"</schema-info>
</key-serializer>
<value-serializer>
<type>avro-generic</type>
<schema-info version="0">"int"</schema-info>
</value-serializer>
</store>
16/06/16 15:31:36 INFO shell-job: Closing AdminClient with BootStrapUrls: [tcp://localhost:6666]
16/06/16 15:31:36 ERROR azkaban.VoldemortBuildAndPushJobRunner: Exception while running BnP job!
voldemort.VoldemortException: An exception occurred during Build and Push !!
at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob.run(VoldemortBuildAndPushJob.java:634)
at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJobRunner.main(VoldemortBuildAndPushJobRunner.java:34)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.NullPointerException
at voldemort.store.StoreDefinition.diff(StoreDefinition.java:556)
at voldemort.client.protocol.admin.AdminClient$StoreManagementOperations.diffMessage(AdminClient.java:2083)
at voldemort.client.protocol.admin.AdminClient$StoreManagementOperations.verifyOrAddStore(AdminClient.java:2031)
at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob.verifyOrAddStore(VoldemortBuildAndPushJob.java:944)
at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob.verifyOrAddAvroStore(VoldemortBuildAndPushJob.java:916)
at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob.run(VoldemortBuildAndPushJob.java:563)
... 7 more
BnP run script finished!

Thanks,
Apurva
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldemort+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldemort+unsubscribe@googlegroups.com.




--
--
Félix

Félix GV

unread,
Jun 16, 2016, 7:28:27 PM6/16/16
to project-voldemort
On Thu, Jun 16, 2016 at 3:46 PM, Apurva <apurva...@gmail.com> wrote:
StoreDefinition.diff

This function is vulnerable to NPE. It has never been tested with some rarer combinations of store definition fields (for example, getZoneReplicationFactor() and getHintedHandoffStrategyType() within the diff() function are chaining another function call afterwards which can result in an NPE).

Do you want to try tweaking this code to guard against nulls and send us a PR?

-F



--
--
Félix

Arunachalam

unread,
Jun 16, 2016, 7:29:21 PM6/16/16
to project-...@googlegroups.com
Apurva,
       As we are discussing in this thread, there are 2 problems here.

Voldemort RO never supported zoned cluster but it did not explicitly fail in the older builds as well. But from what I can understand 1.6.9 it used to work. 

1) But when we added strict condition checking it broke. There are 2 problems with this, it introduced a subtle NullPointerException . This is fixed here.

2) After you apply the previous commit, you will still run into the error because of this issue. 
Instead of throwing the exception, break out of the loop and let it continue. Also use the node by node build ( This is the default, unless you enable it in the config).

Thanks,
Arun.

Apurva
--
Félix

Apurva

unread,
Jun 17, 2016, 6:47:46 PM6/17/16
to project-voldemort
Hi Arun,

Thanks for looking into it, by removing a zoned config it worked. However, in the previous versions we used zoned cluster and it worked fine. 

Also, when you say : "Also use the node by node build ( This is the default, unless you enable it in the config)." By "it" you mean? Is it property you are referring to? If so, what property is it?

Thanks,
Apurva
Apurva
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldemort+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldemort+unsubscribe@googlegroups.com.




--
--
Félix

Arunachalam

unread,
Jun 19, 2016, 2:25:43 AM6/19/16
to project-...@googlegroups.com
Apurva,
   1) About the Zoned Read Only stores, at LinkedIn we did not intend the Read Only stores to be used with the Zoned Config. So if it used to work, it must have been a mere coincidence. So when we worked on new improvements, we did not consider this.

   2) About the build primary replicas only , you can read this to understand more. https://www.linkedin.com/pulse/slashing-voldemorts-hadoop-resource-usage-felix-gv

I thought the feature was disabled by default, but I am wrong. The feature is enabled by default. https://github.com/voldemort/voldemort/blob/master/src/java/voldemort/server/VoldemortConfig.java#L352 

I don't know whether the feature works with Zoned Stores as well. It could work by coincidence or may be not.

In either case, if the community wants to make this work, I recommend some one to validate the scenario, add unit/integration tests and send us a pull request. That way it can be taken care of on future changes as well.

Thanks,
Arun.


Thanks,
Apurva
--
Félix

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.
Visit this group at https://groups.google.com/group/project-voldemort.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages