The above file was first copied inside the HDFS and was picked up by the push script from there. Based on the error it seems like that input cannot be a JSON file but something called a JSON sequence file.
Would it possible to point me in the right direction on how to do the conversion from a JSON array of KV pairs to the required input format of JSON sequence file?
Here is my config file:
type=java
job.class=voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob
hadoop.job.ugi=anagpal,hadoop
build.input.path=/sid.json
build.output.dir=/tmp/build_output/
push.store.name=anagpal-test-old
push.cluster=tcp://localhost:6666
push.store.description="test store"
push.store.owners=mye...@myworkplace.com
build.replication.factor=1
Here is the command output:
./bin/run-bnp.sh sid.config /home/hdoop/hadoop-3.2.2/etc/
2021-01-24 08:59:23,305 INFO azkaban.VoldemortBuildAndPushJobRunner: Extracting config properties out of: sid.config
2021-01-24 08:59:23,341 INFO shell-job: Job props:
{
build.input.path: /sid.json
build.output.dir: /tmp/build_output/
build.replication.factor: 1
hadoop.job.ugi: anagpal,hadoop
job.class: voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob
push.cluster: tcp://localhost:6666
push.store.description: "test store"
push.store.name: anagpal-test-old
push.store.owners: mye...@myworkplace.com
type: java
}
2021-01-24 08:59:23,694 INFO client.AbstractStoreClientFactory: Client zone-id [-1] Attempting to get raw store [voldsys$_metadata_version_persistence]
2021-01-24 08:59:23,723 INFO shell-job: voldemort.fetcher.protocol is set to : webhdfs
2021-01-24 08:59:23,723 INFO shell-job: voldemort.fetcher.port is set to : 50070
2021-01-24 08:59:23,724 INFO shell-job: Build and Push Job will run store verification in parallel, thread num: 20
2021-01-24 08:59:23,724 INFO shell-job: Build and Push Job constructed for 1 cluster(s).
2021-01-24 08:59:23,725 INFO shell-job: Requesting block-level compression codec expected by Server
2021-01-24 08:59:23,959 INFO shell-job: Server responded with block-level compression codecs: [ NO_CODEC ]
2021-01-24 08:59:23,959 INFO shell-job: Using no block-level compression
2021-01-24 08:59:24,714 ERROR utils.HadoopUtils: failed to get JSON metadata from path:/sid.json
2021-01-24 08:59:24,714 INFO shell-job: Closing AdminClient with BootStrapUrls: [tcp://localhost:6666]
2021-01-24 08:59:24,715 ERROR azkaban.VoldemortBuildAndPushJobRunner: Exception while running BnP job!
voldemort.VoldemortException: An exception occurred during Build and Push !!
at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob.run(VoldemortBuildAndPushJob.java:690)
at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJobRunner.main(VoldemortBuildAndPushJobRunner.java:34)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: hdfs://127.0.0.1:9000/sid.json not a SequenceFile
at voldemort.store.readonly.mr.utils.HadoopUtils.getSchemaFromPath(HadoopUtils.java:166)
at voldemort.store.readonly.mr.utils.HadoopUtils.getSchemaFromPath(HadoopUtils.java:98)
at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob.getInputPathJsonSchema(VoldemortBuildAndPushJob.java:896)
at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob.verifyOrAddJsonStore(VoldemortBuildAndPushJob.java:937)
at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob.run(VoldemortBuildAndPushJob.java:617)
... 7 more
Caused by: java.lang.RuntimeException: java.io.IOException: hdfs://127.0.0.1:9000/sid.json not a SequenceFile
at voldemort.store.readonly.mr.utils.HadoopUtils.getMetadataFromSequenceFile(HadoopUtils.java:93)
at voldemort.store.readonly.mr.utils.HadoopUtils.getSchemaFromPath(HadoopUtils.java:119)
... 11 more
Caused by: java.io.IOException: hdfs://127.0.0.1:9000/sid.json not a SequenceFile
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1970)
at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1923)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1872)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1886)
at voldemort.store.readonly.mr.utils.HadoopUtils.getMetadataFromSequenceFile(HadoopUtils.java:83)
-Sid