Input file format Error for Build and Push

108 views
Skip to first unread message

Sidhartha Agrawal

unread,
Jan 24, 2021, 12:03:19 PM1/24/21
to project-voldemort
Hi,
  • I am trying out the steps here[https://www.project-voldemort.com/voldemort/build-and-push.html] to see if I can build and push say 2 K/V pairs to a single node Voldemort cluster. From what I understand my KV pairs can be in JSON or AVRO format. I have decided to go with JSON. I have been trying various KV combinations but I seem to be getting an error that says something like "the provided input file is not valid Sequence File". 
  • I am trying to insert to 2 KV pairs as below. My input file looks like this:
[ 
   { "sid" : "agrawal" }, 
   { "foo": "bar" } 
]  ​
The above file was first copied inside the HDFS and was picked up by the push script from there. Based on the error it seems like that input cannot be a JSON file but something called a JSON sequence file. 

Would it possible to point me in the right direction on how to do the conversion from a JSON array of KV pairs to the required input format of JSON  sequence file?
 

Here is my config file:
type=java
job.class=voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob
hadoop.job.ugi=anagpal,hadoop
build.input.path=/sid.json
build.output.dir=/tmp/build_output/
push.store.name=anagpal-test-old
push.cluster=tcp://localhost:6666
push.store.description="test store"
push.store.owners=mye...@myworkplace.com
build.replication.factor=1

Here is the command output:
./bin/run-bnp.sh sid.config /home/hdoop/hadoop-3.2.2/etc/

2021-01-24 08:59:23,305 INFO azkaban.VoldemortBuildAndPushJobRunner: Extracting config properties out of: sid.config 2021-01-24 08:59:23,341 INFO shell-job: Job props: { build.input.path: /sid.json build.output.dir: /tmp/build_output/ build.replication.factor: 1 hadoop.job.ugi: anagpal,hadoop job.class: voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob push.cluster: tcp://localhost:6666 push.store.description: "test store" push.store.name: anagpal-test-old push.store.owners: mye...@myworkplace.com type: java } 2021-01-24 08:59:23,694 INFO client.AbstractStoreClientFactory: Client zone-id [-1] Attempting to get raw store [voldsys$_metadata_version_persistence] 2021-01-24 08:59:23,723 INFO shell-job: voldemort.fetcher.protocol is set to : webhdfs 2021-01-24 08:59:23,723 INFO shell-job: voldemort.fetcher.port is set to : 50070 2021-01-24 08:59:23,724 INFO shell-job: Build and Push Job will run store verification in parallel, thread num: 20 2021-01-24 08:59:23,724 INFO shell-job: Build and Push Job constructed for 1 cluster(s). 2021-01-24 08:59:23,725 INFO shell-job: Requesting block-level compression codec expected by Server 2021-01-24 08:59:23,959 INFO shell-job: Server responded with block-level compression codecs: [ NO_CODEC ] 2021-01-24 08:59:23,959 INFO shell-job: Using no block-level compression 2021-01-24 08:59:24,714 ERROR utils.HadoopUtils: failed to get JSON metadata from path:/sid.json 2021-01-24 08:59:24,714 INFO shell-job: Closing AdminClient with BootStrapUrls: [tcp://localhost:6666] 2021-01-24 08:59:24,715 ERROR azkaban.VoldemortBuildAndPushJobRunner: Exception while running BnP job! voldemort.VoldemortException: An exception occurred during Build and Push !! at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob.run(VoldemortBuildAndPushJob.java:690) at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJobRunner.main(VoldemortBuildAndPushJobRunner.java:34) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:323) at org.apache.hadoop.util.RunJar.main(RunJar.java:236) Caused by: java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: hdfs://127.0.0.1:9000/sid.json not a SequenceFile at voldemort.store.readonly.mr.utils.HadoopUtils.getSchemaFromPath(HadoopUtils.java:166) at voldemort.store.readonly.mr.utils.HadoopUtils.getSchemaFromPath(HadoopUtils.java:98) at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob.getInputPathJsonSchema(VoldemortBuildAndPushJob.java:896) at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob.verifyOrAddJsonStore(VoldemortBuildAndPushJob.java:937) at voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob.run(VoldemortBuildAndPushJob.java:617) ... 7 more Caused by: java.lang.RuntimeException: java.io.IOException: hdfs://127.0.0.1:9000/sid.json not a SequenceFile at voldemort.store.readonly.mr.utils.HadoopUtils.getMetadataFromSequenceFile(HadoopUtils.java:93) at voldemort.store.readonly.mr.utils.HadoopUtils.getSchemaFromPath(HadoopUtils.java:119) ... 11 more Caused by: java.io.IOException: hdfs://127.0.0.1:9000/sid.json not a SequenceFile at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1970) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1923) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1872) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1886) at voldemort.store.readonly.mr.utils.HadoopUtils.getMetadataFromSequenceFile(HadoopUtils.java:83)

-Sid
Reply all
Reply to author
Forward
0 new messages