Hydrator: JSON Parser Issue!

720 views
Skip to first unread message

Girish Kumar

unread,
Feb 17, 2017, 1:41:28 AM2/17/17
to CDAP User
Hi,

The JSON Parser supports parsing nested JSON object and the mappings of nested fields can be specified in one for the formats as mentioned below

Expression

The "root member object" for parsing any JSON is referred to as $, regardless of whether it's an array or an object. It also uses either dot notation or bracket notation for defining the levels of parsing. For example: $.employee.name or $[employee][name].



Question: What if the root member string has '.' (dot) itself or a special character like @#$ etc ?


Embedding the root member with special character inside the square bracket also doesn't help.  In-fact, Parser doesn't recognize the square bracket at-all.


The JSON parse doesn't process those special cases and ends up in erring out as below...


java.lang.Exception: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: com.jayway.jsonpath.InvalidPathException: Could not parse token starting at position 1. Expected ?, ', 0-9, * 
	at org.apache.hadoop.mapred.LocalJobRunnerWithFix$Job.runTasks(LocalJobRunnerWithFix.java:465) ~[co.cask.cdap.cdap-app-fabric-4.0.0.jar:na]
	at org.apache.hadoop.mapred.LocalJobRunnerWithFix$Job.run(LocalJobRunnerWithFix.java:524) ~[co.cask.cdap.cdap-app-fabric-4.0.0.jar:na]
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: com.jayway.jsonpath.InvalidPathException: Could not parse token starting at position 1. Expected ?, ', 0-9, * 
	at co.cask.cdap.etl.batch.PipeTransformDetail.process(PipeTransformDetail.java:51) ~[cdap-etl-batch-4.0.0.jar:na]
	at co.cask.cdap.etl.batch.mapreduce.PipeTransformExecutor.runOneIteration(PipeTransformExecutor.java:42) ~[cdap-etl-batch-4.0.0.jar:na]
	at co.cask.cdap.etl.batch.mapreduce.TransformRunner.transform(TransformRunner.java:158) ~[cdap-etl-batch-4.0.0.jar:na]
	at co.cask.cdap.etl.batch.mapreduce.ETLMapReduce$ETLMapper.map(ETLMapReduce.java:358) ~[cdap-etl-batch-4.0.0.jar:na]
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) ~[org.apache.hadoop.hadoop-mapreduce-client-core-2.3.0.jar:na]
	at co.cask.cdap.internal.app.runtime.batch.MapperWrapper.run(MapperWrapper.java:119) ~[na:na]
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) ~[org.apache.hadoop.hadoop-mapreduce-client-core-2.3.0.jar:na]
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) ~[org.apache.hadoop.hadoop-mapreduce-client-core-2.3.0.jar:na]
	at org.apache.hadoop.mapred.LocalJobRunnerWithFix$Job$MapTaskRunnable.run(LocalJobRunnerWithFix.java:243) ~[co.cask.cdap.cdap-app-fabric-4.0.0.jar:na]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_51]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_51]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_51]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_51]
	at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_51]
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: com.jayway.jsonpath.InvalidPathException: Could not parse token starting at position 1. Expected ?, ', 0-9, * 
	at co.cask.cdap.etl.batch.PipeTransformDetail.process(PipeTransformDetail.java:51) ~[cdap-etl-batch-4.0.0.jar:na]
	at co.cask.cdap.etl.batch.mapreduce.TransformEmitter.emit(TransformEmitter.java:49) ~[cdap-etl-batch-4.0.0.jar:na]
	at co.cask.cdap.etl.common.TrackedEmitter.emit(TrackedEmitter.java:45) ~[cdap-etl-core-4.0.0.jar:na]
	at co.cask.hydrator.plugin.batch.source.StreamBatchSource.transform(StreamBatchSource.java:175) ~[1487313144261-0/:na]
	at co.cask.hydrator.plugin.batch.source.StreamBatchSource.transform(StreamBatchSource.java:56) ~[1487313144261-0/:na]
	at co.cask.cdap.etl.common.TrackedTransform.transform(TrackedTransform.java:61) ~[cdap-etl-core-4.0.0.jar:na]
	at co.cask.cdap.etl.batch.PipeTransformDetail.process(PipeTransformDetail.java:46) ~[cdap-etl-batch-4.0.0.jar:na]
	... 13 common frames omitted
Caused by: java.lang.RuntimeException: com.jayway.jsonpath.InvalidPathException: Could not parse token starting at position 1. Expected ?, ', 0-9, * 
	at co.cask.cdap.etl.batch.PipeTransformDetail.process(PipeTransformDetail.java:51) ~[cdap-etl-batch-4.0.0.jar:na]
	at co.cask.cdap.etl.batch.mapreduce.TransformEmitter.emit(TransformEmitter.java:49) ~[cdap-etl-batch-4.0.0.jar:na]
	at co.cask.cdap.etl.common.TrackedEmitter.emit(TrackedEmitter.java:45) ~[cdap-etl-core-4.0.0.jar:na]
	at co.cask.hydrator.plugin.transform.ProjectionTransform.transform(ProjectionTransform.java:159) ~[1487313144261-0/:na]
	at co.cask.hydrator.plugin.transform.ProjectionTransform.transform(ProjectionTransform.java:51) ~[1487313144261-0/:na]
	at co.cask.cdap.etl.common.TrackedTransform.transform(TrackedTransform.java:61) ~[cdap-etl-core-4.0.0.jar:na]
	at co.cask.cdap.etl.batch.PipeTransformDetail.process(PipeTransformDetail.java:46) ~[cdap-etl-batch-4.0.0.jar:na]
	... 19 common frames omitted
Caused by: com.jayway.jsonpath.InvalidPathException: Could not parse token starting at position 1. Expected ?, ', 0-9, * 
	at com.jayway.jsonpath.internal.path.PathCompiler.fail(PathCompiler.java:607) ~[na:na]
	at com.jayway.jsonpath.internal.path.PathCompiler.readNextToken(PathCompiler.java:137) ~[na:na]
	at com.jayway.jsonpath.internal.path.PathCompiler.readContextToken(PathCompiler.java:118) ~[na:na]
	at com.jayway.jsonpath.internal.path.PathCompiler.compile(PathCompiler.java:54) ~[na:na]
	at com.jayway.jsonpath.internal.path.PathCompiler.compile(PathCompiler.java:69) ~[na:na]
	at com.jayway.jsonpath.JsonPath.<init>(JsonPath.java:101) ~[na:na]
	at com.jayway.jsonpath.JsonPath.compile(JsonPath.java:467) ~[na:na]
	at com.jayway.jsonpath.internal.JsonContext.read(JsonContext.java:149) ~[na:na]
	at com.jayway.jsonpath.JsonPath.read(JsonPath.java:488) ~[na:na]
	at co.cask.hydrator.plugin.JSONParser.transform(JSONParser.java:146) ~[1487313148284-0/:na]
	at co.cask.hydrator.plugin.JSONParser.transform(JSONParser.java:45) ~[1487313148284-0/:na]
	at co.cask.cdap.etl.common.TrackedTransform.transform(TrackedTransform.java:61) ~[cdap-etl-core-4.0.0.jar:na]
	at co.cask.cdap.etl.batch.PipeTransformDetail.process(PipeTransformDetail.java:46) ~[cdap-etl-batch-4.0.0.jar:na]
	... 25 common frames omitted



Requesting CDAP team to confirm if the square bracket is really supported for mapping the nested JSON ?


Regards,

/Girish BK


Albert Shau

unread,
Feb 21, 2017, 5:18:52 PM2/21/17
to cdap...@googlegroups.com
Hi Girish,

The doc for the JSON Parser has an error. The brackets are supported, but the element in the bracket needs to be enclosed in single quotes. I've opened a https://issues.cask.co/browse/HYDRATOR-1384 to fix it.

This means you can do something like $['employee']['first.name']

Regards,
Albert

--
You received this message because you are subscribed to the Google Groups "CDAP User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+unsubscribe@googlegroups.com.
To post to this group, send email to cdap...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cdap-user/e33f8bc8-5676-4cae-b4de-626d4352ae8f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages