[JIRA] (CDAP-18692) NullPointerException with Array of Records in BigQuery

108 views
Skip to first unread message

Dani SA (Jira)

unread,
Dec 16, 2021, 9:17:52 AM12/16/21
to cdap-...@googlegroups.com
Dani SA created an issue
 
CDAP / Bug CDAP-18692
NullPointerException with Array of Records in BigQuery
Issue Type: Bug Bug
Affects Versions: 6.5.1
Assignee: Prerna Bellara
Components: CDAP
Created: 16/Dec/21 6:17 AM
Labels: bug
Priority: Critical Critical
Reporter: Dani SA

Uploading a JSON file directly in BigQuery within the Google-Console works as intended. But when trying to upload the same file from a bucket/GCS into a BigQuery-table through CDAP, it throws a NullPointerException.

Within the docu, in the table "Data Type Mappings from CDAP to BigQuery" there are conversion types for array<>repeated and also record<>struct between CDAP and BQ respectively. So for my understanding it should be possible to handle arrays of records in CDAP or repeated structs in BQ.

Please find on following lines sample data and schemas. I'm using the sandbox 6.5.1 with two nodes; Source GCS and Sink BQ

Sample input file:
{"users":[

{"name":"Aaaaaaa", "surname":"Bbbbbbb"}

,

{"name":"Aaaaaaa", "surname":"Bbbbbbb"}

,

{"name":"Aaaaaaa", "surname":"Bbbbbbb"}

], "id":1639643540394}

BQ schema
[

{ "description": "bq-datetime", "mode": "NULLABLE", "name": "id", "type": "INTEGER" }

,
{
"fields": [

{ "mode": "NULLABLE", "name": "surname", "type": "STRING" }

,

{ "mode": "NULLABLE", "name": "name", "type": "STRING" }

],
"mode": "REPEATED",
"name": "users",
"type": "RECORD"
}
]

CDAP schema
[
{
"name": "etlSchemaBody",
"schema": {
"type": "record",
"name": "record",
"fields": [
{
"name": "users",
"type": [
{
"type": "array",
"items": {
"type": "record",
"name": "users",
"fields": [

{ "name": "name", "type": [ "string", "null" ] }

,

{ "name": "surname", "type": [ "string", "null" ] }

]
}
},
"null"
]
},

{ "name": "id", "type": [ "long", "null" ] }

]
}
}
]

Error
2021-12-16 15:12:56,251 - INFO [WorkflowDriver:i.c.c.i.a.r.w.WorkflowDriver@623] - Starting workflow execution for 'DataPipelineWorkflow' with Run id '41b2f138-5e7a-11ec-96b9-0242ac120002'
2021-12-16 15:12:56,307 - INFO [action-phase-1-0:i.c.c.i.a.r.w.WorkflowDriver@337] - Starting Spark Program 'phase-1' in workflow
2021-12-16 15:12:56,390 - DEBUG [action-phase-1-0:i.c.c.a.r.s.SparkProgramRunner@219] - Starting Spark Job. Context: SparkRuntimeContext

{id=program:default.Array_of_records_v2.-SNAPSHOT.spark.phase-1, runId=4417bc39-5e7a-11ec-90ab-0242ac120002}

2021-12-16 15:13:04,255 - DEBUG [SparkRunnerphase-1:i.c.p.g.b.s.AbstractBigQuerySink@137] - Init output for table 'array_of_records' with schema: {"type":"record","name":"record","fields":[{"name":"users","type":[{"type":"array","items":{"type":"record","name":"users","fields":[

{"name":"name","type":["string","null"]}

,

{"name":"surname","type":["string","null"]}

]}},"null"]},

{"name":"id","type":["long","null"]}

]}
2021-12-16 15:13:04,581 - ERROR [SparkRunnerphase-1:i.c.c.i.a.r.ProgramControllerServiceAdapter@92] - Spark Program 'phase-1' failed.
org.apache.tephra.TransactionFailureException: Exception raised from TxRunnable.run() io.cdap.cdap.internal.app.runtime.AbstractContext$$Lambda$430/682280453@216cfcd3
at io.cdap.cdap.data2.transaction.Transactions$CacheBasedTransactional.finishExecute(Transactions.java:226) ~[na:na]
at io.cdap.cdap.data2.transaction.Transactions$CacheBasedTransactional.execute(Transactions.java:211) ~[na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.execute(AbstractContext.java:514) ~[na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.execute(AbstractContext.java:502) ~[na:na]
at io.cdap.cdap.app.runtime.spark.BasicSparkClientContext.execute(BasicSparkClientContext.java:342) ~[io.cdap.cdap.cdap-spark-core2_2.11-6.5.1.jar:na]
at io.cdap.cdap.etl.common.submit.SubmitterPlugin.prepareRun(SubmitterPlugin.java:69) ~[na:na]
at io.cdap.cdap.etl.common.submit.PipelinePhasePreparer.prepare(PipelinePhasePreparer.java:148) ~[na:na]
at io.cdap.cdap.etl.spark.AbstractSparkPreparer.prepare(AbstractSparkPreparer.java:87) ~[na:na]
at io.cdap.cdap.etl.spark.batch.SparkPreparer.prepare(SparkPreparer.java:87) ~[na:na]
at io.cdap.cdap.etl.spark.batch.ETLSpark.initialize(ETLSpark.java:120) ~[na:na]
at io.cdap.cdap.api.spark.AbstractSpark.initialize(AbstractSpark.java:131) ~[na:na]
at io.cdap.cdap.api.spark.AbstractSpark.initialize(AbstractSpark.java:33) ~[na:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService$2.initialize(SparkRuntimeService.java:167) ~[io.cdap.cdap.cdap-spark-core2_2.11-6.5.1.jar:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService$2.initialize(SparkRuntimeService.java:162) ~[io.cdap.cdap.cdap-spark-core2_2.11-6.5.1.jar:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.lambda$initializeProgram$6(AbstractContext.java:602) ~[na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.execute(AbstractContext.java:562) ~[na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.initializeProgram(AbstractContext.java:599) ~[na:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService.initialize(SparkRuntimeService.java:433) ~[io.cdap.cdap.cdap-spark-core2_2.11-6.5.1.jar:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService.startUp(SparkRuntimeService.java:208) ~[io.cdap.cdap.cdap-spark-core2_2.11-6.5.1.jar:na]
at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:47) ~[com.google.guava.guava-13.0.1.jar:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService$5$1.run(SparkRuntimeService.java:404) [io.cdap.cdap.cdap-spark-core2_2.11-6.5.1.jar:na]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_292]
Caused by: java.lang.NullPointerException: null
at java.util.Objects.requireNonNull(Objects.java:203) ~[na:1.8.0_292]
at io.cdap.plugin.gcp.bigquery.sink.BigQuerySinkUtils.generateTableFieldSchema(BigQuerySinkUtils.java:282) ~[na:na]
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[na:1.8.0_292]
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) ~[na:1.8.0_292]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) ~[na:1.8.0_292]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[na:1.8.0_292]
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) ~[na:1.8.0_292]
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[na:1.8.0_292]
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) ~[na:1.8.0_292]
at io.cdap.plugin.gcp.bigquery.sink.BigQuerySinkUtils.getBigQueryTableFieldsFromSchema(BigQuerySinkUtils.java:264) ~[na:na]
at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.getBigQueryTableFields(AbstractBigQuerySink.java:389) ~[na:na]
at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.initOutput(AbstractBigQuerySink.java:139) ~[na:na]
at io.cdap.plugin.gcp.bigquery.sink.BigQuerySink.prepareRunInternal(BigQuerySink.java:114) ~[na:na]
at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.prepareRun(AbstractBigQuerySink.java:100) ~[na:na]
at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.prepareRun(AbstractBigQuerySink.java:60) ~[na:na]
at io.cdap.cdap.etl.common.plugin.WrappedBatchSink.lambda$prepareRun$0(WrappedBatchSink.java:52) ~[na:na]
at io.cdap.cdap.etl.common.plugin.Caller$1.call(Caller.java:30) ~[na:na]
at io.cdap.cdap.etl.common.plugin.WrappedBatchSink.prepareRun(WrappedBatchSink.java:51) ~[na:na]
at io.cdap.cdap.etl.common.plugin.WrappedBatchSink.prepareRun(WrappedBatchSink.java:37) ~[na:na]
at io.cdap.cdap.etl.common.submit.SubmitterPlugin.lambda$prepareRun$2(SubmitterPlugin.java:71) ~[na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.lambda$execute$3(AbstractContext.java:517) ~[na:na]
at io.cdap.cdap.data2.transaction.Transactions$CacheBasedTransactional.finishExecute(Transactions.java:224) ~[na:na]
... 21 common frames omitted
2021-12-16 15:13:04,599 - ERROR [SparkRunnerphase-1:i.c.c.i.a.r.ProgramControllerServiceAdapter@93] - Spark program 'phase-1' failed with error: null. Please check the system logs for more details.
java.lang.NullPointerException: null
at java.util.Objects.requireNonNull(Objects.java:203) ~[na:1.8.0_292]
at io.cdap.plugin.gcp.bigquery.sink.BigQuerySinkUtils.generateTableFieldSchema(BigQuerySinkUtils.java:282) ~[na:na]
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[na:1.8.0_292]
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) ~[na:1.8.0_292]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) ~[na:1.8.0_292]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[na:1.8.0_292]
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) ~[na:1.8.0_292]
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[na:1.8.0_292]
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) ~[na:1.8.0_292]
at io.cdap.plugin.gcp.bigquery.sink.BigQuerySinkUtils.getBigQueryTableFieldsFromSchema(BigQuerySinkUtils.java:264) ~[na:na]
at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.getBigQueryTableFields(AbstractBigQuerySink.java:389) ~[na:na]
at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.initOutput(AbstractBigQuerySink.java:139) ~[na:na]
at io.cdap.plugin.gcp.bigquery.sink.BigQuerySink.prepareRunInternal(BigQuerySink.java:114) ~[na:na]
at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.prepareRun(AbstractBigQuerySink.java:100) ~[na:na]
at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.prepareRun(AbstractBigQuerySink.java:60) ~[na:na]
at io.cdap.cdap.etl.common.plugin.WrappedBatchSink.lambda$prepareRun$0(WrappedBatchSink.java:52) ~[na:na]
at io.cdap.cdap.etl.common.plugin.Caller$1.call(Caller.java:30) ~[na:na]
at io.cdap.cdap.etl.common.plugin.WrappedBatchSink.prepareRun(WrappedBatchSink.java:51) ~[na:na]
at io.cdap.cdap.etl.common.plugin.WrappedBatchSink.prepareRun(WrappedBatchSink.java:37) ~[na:na]
at io.cdap.cdap.etl.common.submit.SubmitterPlugin.lambda$prepareRun$2(SubmitterPlugin.java:71) ~[na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.lambda$execute$3(AbstractContext.java:517) ~[na:na]
at io.cdap.cdap.data2.transaction.Transactions$CacheBasedTransactional.finishExecute(Transactions.java:224) ~[na:na]
at io.cdap.cdap.data2.transaction.Transactions$CacheBasedTransactional.execute(Transactions.java:211) ~[na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.execute(AbstractContext.java:514) ~[na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.execute(AbstractContext.java:502) ~[na:na]
at io.cdap.cdap.app.runtime.spark.BasicSparkClientContext.execute(BasicSparkClientContext.java:342) ~[io.cdap.cdap.cdap-spark-core2_2.11-6.5.1.jar:na]
at io.cdap.cdap.etl.common.submit.SubmitterPlugin.prepareRun(SubmitterPlugin.java:69) ~[na:na]
at io.cdap.cdap.etl.common.submit.PipelinePhasePreparer.prepare(PipelinePhasePreparer.java:148) ~[na:na]
at io.cdap.cdap.etl.spark.AbstractSparkPreparer.prepare(AbstractSparkPreparer.java:87) ~[na:na]
at io.cdap.cdap.etl.spark.batch.SparkPreparer.prepare(SparkPreparer.java:87) ~[na:na]
at io.cdap.cdap.etl.spark.batch.ETLSpark.initialize(ETLSpark.java:120) ~[na:na]
at io.cdap.cdap.api.spark.AbstractSpark.initialize(AbstractSpark.java:131) ~[na:na]
at io.cdap.cdap.api.spark.AbstractSpark.initialize(AbstractSpark.java:33) ~[na:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService$2.initialize(SparkRuntimeService.java:167) ~[io.cdap.cdap.cdap-spark-core2_2.11-6.5.1.jar:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService$2.initialize(SparkRuntimeService.java:162) ~[io.cdap.cdap.cdap-spark-core2_2.11-6.5.1.jar:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.lambda$initializeProgram$6(AbstractContext.java:602) ~[na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.execute(AbstractContext.java:562) ~[na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.initializeProgram(AbstractContext.java:599) ~[na:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService.initialize(SparkRuntimeService.java:433) ~[io.cdap.cdap.cdap-spark-core2_2.11-6.5.1.jar:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService.startUp(SparkRuntimeService.java:208) ~[io.cdap.cdap.cdap-spark-core2_2.11-6.5.1.jar:na]
at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:47) ~[com.google.guava.guava-13.0.1.jar:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService$5$1.run(SparkRuntimeService.java:404) [io.cdap.cdap.cdap-spark-core2_2.11-6.5.1.jar:na]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_292]
2021-12-16 15:13:04,610 - ERROR [WorkflowDriver:i.c.c.d.SmartWorkflow@561] - Pipeline 'Array_of_records_v2' failed.
2021-12-16 15:13:04,686 - ERROR [WorkflowDriver:i.c.c.i.a.r.w.WorkflowProgramController@89] - Workflow service 'workflow.default.Array_of_records_v2.DataPipelineWorkflow.41b2f138-5e7a-11ec-96b9-0242ac120002' failed.
java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.tephra.TransactionFailureException: Exception raised from TxRunnable.run() io.cdap.cdap.internal.app.runtime.AbstractContext$$Lambda$430/682280453@216cfcd3
at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[na:1.8.0_292]
at java.util.concurrent.FutureTask.get(FutureTask.java:192) ~[na:1.8.0_292]
at io.cdap.cdap.internal.app.runtime.workflow.WorkflowDriver.executeAction(WorkflowDriver.java:344) ~[na:na]
at io.cdap.cdap.internal.app.runtime.workflow.WorkflowDriver.executeNode(WorkflowDriver.java:475) ~[na:na]
at io.cdap.cdap.internal.app.runtime.workflow.WorkflowDriver.executeAll(WorkflowDriver.java:641) ~[na:na]
at io.cdap.cdap.internal.app.runtime.workflow.WorkflowDriver.run(WorkflowDriver.java:626) ~[na:na]
at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:52) ~[com.google.guava.guava-13.0.1.jar:na]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_292]
Caused by: java.lang.RuntimeException: org.apache.tephra.TransactionFailureException: Exception raised from TxRunnable.run() io.cdap.cdap.internal.app.runtime.AbstractContext$$Lambda$430/682280453@216cfcd3
at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[com.google.guava.guava-13.0.1.jar:na]
at io.cdap.cdap.internal.app.runtime.workflow.DefaultProgramWorkflowRunner$1.run(DefaultProgramWorkflowRunner.java:143) ~[na:na]
at io.cdap.cdap.internal.app.runtime.workflow.WorkflowDriver$1.call(WorkflowDriver.java:338) ~[na:na]
at io.cdap.cdap.internal.app.runtime.workflow.WorkflowDriver$1.call(WorkflowDriver.java:322) ~[na:na]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_292]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_292]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_292]
... 1 common frames omitted
Caused by: org.apache.tephra.TransactionFailureException: Exception raised from TxRunnable.run() io.cdap.cdap.internal.app.runtime.AbstractContext$$Lambda$430/682280453@216cfcd3
at io.cdap.cdap.data2.transaction.Transactions$CacheBasedTransactional.finishExecute(Transactions.java:226) ~[na:na]
at io.cdap.cdap.data2.transaction.Transactions$CacheBasedTransactional.execute(Transactions.java:211) ~[na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.execute(AbstractContext.java:514) ~[na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.execute(AbstractContext.java:502) ~[na:na]
at io.cdap.cdap.app.runtime.spark.BasicSparkClientContext.execute(BasicSparkClientContext.java:342) ~[na:na]
at io.cdap.cdap.etl.common.submit.SubmitterPlugin.prepareRun(SubmitterPlugin.java:69) ~[na:na]
at io.cdap.cdap.etl.common.submit.PipelinePhasePreparer.prepare(PipelinePhasePreparer.java:148) ~[na:na]
at io.cdap.cdap.etl.spark.AbstractSparkPreparer.prepare(AbstractSparkPreparer.java:87) ~[na:na]
at io.cdap.cdap.etl.spark.batch.SparkPreparer.prepare(SparkPreparer.java:87) ~[na:na]
at io.cdap.cdap.etl.spark.batch.ETLSpark.initialize(ETLSpark.java:120) ~[na:na]
at io.cdap.cdap.api.spark.AbstractSpark.initialize(AbstractSpark.java:131) ~[na:na]
at io.cdap.cdap.api.spark.AbstractSpark.initialize(AbstractSpark.java:33) ~[na:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService$2.initialize(SparkRuntimeService.java:167) ~[na:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService$2.initialize(SparkRuntimeService.java:162) ~[na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.lambda$initializeProgram$6(AbstractContext.java:602) ~[na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.execute(AbstractContext.java:562) ~[na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.initializeProgram(AbstractContext.java:599) ~[na:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService.initialize(SparkRuntimeService.java:433) ~[na:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService.startUp(SparkRuntimeService.java:208) ~[na:na]
at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:47) ~[com.google.guava.guava-13.0.1.jar:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService$5$1.run(SparkRuntimeService.java:404) ~[na:na]
... 1 common frames omitted
Caused by: java.lang.NullPointerException: null
at java.util.Objects.requireNonNull(Objects.java:203) ~[na:1.8.0_292]
at io.cdap.plugin.gcp.bigquery.sink.BigQuerySinkUtils.generateTableFieldSchema(BigQuerySinkUtils.java:282) ~[na:na]
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[na:1.8.0_292]
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) ~[na:1.8.0_292]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) ~[na:1.8.0_292]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[na:1.8.0_292]
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) ~[na:1.8.0_292]
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[na:1.8.0_292]
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) ~[na:1.8.0_292]
at io.cdap.plugin.gcp.bigquery.sink.BigQuerySinkUtils.getBigQueryTableFieldsFromSchema(BigQuerySinkUtils.java:264) ~[na:na]
at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.getBigQueryTableFields(AbstractBigQuerySink.java:389) ~[na:na]
at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.initOutput(AbstractBigQuerySink.java:139) ~[na:na]
at io.cdap.plugin.gcp.bigquery.sink.BigQuerySink.prepareRunInternal(BigQuerySink.java:114) ~[na:na]
at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.prepareRun(AbstractBigQuerySink.java:100) ~[na:na]
at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.prepareRun(AbstractBigQuerySink.java:60) ~[na:na]
at io.cdap.cdap.etl.common.plugin.WrappedBatchSink.lambda$prepareRun$0(WrappedBatchSink.java:52) ~[na:na]
at io.cdap.cdap.etl.common.plugin.Caller$1.call(Caller.java:30) ~[na:na]
at io.cdap.cdap.etl.common.plugin.WrappedBatchSink.prepareRun(WrappedBatchSink.java:51) ~[na:na]
at io.cdap.cdap.etl.common.plugin.WrappedBatchSink.prepareRun(WrappedBatchSink.java:37) ~[na:na]
at io.cdap.cdap.etl.common.submit.SubmitterPlugin.lambda$prepareRun$2(SubmitterPlugin.java:71) ~[na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.lambda$execute$3(AbstractContext.java:517) ~[na:na]
at io.cdap.cdap.data2.transaction.Transactions$CacheBasedTransactional.finishExecute(Transactions.java:224) ~[na:na]
... 21 common frames omitted
2021-12-16 15:13:04,713 - DEBUG [pcontroller-program:default.Array_of_records_v2.-SNAPSHOT.workflow.DataPipelineWorkflow-41b2f138-5e7a-11ec-96b9-0242ac120002-3:i.c.c.a.r.AbstractProgramRuntimeService@564] - RuntimeInfo removed: program_run:default.Array_of_records_v2.-SNAPSHOT.workflow.DataPipelineWorkflow.41b2f138-5e7a-11ec-96b9-0242ac120002
2021-12-16 15:13:05,895 - DEBUG [provisioning-task-7:i.c.c.i.p.t.ProvisioningTask@116] - Completed DEPROVISION task for program run program_run:default.Array_of_records_v2.-SNAPSHOT.workflow.DataPipelineWorkflow.41b2f138-5e7a-11ec-96b9-0242ac120002.

Add Comment Add Comment
 
Get Jira notifications on your phone! Download the Jira Cloud app for Android or iOS
This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100187-sha1:2bd1920)
Atlassian logo

Dani SA (Jira)

unread,
Dec 16, 2021, 11:43:47 AM12/16/21
to cdap-...@googlegroups.com
Dani SA updated an issue
Change By: Dani SA
Uploading a JSON file directly in BigQuery within the Google-Console works as intended. But when trying to upload the same file from a bucket/GCS into a BigQuery-table through CDAP, it throws a NullPointerException.

Within the docu, in the table "Data Type Mappings from CDAP to BigQuery" there are conversion types for array <-> to repeated and also record <-> to struct between CDAP and BQ respectively. So for my understanding it should be possible to handle arrays of records in CDAP or repeated structs in BQ.

Please find on following lines sample data and schemas. I'm using the sandbox 6.5.1 with two nodes; Source GCS and Sink BQ . Please find on following lines used input data, schemas and error.

*Sample input file*:
\ {"users": \ [ \ {"name":"Aaaaaaa", "surname":"Bbbbbbb"}, \ {"name":"Aaaaaaa", "surname":"Bbbbbbb"}, \ {"name":"Aaaaaaa", "surname":"Bbbbbbb"}], "id":1639643540394}

*BQ schema*
\ [
\   {

    "description": "bq-datetime",
    "mode": "NULLABLE",
    "name": "id",
    "type": "INTEGER"
  },
\   {
    "fields":
\ [
\       {
        "mode": "NULLABLE",
        "name": "surname",
        "type": "STRING"
      },
\
      {

        "mode": "NULLABLE",
        "name": "name",
        "type": "STRING"
      }
    ],
    "mode": "REPEATED",
    "name": "users",
    "type": "RECORD"
  }
]

*CDAP schema*
\ [
\     {
        "name": "etlSchemaBody",
        "schema":
\ {
\ [
[
                     "long",
                     "null"
                    ]
                }
            ]
        }
    }
]

*Error*
  \   [WorkflowDriver:i.c.c.i.a.r.w.WorkflowDriver@623] - Starting workflow execution for 'DataPipelineWorkflow' with Run id '41b2f138-5e7a-11ec-96b9-0242ac120002'
  \   [action-phase-1-0:i.c.c.i.a.r.w.WorkflowDriver@337] - Starting Spark Program 'phase-1' in workflow
\ [action-phase-1-0:i.c.c.a.r.s.SparkProgramRunner@219] - Starting Spark Job. Context: SparkRuntimeContext \ {id=program:default.Array_of_records_v2.-SNAPSHOT.spark.phase-1, runId=4417bc39-5e7a-11ec-90ab-0242ac120002}
\ [SparkRunnerphase-1:i.c.p.g.b.s.AbstractBigQuerySink@137] - Init output for table 'array_of_records' with schema: \ {"type":"record","name":"record","fields": \ [ \ {"name":"users","type": \ [ \ {"type":"array","items": \ {"type":"record","name":"users","fields": \ [ \ {"name":"name","type": \ ["string","null"]}, \ {"name":"surname","type": \ ["string","null"]}]}},"null"]}, \ {"name":"id","type": \ ["long","null"]}]}
\ [SparkRunnerphase-1:i.c.c.i.a.r.ProgramControllerServiceAdapter@92] - Spark Program 'phase-1' failed.
\ [na:na]
at io.cdap.cdap.data2.transaction.Transactions$CacheBasedTransactional.execute(Transactions.java:211) ~
\ [na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.execute(AbstractContext.java:514) ~
\ [na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.execute(AbstractContext.java:502) ~
\ [na:na]
at io.cdap.cdap.app.runtime.spark.BasicSparkClientContext.execute(BasicSparkClientContext.java:342) ~
\ [io.cdap.cdap.cdap-spark-core2_2.11-6.5.1.jar:na]
at io.cdap.cdap.etl.common.submit.SubmitterPlugin.prepareRun(SubmitterPlugin.java:69) ~
\ [na:na]
at io.cdap.cdap.etl.common.submit.PipelinePhasePreparer.prepare(PipelinePhasePreparer.java:148) ~
\ [na:na]
at io.cdap.cdap.etl.spark.AbstractSparkPreparer.prepare(AbstractSparkPreparer.java:87) ~
\ [na:na]
at io.cdap.cdap.etl.spark.batch.SparkPreparer.prepare(SparkPreparer.java:87) ~
\ [na:na]
at io.cdap.cdap.etl.spark.batch.ETLSpark.initialize(ETLSpark.java:120) ~
\ [na:na]
at io.cdap.cdap.api.spark.AbstractSpark.initialize(AbstractSpark.java:131) ~
\ [na:na]
at io.cdap.cdap.api.spark.AbstractSpark.initialize(AbstractSpark.java:33) ~
\ [na:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService$2.initialize(SparkRuntimeService.java:167) ~
\ [io.cdap.cdap.cdap-spark-core2_2.11-6.5.1.jar:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService$2.initialize(SparkRuntimeService.java:162) ~
\ [io.cdap.cdap.cdap-spark-core2_2.11-6.5.1.jar:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.lambda$initializeProgram$6(AbstractContext.java:602) ~
\ [na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.execute(AbstractContext.java:562) ~
\ [na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.initializeProgram(AbstractContext.java:599) ~
\ [na:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService.initialize(SparkRuntimeService.java:433) ~
\ [io.cdap.cdap.cdap-spark-core2_2.11-6.5.1.jar:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService.startUp(SparkRuntimeService.java:208) ~
\ [io.cdap.cdap.cdap-spark-core2_2.11-6.5.1.jar:na]
at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:47) ~
\ [com.google.guava.guava-13.0.1.jar:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService$5$1.run(SparkRuntimeService.java:404)
\ [io.cdap.cdap.cdap-spark-core2_2.11-6.5.1.jar:na]
at java.lang.Thread.run(Thread.java:748)
\ [na:1.8.0_292]
\ [na:1.8.0_292]
at io.cdap.plugin.gcp.bigquery.sink.BigQuerySinkUtils.generateTableFieldSchema(BigQuerySinkUtils.java:282) ~
\ [na:na]
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~
\ [na:1.8.0_292]
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) ~
\ [na:1.8.0_292]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) ~
\ [na:1.8.0_292]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~
\ [na:1.8.0_292]
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) ~
\ [na:1.8.0_292]
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~
\ [na:1.8.0_292]
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) ~
\ [na:1.8.0_292]
at io.cdap.plugin.gcp.bigquery.sink.BigQuerySinkUtils.getBigQueryTableFieldsFromSchema(BigQuerySinkUtils.java:264) ~
\ [na:na]
at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.getBigQueryTableFields(AbstractBigQuerySink.java:389) ~
\ [na:na]
at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.initOutput(AbstractBigQuerySink.java:139) ~
\ [na:na]
at io.cdap.plugin.gcp.bigquery.sink.BigQuerySink.prepareRunInternal(BigQuerySink.java:114) ~
\ [na:na]
at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.prepareRun(AbstractBigQuerySink.java:100) ~
\ [na:na]
at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.prepareRun(AbstractBigQuerySink.java:60) ~
\ [na:na]
at io.cdap.cdap.etl.common.plugin.WrappedBatchSink.lambda$prepareRun$0(WrappedBatchSink.java:52) ~
\ [na:na]
at io.cdap.cdap.etl.common.plugin.Caller$1.call(Caller.java:30) ~
\ [na:na]
at io.cdap.cdap.etl.common.plugin.WrappedBatchSink.prepareRun(WrappedBatchSink.java:51) ~
\ [na:na]
at io.cdap.cdap.etl.common.plugin.WrappedBatchSink.prepareRun(WrappedBatchSink.java:37) ~
\ [na:na]
at io.cdap.cdap.etl.common.submit.SubmitterPlugin.lambda$prepareRun$2(SubmitterPlugin.java:71) ~
\ [na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.lambda$execute$3(AbstractContext.java:517) ~
\ [na:na]
at io.cdap.cdap.data2.transaction.Transactions$CacheBasedTransactional.finishExecute(Transactions.java:224) ~
\ [na:na]
\ [SparkRunnerphase-1:i.c.c.i.a.r.ProgramControllerServiceAdapter@93] - Spark program 'phase-1' failed with error: null. Please check the system logs for more details.
\ [na:1.8.0_292]
at io.cdap.plugin.gcp.bigquery.sink.BigQuerySinkUtils.generateTableFieldSchema(BigQuerySinkUtils.java:282) ~
\ [na:na]
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~
\ [na:1.8.0_292]
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) ~
\ [na:1.8.0_292]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) ~
\ [na:1.8.0_292]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~
\ [na:1.8.0_292]
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) ~
\ [na:1.8.0_292]
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~
\ [na:1.8.0_292]
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) ~
\ [na:1.8.0_292]
at io.cdap.plugin.gcp.bigquery.sink.BigQuerySinkUtils.getBigQueryTableFieldsFromSchema(BigQuerySinkUtils.java:264) ~
\ [na:na]
at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.getBigQueryTableFields(AbstractBigQuerySink.java:389) ~
\ [na:na]
at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.initOutput(AbstractBigQuerySink.java:139) ~
\ [na:na]
at io.cdap.plugin.gcp.bigquery.sink.BigQuerySink.prepareRunInternal(BigQuerySink.java:114) ~
\ [na:na]
at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.prepareRun(AbstractBigQuerySink.java:100) ~
\ [na:na]
at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.prepareRun(AbstractBigQuerySink.java:60) ~
\ [na:na]
at io.cdap.cdap.etl.common.plugin.WrappedBatchSink.lambda$prepareRun$0(WrappedBatchSink.java:52) ~
\ [na:na]
at io.cdap.cdap.etl.common.plugin.Caller$1.call(Caller.java:30) ~
\ [na:na]
at io.cdap.cdap.etl.common.plugin.WrappedBatchSink.prepareRun(WrappedBatchSink.java:51) ~
\ [na:na]
at io.cdap.cdap.etl.common.plugin.WrappedBatchSink.prepareRun(WrappedBatchSink.java:37) ~
\ [na:na]
at io.cdap.cdap.etl.common.submit.SubmitterPlugin.lambda$prepareRun$2(SubmitterPlugin.java:71) ~
\ [na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.lambda$execute$3(AbstractContext.java:517) ~
\ [na:na]
at io.cdap.cdap.data2.transaction.Transactions$CacheBasedTransactional.finishExecute(Transactions.java:224) ~
\ [na:na]
at io.cdap.cdap.data2.transaction.Transactions$CacheBasedTransactional.execute(Transactions.java:211) ~
\ [na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.execute(AbstractContext.java:514) ~
\ [na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.execute(AbstractContext.java:502) ~
\ [na:na]
at io.cdap.cdap.app.runtime.spark.BasicSparkClientContext.execute(BasicSparkClientContext.java:342) ~
\ [io.cdap.cdap.cdap-spark-core2_2.11-6.5.1.jar:na]
at io.cdap.cdap.etl.common.submit.SubmitterPlugin.prepareRun(SubmitterPlugin.java:69) ~
\ [na:na]
at io.cdap.cdap.etl.common.submit.PipelinePhasePreparer.prepare(PipelinePhasePreparer.java:148) ~
\ [na:na]
at io.cdap.cdap.etl.spark.AbstractSparkPreparer.prepare(AbstractSparkPreparer.java:87) ~
\ [na:na]
at io.cdap.cdap.etl.spark.batch.SparkPreparer.prepare(SparkPreparer.java:87) ~
\ [na:na]
at io.cdap.cdap.etl.spark.batch.ETLSpark.initialize(ETLSpark.java:120) ~
\ [na:na]
at io.cdap.cdap.api.spark.AbstractSpark.initialize(AbstractSpark.java:131) ~
\ [na:na]
at io.cdap.cdap.api.spark.AbstractSpark.initialize(AbstractSpark.java:33) ~
\ [na:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService$2.initialize(SparkRuntimeService.java:167) ~
\ [io.cdap.cdap.cdap-spark-core2_2.11-6.5.1.jar:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService$2.initialize(SparkRuntimeService.java:162) ~
\ [io.cdap.cdap.cdap-spark-core2_2.11-6.5.1.jar:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.lambda$initializeProgram$6(AbstractContext.java:602) ~
\ [na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.execute(AbstractContext.java:562) ~
\ [na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.initializeProgram(AbstractContext.java:599) ~
\ [na:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService.initialize(SparkRuntimeService.java:433) ~
\ [io.cdap.cdap.cdap-spark-core2_2.11-6.5.1.jar:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService.startUp(SparkRuntimeService.java:208) ~
\ [io.cdap.cdap.cdap-spark-core2_2.11-6.5.1.jar:na]
at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:47) ~
\ [com.google.guava.guava-13.0.1.jar:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService$5$1.run(SparkRuntimeService.java:404)
\ [io.cdap.cdap.cdap-spark-core2_2.11-6.5.1.jar:na]
at java.lang.Thread.run(Thread.java:748)
\ [na:1.8.0_292]
\ [WorkflowDriver:i.c.c.d.SmartWorkflow@561] - Pipeline 'Array_of_records_v2' failed.
\ [WorkflowDriver:i.c.c.i.a.r.w.WorkflowProgramController@89] - Workflow service 'workflow.default.Array_of_records_v2.DataPipelineWorkflow.41b2f138-5e7a-11ec-96b9-0242ac120002' failed.
\ [na:1.8.0_292]
at java.util.concurrent.FutureTask.get(FutureTask.java:192) ~
\ [na:1.8.0_292]
at io.cdap.cdap.internal.app.runtime.workflow.WorkflowDriver.executeAction(WorkflowDriver.java:344) ~
\ [na:na]
at io.cdap.cdap.internal.app.runtime.workflow.WorkflowDriver.executeNode(WorkflowDriver.java:475) ~
\ [na:na]
at io.cdap.cdap.internal.app.runtime.workflow.WorkflowDriver.executeAll(WorkflowDriver.java:641) ~
\ [na:na]
at io.cdap.cdap.internal.app.runtime.workflow.WorkflowDriver.run(WorkflowDriver.java:626) ~
\ [na:na]
at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:52) ~
\ [com.google.guava.guava-13.0.1.jar:na]
at java.lang.Thread.run(Thread.java:748)
\ [na:1.8.0_292]
\ [com.google.guava.guava-13.0.1.jar:na]
at io.cdap.cdap.internal.app.runtime.workflow.DefaultProgramWorkflowRunner$1.run(DefaultProgramWorkflowRunner.java:143) ~
\ [na:na]
at io.cdap.cdap.internal.app.runtime.workflow.WorkflowDriver$1.call(WorkflowDriver.java:338) ~
\ [na:na]
at io.cdap.cdap.internal.app.runtime.workflow.WorkflowDriver$1.call(WorkflowDriver.java:322) ~
\ [na:na]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~
\ [na:1.8.0_292]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~
\ [na:1.8.0_292]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~
\ [na:1.8.0_292]
\ [na:na]
at io.cdap.cdap.data2.transaction.Transactions$CacheBasedTransactional.execute(Transactions.java:211) ~
\ [na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.execute(AbstractContext.java:514) ~
\ [na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.execute(AbstractContext.java:502) ~
\ [na:na]
at io.cdap.cdap.app.runtime.spark.BasicSparkClientContext.execute(BasicSparkClientContext.java:342) ~
\ [na:na]
at io.cdap.cdap.etl.common.submit.SubmitterPlugin.prepareRun(SubmitterPlugin.java:69) ~
\ [na:na]
at io.cdap.cdap.etl.common.submit.PipelinePhasePreparer.prepare(PipelinePhasePreparer.java:148) ~
\ [na:na]
at io.cdap.cdap.etl.spark.AbstractSparkPreparer.prepare(AbstractSparkPreparer.java:87) ~
\ [na:na]
at io.cdap.cdap.etl.spark.batch.SparkPreparer.prepare(SparkPreparer.java:87) ~
\ [na:na]
at io.cdap.cdap.etl.spark.batch.ETLSpark.initialize(ETLSpark.java:120) ~
\ [na:na]
at io.cdap.cdap.api.spark.AbstractSpark.initialize(AbstractSpark.java:131) ~
\ [na:na]
at io.cdap.cdap.api.spark.AbstractSpark.initialize(AbstractSpark.java:33) ~
\ [na:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService$2.initialize(SparkRuntimeService.java:167) ~
\ [na:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService$2.initialize(SparkRuntimeService.java:162) ~
\ [na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.lambda$initializeProgram$6(AbstractContext.java:602) ~
\ [na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.execute(AbstractContext.java:562) ~
\ [na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.initializeProgram(AbstractContext.java:599) ~
\ [na:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService.initialize(SparkRuntimeService.java:433) ~
\ [na:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService.startUp(SparkRuntimeService.java:208) ~
\ [na:na]
at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:47) ~
\ [com.google.guava.guava-13.0.1.jar:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService$5$1.run(SparkRuntimeService.java:404) ~
\ [na:na]
\ [na:1.8.0_292]
at io.cdap.plugin.gcp.bigquery.sink.BigQuerySinkUtils.generateTableFieldSchema(BigQuerySinkUtils.java:282) ~
\ [na:na]
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~
\ [na:1.8.0_292]
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) ~
\ [na:1.8.0_292]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) ~
\ [na:1.8.0_292]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~
\ [na:1.8.0_292]
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) ~
\ [na:1.8.0_292]
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~
\ [na:1.8.0_292]
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) ~
\ [na:1.8.0_292]
at io.cdap.plugin.gcp.bigquery.sink.BigQuerySinkUtils.getBigQueryTableFieldsFromSchema(BigQuerySinkUtils.java:264) ~
\ [na:na]
at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.getBigQueryTableFields(AbstractBigQuerySink.java:389) ~
\ [na:na]
at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.initOutput(AbstractBigQuerySink.java:139) ~
\ [na:na]
at io.cdap.plugin.gcp.bigquery.sink.BigQuerySink.prepareRunInternal(BigQuerySink.java:114) ~
\ [na:na]
at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.prepareRun(AbstractBigQuerySink.java:100) ~
\ [na:na]
at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.prepareRun(AbstractBigQuerySink.java:60) ~
\ [na:na]
at io.cdap.cdap.etl.common.plugin.WrappedBatchSink.lambda$prepareRun$0(WrappedBatchSink.java:52) ~
\ [na:na]
at io.cdap.cdap.etl.common.plugin.Caller$1.call(Caller.java:30) ~
\ [na:na]
at io.cdap.cdap.etl.common.plugin.WrappedBatchSink.prepareRun(WrappedBatchSink.java:51) ~
\ [na:na]
at io.cdap.cdap.etl.common.plugin.WrappedBatchSink.prepareRun(WrappedBatchSink.java:37) ~
\ [na:na]
at io.cdap.cdap.etl.common.submit.SubmitterPlugin.lambda$prepareRun$2(SubmitterPlugin.java:71) ~
\ [na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.lambda$execute$3(AbstractContext.java:517) ~
\ [na:na]
at io.cdap.cdap.data2.transaction.Transactions$CacheBasedTransactional.finishExecute(Transactions.java:224) ~
\ [na:na]
\ [pcontroller-program:default.Array_of_records_v2.-SNAPSHOT.workflow.DataPipelineWorkflow-41b2f138-5e7a-11ec-96b9-0242ac120002-3:i.c.c.a.r.AbstractProgramRuntimeService@564] - RuntimeInfo removed: program_run:default.Array_of_records_v2.-SNAPSHOT.workflow.DataPipelineWorkflow.41b2f138-5e7a-11ec-96b9-0242ac120002
\ [provisioning-task-7:i.c.c.i.p.t.ProvisioningTask@116] - Completed DEPROVISION task for program run program_run:default.Array_of_records_v2.-SNAPSHOT.workflow.DataPipelineWorkflow.41b2f138-5e7a-11ec-96b9-0242ac120002.

Sebastian Echegaray (Jira)

unread,
Dec 22, 2021, 5:03:43 PM12/22/21
to cdap-...@googlegroups.com
Change By: Sebastian Echegaray
Assignee: Prerna Bellara Sebastian Echegaray
Get Jira notifications on your phone! Download the Jira Cloud app for Android or iOS
This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100187-sha1:76abc0a)
Atlassian logo

Sebastian Echegaray (Jira)

unread,
Dec 22, 2021, 5:53:36 PM12/22/21
to cdap-...@googlegroups.com
Get Jira notifications on your phone! Download the Jira Cloud app for Android or iOS
This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100187-sha1:76abc0a)
Atlassian logo

Sebastian Echegaray (Jira)

unread,
Dec 22, 2021, 5:54:30 PM12/22/21
to cdap-...@googlegroups.com
Sebastian Echegaray resolved as Fixed
 
Change By: Sebastian Echegaray
Fix versions: 6.7.0
Resolution: Fixed
Status: Open Resolved
Get Jira notifications on your phone! Download the Jira Cloud app for Android or iOS
This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100187-sha1:76abc0a)
Atlassian logo

Robin Rielley (Jira)

unread,
Dec 23, 2021, 1:42:14 PM12/23/21
to cdap-...@googlegroups.com
Robin Rielley commented on Bug CDAP-18692
 
Re: NullPointerException with Array of Records in BigQuery

Sebastian Echegaray This looks like we need a release note. It looks like a problem with BQ sinks, but your PR says “Allow the BigQuery Plugin to load Array and Structs from BigQuery and convert them to StructureRecord to pass them downstream.”

Can you please provide a little more info for the release notes?

Get Jira notifications on your phone! Download the Jira Cloud app for Android or iOS
This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100187-sha1:76abc0a)
Atlassian logo

Dani SA (Jira)

unread,
Jan 11, 2022, 8:06:34 AM1/11/22
to cdap-...@googlegroups.com

Thanks for your feedback. Would it be possible to get a sandbox for version 6.7.0?

Get Jira notifications on your phone! Download the Jira Cloud app for Android or iOS
This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100188-sha1:6a5f664)
Atlassian logo

Robin Rielley (Jira)

unread,
Jan 11, 2022, 10:27:56 AM1/11/22
to cdap-...@googlegroups.com

Dani SA 6.7 builds are in the dev branch: https://builds.cask.co/browse/CDAP-BUT

Get Jira notifications on your phone! Download the Jira Cloud app for Android or iOS
This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100188-sha1:6a5f664)
Atlassian logo

Dani SA (Jira)

unread,
Jan 12, 2022, 9:51:48 AM1/12/22
to cdap-...@googlegroups.com

Robin Rielley, would it be possible to get the build 6.7.0 as a zip file or a docker image? build process endup with this error (independently from used OS windows/ubuntu):

[INFO] CDAP UI ............................................ FAILURE [05:22 min]
[INFO] CDAP Operational Stats Core ........................ SKIPPED
[INFO] CDAP Google Cloud Dataproc Runtime Extension ....... SKIPPED
[INFO] CDAP Amazon EMR Runtime Extension .................. SKIPPED
[INFO] CDAP Remote Hadoop Runtime Extension ............... SKIPPED
[INFO] CDAP Google Cloud KMS Secure Store Extension ....... SKIPPED
[INFO] CDAP Standalone .................................... SKIPPED
[INFO] CDAP Java Client Tests ............................. SKIPPED
[INFO] CDAP CLI Tests ..................................... SKIPPED
[INFO] CDAP Master ........................................ SKIPPED
[INFO] CDAP Integration Test Framework .................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  01:57 h
[INFO] Finished at: 2022-01-12T14:27:08+01:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal com.github.eirslett:frontend-maven-plugin:1.6:yarn (install-node-gyp) on project cdap-ui: Failed to run task: 'yarn global add node...@8.3.0' failed. org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <args> -rf :cdap-ui
Get Jira notifications on your phone! Download the Jira Cloud app for Android or iOS
This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100188-sha1:6a5f664)
Atlassian logo

Robin Rielley (Jira)

unread,
Jan 12, 2022, 12:30:27 PM1/12/22
to cdap-...@googlegroups.com
Get Jira notifications on your phone! Download the Jira Cloud app for Android or iOS
This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100188-sha1:6a5f664)
Atlassian logo

Prerna Bellara (Jira)

unread,
Apr 25, 2022, 4:34:54 PM4/25/22
to cdap-...@googlegroups.com
Prerna Bellara updated an issue
 
Change By: Prerna Bellara
Priority: Critical Blocker
Get Jira notifications on your phone! Download the Jira Cloud app for Android or iOS
This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100198-sha1:943baf4)
Atlassian logo

Sebastian Echegaray (Jira)

unread,
May 3, 2022, 11:42:13 AM5/3/22
to cdap-...@googlegroups.com
Sebastian Echegaray updated an issue
Change By: Sebastian Echegaray
Release Notes: Fixed a bug that cause Null Pointer Exceptions when dealing with Array of Records  in BigQuery
Get Jira notifications on your phone! Download the Jira Cloud app for Android or iOS
This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100198-sha1:4037a5b)
Atlassian logo

Sebastian Echegaray (Jira)

unread,
May 23, 2022, 12:40:37 PM5/23/22
to cdap-...@googlegroups.com
Sebastian Echegaray closed an issue as Fixed
Change By: Sebastian Echegaray
Triaged: Yes
Status: Resolved Closed
Get Jira notifications on your phone! Download the Jira Cloud app for Android or iOS
This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100198-sha1:ecce247)
Atlassian logo
Reply all
Reply to author
Forward
0 new messages