Error in Transforming your Data section of Introduction to CDAP

79 views
Skip to first unread message

Peter Quirk

unread,
Sep 29, 2016, 9:44:14 AM9/29/16
to CDAP User
I have been working my through the Introduction to CDAP, initially with 3.5.1 standalone (JVM) and now with the 4.0.0-SNAPSHOT standalone (JVM), both on a Windows PC. In both cases I ran into an error at the following step.


cdap (http://USORH-PQUIRK1:11015/namespace:default)> load stream logEventStream examples/resources/accesslog.txt
Successfully loaded file to stream 'logEventStream'

cdap (http://USORH-PQUIRK1:11015/namespace:default)> list dataset instances
+==============================================================================================+
| name | type | description |
+==============================================================================================+
| _auditLog | co.cask.tracker.entity.AuditLogTable | |
| _kafkaOffset | co.cask.cdap.api.dataset.lib.KeyValueTable | |
| _auditMetrics | co.cask.tracker.entity.AuditMetricsCube | |
| _auditTagsTable | co.cask.tracker.entity.AuditTagsTable | |
| _timeSinceTable | co.cask.tracker.entity.LatestEntityTable | |
| logEventStream_converted | co.cask.cdap.api.dataset.lib.TimePartitionedFileSet | |
+==============================================================================================+

cdap (http://USORH-PQUIRK1:11015/namespace:default)> execute 'describe dataset_logEventStream_converted'
Error: co.cask.cdap.explore.service.ExploreException: Cannot execute query. Reason: Response code: 400, message: 'Bad Request', body: '[SQLState 42S02] Error while compiling statem
ent: FAILED: SemanticException [Error 10001]: Table not found dataset_logEventStream_converted'

cdap (http://USORH-PQUIRK1:11015/namespace:default)>

If I inspect the pipeline logs in Hydrator, I see several audit errors related to the absence of a Kafka producer and an error "Input record does not contain the date field".

2016-09-29 09:25:01,615 - INFO  [NettyHttpService STARTING:c.c.h.NettyHttpService@257] - Starting WorkflowDriver http service on address /127.0.0.1:0...
2016-09-29 09:25:01,618 - INFO  [NettyHttpService STARTING:c.c.h.NettyHttpService@262] - Started WorkflowDriver http service on address /127.0.0.1:60823
2016-09-29 09:25:01,663 - INFO  [WorkflowDriver:c.c.c.i.a.r.w.WorkflowProgramController@78] - Workflow service workflow.default.logEventStreamConverter.DataPipelineWorkflow.1f3039fa-8648-11e6-9fd2-000000341358 announced at /127.0.0.1:60823
2016-09-29 09:25:01,663 - INFO  [WorkflowDriver:c.c.c.i.a.r.w.WorkflowDriver@554] - Start workflow execution for DataPipelineWorkflow
2016-09-29 09:25:01,664 - DEBUG [WorkflowDriver:c.c.c.i.a.r.w.WorkflowDriver@555] - Workflow specification is WorkflowSpecification{className='co.cask.cdap.datapipeline.SmartWorkflow', name='DataPipelineWorkflow', description='Data Pipeline Workflow', properties={pipeline.spec={"endingActions":[],"driverResources":{"virtualCores":1,"memoryMB":512},"stages":[{"name":"Projection","plugin":{"type":"transform","name":"Projection","properties":{"drop":"headers"},"artifact":{"name":"core-plugins","version":{"version":"1.5.0-SNAPSHOT","major":1,"minor":5,"fix":0,"suffix":"SNAPSHOT"},"scope":"SYSTEM"}},"inputSchemas":{},"inputs":["Stream"],"outputs":["TPFSAvro"]},{"name":"TPFSAvro","plugin":{"type":"batchsink","name":"TPFSAvro","properties":{"schema":"{\r\n              \"type\":\"record\",\r\n              \"name\":\"etlSchemaBody\",\r\n              \"fields\":[\r\n                {\"name\":\"ts\",\"type\":\"long\"},\r\n                {\"name\":\"remote_host\",\"type\":[\"string\",\"null\"]},\r\n                {\"name\":\"remote_login\",\"type\":[\"string\",\"null\"]},\r\n                {\"name\":\"auth_user\",\"type\":[\"string\",\"null\"]},\r\n                {\"name\":\"date\",\"type\":[\"string\",\"null\"]},\r\n                {\"name\":\"request\",\"type\":[\"string\",\"null\"]},\r\n                {\"name\":\"status\",\"type\":[\"int\",\"null\"]},\r\n                {\"name\":\"content_length\",\"type\":[\"int\",\"null\"]},\r\n                {\"name\":\"referrer\",\"type\":[\"string\",\"null\"]},\r\n                {\"name\":\"user_agent\",\"type\":[\"string\",\"null\"]}]}","name":"logEventStream_converted","basePath":"logEventStream_converted"},"artifact":{"name":"core-plugins","version":{"version":"1.5.0-SNAPSHOT","major":1,"minor":5,"fix":0,"suffix":"SNAPSHOT"},"scope":"SYSTEM"}},"inputSchemas":{},"inputs":["Projection"],"outputs":[]},{"name":"Stream","plugin":{"type":"batchsource","name":"Stream","properties":{"format":"clf","name":"logEventStream","duration":"5m"},"artifact":{"name":"core-plugins","version":{"version":"1.5.0-SNAPSHOT","major":1,"minor":5,"fix":0,"suffix":"SNAPSHOT"},"scope":"SYSTEM"}},"inputSchemas":{},"inputs":[],"outputs":["Projection"]}],"connections":[{"from":"Stream","to":"Projection"},{"from":"Projection","to":"TPFSAvro"}],"resources":{"virtualCores":1,"memoryMB":512},"stageLoggingEnabled":true}}, nodes=[WorkflowActionNode{nodeId=phase-1program=ScheduleProgramInfo{programName='phase-1', programType=MAPREDUCE}, actionSpecification=null, customActionSpecification=null}], localDatasetSpecs={}}
2016-09-29 09:25:01,824 - INFO  [action-phase-1-0:c.c.c.i.w.ProgramWorkflowAction@62] - Starting Program for workflow action: phase-1
2016-09-29 09:25:01,852 - INFO  [action-phase-1-0:c.c.c.i.a.r.b.MapReduceProgramRunner@198] - Starting MapReduce Job: name=phase-1, jobId=null, namespaceId=default, applicationId=logEventStreamConverter, program=phase-1, runid=202918eb-8648-11e6-975a-0000001e3ac4
2016-09-29 09:25:07,150 - INFO  [MapReduceRunner-phase-1:c.c.h.p.b.s.StreamBatchSource@124] - Setting input to Stream : logEventStream
2016-09-29 09:25:07,201 - INFO  [MapReduceRunner-phase-1:c.c.h.p.b.s.TimePartitionedFileSetSink@87] - Writing to output partition of time 1475155500000.
2016-09-29 09:25:07,207 - ERROR [MapReduceRunner-phase-1:c.c.c.d.a.KafkaAuditPublisher@74] - Got exception publishing audit message AuditMessage{version=1, time=1475155507206, entityId=dataset:default.logEventStream_converted, user='', type=ACCESS, payload=AccessPayload{accessType=WRITE, accessor=program_run:default.logEventStreamConverter.-SNAPSHOT.mapreduce.phase-1.202918eb-8648-11e6-975a-0000001e3ac4} AuditPayload{}}. Exception:
java.util.concurrent.ExecutionException: java.lang.IllegalStateException: No kafka producer available.
	at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:294) ~[com.google.guava.guava-13.0.1.jar:na]
	at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:281) ~[com.google.guava.guava-13.0.1.jar:na]
	at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[com.google.guava.guava-13.0.1.jar:na]
	at co.cask.cdap.data2.audit.KafkaAuditPublisher.publish(KafkaAuditPublisher.java:72) ~[co.cask.cdap.cdap-data-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.data2.audit.AuditPublishers.publishAccess(AuditPublishers.java:77) [co.cask.cdap.cdap-data-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.data2.metadata.writer.LineageWriterDatasetFramework.publishAudit(LineageWriterDatasetFramework.java:215) [co.cask.cdap.cdap-data-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.data2.metadata.writer.LineageWriterDatasetFramework.writeLineage(LineageWriterDatasetFramework.java:190) [co.cask.cdap.cdap-data-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.data2.dataset2.ForwardingDatasetFramework.writeLineage(ForwardingDatasetFramework.java:169) [co.cask.cdap.cdap-data-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.internal.app.runtime.workflow.NameMappedDatasetFramework.writeLineage(NameMappedDatasetFramework.java:154) [co.cask.cdap.cdap-app-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.data.dataset.SystemDatasetInstantiator.writeLineage(SystemDatasetInstantiator.java:111) [co.cask.cdap.cdap-data-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.data2.dataset2.SingleThreadDatasetCache$LineageRecordingDatasetCache.get(SingleThreadDatasetCache.java:144) [co.cask.cdap.cdap-data-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.data2.dataset2.SingleThreadDatasetCache$LineageRecordingDatasetCache.get(SingleThreadDatasetCache.java:128) [co.cask.cdap.cdap-data-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.data2.dataset2.SingleThreadDatasetCache.getDataset(SingleThreadDatasetCache.java:171) [co.cask.cdap.cdap-data-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.data2.dataset2.DynamicDatasetCache.getDataset(DynamicDatasetCache.java:202) [co.cask.cdap.cdap-data-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.data2.dataset2.DynamicDatasetCache.getDataset(DynamicDatasetCache.java:161) [co.cask.cdap.cdap-data-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.internal.app.runtime.AbstractContext.getDataset(AbstractContext.java:238) [co.cask.cdap.cdap-app-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.internal.app.runtime.batch.BasicMapReduceContext.addOutput(BasicMapReduceContext.java:296) [co.cask.cdap.cdap-app-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.etl.batch.mapreduce.MapReduceSinkContext$3.call(MapReduceSinkContext.java:85) [cdap-etl-batch-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.etl.batch.mapreduce.MapReduceSinkContext$3.call(MapReduceSinkContext.java:81) [cdap-etl-batch-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.etl.log.LogContext.runUnchecked(LogContext.java:145) [cdap-etl-core-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.etl.log.LogContext.runWithoutLoggingUnchecked(LogContext.java:139) [cdap-etl-core-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.etl.batch.mapreduce.MapReduceSinkContext.addOutput(MapReduceSinkContext.java:81) [cdap-etl-batch-4.0.0-SNAPSHOT.jar:na]
	at co.cask.hydrator.plugin.batch.sink.TimePartitionedFileSetSink.prepareRun(TimePartitionedFileSetSink.java:93) [1475155501857-0/:na]
	at co.cask.hydrator.plugin.batch.sink.TimePartitionedFileSetSink.prepareRun(TimePartitionedFileSetSink.java:42) [1475155501857-0/:na]
	at co.cask.cdap.etl.batch.LoggedBatchConfigurable$1.call(LoggedBatchConfigurable.java:44) [cdap-etl-batch-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.etl.batch.LoggedBatchConfigurable$1.call(LoggedBatchConfigurable.java:41) [cdap-etl-batch-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.etl.log.LogContext.run(LogContext.java:59) [cdap-etl-core-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.etl.batch.LoggedBatchConfigurable.prepareRun(LoggedBatchConfigurable.java:41) [cdap-etl-batch-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.etl.batch.mapreduce.ETLMapReduce.initialize(ETLMapReduce.java:206) [cdap-etl-batch-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.api.mapreduce.AbstractMapReduce.initialize(AbstractMapReduce.java:171) [co.cask.cdap.cdap-api-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.api.mapreduce.AbstractMapReduce.initialize(AbstractMapReduce.java:33) [co.cask.cdap.cdap-api-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.internal.app.runtime.batch.MapReduceRuntimeService$2.call(MapReduceRuntimeService.java:506) [co.cask.cdap.cdap-app-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.internal.app.runtime.batch.MapReduceRuntimeService$2.call(MapReduceRuntimeService.java:499) [co.cask.cdap.cdap-app-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.data2.transaction.Transactions.execute(Transactions.java:165) [co.cask.cdap.cdap-data-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.internal.app.runtime.batch.MapReduceRuntimeService.beforeSubmit(MapReduceRuntimeService.java:499) [co.cask.cdap.cdap-app-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.internal.app.runtime.batch.MapReduceRuntimeService.startUp(MapReduceRuntimeService.java:217) [co.cask.cdap.cdap-app-fabric-4.0.0-SNAPSHOT.jar:na]
	at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:47) [com.google.guava.guava-13.0.1.jar:na]
	at co.cask.cdap.internal.app.runtime.batch.MapReduceRuntimeService$1$1.run(MapReduceRuntimeService.java:417) [co.cask.cdap.cdap-app-fabric-4.0.0-SNAPSHOT.jar:na]
	at java.lang.Thread.run(Unknown Source) [na:1.8.0_101]
Caused by: java.lang.IllegalStateException: No kafka producer available.
	at org.apache.twill.internal.kafka.client.SimpleKafkaPublisher$SimplePreparer.send(SimpleKafkaPublisher.java:120) ~[org.apache.twill.twill-core-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
	... 36 common frames omitted
2016-09-29 09:25:07,212 - DEBUG [MapReduceRunner-phase-1:c.c.c.d.m.w.BasicLineageWriter@63] - Writing access for run program_run:default.logEventStreamConverter.-SNAPSHOT.mapreduce.phase-1.202918eb-8648-11e6-975a-0000001e3ac4, dataset dataset:default.logEventStream_converted, accessType WRITE, component null, accessTime = 1475155507212
2016-09-29 09:25:07,284 - DEBUG [MapReduceRunner-phase-1:c.c.c.d.m.w.BasicLineageWriter@81] - Writing access for run program_run:default.logEventStreamConverter.-SNAPSHOT.mapreduce.phase-1.202918eb-8648-11e6-975a-0000001e3ac4, stream stream:default.logEventStream, accessType READ, component null, accessTime = 1475155507284
2016-09-29 09:25:07,299 - ERROR [MapReduceRunner-phase-1:c.c.c.d.a.KafkaAuditPublisher@74] - Got exception publishing audit message AuditMessage{version=1, time=1475155507299, entityId=stream:default.logEventStream, user='', type=ACCESS, payload=AccessPayload{accessType=READ, accessor=program_run:default.logEventStreamConverter.-SNAPSHOT.mapreduce.phase-1.202918eb-8648-11e6-975a-0000001e3ac4} AuditPayload{}}. Exception:
java.util.concurrent.ExecutionException: java.lang.IllegalStateException: No kafka producer available.
	at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:294) ~[com.google.guava.guava-13.0.1.jar:na]
	at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:281) ~[com.google.guava.guava-13.0.1.jar:na]
	at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[com.google.guava.guava-13.0.1.jar:na]
	at co.cask.cdap.data2.audit.KafkaAuditPublisher.publish(KafkaAuditPublisher.java:72) ~[co.cask.cdap.cdap-data-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.data2.audit.AuditPublishers.publishAccess(AuditPublishers.java:72) [co.cask.cdap.cdap-data-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.data2.transaction.stream.FileStreamAdmin.addAccess(FileStreamAdmin.java:593) [co.cask.cdap.cdap-data-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.internal.app.runtime.batch.MapReduceRuntimeService.setDecoderForStream(MapReduceRuntimeService.java:701) [co.cask.cdap.cdap-app-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.internal.app.runtime.batch.MapReduceRuntimeService.setInputsIfNeeded(MapReduceRuntimeService.java:664) [co.cask.cdap.cdap-app-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.internal.app.runtime.batch.MapReduceRuntimeService.access$200(MapReduceRuntimeService.java:135) [co.cask.cdap.cdap-app-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.internal.app.runtime.batch.MapReduceRuntimeService$2.call(MapReduceRuntimeService.java:514) [co.cask.cdap.cdap-app-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.internal.app.runtime.batch.MapReduceRuntimeService$2.call(MapReduceRuntimeService.java:499) [co.cask.cdap.cdap-app-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.data2.transaction.Transactions.execute(Transactions.java:165) [co.cask.cdap.cdap-data-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.internal.app.runtime.batch.MapReduceRuntimeService.beforeSubmit(MapReduceRuntimeService.java:499) [co.cask.cdap.cdap-app-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.internal.app.runtime.batch.MapReduceRuntimeService.startUp(MapReduceRuntimeService.java:217) [co.cask.cdap.cdap-app-fabric-4.0.0-SNAPSHOT.jar:na]
	at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:47) [com.google.guava.guava-13.0.1.jar:na]
	at co.cask.cdap.internal.app.runtime.batch.MapReduceRuntimeService$1$1.run(MapReduceRuntimeService.java:417) [co.cask.cdap.cdap-app-fabric-4.0.0-SNAPSHOT.jar:na]
	at java.lang.Thread.run(Unknown Source) [na:1.8.0_101]
Caused by: java.lang.IllegalStateException: No kafka producer available.
	at org.apache.twill.internal.kafka.client.SimpleKafkaPublisher$SimplePreparer.send(SimpleKafkaPublisher.java:120) ~[org.apache.twill.twill-core-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
	... 14 common frames omitted
2016-09-29 09:25:07,306 - DEBUG [MapReduceRunner-phase-1:c.c.c.i.a.r.b.MapReduceRuntimeService@713] - Using as output for MapReduce Job: [logEventStream_converted-5c456bf5-ff8d-4ab2-bf8b-b2ff1126ad73]
2016-09-29 09:25:07,306 - DEBUG [MapReduceRunner-phase-1:c.c.c.i.a.r.b.MapReduceRuntimeService@949] - Set output value class to class java.lang.Object
2016-09-29 09:25:07,310 - DEBUG [MapReduceRunner-phase-1:c.c.c.i.a.r.b.MapReduceRuntimeService@987] - Set map output key class to class java.lang.Object
2016-09-29 09:25:07,310 - DEBUG [MapReduceRunner-phase-1:c.c.c.i.a.r.b.MapReduceRuntimeService@992] - Set map output value class to class java.lang.Object
2016-09-29 09:25:07,311 - ERROR [MapReduceRunner-phase-1:c.c.c.d.a.KafkaAuditPublisher@74] - Got exception publishing audit message AuditMessage{version=1, time=1475155507311, entityId=dataset:default.logEventStream_converted, user='', type=ACCESS, payload=AccessPayload{accessType=READ, accessor=program_run:default.logEventStreamConverter.-SNAPSHOT.mapreduce.phase-1.202918eb-8648-11e6-975a-0000001e3ac4} AuditPayload{}}. Exception:
java.util.concurrent.ExecutionException: java.lang.IllegalStateException: No kafka producer available.
	at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:294) ~[com.google.guava.guava-13.0.1.jar:na]
	at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:281) ~[com.google.guava.guava-13.0.1.jar:na]
	at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[com.google.guava.guava-13.0.1.jar:na]
	at co.cask.cdap.data2.audit.KafkaAuditPublisher.publish(KafkaAuditPublisher.java:72) ~[co.cask.cdap.cdap-data-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.data2.audit.AuditPublishers.publishAccess(AuditPublishers.java:72) [co.cask.cdap.cdap-data-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.data2.metadata.writer.LineageWriterDatasetFramework.publishAudit(LineageWriterDatasetFramework.java:215) [co.cask.cdap.cdap-data-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.data2.metadata.writer.LineageWriterDatasetFramework.writeLineage(LineageWriterDatasetFramework.java:190) [co.cask.cdap.cdap-data-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.data2.metadata.writer.LineageWriterDatasetFramework$BasicDatasetAccessRecorder.recordLineage(LineageWriterDatasetFramework.java:267) [co.cask.cdap.cdap-data-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.data2.dataset2.DefaultDatasetRuntimeContext.recordAccess(DefaultDatasetRuntimeContext.java:178) [co.cask.cdap.cdap-data-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.data2.dataset2.DefaultDatasetRuntimeContext.onMethodEntry(DefaultDatasetRuntimeContext.java:166) [co.cask.cdap.cdap-data-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.data2.dataset2.lib.file.FileSetDataset.exists(FileSetDataset.java) [na:na]
	at co.cask.cdap.data2.dataset2.lib.file.FileSetDataset.access$300(FileSetDataset.java:61) [na:na]
	at co.cask.cdap.data2.dataset2.lib.file.FileSetDataset$FileSetLocation.exists(FileSetDataset.java:405) [co.cask.cdap.cdap-data-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.data2.dataset2.lib.partitioned.PartitionedFileSetDataset.postTxCommit(PartitionedFileSetDataset.java:155) [na:na]
	at org.apache.tephra.TransactionContext.postCommit(TransactionContext.java:311) [org.apache.tephra.tephra-core-0.8.0-incubating.jar:0.8.0-incubating]
	at org.apache.tephra.TransactionContext.finish(TransactionContext.java:119) [org.apache.tephra.tephra-core-0.8.0-incubating.jar:0.8.0-incubating]
	at co.cask.cdap.data2.dataset2.SingleThreadDatasetCache$DelayedDiscardingTransactionContext.finish(SingleThreadDatasetCache.java:410) [co.cask.cdap.cdap-data-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.data2.transaction.Transactions.execute(Transactions.java:172) [co.cask.cdap.cdap-data-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.internal.app.runtime.batch.MapReduceRuntimeService.beforeSubmit(MapReduceRuntimeService.java:499) [co.cask.cdap.cdap-app-fabric-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.internal.app.runtime.batch.MapReduceRuntimeService.startUp(MapReduceRuntimeService.java:217) [co.cask.cdap.cdap-app-fabric-4.0.0-SNAPSHOT.jar:na]
	at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:47) [com.google.guava.guava-13.0.1.jar:na]
	at co.cask.cdap.internal.app.runtime.batch.MapReduceRuntimeService$1$1.run(MapReduceRuntimeService.java:417) [co.cask.cdap.cdap-app-fabric-4.0.0-SNAPSHOT.jar:na]
	at java.lang.Thread.run(Unknown Source) [na:1.8.0_101]
Caused by: java.lang.IllegalStateException: No kafka producer available.
	at org.apache.twill.internal.kafka.client.SimpleKafkaPublisher$SimplePreparer.send(SimpleKafkaPublisher.java:120) ~[org.apache.twill.twill-core-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
	... 20 common frames omitted
2016-09-29 09:25:07,316 - DEBUG [MapReduceRunner-phase-1:c.c.c.d.m.w.BasicLineageWriter@63] - Writing access for run program_run:default.logEventStreamConverter.-SNAPSHOT.mapreduce.phase-1.202918eb-8648-11e6-975a-0000001e3ac4, dataset dataset:default.logEventStream_converted, accessType READ, component null, accessTime = 1475155507316
2016-09-29 09:25:07,333 - DEBUG [MapReduceRunner-phase-1:c.c.c.i.a.r.b.MapReduceRuntimeService@806] - Creating Job jar: C:\Users\pquirk\Documents\cdap-sdk-4.0-preview\data\tmp\runner\mapreduce.default.logEventStreamConverter.phase-1.202918eb-8648-11e6-975a-0000001e3ac4\job.jar
2016-09-29 09:25:07,339 - INFO  [MapReduceRunner-phase-1:c.c.c.i.a.r.b.i.LocalClientProtocolProvider@42] - Using framework: local
2016-09-29 09:25:07,339 - INFO  [MapReduceRunner-phase-1:c.c.c.i.a.r.b.i.LocalClientProtocolProvider@50] - Using tracker: clocal
2016-09-29 09:25:07,346 - WARN  [MapReduceRunner-phase-1:o.a.h.m.JobSubmitter@150] - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2016-09-29 09:25:07,435 - WARN  [MapReduceRunner-phase-1:o.a.h.c.Configuration@2345] - file:/C:/Users/pquirk/Documents/cdap-sdk-4.0-preview/data/tmp/runner/mapreduce.default.logEventStreamConverter.phase-1.202918eb-8648-11e6-975a-0000001e3ac4/mapreduce/staging/pquirk1814824884/.staging/job_local1814824884_0188/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
2016-09-29 09:25:07,436 - WARN  [MapReduceRunner-phase-1:o.a.h.c.Configuration@2345] - file:/C:/Users/pquirk/Documents/cdap-sdk-4.0-preview/data/tmp/runner/mapreduce.default.logEventStreamConverter.phase-1.202918eb-8648-11e6-975a-0000001e3ac4/mapreduce/staging/pquirk1814824884/.staging/job_local1814824884_0188/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
2016-09-29 09:25:07,508 - WARN  [MapReduceRunner-phase-1:o.a.h.c.Configuration@2345] - file:/C:/Users/pquirk/Documents/cdap-sdk-4.0-preview/data/tmp/runner/mapreduce.default.logEventStreamConverter.phase-1.202918eb-8648-11e6-975a-0000001e3ac4/mapreduce/local/localRunner/pquirk/job_local1814824884_0188/job_local1814824884_0188.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
2016-09-29 09:25:07,510 - WARN  [MapReduceRunner-phase-1:o.a.h.c.Configuration@2345] - file:/C:/Users/pquirk/Documents/cdap-sdk-4.0-preview/data/tmp/runner/mapreduce.default.logEventStreamConverter.phase-1.202918eb-8648-11e6-975a-0000001e3ac4/mapreduce/local/localRunner/pquirk/job_local1814824884_0188/job_local1814824884_0188.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
2016-09-29 09:25:07,510 - INFO  [MapReduceRunner-phase-1:c.c.c.i.a.r.b.MapReduceRuntimeService@316] - Submitted MapReduce Job: name=phase-1, jobId=job_local1814824884_0188, namespaceId=default, applicationId=logEventStreamConverter, program=phase-1, runid=202918eb-8648-11e6-975a-0000001e3ac4.
2016-09-29 09:25:08,337 - WARN  [Thread-877:o.a.h.m.LocalJobRunnerWithFix@562] - Error cleaning up job: job_local1814824884_0188
java.lang.Exception: java.lang.IllegalArgumentException: Input record does not contain the date field.
	at org.apache.hadoop.mapred.LocalJobRunnerWithFix$Job.runTasks(LocalJobRunnerWithFix.java:465) ~[co.cask.cdap.cdap-app-fabric-4.0.0-SNAPSHOT.jar:na]
	at org.apache.hadoop.mapred.LocalJobRunnerWithFix$Job.run(LocalJobRunnerWithFix.java:524) ~[co.cask.cdap.cdap-app-fabric-4.0.0-SNAPSHOT.jar:na]
java.lang.IllegalArgumentException: Input record does not contain the date field.
	at co.cask.hydrator.plugin.common.StructuredToAvroTransformer.transform(StructuredToAvroTransformer.java:67) ~[1475155501857-0/:na]
	at co.cask.hydrator.plugin.common.StructuredToAvroTransformer.transform(StructuredToAvroTransformer.java:51) ~[1475155501857-0/:na]
	at co.cask.hydrator.plugin.batch.sink.TimePartitionedFileSetDatasetAvroSink.transform(TimePartitionedFileSetDatasetAvroSink.java:71) ~[1475155501857-0/:na]
	at co.cask.hydrator.plugin.batch.sink.TimePartitionedFileSetDatasetAvroSink.transform(TimePartitionedFileSetDatasetAvroSink.java:42) ~[1475155501857-0/:na]
	at co.cask.cdap.etl.batch.KVTransformations$KVSinkTransformation.transform(KVTransformations.java:123) ~[cdap-etl-batch-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.etl.batch.KVTransformations$KVSinkTransformation.transform(KVTransformations.java:111) ~[cdap-etl-batch-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.etl.common.TrackedTransform.transform(TrackedTransform.java:58) ~[cdap-etl-core-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.etl.common.TransformExecutor.executeTransformation(TransformExecutor.java:88) ~[cdap-etl-core-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.etl.common.TransformExecutor.executeTransformation(TransformExecutor.java:93) ~[cdap-etl-core-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.etl.common.TransformExecutor.executeTransformation(TransformExecutor.java:93) ~[cdap-etl-core-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.etl.common.TransformExecutor.runOneIteration(TransformExecutor.java:50) ~[cdap-etl-core-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.etl.batch.mapreduce.TransformRunner.transform(TransformRunner.java:165) ~[cdap-etl-batch-4.0.0-SNAPSHOT.jar:na]
	at co.cask.cdap.etl.batch.mapreduce.ETLMapReduce$ETLMapper.map(ETLMapReduce.java:356) ~[cdap-etl-batch-4.0.0-SNAPSHOT.jar:na]
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) ~[org.apache.hadoop.hadoop-mapreduce-client-core-2.3.0.jar:na]
	at co.cask.cdap.internal.app.runtime.batch.MapperWrapper.run(MapperWrapper.java:117) ~[co.cask.cdap.cdap-app-fabric-4.0.0-SNAPSHOT.jar:na]
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) ~[org.apache.hadoop.hadoop-mapreduce-client-core-2.3.0.jar:na]
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) ~[org.apache.hadoop.hadoop-mapreduce-client-core-2.3.0.jar:na]
	at org.apache.hadoop.mapred.LocalJobRunnerWithFix$Job$MapTaskRunnable.run(LocalJobRunnerWithFix.java:243) ~[co.cask.cdap.cdap-app-fabric-4.0.0-SNAPSHOT.jar:na]
	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[na:1.8.0_101]
	at java.util.concurrent.FutureTask.run(Unknown Source) ~[na:1.8.0_101]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[na:1.8.0_101]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[na:1.8.0_101]
	at java.lang.Thread.run(Unknown Source) ~[na:1.8.0_101]
2016-09-29 09:25:08,519 - INFO  [MapReduceRunner-phase-1:c.c.c.i.a.r.b.MapReduceRuntimeService@348] - MapReduce Job is complete, status: false, job: name=phase-1, jobId=job_local1814824884_0188, namespaceId=default, applicationId=logEventStreamConverter, program=phase-1, runid=202918eb-8648-11e6-975a-0000001e3ac4
2016-09-29 09:25:10,523 - INFO  [MapReduceRunner-phase-1:o.a.t.TransactionManager@1013] - Tx invalid list: added tx 1475155507335000000 because of invalidate
2016-09-29 09:25:10,527 - INFO  [MapReduceRunner-phase-1:c.c.c.e.b.m.ETLMapReduce@330] - Batch Run finished : status = ProgramState{status=FAILED, failureInfo='NA'}
2016-09-29 09:25:11,103 - ERROR [MapReduceRunner-phase-1:c.c.c.i.a.r.ProgramControllerServiceAdapter@81] - Program terminated with exception
java.lang.IllegalStateException: MapReduce execution failure: job-id : job_local1814824884_0188uber-mode : falsemap-progress : 1.0reduce-progress : 0.0cleanup-progress : 1.0setup-progress : 1.0runstate : FAILEDstart-time : 0user-name : pquirkpriority : NORMALscheduling-info : NAnum-used-slots0num-reserved-slots0used-mem0reserved-mem0needed-mem0
	at com.google.common.base.Preconditions.checkState(Preconditions.java:176) ~[com.google.guava.guava-13.0.1.jar:na]
	at co.cask.cdap.internal.app.runtime.batch.MapReduceRuntimeService.run(MapReduceRuntimeService.java:358) ~[co.cask.cdap.cdap-app-fabric-4.0.0-SNAPSHOT.jar:na]
	at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:52) ~[com.google.guava.guava-13.0.1.jar:na]
	at co.cask.cdap.internal.app.runtime.batch.MapReduceRuntimeService$1$1.run(MapReduceRuntimeService.java:417) [co.cask.cdap.cdap-app-fabric-4.0.0-SNAPSHOT.jar:na]
	at java.lang.Thread.run(Unknown Source) [na:1.8.0_101]
2016-09-29 09:25:11,107 - INFO  [action-phase-1-0:c.c.c.i.w.ProgramWorkflowAction@70] - Failed to execute MAPREDUCE Program phase-1 in workflow
java.lang.IllegalStateException: MapReduce execution failure: job-id : job_local1814824884_0188uber-mode : falsemap-progress : 1.0reduce-progress : 0.0cleanup-progress : 1.0setup-progress : 1.0runstate : FAILEDstart-time : 0user-name : pquirkpriority : NORMALscheduling-info : NAnum-used-slots0num-reserved-slots0used-mem0reserved-mem0needed-mem0
	at com.google.common.base.Preconditions.checkState(Preconditions.java:176) ~[com.google.guava.guava-13.0.1.jar:na]
	at co.cask.cdap.internal.app.runtime.batch.MapReduceRuntimeService.run(MapReduceRuntimeService.java:358) ~[co.cask.cdap.cdap-app-fabric-4.0.0-SNAPSHOT.jar:na]
	at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:52) ~[com.google.guava.guava-13.0.1.jar:na]
	at co.cask.cdap.internal.app.runtime.batch.MapReduceRuntimeService$1$1.run(MapReduceRuntimeService.java:417) ~[co.cask.cdap.cdap-app-fabric-4.0.0-SNAPSHOT.jar:na]
	at java.lang.Thread.run(Unknown Source) [na:1.8.0_101]
2016-09-29 09:25:11,108 - INFO  [NettyHttpService STOPPING:c.c.h.NettyHttpService@274] - Stopping WorkflowDriver http service on address /127.0.0.1:60823...
2016-09-29 09:25:11,109 - INFO  [NettyHttpService STOPPING:c.c.h.NettyHttpService@288] - Stopped WorkflowDriver http service on address /127.0.0.1:60823
2016-09-29 09:25:11,123 - INFO  [WorkflowDriver:c.c.c.i.a.r.w.WorkflowProgramController@98] - Workflow service failed from RUNNING. Un-registering service workflow.default.logEventStreamConverter.DataPipelineWorkflow.1f3039fa-8648-11e6-9fd2-000000341358.
java.lang.IllegalStateException: MapReduce execution failure: job-id : job_local1814824884_0188uber-mode : falsemap-progress : 1.0reduce-progress : 0.0cleanup-progress : 1.0setup-progress : 1.0runstate : FAILEDstart-time : 0user-name : pquirkpriority : NORMALscheduling-info : NAnum-used-slots0num-reserved-slots0used-mem0reserved-mem0needed-mem0
	at com.google.common.base.Preconditions.checkState(Preconditions.java:176) ~[com.google.guava.guava-13.0.1.jar:na]
	at co.cask.cdap.internal.app.runtime.batch.MapReduceRuntimeService.run(MapReduceRuntimeService.java:358) ~[co.cask.cdap.cdap-app-fabric-4.0.0-SNAPSHOT.jar:na]
	at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:52) ~[com.google.guava.guava-13.0.1.jar:na]
	at co.cask.cdap.internal.app.runtime.batch.MapReduceRuntimeService$1$1.run(MapReduceRuntimeService.java:417) ~[co.cask.cdap.cdap-app-fabric-4.0.0-SNAPSHOT.jar:na]
	at java.lang.Thread.run(Unknown Source) [na:1.8.0_101]
2016-09-29 09:25:11,123 - INFO  [WorkflowDriver:c.c.c.i.a.r.w.WorkflowProgramController@104] - Service workflow.default.logEventStreamConverter.DataPipelineWorkflow.1f3039fa-8648-11e6-9fd2-000000341358 unregistered.
2016-09-29 09:25:11,123 - INFO  [pcontroller-program:default.logEventStreamConverter.-SNAPSHOT.workflow.DataPipelineWorkflow-1f3039fa-8648-11e6-9fd2-000000341358:c.c.c.i.a.s.ProgramLifecycleService@338] - Program stopped with error program:default.logEventStreamConverter.-SNAPSHOT.workflow.DataPipelineWorkflow, 1f3039fa-8648-11e6-9fd2-000000341358
java.lang.IllegalStateException: MapReduce execution failure: job-id : job_local1814824884_0188uber-mode : falsemap-progress : 1.0reduce-progress : 0.0cleanup-progress : 1.0setup-progress : 1.0runstate : FAILEDstart-time : 0user-name : pquirkpriority : NORMALscheduling-info : NAnum-used-slots0num-reserved-slots0used-mem0reserved-mem0needed-mem0
	at com.google.common.base.Preconditions.checkState(Preconditions.java:176) ~[com.google.guava.guava-13.0.1.jar:na]
	at co.cask.cdap.internal.app.runtime.batch.MapReduceRuntimeService.run(MapReduceRuntimeService.java:358) ~[co.cask.cdap.cdap-app-fabric-4.0.0-SNAPSHOT.jar:na]
	at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:52) ~[com.google.guava.guava-13.0.1.jar:na]
	at co.cask.cdap.internal.app.runtime.batch.MapReduceRuntimeService$1$1.run(MapReduceRuntimeService.java:417) ~[co.cask.cdap.cdap-app-fabric-4.0.0-SNAPSHOT.jar:na]
	at java.lang.Thread.run(Unknown Source) [na:1.8.0_101]
2016-09-29 09:25:11,149 - DEBUG [pcontroller-program:default.logEventStreamConverter.-SNAPSHOT.workflow.DataPipelineWorkflow-1f3039fa-8648-11e6-9fd2-000000341358:c.c.c.a.r.AbstractProgramRuntimeService@436] - Removing RuntimeInfo: Workflow DataPipelineWorkflow 1f3039fa-8648-11e6-9fd2-000000341358
2016-09-29 09:25:11,150 - DEBUG [pcontroller-program:default.logEventStreamConverter.-SNAPSHOT.workflow.DataPipelineWorkflow-1f3039fa-8648-11e6-9fd2-000000341358:c.c.c.a.r.AbstractProgramRuntimeService@439] - RuntimeInfo removed: RuntimeInfo{programId=program:default.logEventStreamConverter.-SNAPSHOT.workflow.DataPipelineWorkflow, twillRunId=null}
 
Sure enough, if you inspect the pipeline in Hydrator you see that the input stream, which is in clf format, contains a 'request_time' field, but the output sink uses the field 'date'.


vin...@cask.co

unread,
Sep 29, 2016, 9:54:48 PM9/29/16
to CDAP User
Hey Peter,

To debug this issue better, could you please provide the pipeline config you are trying to deploy? 

Thanks,
Vinisha

Peter Quirk

unread,
Sep 29, 2016, 10:01:48 PM9/29/16
to CDAP User
It's strictly per the documentation at http://docs.cask.co/cdap/4.0.0-SNAPSHOT/en/introduction/index.html. I believe the tutorial has an error in it.

Peter

vin...@cask.co

unread,
Sep 30, 2016, 3:55:59 PM9/30/16
to CDAP User
Hi Peter,

You are right. I am able to reproduce error "Input record does not contain the date field" in 3.5.1 using this app-config.json. As you have investigated its because of wrong field name. I have filed a JIRA for it: https://issues.cask.co/browse/CDAP-7386. That inconsistency is causing Hive queries to fail which consequently throws:

cdap (http://USORH-PQUIRK1:11015/namespace:default)> execute 'describe dataset_logEventStream_converted'
Error: co.cask.cdap.explore.service.ExploreException: Cannot execute query. Reason: Response code: 400, message: 'Bad Request', body: '[SQLState 42S02] Error while compiling statem
ent: FAILED: SemanticException [Error 10001]: Table not found dataset_logEventStream_converted'

Could you please clean the namespace and deploy and run the pipeline again? I was able to run pipeline and explore the dataset after cleaning up the namespace on 3.5.1.

Just a note: the version of of CDAP you are using is 3.5.1 and the pipeline you have deployed is uses 4.0.0-SNAPSHOT. However, I was able to reproduce this problem with 3.5.1 version as well.

Please let us know if we answered your questions.

Thanks,
Vinisha

Peter Quirk

unread,
Sep 30, 2016, 5:14:50 PM9/30/16
to CDAP User
Thanks Vinisha.

I edited app-config.json and created the app again. I was able to complete that step and the following ones successfully. I only tested with CDAP 4.0.0-SNAPSHOT, but I'm sure that 3.5.1 will behave identically.

Peter
Reply all
Reply to author
Forward
0 new messages