Batch Ingestion fails due to unknown error

21 views
Skip to first unread message

Yong Cheng Toh

unread,
Apr 17, 2017, 11:40:11 PM4/17/17
to Druid User
Hi,

I am using Druid to ingest 1TB worth of files from S3 by using a local cluster on a c4.4xlarge machine.  The error happens after ingesting 1400 files, and it just stops ingesting any more files from there. The error doesn't seem to indicate what is wrong:

2017-04-16T20:09:35,104 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_searches_2017-04-14T02:26:05.089Z, type=index_hadoop, dataSource=searches}]
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:204) ~[druid-indexing-service-0.9.2.jar:0.9.2]
at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:208) ~[druid-indexing-service-0.9.2.jar:0.9.2]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.9.2.jar:0.9.2]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.9.2.jar:0.9.2]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_05]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_05]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_05]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_05]
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_05]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_05]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_05]
at java.lang.reflect.Method.invoke(Method.java:483) ~[?:1.8.0_05]
at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:201) ~[druid-indexing-service-0.9.2.jar:0.9.2]
... 7 more
Caused by: com.metamx.common.ISE: Job[class io.druid.indexer.IndexGeneratorJob] failed!
at io.druid.indexer.JobHelper.runJobs(JobHelper.java:369) ~[druid-indexing-hadoop-0.9.2.jar:0.9.2]
at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:94) ~[druid-indexing-hadoop-0.9.2.jar:0.9.2]
at io.druid.indexing.common.task.HadoopIndexTask$HadoopIndexGeneratorInnerProcessing.runTask(HadoopIndexTask.java:261) ~[druid-indexing-service-0.9.2.jar:0.9.2]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_05]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_05]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_05]
at java.lang.reflect.Method.invoke(Method.java:483) ~[?:1.8.0_05]
at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:201) ~[druid-indexing-service-0.9.2.jar:0.9.2]
... 7 more
2017-04-16T20:10:02,611 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_hadoop_searches_2017-04-14T02:26:05.089Z] status changed to [FAILED].
2017-04-16T20:10:39,674 WARN [Curator-Framework-0] org.apache.curator.ConnectionState - Connection attempt unsuccessful after 232110 (greater than max timeout of 30000). Resetting connection and trying again with a new connection.
2017-04-16T20:10:40,232 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_hadoop_searches_2017-04-14T02:26:05.089Z",
  "status" : "FAILED",
  "duration" : 236634704
}


Settings wise, I am using the conf settings recommended and reduced the heap size requirements a little so that the memory consumed remains within the capacity of the machine. 

I am currently using the following spec to ingest the files from S3, and have proven that it works on a single file from S3 when changing the path to consume one file. However, when trying to ingest 1TB worth of files (total of 10000+ files) from S3, we get the error. 


{
  "type": "index_hadoop",
  "spec": {
    "ioConfig": {
      "type": "hadoop",
      "inputSpec": {
        "type": "static",
        "paths": "s3n://<ACCESS_KEY>:<SECRET_ACCESS_KEY>@<BUCKET>/<PREFIX>/dt=2017-04-04/*"
      }
    },
    "dataSchema": {
      "dataSource": "searches",
      "granularitySpec": {
        "type": "uniform",
        "segmentGranularity": "DAY",
        "queryGranularity": "DAY",
        "intervals": ["2017-04-04/2017-04-05"]
      },
      "parser": {
        "type": "string",
        "parseSpec": {
          "format": "json",
          "flattenSpec": {
            "useFieldDiscovery": true,
            "fields": [
              {
                "type": "path",
                "name": "timestamp",
                "expr": "$.eventHeader.createdOn.unixTimeMillis"
              },
              {
                "type": "path",
                "name": "id",
                "expr": "$.downstreamId.identifier"
              },
              {
                "type": "path",
                "name": "type",
                "expr": "$.type"
              },
              {
                "type": "path",
                "name": "exceptionType",
                "expr": "$.exception.exceptionType"
              },
              {
                "type": "path",
                "name": "qCount",
                "expr": "$.qCount"
              },
              {
                "type": "path",
                "name": "region",
                "expr": "$.eventHeader.serviceInstance.region"
              },
              {
                "type": "path",
                "name": "searchKind",
                "expr": "$.search.kind"
              },
              {
                "type": "path",
                "name": "engine",
                "expr": "$.engineName"
              },
              {
                "type": "path",
                "name": "requestClient",
                "expr": "$.requestClientKind"
              }
            ]
          },
          "dimensionsSpec": {
            "dimensions": [
              "id",
              "type",
              "exceptionType",
              "region",
              "searchKind",
              "engine",
              "requestClient"
            ],
            "dimensionExclusions" : [],
            "spatialDimensions" : []
          },
          "timestampSpec": {
            "format": "millis",
            "column": "timestamp"
          }
        }
      },
      "metricsSpec": [
        {
          "name": "count",
          "type": "count"
        },
        {
          "type" : "longSum",
          "name" : "qCountSum",
          "fieldName" : "qCount"
        }
      ]
    },
    "tuningConfig": {
      "type": "hadoop",
      "jobProperties": {
        "fs.s3n.awsAccessKeyId": "<ACCESS_KEY>",
        "fs.s3n.awsSecretAccessKey": "<SECRET_KEY>",
        "fs.s3n.impl": "org.apache.hadoop.fs.s3native.NativeS3FileSystem"
      }
    }
  }
}


Hopefully, there is someone out there who knows about the possible issue related to this.

Thanks!

Yong Cheng


Reply all
Reply to author
Forward
0 new messages