Re: [druid-user] Re: No Segments created - Kafka Indexing Service

768 views

Skip to first unread message

Chakravarthy varaga

unread,

Jul 20, 2018, 8:33:29 AM7/20/18

to druid...@googlegroups.com

Hi Jonathan

First of all, thanks for your response. I attempted to redo some of the changes you asked me to. I have cleansed the future timestamp data + removed the DEBUG log.

I have initiated a new task now on the real time kafka indexing task. The task was running fine for few days until I changed the cluster/resources yesterday.

The problem is similar to what has been cited in this mail thread. Once the MM stopped for few hours and reconnected to recover from the previous start, it failed to create the segments. There are no exceptions or anything deviating in the overlord and the coordinator.

It is worth to note that these tasks use AWS EFS as the tmp segment directory where all the indexing task logs, intermediate persist are stored.

I have shared the link over in google drive to your email address

I appreciate any help in this regards

Best Regards

Varaga

On Mon, Jul 9, 2018 at 11:08 PM, Jonathan Wei <jon...@imply.io> wrote:

Varaga: In the log you sent me via direct mail, it looks like the task didn't make any attempts to publish/handoff segments, what was the `taskDuration` in that attached task's supervisor spec?

I would recommend checking your coordinator logs and overlord logs to see if there's anything interesting relating to the task in question and its segments.

The persists that were occurring seem to be extremely slow, e.g.:

```
2018-07-06T12:15:41,672 INFO [caliper-sample-atomic-avro-kafka-int-w3-incremental-persist] io.druid.segment.StringDimensionMergerV9 - Completed dim[id] inverted with cardinality[5,048] in 86,457 millis.
```

I would suggest setting your log level to INFO, the DEBUG level logs don't really add much additional useful information for ingestion logs and could be hurting your performance there.

You may want to allocate fewer worker tasks per middle manager, allocating fewer resources per worker, or use bigger middle manager machines, it looks like the machine might be too small for your current worker settings.

I also noticed there's data for the year 2051, this will likely cause issues with segment handoff with Kafka indexing: https://github.com/apache/incubator-druid/issues/4137, so I would make sure your input doesn't have future data like that.

> Finally, Tried with 1 task with no replicas and the task having 10g heap, 3 gb off heap. When this was run, I had set only 2 dimensions with least cardinalities. Every one of them failed.

If you run a task like that with a short taskDuration like 10M, what is the specific failure that you see?

Thanks,
Jon

On Wed, Jul 4, 2018 at 8:19 PM, Jonathan Wei <jon...@imply.io> wrote:
> There are no segments created in deep storage for the intervals.
> Every one of them failed.

Do you have some sample complete task logs for these two cases?

- Jon

On Tue, Jul 3, 2018 at 9:47 AM, Chakravarthy varaga <chakrav...@gmail.com> wrote:
Hi Druid Experts,

       Any ideas/help here?

Best Regards
Varaga

On Fri, Jun 22, 2018 at 9:18 PM, Chakravarthy varaga <chakrav...@gmail.com> wrote:
Hi guys,

It'd be great if the experts could point me to the right direction.

There's 1 topic with 2 partitions and there are 13 million events (dated Jan 2018) on this topic. Each event has 35 fields and with deep cardinalities.

I tried altering segment granularity to having 5 minutes, 2 minutes, hour, day etc.,

I also tried altering taskDuration to having 5 minutes, 20, 30 minutes, 1 hour, 4 hours.
I have also tried completion timeout set to larger values than the default (although based on the index task description in the logs it does not look to be honored).

Essentially I also tried having taskCount set to 2 with each task having 2 gb heap and 1g off heap, with 2 replicas.

Finally, Tried with 1 task with no replicas and the task having 10g heap, 3 gb off heap. When this was run, I had set only 2 dimensions with least cardinalities.

Every one of them failed.

Any help here is appreciated and will be timely.

Thanks a million
Varaga

On Thu, Jun 21, 2018, 17:24 Varaga <chakrav...@gmail.com> wrote:
Some snippets below from the task logs, The incremental persists thread seems to create the smoosh files and all the version/meta-data in the tmp storage. However the segments are not pushed. I don't see any of the S3 Segment pusher invoked. It is worth re-iterating that I'm trying to ingest some millions of events (dated Jan 2018) over the kafka indexer task ! 2018-06-20T13:49:31,742 INFO [caliper-sample-sessions-avro-kafka-prod-timestamp-test1-incremental-persist] io.druid.segment.StringDimensionMergerV9 - Completed dim[timestamp] inverted with cardinality[7,461] in 140,921 millis. 2018-06-20T13:49:31,789 INFO [caliper-sample-sessions-avro-kafka-prod-timestamp-test1-incremental-persist] io.druid.segment.IndexMergerV9 - Completed index.drd in 10 millis. 2018-06-20T13:49:31,820 INFO [caliper-sample-sessions-avro-kafka-prod-timestamp-test1-incremental-persist] io.druid.java.util.common.io.smoosh.FileSmoosher - Created smoosh file [/mnt/<Masked>/task/index_kafka_caliper-sample-sessions-avro-kafka-prod-timestamp-test1_eacb41e33fd1473_flpgjfbf/work/persist/caliper-sample-sessions-avro-kafka-prod-timestamp-test1_2018-01-22T19:00:00.000Z_2018-01-22T20:00:00.000Z_2018-06-20T13:44:32.472Z/0/00000.smoosh] of size [410760] bytes. 2018-06-20T13:49:32,164 DEBUG [caliper-sample-sessions-avro-kafka-prod-timestamp-test1-incremental-persist] io.druid.segment.IndexIO - Mapping v9 index[/mnt/<Masked>/index_kafka_caliper-sample-sessions-avro-kafka-prod-timestamp-test1_eacb41e33fd1473_flpgjfbf/work/persist/caliper-sample-sessions-avro-kafka-prod-timestamp-test1_2018-01-22T19:00:00.000Z_2018-01-22T20:00:00.000Z_2018-06-20T13:44:32.472Z/0] 2018-06-20T13:49:32,198 DEBUG [caliper-sample-sessions-avro-kafka-prod-timestamp-test1-incremental-persist] io.druid.segment.IndexIO - Mapped v9 index[/mnt/<Masked>/index_kafka_caliper-sample-sessions-avro-kafka-prod-timestamp-test1_eacb41e33fd1473_flpgjfbf/work/persist/caliper-sample-sessions-avro-kafka-prod-timestamp-test1_2018-01-22T19:00:00.000Z_2018-01-22T20:00:00.000Z_2018-06-20T13:44:32.472Z/0] in 34 millis 2018-06-20T13:49:32,203 INFO [caliper-sample-sessions-avro-kafka-prod-timestamp-test1-incremental-persist] io.druid.segment.realtime.appenderator.AppenderatorImpl - Segment[caliper-sample-sessions-avro-kafka-prod-timestamp-test1_2018-01-22T12:00:00.000Z_2018-01-22T13:00:00.000Z_2018-06-20T13:44:26.802Z], persisting Hydrant[FireHydrant{, queryable=caliper-sample-sessions-avro-kafka-prod-timestamp-test1_2018-01-22T12:00:00.000Z_2018-01-22T13:00:00.000Z_2018-06-20T13:44:26.802Z, count=0}] 2018-06-20T13:49:32,241 INFO [caliper-sample-sessions-avro-kafka-prod-timestamp-test1-incremental-persist] io.druid.segment.IndexMergerV9 - Starting persist for interval[2018-01-22T12:00:00.000Z/2018-01-22T13:00:00.000Z], rows[1,106] 2018-06-20T13:49:32,263 INFO [caliper-sample-sessions-avro-kafka-prod-timestamp-test1-incremental-persist] io.druid.segment.IndexMergerV9 - Using SegmentWriteOutMediumFactory[TmpFileSegmentWriteOutMediumFactory] 2018-06-20T13:49:32,290 INFO [caliper-sample-sessions-avro-kafka-prod-timestamp-test1-incremental-persist] io.druid.segment.IndexMergerV9 - Completed version.bin in 18 millis. 2018-06-20T13:49:32,307 INFO [caliper-sample-sessions-avro-kafka-prod-timestamp-test1-incremental-persist] io.druid.segment.IndexMergerV9 - Completed factory.json in 16 millis 2018-06-20T13:49:32,387 INFO [caliper-sample-sessions-avro-kafka-prod-timestamp-test1-incremental-persist] io.druid.segment.StringDimensionMergerV9 - Completed dim[eventType] conversions with cardinality[2] in 80 millis. 2018-06-20T13:49:32,455 INFO [caliper-sample-sessions-avro-kafka-prod-timestamp-test1-incremental-persist] io.druid.segment.StringDimensionMergerV9 - Completed dim[userbrowser.name] conversions with cardinality[0] in 34 millis. 2018-06-20T13:49:32,682 DEBUG [main-SendThread(ip-10-35-35-239.ec2.internal:2181)] org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 0xf64134df7f2d07f after 1ms 2018-06-20T13:49:34,562 INFO [caliper-sample-sessions-avro-kafka-prod-timestamp-test1-incremental-persist] io.druid.segment.StringDimensionMergerV9 - Completed dim[actor.districtPid] conversions with cardinality[111] in 2,071 millis.

On Wednesday, June 20, 2018 at 4:38:05 PM UTC+1, Varaga wrote:

Hey Guys,

I'm stuck here without a clue on what's happening.

Please find the images of the coordinator console, cluster view, overlord console, middle manager runtime properties and the resources allocated to it.

The curl command is constantly showing this count for hours together now !

    curl -sX POST "http://<Masked>/druid/v2/?pretty=true" -H 'content-type: application/json' -d @/tmp/count-searchkeywords.json
[ {
"version" : "v1",
"timestamp" : "2018-01-22T00:00:00.000Z",
"event" : {
    "actor.junk-field" : null,
    "total-count" : 150000
}
} ]

Coordinator Console: No Datasources visible

Coordinator Old Console - Cluster View:


Overlord Console:

Middle Manager Configuration

Middle Manager Resources / Instance - There are 2 instances running

On Tue, Jun 19, 2018 at 5:35 PM, Varaga <chakrav...@gmail.com> wrote:

Hi Guys,

     The out of order events are so sparse outside the task duration time. Looking at the overlord logs, it seems to suggest that there were segments pending to be created for those sparse events...

     Any suggestions guys?


OverLord Logs: ============== 2018-06-19T11:50:32,603 INFO [qtp465869765-108] io.druid.indexing.common.actions.LocalTaskActionClient - Performing action for task[index_kafka_caliper-sample-atomic-avro-kafka-prod-eventTime-test6_31f44481cd36d94_nndbkfpn]: SegmentAllocateAction{dataSource='caliper-sample-atomic-avro-kafka-prod-eventTime-test6', timestamp=2018-01-31T03:27:49.104Z, queryGranularity=NoneGranularity, preferredSegmentGranularity={type=period, period=PT1H, timeZone=UTC, origin=null}, sequenceName='index_kafka_caliper-sample-atomic-avro-kafka-prod-eventTime-test6_31f44481cd36d94_0', previousSegmentId='caliper-sample-atomic-avro-kafka-prod-eventTime-test6_2018-01-29T22:00:00.000Z_2018-01-29T23:00:00.000Z_2018-06-18T15:28:29.288Z', skipSegmentLineageCheck='true'} 2018-06-19T11:50:32,605 DEBUG [qtp465869765-108] io.druid.indexing.common.actions.SegmentAllocateAction - Trying to allocate pending segment for rowInterval[2018-01-31T03:27:49.104Z/2018-01-31T03:27:49.105Z], segmentInterval[2018-01-31T03:00:00.000Z/2018-01-31T04:00:00.000Z]. 2018-06-19T11:50:32,605 INFO [qtp465869765-108] io.druid.indexing.overlord.TaskLockbox - Added task[index_kafka_caliper-sample-atomic-avro-kafka-prod-eventTime-test6_31f44481cd36d94_nndbkfpn] to TaskLock[index_kafka_caliper-sample-atomic-avro-kafka-prod-eventTime-test6] 2018-06-19T11:50:32,605 INFO [qtp465869765-108] io.druid.indexing.overlord.MetadataTaskStorage - Adding lock on interval[2018-01-31T03:00:00.000Z/2018-01-31T04:00:00.000Z] version[2018-06-19T11:50:20.668Z] for task: index_kafka_caliper-sample-atomic-avro-kafka-prod-eventTime-test6_31f44481cd36d94_nndbkfpn 2018-06-19T11:50:32,610 INFO [qtp465869765-108] io.druid.metadata.IndexerSQLMetadataStorageCoordinator - Found existing pending segment [caliper-sample-atomic-avro-kafka-prod-eventTime-test6_2018-01-31T03:00:00.000Z_2018-01-31T04:00:00.000Z_2018-06-18T15:28:25.364Z] for sequence[index_kafka_caliper-sample-atomic-avro-kafka-prod-eventTime-test6_31f44481cd36d94_0] (previous = [caliper-sample-atomic-avro-kafka-prod-eventTime-test6_2018-01-29T22:00:00.000Z_2018-01-29T23:00:00.000Z_2018-06-18T15:28:29.288Z]) in DB

On Monday, June 18, 2018 at 5:09:43 PM UTC+1, Varaga wrote:
Hi All,

    I'm using Kafka indexing service with segment granularity: 1H, queryGranularity: NONE, taskDuration: PT10M
    The data ingested is 6 months old data in the kafka topics. We are using ETL to historically backfill some of our data in legacy systems.
    Essentially the field that is indexed -with timespec - is 6 months old time.

    What I notice is that:
    -> There are no segments created in deep storage for the intervals.
    -> There are rows in the druid_pendingsegments table in the metadata storage.
    -> The task log shows that there is a request created for SegmentAllocation to the overlord

2018-06-18T15:28:31,806 INFO [task-runner-0-priority-0] io.druid.indexing.common.actions.RemoteTaskActionClient - Submitting action for task[<MASKED>] to overlord: [SegmentAllocateAction{dataSource='<MASKED>', timestamp=2018-01-31T03:27:49.104Z, queryGranularity=NoneGranularity, preferredSegmentGranularity={type=period, period=PT1H, timeZone=UTC, origin=null}, sequenceName='index_kafka_caliper-sample-atomic-avro-kafka-prod-eventTime-test6_31f44481cd36d94_0', previousSegmentId='caliper-sample-atomic-avro-kafka-prod-eventTime-test6_2018-01-29T22:00:00.000Z_2018-01-29T23:00:00.000Z_2018-06-18T15:28:29.288Z', skipSegmentLineageCheck='true'}].Enter code here...

    The taskCount is set to: 1 with replica: 2

Can someone suggest what the problem could be here?

... "granularitySpec" : { "type" : "uniform", "segmentGranularity" : "HOUR", "queryGranularity" : "NONE", "rollup" : false } }, "ioConfig" : { "topic" : "<Masked>", "consumerProperties": { "bootstrap.servers": "<Masked>", "group.id": "druid-indexer-consumer-avro-prod-atomic-6" }, "replicas":"2", "taskCount":"1", "taskDuration": "PT10M", "useEarliestOffset":true }, "tuningConfig" : { "type" : "kafka", "resetOffsetAutomatically":true } ... }

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@google

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/CAP7S8jshQ8Ybxgazx%3DVfOJ-a8No0Dfsr8D6XSGOz9y-VSOpiRA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/CAG0p_PG_OmrnYD9eoYjMWaa0XgEf%2B-1%3DfqnJ_bk%3DwZC8W9-MBg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages