Local firehose Indexing Issue on druid-services-0.7.0-SNAPSHOT

80 views
Skip to first unread message

Sudhir Rama Rao

unread,
Dec 20, 2014, 12:11:50 AM12/20/14
to druid-de...@googlegroups.com
Hi folks,

I built latest version of druid from master : druid-services-0.7.0-SNAPSHOT and have the following config for indexing the local JSON file

{

    "spec": {

        "dataSchema": {

            "dataSource": "pci",

            "granularitySpec": {

                "intervals": [

                    "2012-01-01T00:00:00.000Z/2020-01-01T00:00:00.000Z"

                ],

                "queryGranularity": "day",

                "type": "uniform"

            },

            "metricsSpec": [

                {

                    "name": "count",

                    "type": "count"

                },

                {

                    "fieldName": "amount",

                    "name": "amount",

                    "type": "doubleSum"

                },

                {

                    "fieldName": "count",

                    "name": "count",

                    "type": "longSum"

                }

            ],

            "parser": {

                "parseSpec": {

                 "columns": ["timestamp","account_number","tcode","amount","currency","country","timezone","count"],

                    "dimensionsSpec": {

                        "dimensions": [

                            "country",

                            "currency",

                            "timezone",

                            "tcode",

                            "account_number"

                        ]

                    },

                    "format": "json",

                    "timestampSpec": {

                        "column": "timestamp"

                    }

                },

                "type": "string"

            }

        },

        "ioConfig": {

            "type":"index",

            "firehose":{"baseDir":"/home/sudhir","filter":"data.json","type":"local"}

        }

    },

    "type": "index"

}


I get the index task ID back without any issues. But when i look at the overlord logs this is what i see

014-12-20 05:00:02,664 INFO [task-runner-0] io.druid.indexing.common.task.IndexTask - Will require [1] shard(s).
2014-12-20 05:00:02,666 INFO [task-runner-0] io.druid.segment.realtime.firehose.LocalFirehoseFactory - Searching for all [data.json] in [/home/sudhir]
2014-12-20 05:00:02,764 ERROR [task-runner-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[IndexTask{id=index_pci_2014-12-20T04:59:53.961Z, type=index, dataSource=pci}]
java.lang.IllegalArgumentException: Multiple entries with same key: count=2 and count=0
	at com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:150)
	at com.google.common.collect.RegularImmutableMap.checkNoConflictInBucket(RegularImmutableMap.java:104)
	at com.google.common.collect.RegularImmutableMap.<init>(RegularImmutableMap.java:70)
	at com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:254)
	at io.druid.segment.incremental.IncrementalIndex.<init>(IncrementalIndex.java:276)
	at io.druid.segment.incremental.OnheapIncrementalIndex.<init>(OnheapIncrementalIndex.java:52)
	at io.druid.segment.incremental.OnheapIncrementalIndex.<init>(OnheapIncrementalIndex.java:97)
	at io.druid.segment.realtime.plumber.Sink.makeNewCurrIndex(Sink.java:204)
	at io.druid.segment.realtime.plumber.Sink.<init>(Sink.java:74)
	at io.druid.indexing.common.index.YeOldePlumberSchool.findPlumber(YeOldePlumberSchool.java:94)
	at io.druid.indexing.common.task.IndexTask.generateSegment(IndexTask.java:325)
	at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:187)
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:239)
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:218)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
2014-12-20 05:00:02,772 INFO [task-runner-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_pci_2014-12-20T04:59:53.961Z",
  "status" : "FAILED",
  "duration" : 2891
}

My data looks like this

{"country":"US","count":1,"timezone":"America/Los_Angeles","code":"32","account_number":"123","currency":"USD","amount":3159,"timestamp":"2012-03-22T08:57:53"}

{"country":"US","count":1,"timezone":"America/Los_Angeles","code":"59","account_number":"234","currency":"USD","amount":-3010,"timestamp":"2013-07-19T21:27:50"}

{"country":"GB","count":1,"timezone":"Europe/London","code":"31","account_number":"345","currency":"GBP","amount":-400,"timestamp":"2012-11-19T11:56:43"}


please let me know what i am missing here

Xavier Léauté

unread,
Dec 20, 2014, 12:32:25 AM12/20/14
to druid-de...@googlegroups.com
Hi Sudhir, you are specifying the count metric twice, once as type longSum and once as type count, you will need to give those distinct names if you need both. 

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/08e224a2-e77f-41f9-b5ac-e98bb690b3a4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sudhir Rama Rao

unread,
Dec 20, 2014, 12:44:28 AM12/20/14
to druid-de...@googlegroups.com
Thanks Xavier, it looks like that was it. I remove that part from aggregators and the indexing is happening. Local firehose seems like really slow ... its been 5 mins and 100K records are still loading, is this expected ?

Fangjin Yang

unread,
Dec 20, 2014, 1:35:53 AM12/20/14
to druid-de...@googlegroups.com
The local firehose was created for quick and dirty POCs. It should not be used in production and I regret creating it.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

Sudhir Rama Rao

unread,
Dec 22, 2014, 12:56:09 PM12/22/14
to druid-de...@googlegroups.com
Fangjin, We are using it for POC and not in production.
Reply all
Reply to author
Forward
0 new messages