Re: Upgrade to Druid 0.8.1 - Unable to make the Hadoop indexer to work

317 views
Skip to first unread message
Message has been deleted

Fangjin Yang

unread,
Nov 4, 2015, 7:35:40 PM11/4/15
to Druid User
Hi Torche, if you take a look at the 0.6.x branch, there should be a tool you can run to convert the old index task spec to the new one.

On Tuesday, November 3, 2015 at 10:25:48 AM UTC-8, Torche Guillaume wrote:
Hi all,

We are trying to upgrade our cluster from 0.7.x to 0.8.1. We are currently using the Hadoop indexer for our batch pipelines (we are using version 0.6.171). 
It looks like the indexing task spec has changed and I cannot use the specification I am using with the jar 0.6.171 of the Druid project.

I have looked at the documentation and updated my indexing task to follow the new specs. However I get this error for the third job of the batch indexing process:

2015-11-02 19:11:55,911 ERROR [main] cli.CliHadoopIndexer (Logger.java:error(98)) - failure!!!!
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at io.druid.cli.CliHadoopIndexer.run(CliHadoopIndexer.java:120)
at io.druid.cli.Main.main(Main.java:91)
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: No buckets?? seems there is no data to index.
at io.druid.indexer.IndexGeneratorJob.run(IndexGeneratorJob.java:207)
at io.druid.indexer.JobHelper.runJobs(JobHelper.java:182)
at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:96)
at io.druid.indexer.JobHelper.runJobs(JobHelper.java:182)
at io.druid.cli.CliInternalHadoopIndexer.run(CliInternalHadoopIndexer.java:132)
at io.druid.cli.Main.main(Main.java:91)
... 6 more
Caused by: java.lang.RuntimeException: No buckets?? seems there is no data to index.
at io.druid.indexer.IndexGeneratorJob.run(IndexGeneratorJob.java:159)



1) First of all, can someone explain me what are the purpose of each job in the batch ingestion process? (determine_partitions_groupby, determine_partitions_dimselection ...). 

There's 3 stages: find interval of data to index (optional), determine how partitions should be created, and create the segments.

2) What does this error mean? I am sure the interval in granularitySpec is correct because I am using the same interval and the same input path for my data when I ingest data with version 0.6.171.

Error is most likely related to outdated Druid spec. 

3) Is it possible to continue ingesting my data with the Hadoop indexer version 0.6.171 whereas my cluster will be 0.8.1?

No, the entire spec changed. You have to use the new ingestion spec.

4) In the metadataUpdateSpec spec I tried to use "mysql" or "db" for the type but it did not work. I am now using derby. What does it stand for? Is it the right way to do it if my metadata storage is MySQL?

These release notes might help from 0.6.x ingestion to 0.7.x: https://github.com/druid-io/druid/releases/tag/druid-0.7.0. Derby should never be used for production. 


I have attached the complete MR logs. Thanks for your help! 

Torche Guillaume

unread,
Nov 5, 2015, 5:03:47 PM11/5/15
to Druid User
Hi Frangjin,

Thanks for your answer.

I have converted my old hadoop ingestion spec to the new format. However I still get an exception when I specify the metadata storage type.

Let me summarize what I am trying to achieve so you have a better understanding of my problem.

My batch pipeline doesn't rely on my indexing service (Hadoop is not configured there). What I do is spinning up an EMR cluster whenever I want to run a batch pipeline, then a bootstrap action installs druid on the master node and the following command is ran on this same node to submit the hadoop ingestion task using the Druid cli hadoop indexer:

java -Xmx256m -Duser.timezone=PST -Dfile.encoding=UTF-8 -classpath /home/hadoop/.versions/2.4.0-amzn-7/share/hadoop/yarn/*:/home/hadoop/.versions/2.4.0-amzn-7/share/hadoop/yarn/lib/*:/home/hadoop/.versions/2.4.0-amzn-7/share/hadoop/tools/*:/home/hadoop/.versions/2.4.0-amzn-7/share/hadoop/tools/lib/*:/home/hadoop/.versions/2.4.0-amzn-7/share/hadoop/hdfs/lib/*:/home/hadoop/.versions/2.4.0-amzn-7/share/hadoop/hdfs/*:/home/hadoop/.versions/2.4.0-amzn-7/share/hadoop/mapreduce/*:/home/hadoop/.versions/2.4.0-amzn-7/share/hadoop/common/*:/home/hadoop/.versions/2.4.0-amzn-7/share/hadoop/common/lib/*:/home/hadoop/conf/:/home/hadoop/druid-services/lib/* -Dhadoop.fs.s3n.impl=org.apache.hadoop.fs.s3native.NativeS3FileSystem -Dhadoop.fs.s3.impl=org.apache.hadoop.fs.s3native.NativeS3FileSystem -Dfs.s3n.awsAccessKeyId=***** -Dfs.s3n.awsSecretAccessKey=***** io.druid.cli.Main index hadoop --no-default-hadoop  specFile


where specFile is:


{
  "type" : "index_hadoop",
  "spec": {
      "dataSchema" : {
        "dataSource" : "rtb_bids",
        "parser" : {
          "type" : "hadoopyString",
          "parseSpec" : {
            "format" : "tsv",
            "timestampSpec" : {
              "column" : "timestamp",
              "format" : "auto"
            },
            "columns":[
                        "timestamp",
                        "aws_region",
                        "tracking_id",
                        "zone_type_id",
                        "domain",
                        "publisher_id",
                        "country_code",
                        "bidder_id",
                        "advertiser_id",
                        "unit_type_id",
                        "product_id",
                        "deal_id",
                        "browser_type_id",
                        "vertical_id",
                        "width",
                        "height",
                        "bids",
                        "wins",
                        "total_bid_price",
                        "total_win_price",
                        "total_settlement_price"
            ],
            "dimensionsSpec" : {
              "dimensions": [  
                 "aws_region",
                 "tracking_id",
                 "zone_type_id",
                 "domain",
                 "publisher_id",
                 "country_code",
                 "bidder_id",
                 "advertiser_id",
                 "unit_type_id",
                 "product_id",
                 "deal_id",
                 "browser_type_id",
                 "vertical_id",
                 "width",
                 "height"
              ],
              "dimensionExclusions" : [],
              "spatialDimensions" : []
            }
          }
        },
        "metricsSpec" : [  
             {  
                "type":"longSum",
                "name":"bids",
                "fieldName":"bids"
             },
             {  
                "type":"longSum",
                "name":"wins",
                "fieldName":"wins"
             },
             {  
                "type":"doubleSum",
                "name":"total_bid_price",
                "fieldName":"total_bid_price"
             },
             {  
                "type":"doubleSum",
                "name":"total_win_price",
                "fieldName":"total_win_price"
             },
             {  
                "type":"doubleSum",
                "name":"total_settlement_price",
                "fieldName":"total_settlement_price"
             }
         ],
        "granularitySpec" : {
          "type" : "uniform",
          "segmentGranularity" : "HOUR",
          "queryGranularity" : "NONE",
          "intervals" : [ "2015-11-02T00:00:00.000-08:00/2015-11-02T04:00:00.000-08:00" ]
        }
      },
      "ioConfig" : {
        "type" : "hadoop",
        "inputSpec" : {
          "type" : "static",
          "paths" : "s3n://gumgum-elastic-mapreduce/druid/rtbevents/output/2015-11-02-00/bids/*"
        },
        "metadataUpdateSpec" : {
          "type":"mysql",
          "connectURI" : "**********",
          "password" : "********",
          "segmentTable" : "prod_segments",
          "user" : "****"
        },
        "segmentOutputPath" : "s3n://gumgum-druid/prod-segments-druid-08"
      },
      "tuningConfig" : {
        "type" : "hadoop",
        "workingPath": "/tmp/gumgum-druid/",
        "partitionsSpec" : {
          "type" : "dimension",
          "partitionDimension" : null,
          "targetPartitionSize" : 5000000,
          "maxPartitionSize" : 7500000,
          "assumeGrouped" : false,
          "numShards" : -1
        },
        "shardSpecs" : { },
        "leaveIntermediate" : false,
        "cleanupOnFailure" : true,
        "overwriteFiles" : false,
        "ignoreInvalidRows" : false,
        "jobProperties" : { },
        "combineText" : false,
        "persistInHeap" : false,
        "ingestOffheap" : false,
        "bufferSize" : 134217728,
        "aggregationBufferRatio" : 0.5,
        "rowFlushBoundary" : 300000
      }
   }
}


It looks to me this spec file is correct and follow the new specs. However I get an Exception saying that the provider for the sql metadata connector is not recognized and that the only option is derby. Here is full trace:

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/.versions/2.4.0-amzn-7/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/druid-services/lib/log4j-slf4j-impl-2.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2015-11-05 13:57:26,321 INFO  [main] util.Version (Version.java:<clinit>(27)) - HV000001: Hibernate Validator 5.1.3.Final
2015-11-05 13:57:27,307 INFO  [main] guice.JsonConfigurator (Logger.java:info(70)) - Loaded class[class io.druid.guice.ExtensionsConfig] from props[druid.extensions.] as [ExtensionsConfig{searchCurrentClassloader=true, coordinates=[], defaultVersion='0.8.1', localRepository='/home/hadoop/.m2/repository', remoteRepositories=[https://repo1.maven.org/maven2/, https://metamx.artifactoryonline.com/metamx/pub-libs-releases-local]}]
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/.versions/2.4.0-amzn-7/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/druid-services/lib/log4j-slf4j-impl-2.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2015-11-05 13:57:28,117 INFO  [main] util.Version (Version.java:<clinit>(27)) - HV000001: Hibernate Validator 5.1.3.Final
2015-11-05 13:57:28,923 INFO  [main] guice.JsonConfigurator (Logger.java:info(70)) - Loaded class[class io.druid.guice.ExtensionsConfig] from props[druid.extensions.] as [ExtensionsConfig{searchCurrentClassloader=true, coordinates=[], defaultVersion='0.8.1', localRepository='/home/hadoop/.m2/repository', remoteRepositories=[https://repo1.maven.org/maven2/, https://metamx.artifactoryonline.com/metamx/pub-libs-releases-local]}]
2015-11-05 13:57:29,882 INFO  [main] guice.JsonConfigurator (Logger.java:info(70)) - Loaded class[class io.druid.guice.ExtensionsConfig] from props[druid.extensions.] as [ExtensionsConfig{searchCurrentClassloader=true, coordinates=[], defaultVersion='0.8.1', localRepository='/home/hadoop/.m2/repository', remoteRepositories=[https://repo1.maven.org/maven2/, https://metamx.artifactoryonline.com/metamx/pub-libs-releases-local]}]
2015-11-05 13:57:30,694 INFO  [main] guice.JsonConfigurator (Logger.java:info(70)) - Loaded class[class io.druid.server.metrics.DruidMonitorSchedulerConfig] from props[druid.monitoring.] as [io.druid.server.metrics.DruidMonitorSchedulerConfig@737361b8]
2015-11-05 13:57:30,708 INFO  [main] guice.JsonConfigurator (Logger.java:info(70)) - Loaded class[class io.druid.server.metrics.MonitorsConfig] from props[druid.monitoring.] as [MonitorsConfig{monitors=[]}]
2015-11-05 13:57:31,044 INFO  [main] config.ConfigurationObjectFactory (ConfigurationObjectFactory.java:buildSimple(162)) - Using method itself for [druid.computation.buffer.size, ${base_path}.buffer.sizeBytes] on [io.druid.query.DruidProcessingConfig#intermediateComputeSizeBytes()]
2015-11-05 13:57:31,049 INFO  [main] config.ConfigurationObjectFactory (ConfigurationObjectFactory.java:buildSimple(162)) - Using method itself for [${base_path}.numThreads] on [io.druid.query.DruidProcessingConfig#getNumThreads()]
2015-11-05 13:57:31,049 INFO  [main] config.ConfigurationObjectFactory (ConfigurationObjectFactory.java:buildSimple(162)) - Using method itself for [${base_path}.columnCache.sizeBytes] on [io.druid.query.DruidProcessingConfig#columnCacheSizeBytes()]
2015-11-05 13:57:31,050 INFO  [main] config.ConfigurationObjectFactory (ConfigurationObjectFactory.java:buildSimple(151)) - Assigning default value [processing-%s] for [${base_path}.formatString] on [com.metamx.common.concurrent.ExecutorServiceConfig#getFormatString()]
2015-11-05 13:57:31,234 INFO  [main] guice.JsonConfigurator (Logger.java:info(70)) - Loaded class[interface io.druid.segment.data.BitmapSerdeFactory] from props[druid.processing.bitmap.] as [ConciseBitmapSerdeFactory{}]
Nov 05, 2015 1:57:31 PM com.google.inject.servlet.GuiceFilter setPipeline
WARNING: Multiple Servlet injectors detected. This is a warning indicating that you have more than one GuiceFilter running in your web application. If this is deliberate, you may safely ignore this message. If this is NOT deliberate however, your application may not work as expected.
2015-11-05 13:57:31,391 INFO  [main] guice.JsonConfigurator (Logger.java:info(70)) - Loaded class[class io.druid.server.metrics.DruidMonitorSchedulerConfig] from props[druid.monitoring.] as [io.druid.server.metrics.DruidMonitorSchedulerConfig@77c143b0]
2015-11-05 13:57:31,402 INFO  [main] guice.JsonConfigurator (Logger.java:info(70)) - Loaded class[class io.druid.server.metrics.MonitorsConfig] from props[druid.monitoring.] as [MonitorsConfig{monitors=[]}]
2015-11-05 13:57:31,422 INFO  [main] guice.JsonConfigurator (Logger.java:info(70)) - Loaded class[class io.druid.server.DruidNode] from props[druid.] as [DruidNode{serviceName='druid/internal-hadoop-indexer', host='ip-10-81-200-79.ec2.internal', port=0}]
2015-11-05 13:57:31,428 ERROR [main] cli.CliHadoopIndexer (Logger.java:error(98)) - failure!!!!
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at io.druid.cli.CliHadoopIndexer.run(CliHadoopIndexer.java:120)
at io.druid.cli.Main.main(Main.java:91)
Caused by: com.google.inject.ProvisionException: Guice provision errors:

1) Unknown provider[mysql] of Key[type=io.druid.metadata.SQLMetadataConnector, annotation=[none]], known options[[derby]]
  at io.druid.guice.PolyBind.createChoiceWithDefault(PolyBind.java:67)
  while locating io.druid.metadata.SQLMetadataConnector
    for parameter 2 at io.druid.metadata.IndexerSQLMetadataStorageCoordinator.<init>(IndexerSQLMetadataStorageCoordinator.java:69)
  while locating io.druid.metadata.IndexerSQLMetadataStorageCoordinator
  at io.druid.cli.CliInternalHadoopIndexer$1.configure(CliInternalHadoopIndexer.java:98)
  while locating io.druid.indexing.overlord.IndexerMetadataStorageCoordinator

1 error
at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
at com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1013)
at io.druid.cli.CliInternalHadoopIndexer.run(CliInternalHadoopIndexer.java:119)
at io.druid.cli.Main.main(Main.java:91)
... 6 more


Do you have an idea why I get this error? It seems like only the derby provider is supported. I know the 0.7.x version of Druid introduced derby as the metadata storage by default but it should be possible to use another metadata storage with the cli hadoop indexer right?

Thanks for your help!

Jonathan Wei

unread,
Nov 5, 2015, 5:31:22 PM11/5/15
to druid...@googlegroups.com
Hi Torche,

I think you need to specify the mysql extension when starting the indexer, can you try adding:

-Ddruid.extensions.coordinates=[\"io.druid.extensions:mysql-metadata-storage\"]

to the launching command?

These following links might be helpful for that error:

- Jon


--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/f8d4072e-0fd4-4648-a281-486db637c68c%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Torche Guillaume

unread,
Nov 10, 2015, 12:35:51 PM11/10/15
to Druid User
Thanks it worked!
Reply all
Reply to author
Forward
0 new messages