setting hadoopDependencies in CliHadoopIndexer

104 views
Skip to first unread message

Himanshu Gupta

unread,
Aug 26, 2014, 10:36:26 AM8/26/14
to druid-de...@googlegroups.com
Druid Version Used: 0.6.146 stable

I want to supply my own jars for hadoop and don't want to have them downloaded from maven. So, I am trying to set CliHadoopIndexer.hadoopDependencyCoordinates to empty list, using following command..

$java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8  -cp  $HADOOP_CLASSPATH:./druid-services-0.6.146-selfcontained.jar  io.druid.cli.Main index hadoop hadoopDependencies '[]' wikipedia_hadoop_config.json

it fails with following exception
2014-08-26 14:31:28,304 INFO [main] org.hibernate.validator.internal.util.Version - HV000001: Hibernate Validator 5.0.1.Final
2014-08-26 14:31:29,000 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.guice.ExtensionsConfig] from props[druid.extensions.] as [ExtensionsConfig{searchCurrentClassloader=true, coordinates=[], localRepository='/home/himanshu/.m2/repository', remoteRepositories=[http://repo1.maven.org/maven2/, https://metamx.artifactoryonline.com/metamx/pub-libs-releases-local]}]
Exception in thread "main" java.lang.UnsupportedOperationException
    at java.util.AbstractList.add(AbstractList.java:148)
    at java.util.AbstractList.add(AbstractList.java:108)
    at java.util.AbstractCollection.addAll(AbstractCollection.java:334)
    at com.google.common.collect.Iterables.addAll(Iterables.java:348)
    at io.airlift.command.Accessor.addValues(Accessor.java:105)
    at io.airlift.command.ParserUtil.createInstance(ParserUtil.java:48)
    at io.airlift.command.Cli.parse(Cli.java:120)
    at io.airlift.command.Cli.parse(Cli.java:97)
    at io.druid.cli.Main.main(Main.java:86)


Is this a bug or Am I doing it wrong?

Thanks,
Himanshu

PS: Same can be achieved for HadoopIndexTask by specifiying "druid.indexer.task.defaultHadoopCoordinates=[]" in runtime.properties of the overlord though.

Gian Merlino

unread,
Aug 26, 2014, 3:11:56 PM8/26/14
to druid-de...@googlegroups.com
This looks like a bug. Thanks for the report. I think this patch will fix it: https://github.com/metamx/druid/pull/704. That will let you run "io.druid.cli.Main index hadoop --no-default-hadoop wikipedia_hadoop_config.json" to prevent Druid from pulling down the default hadoop version from maven. In that case it *should* be free to use one from the classpath (although I haven't tested this). If you have a chance to try building Druid from source with that patch, please let me know if it does what you want.

If you don't want to build from source or wait for a new release, you can also try using the pull-deps command to pull things down locally, in which case Druid will use a local maven repository instead of a remote one. See the "I want classloader isolation, but I don't want my production machines downloading their own dependencies. What should I do?" section here: http://druid.io/docs/latest/Modules.html

Himanshu Gupta

unread,
Aug 26, 2014, 11:45:03 PM8/26/14
to druid-de...@googlegroups.com
Thanks Gian for the quick fix.

I pulled the code and built the packages for testing. It works as expected for "--no-default-hadoop" flag. However, "-c" seems to be broken and looks like airlift/airline does not seem to parse comma separated list correctly so if you specified more than one mvn coordinates using -c flag (e.g. -c "a.b.c:1.2,d.e.f:2.3") then the list is not splitted by ',' as expected.
I've sent a simple pull request that fixes that. Here is the link : https://github.com/metamx/druid/pull/707

Thanks,
Himanshu

Himanshu Gupta

unread,
Aug 27, 2014, 12:07:30 AM8/27/14
to druid-de...@googlegroups.com
Ok, to conclude this thread. Gian mentioned on the pull request comment that you can specify "-c" flag multiple times to specify multiple maven coordinates and that is working as expected.

Thanks,
Himanshu
Reply all
Reply to author
Forward
0 new messages