Connecting to Druid via Spark via JDBC

1,689 views
Skip to first unread message

Ben Vogan

unread,
May 23, 2017, 5:06:23 PM5/23/17
to druid...@googlegroups.com
Hi all,

I am interested in querying Druid via Spark.  I know there is a separate project for doing so (https://github.com/SparklineData/spark-druid-olap) but I was curious as to whether the new JDBC support might not be a better supported option.

I am wholly unfamiliar with the Avatica driver and I am unclear as to what class is the proper entry point.

I have tried:

val druidDf = sqlContext.read.format("jdbc").options(Map("url" -> "jdbc:avatica:remote:url=http://mydruidbroker:8082/druid/v2/sql/avatica/", "dbtable" -> "mydruidtable", "driver" -> "org.apache.calcite.avatica.remote.Driver", "fetchSize"->"10000")).load()

But this gives me an UnsupportedOperationException.

I tried changing the driver to org.apache.calcite.avatica.UnregisteredDriver but this gives me:

java.lang.IllegalAccessException: Class org.apache.spark.sql.execution.datasources.jdbc.DriverRegistry$ can not access a member of class org.apache.calcite.avatica.UnregisteredDriver with modifiers "protected"

I presume this is because the constructor is protected.

If someone can point me in the correct direction I would greatly appreciate it.

Thanks,
--
BENJAMIN VOGAN | Data Platform Team Lead

Gian Merlino

unread,
May 23, 2017, 5:22:23 PM5/23/17
to druid...@googlegroups.com
Do you have a message or stack trace for the UnsupportedOperationException? That'd help.

The Spark docs have a troubleshooting step that talks about getting a classloader set up, which may or may not be related (https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases):

> The JDBC driver class must be visible to the primordial class loader on the client session and on all executors. This is because Java’s DriverManager class does a security check that results in it ignoring all drivers not visible to the primordial class loader when one goes to open a connection. One convenient way to do this is to modify compute_classpath.sh on all worker nodes to include your driver JARs.

Gian

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/CAAoNsd%3Dgtb%2BFVzXXD2Z4eL304dusuGDvwKUDwSf1wW0ueV7-kA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Ben Vogan

unread,
May 24, 2017, 1:33:54 PM5/24/17
to druid...@googlegroups.com
My apologies for the delay.  It appears to be a problem using prepared statements:

scala> val dw2 = sqlContext.read.format("jdbc").options(Map("url" -> "jdbc:avatica:remote:url=http://jarvis-druid-query002:8082/druid/v2/sql/avatica/", "dbtable" -> "sor_business_events_all", "driver" -> "org.apache.calcite.avatica.remote.Driver", "fetchSize"->"10000")).load()

java.lang.UnsupportedOperationException
at org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:275)
at org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:121)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:122)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:91)
at org.apache.spark.sql.execution.datasources.jdbc.DefaultSource.createRelation(DefaultSource.scala:57)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:25)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:30)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:32)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:34)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
at $iwC$$iwC$$iwC.<init>(<console>:38)
at $iwC$$iwC.<init>(<console>:40)
at $iwC.<init>(<console>:42)
at <init>(<console>:44)
at .<init>(<console>:48)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1045)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1326)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:821)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:852)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:800)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1064)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



For more options, visit https://groups.google.com/d/optout.

Gian Merlino

unread,
May 25, 2017, 5:10:13 AM5/25/17
to druid...@googlegroups.com
Hmm, looks like something missing in Avatica. Is it possible to get Spark to avoid using prepared statements?

The Calcite folks may also be able to help with this, maybe they can shed some light on why the method isn't implemented.

Gian

Ben Vogan

unread,
May 25, 2017, 12:17:20 PM5/25/17
to druid...@googlegroups.com
Thanks Gian.  I have opened a ticket with the Avatica/Calcite folks.

--Ben


For more options, visit https://groups.google.com/d/optout.

lethuy...@gmail.com

unread,
Jun 11, 2018, 5:32:42 AM6/11/18
to Druid User
hii guys,

Does anybody know, where we should to copy avatica.jar in druid?
How to connect druid with tableau or grafana (as SQL datasource)?

Thank you so much,

zhangxin...@gmail.com

unread,
Jul 14, 2018, 11:38:47 AM7/14/18
to Druid User
there is a druid plugin in grafana, you can use it.

在 2018年6月11日星期一 UTC+8下午5:32:42,lethuy...@gmail.com写道:

seyyed Safavie

unread,
Aug 10, 2020, 2:27:53 PM8/10/20
to Druid User
This is a problem for me too
Did you solve this problem?

Gian


Gian

To unsubscribe from this group and stop receiving emails from it, send an email to druid...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid...@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.



--
BENJAMIN VOGAN | Data Platform Team Lead

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid...@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid...@googlegroups.com.

vijay narayanan

unread,
Aug 10, 2020, 11:14:45 PM8/10/20
to druid...@googlegroups.com
this works for me

val dw2 = sqlContext.read.format("jdbc").options(Map("url" -> "jdbc:avatica:remote:url=http://localhost:8082/druid/v2/sql/avatica/", "dbtable" -> "Fire_Department_Calls_for_service", "driver" -> "org.apache.calcite.avatica.remote.Driver", "fetchSize"->"10000")).load()

dw2: org.apache.spark.sql.DataFrame = [ALS Unit: string, Address: string ... 43 more fields]



I ran spark shell like ./spark-shell --driver-class-path ../../avatica-1.12.0.jar --jars ../../avatica-1.12.0.jar 



vijay


To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/4055e0f6-f28a-40da-9fea-629445ab2211o%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages