Spark Integration

300 views
Skip to first unread message

daniel....@origamilogic.com

unread,
Oct 26, 2016, 9:35:10 AM10/26/16
to OpenTSDB
Hi guys.

I wanted to ask if there are any plans for spark integration? I'm currently working on a spark-based time-series project and am trying to figure out if OpenTSDB would be a good option. I've seen a few github projects for creating RDDs out of OpenTSDB time-series, yet I'd be very concerned about using anything like this for a production implementation. Are there any releases from reputable sources that integrate OpenTSDB with spark?

Thank you

Daniel

daniel....@origamilogic.com

unread,
Oct 26, 2016, 9:37:30 AM10/26/16
to OpenTSDB
Also as a small clarification: I'd ideally like to have the daemons used for continous ingest into HBase, and then be able to bulk-read from HBase into a spark RDD. Perhaps there are tools in the OpenTSDB package that would allow me to form the correct HBase queries?

Jonathan Creasy

unread,
Oct 26, 2016, 12:26:53 PM10/26/16
to daniel....@origamilogic.com, OpenTSDB
Yes, you could use OpenTSDB to write the time series data to HBase and then use the OpenTSDB packages to read it back again.

Daniel Imberman

unread,
Oct 26, 2016, 12:33:32 PM10/26/16
to Jonathan Creasy, OpenTSDB
Thank you Jonathan. So I think the one thing that I've been having trouble with there is I can't seem to find any documentation on actually using the classes in OpenTSDB. I downloaded the dependency from "net.opentsdb" % "opentsdb" % "2.2.1", but I find that I can't access any of the classes I see on the git repo here https://github.com/OpenTSDB/opentsdb. Do you know of any documentation on programatically using the openTSDB package or is there another package I should be using?

thank you,

Daniel

ManOLamancha

unread,
Dec 20, 2016, 4:13:25 PM12/20/16
to OpenTSDB, jona...@ghostlab.net
On Wednesday, October 26, 2016 at 9:33:32 AM UTC-7, Daniel Imberman wrote:
Thank you Jonathan. So I think the one thing that I've been having trouble with there is I can't seem to find any documentation on actually using the classes in OpenTSDB. I downloaded the dependency from "net.opentsdb" % "opentsdb" % "2.2.1", but I find that I can't access any of the classes I see on the git repo here https://github.com/OpenTSDB/opentsdb. Do you know of any documentation on programatically using the openTSDB package or is there another package I should be using?

You do want to use the TSDB package and specifically instantiate the TSDB class. Then you can fill out a TSQuery object and use that to generate queries. I recently wrote some query code so I'll merge that with the examples posted in a PR and get them into the TSDB package for folks to work with. 

bar...@bossanova.com

unread,
Mar 27, 2017, 9:20:19 PM3/27/17
to OpenTSDB, jona...@ghostlab.net
Apologies for reviving an older thread. I have a question about this approach.

We are in the planning stages of using OpenTSDB so this question might be a naive.

The approach of using the TSDB.java appears to eliminate the needs to run the daemons if I'm solely interested in writes. Is that correct?

We do have consumers that would be interested in the using the daemons for reads, but I don't see this presenting a problem.

Basically we want what OP wanted which is writes from Spark Streaming and reads from a variety of clients using the HTTP API via the daemon.

Is my thinking about this broken?

Very cool project by the way!

-b

ManOLamancha

unread,
Apr 25, 2017, 4:41:53 PM4/25/17
to OpenTSDB, jona...@ghostlab.net
On Monday, March 27, 2017 at 6:20:19 PM UTC-7, bar...@bossanova.com wrote:
Apologies for reviving an older thread. I have a question about this approach.

We are in the planning stages of using OpenTSDB so this question might be a naive.

The approach of using the TSDB.java appears to eliminate the needs to run the daemons if I'm solely interested in writes. Is that correct?

We do have consumers that would be interested in the using the daemons for reads, but I don't see this presenting a problem.

Basically we want what OP wanted which is writes from Spark Streaming and reads from a variety of clients using the HTTP API via the daemon.

Is my thinking about this broken?

Take a look at the 3.0 branch. I'm re-writing the query engine to be pluggable and flexible and it can be dropped into spark for execution. (need help on interfaces for that). I'm splitting the code up into libraries that can be used to implement different components.  
Reply all
Reply to author
Forward
0 new messages