How to export OpenTSDB datas to Parquet files ?

405 views
Skip to first unread message

Erwan Rouzel

unread,
Jul 20, 2016, 5:53:54 AM7/20/16
to OpenTSDB, SERGENT David
Hi,

First of all, I will explain the goal which I am trying to achieve : I would like to make queries with Spark over the OpenTSDB datas.
As such this is not possible because the datas encoded with OpenTSDB inside HBase are completely raw and cannot be requested from Spark easily.

So the solution which I am trying to setup uses Apache Crunch to perform ETL by reading the raw datas from HBase, decoding both the keys and values from bytes to human-friendly representations, then write the results inside Parquet files.

The final Parquet file follows a schema like :
- metric (string)
- timestamp (integer)
- value (integer)
- tags (map of key/values with tagk/tagv)

The tricky part here is the conversion from bytes representation to human-friendly representation. I am able to convert the raw keys without much problem but when it comes to the values this is not that easy and even after following the official documentation; I am not able to figure out how exactly are encoded these values :

I understood there is two modes, one is non-compacted where the values spread over several columns, and other is compacted with all the values concatenated inside one unique column and the qualifiers also concatenated inside one single qualifier. Still in my data which are compacted I am not able to find this representation because I have only one 3-octets qualifier which does not look at all like a concatenated qualifier.

I could not find much resource regarding this issue, but only this one post which is similar :

Still I am not even able to run the snippets of code given in this post as it implies some unknown dependencies.

It seems there is no built-in OpenTSDB feature as of now which allows decoding the raw datas as the above post finally was turned into a feature request for OpenTSDB which has not been implemented so far.

Does anyone has faced a similar issue and found any solution to it ?
Or does anyone has good knowledge of how exactly are encoded the data inside OpenTSDB as just reading the documentation I am not able to match what is actually encoded inside my OpenTSDB table ?

Thank you so much for your help.

Best Regards,

Erwan


Christophe S

unread,
Jul 27, 2016, 12:04:03 PM7/27/16
to OpenTSDB, David....@3ds.com
Hi David,

You can query OpenTSDB using the OpenTSDB dependency:
<dependency>
<groupId>net.opentsdb</groupId>
<artifactId>opentsdb</artifactId>
<version>2.2.0RC1</version>
</dependency>

Then you can use the TSDB class that will give you a TSDB client to access data. You can then do TsdbQuery.

This should work.

Regards,
Christophe

Erwan Rouzel

unread,
Aug 1, 2016, 6:20:11 AM8/1/16
to OpenTSDB, David....@3ds.com
Hi Christophe,

Thank you so much for your reply.

In fact I have tried already this method of using the internal API of OpenTSDB, but unfortunately I run through this error message :

java.lang.IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString

This error message seems typical and due to the usage of asynchbase by internal API as it is reported here :

I have tried various workarounds listed in above post but none really worked in my case.

Also, when looking in source code of Internal API of OpenTSDB, there is some disclaimer that quite discourage from using this API :

* | This class is reserved for |
* | OpenTSDB's internal usage! |
* `----------------------------'
* \ / \ //\
* \ |\___/| / \// \\
* /0 0 \__ / // | \ \
* / / \/_/ // | \ \
* @_^_@'/ \/_ // | \ \
* //_^_/ \/_ // | \ \
* ( //) | \/// | \ \
* ( / /) _|_ / ) // | \ _\
* ( // /) '/,_ _ _/ ( ; -. | _ _\.-~ .-~~~^-.
* (( / / )) ,-{ _ `-.|.-~-. .~ `.
* (( // / )) '/\ / ~-. _ .-~ .-~^-. \
* (( /// )) `. { } / \ \
* (( / )) .----~-.\ \-' .~ \ `. \^-.
* ///.----../ \ _ -~ `. ^-` ^-_
* ///-._ _ _ _ _ _ _}^ - - - - ~ ~-- ,.-~
* /.-~
* You've been warned by the dragon!
* </pre><p>
* This class is reserved for OpenTSDB's own internal usage only. If you use
* anything from this package outside of OpenTSDB, a dragon will spontaneously
* appear and eat you. You've been warned.


So this does not seem like best way but also it seems the only way.

Any idea ?

Best Regards,

Erwan

ManOLamancha

unread,
Dec 19, 2016, 8:14:33 PM12/19/16
to OpenTSDB, David....@3ds.com, erwan....@gmail.com
On Monday, August 1, 2016 at 3:20:11 AM UTC-7, Erwan Rouzel wrote:
Hi Christophe,

Thank you so much for your reply.

In fact I have tried already this method of using the internal API of OpenTSDB, but unfortunately I run through this error message :

java.lang.IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString

This error message seems typical and due to the usage of asynchbase by internal API as it is reported here :

I have tried various workarounds listed in above post but none really worked in my case.

Are you pulling in an older Guava release with your code? That's likely the issue you're facing here. Guava can be a pain with multiple versions.
I like Benoit's dragon ;)  But that one is there specifically as the Internal.java methods may move around or change a little. You can use them though as we've been careful to avoid breaking API changes throughout the 2.x release train. The recommended way though is to use the TSDB class and fetch data via a TSQuery object. Someone submitted some example code that I need to cleanup and merge. 
Reply all
Reply to author
Forward
0 new messages