Hi,
First of all, I will explain the goal which I am trying to achieve : I would like to make queries with Spark over the OpenTSDB datas.
As such this is not possible because the datas encoded with OpenTSDB inside HBase are completely raw and cannot be requested from Spark easily.
So the solution which I am trying to setup uses Apache Crunch to perform ETL by reading the raw datas from HBase, decoding both the keys and values from bytes to human-friendly representations, then write the results inside Parquet files.
The final Parquet file follows a schema like :
- metric (string)
- timestamp (integer)
- value (integer)
- tags (map of key/values with tagk/tagv)
The tricky part here is the conversion from bytes representation to human-friendly representation. I am able to convert the raw keys without much problem but when it comes to the values this is not that easy and even after following the official documentation; I am not able to figure out how exactly are encoded these values :
I understood there is two modes, one is non-compacted where the values spread over several columns, and other is compacted with all the values concatenated inside one unique column and the qualifiers also concatenated inside one single qualifier. Still in my data which are compacted I am not able to find this representation because I have only one 3-octets qualifier which does not look at all like a concatenated qualifier.
I could not find much resource regarding this issue, but only this one post which is similar :
Still I am not even able to run the snippets of code given in this post as it implies some unknown dependencies.
It seems there is no built-in OpenTSDB feature as of now which allows decoding the raw datas as the above post finally was turned into a feature request for OpenTSDB which has not been implemented so far.
Does anyone has faced a similar issue and found any solution to it ?
Or does anyone has good knowledge of how exactly are encoded the data inside OpenTSDB as just reading the documentation I am not able to match what is actually encoded inside my OpenTSDB table ?
Thank you so much for your help.
Best Regards,
Erwan