Hi,
I have been looking a Drill and Parquet and even though I found Drill very interesting, solving many use cases Druid is not meant to solve (variable datasources/datatypes etc.), I found it lacking in many areas (Histograms/HyperLogLog etc.) that are important for us when running our analytic queries.
The only question that remains is if you have considered support for Apache Parquet as segment files.
There are several reasons I ask:
- It seems to offer superb encoding/compression/efficiency
- It supports nested structures (complex JSON structures)
I know it's missing HyperLogLog/Histograms now but they seem to be on the roadmap (in their Jira issues) and if Druid was to support them then that would open up for a lot of interoperability (I think).
I'm certainly not qualified for any in-depth discussion about Druid architecture but I wanted to ask if you have considered it as a possibility going forward.
Best regards,
-Stefan