Druid vs. InfluxDB

Austin Kyker

unread,

Jul 27, 2015, 1:50:13 PM7/27/15

to Druid User

So I have been doing a bit of research into time-series databases, and Druid and InfluxDB have caught my attention.

I wanted to post some of my thoughts and was wondering if others think this is as an accurate analysis of the technologies.

For my use cases, I am currently leaning towards Druid which seems to emphasize speed even more than ease of usability.

What do you all think?

Druid vs. InfluxDB

· Why Druid is better

o Major Emphasis – Speed and Scalability

o InfluxDB does not scale well at this point – the maximum and required cluster size is 3 nodes, whereas Druid is horizontally scalable without bound.

o InfluxDB’s performance degrades significantly when grouping by tags (dimensions in Druid) with cardinality > 100,000

o InfluxDB uses BoltDB as its internal storage engine and therefore does not provide the flexibility that Druid does in selecting a backend (S3, HDFS, or local storage).

o Ability to write custom, Javascript aggregation functions.

o Offers real time, reliable ingestion through public APIs and Kafka. InfluxDB does not just hook-up to Kafka (there is a project on Github to do this, but it has only 2 contributors).

· Why InfluxDB is better

o Major Emphasis – Usability and Simplicity

o Offer SQL-like query language that is very intuitive - much easier than having to frame queries in JSON

o Several user interfaces already exist to allow for data exploration and visualization (Grafana).

o Comes with more built-in aggregation functions – PERCENTILE, STDDEV, etc.

o Is completely schema-less in that you can add columns on the fly – must keep column data types consistent in order to get expected query results.

o Has no external dependencies (Druid relies on MySQL and Apache Zookeeper for exploration).

o Much simpler to expire data – adding a retention policy (1 day, 1 month, etc.) can be done in one line – whereas in Druid you must write a rule to the Coordinator config which is more difficult.

o Allows for joins across series (SQL table equivalent)

Gian Merlino

unread,

Jul 27, 2015, 3:07:41 PM7/27/15

to druid...@googlegroups.com

Hey Austin, thanks for the writeup! You're right that Druid focuses on speed and scalability. I'd just add a couple of things:

- Druid does support adding columns on the fly. You can change the ingestion spec (dimensions & metrics) any time you want, and the new spec will take effect for newly ingested data. Queries will work against both older schemas and newer schemas simultaneously, as long as the column types don't conflict. You can also run Druid ingestion in a "schemaless dimensions" mode, where any field that you don't explicitly list as a metric automatically becomes a dimension.

- There are a couple of community-contributed GUIs: a Grafana plugin (https://github.com/Quantiply/grafana-plugins) and a GUI built from the ground up for Druid (https://github.com/mistercrunch/panoramix)

- There are also community-contributed SQL-like query languages for Druid, one for Java and one for JavaScript (the latter can also be used from the command line): http://druid.io/docs/latest/development/libraries.html

- Druid actually can compute percentiles, using its approximate histogram & quantile aggregator. It can require a bit of tuning to get the right accuracy vs storage tradeoff for you, which is why it's labeled experimental.

- The coordinator config rules *can* be added programmatically in a single call (there's an HTTP API). I'm not sure if that makes life easier or harder for you, though…

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/3fe028df-15d1-43cd-8a67-9d265a47b106%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

charles.allen

unread,

Jul 27, 2015, 4:06:13 PM7/27/15

to Druid User, gianm...@gmail.com

I would also note that it does not require MySQL as an actual DB, it works with postgresql as well. But to say it requires an external metadata store is true.

Reply all

Reply to author

Forward