Druid Historical Nodes and Deep Storage

3,040 views
Skip to first unread message

Austin Kyker

unread,
Jul 30, 2015, 3:37:49 PM7/30/15
to Druid User
So, I presented to my team about Druid and they have a lot of questions: some I couldn't find the answer to.

Let's imagine that we have a production cluster: Several historical nodes, a coordinator node, a real-time node, and a broker node.

I understand basic flow: real-time node ingests data, creates segments, push them to deep storage (HDFS) and publishes to
other nodes that segments are available. 

First question is, "Does coordinator ensure that all segments in HDFS are loaded into historical nodes?"

If so, "What happens when you have more data in deep storage than you have space on Historical Nodes"? For instance,
let's say we have 1 TB of data in HDFS and we have 5 historical nodes (each can hold 100 GB)? How does druid handle this:
 Does it load as much as it can into Historical nodes?

My other question regards querying. I know that broker receives query, but how does it know which historical nodes
to send the query to. Does it communicate with the coordinator node to get this information or does it simply send blindly to 
all historical nodes?

Does historical node ever load data from HDFS to service a query, or does it only load when coordinator tells it (when a new
segment is published to HDFS)?


Lastly, looking at Druid from a high level, I am still not understanding exactly why Deep Storage is needed. If the historical nodes
can be configured to replicate data (already an available system) then why can't they just be used to hold all of the data?




Gian Merlino

unread,
Jul 30, 2015, 5:06:18 PM7/30/15
to druid...@googlegroups.com
Hey Austin, hope this helps:

1) Yes, the coordinator does ensure that segments from HDFS are loaded on historicals. It decides which historical should load which segment and sends them commands to make that happen.

2) If you have more data in deep storage than you can fit on your historical disks, then Druid will load as much as it can and then start logging warnings on the coordinator that there is not enough capacity to load everything.

3) Historical nodes advertise which segments they're serving in ZooKeeper, and the broker maintains a local cache of the segment -> historical mapping.

4) Historicals only pull data from deep storage when instructed to do so by the coordinator. They won't pull it dynamically in response to a query.

5) Good question :). The simple answer is that Druid was designed to run in an environment where a persistent blob storage service exists, like HDFS, S3, or the analogous Google and Azure offerings. In that world it's natural to leverage the blob storage service rather than trying to invent one's own distributed file system. Using an external blob store also makes Druid much easier to run elastically- you can safely spin up or terminate historical nodes at any time without worrying about permanent data loss.

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/f2ef2b77-1619-44b2-8e2b-8a0d46a3008e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Fangjin Yang

unread,
Jul 30, 2015, 9:20:49 PM7/30/15
to Druid User, gianm...@gmail.com
To add onto why deep storage was first created. The first Druid deployments ran in AWS, where failures could occur that wipe out half your cluster. WIthout having data backed up elsewhere, you would lose data anytime there was an AWS outage.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.

Amey Jain

unread,
Feb 22, 2016, 4:56:33 PM2/22/16
to Druid User

One follow up question on this ? when we day it can scale to petabytes of data we need that much space on the historical nodes? this means if I have 1 petabyte of data to analyze, I will need atleast 1 petabyte(without replication) on HDFS(deepstorage) and 1 petabyte of total disk space on historical nodes?



On Thursday, July 30, 2015 at 2:06:18 PM UTC-7, Gian Merlino wrote:

Fangjin Yang

unread,
Feb 29, 2016, 12:17:44 PM2/29/16
to Druid User
Amey, reading http://static.druid.io/docs/druid.pdf might help

segments are on historicals, and raw data/segments are backed up to HDFS

sascha...@smaato.com

unread,
Mar 4, 2016, 2:05:28 AM3/4/16
to Druid User
Interesting thread. We've been having exactly the same questions.

Even with what has been explained here, there are two things I still don't get:

1.) when I was trying to find out whether segments can be loaded from deep storage at query time (which according to the info from this thread is not possible), I found one statement in the Druid documentation that I interpreted as proof that such on-demand loading of segments is possible:
This page decribes the property druid.segmentCache.locations
Segments assigned to a Historical node are first stored on the local file system (in a disk cache) and then served by the Historical node. These locations define where that local cache resides.
Default value: none (no caching)

So, if it is possible and even the default to disable caching of segments on local disk AND if segments cannot be loaded from deep storage on demand at query time, then how would a historical be able to serve queries in the case that local disk caching is disabled?

---

The second thing I don't get:
With what has been explained here, if there is 1 petabyte of druid segments in deep storage (s3 or hdfs) that shall be queryable, then one also has to have 1 petabyte of local disk storage on the historicals, right? This is what I would conclude from the statement that segments cannot be loaded from deep-storage on demand at query time. The argument for deep storage for other products we operate is usually the lower price point. Its cheaper to buy 1 PB of S3 storage than to buy 1PB of disk storage. The price point for 1PB of local disk storage is probably too high to make petabyte scale querying feasible. It might be technically possible but price points always matter and who could affort 1PB of local storage for a single-tenant solution given that the expected cluster utilization based on the usual natural demand for analytical querying is rather low. So my original understanding was that local disks are just a first-level cache and that the full data is residing in S3 for cost reasons in order to make Druid feasible for a broader number of business use-cases.

When using the metamarkets dashboard, there is an increased latency and even a visual indication of such when selecting time ranges that go farther back than the last 6 weeks or so. I always assumed that that was due to segments being loaded from deep storage to local disk. If every queryable data needs to be on local disks anyways, I don't seem to understand why there are differences in query latency depending on which time range gets queried. I understand that older data can be rolled up to days, then weeks, but this would only make querying this data faster, not slower, right?

thanks
Message has been deleted

Federico Nieves

unread,
Sep 6, 2016, 5:41:23 PM9/6/16
to Druid User
Hi,

I have the same question, more specifically:

One follow up question on this ? when we day it can scale to petabytes of data we need that much space on the historical nodes? this means if I have 1 petabyte of data to analyze, I will need atleast 1 petabyte(without replication) on HDFS(deepstorage) and 1 petabyte of total disk space on historical nodes?

When using Tranquility for real time indexing we came up with an issue. Index Tasks were always on "Running" state forever. After some debug, it turned out that on Coordinator logs were showing: Not enough [_default_tier] servers or node capacity to assign segment .

On Historicals we just put 30GB for segment Cache and maxSize:

druid.server.maxSize=30000000000
druid.segmentCache.locations=[{"path": "/var/druid/cache/historical", "maxSize": 30000000000}]

But HDFS is much larger than this (around 1TB). But it seems that index tasks are not making the handoff because Historicals don't have capacity. So all data in HDFS should also be inside Historicals, right?

The equation could be something like:

HistoricalNodes * HistoricalCapacity * DruidReplication = HDFSCapacity * HDFSReplication

Thanks!

Manish Malhotra

unread,
Sep 14, 2018, 7:11:17 PM9/14/18
to Druid User

We also had similar question, when thinking about capacity planning :)
Though Druid is a fantastic product, so nothing about that, and this could be inherent design decision to have to have local disk of historical nodes = HDFS/S3 space ( including replication)

But wanted to understand the implication of 
(Historical nodes disk space < Deep storage storage requirement)

-Manish

sabyasachi pain

unread,
Sep 14, 2018, 10:06:26 PM9/14/18
to druid...@googlegroups.com
Hi,

My 2 cents here.

Most of these analytical tools (say ELK) is not recommended to be used as Single Source of Truth,i.e. you are free to use them for your analytics /OLAP needs but not to trust them entirely for the safe long term preservation of data as you would trust a full fledged database.
Hence the philosophy of a dedicated, totally redundancy backup.

Guess, Druid also follows the same philosophy. 

Thanks 

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.

no jihun

unread,
Nov 18, 2018, 10:49:21 AM11/18/18
to Druid User
I had same question so real all comments and links.

Finally my understanding is,
Deep storage is not for data scalability. It is just for data backup or data recovery.

All segments need to be located on historical before query.

So, If you have 1TB segments to be queried, Your historical need to have more than 1TB disk.

Reply all
Reply to author
Forward
0 new messages