A little confusion about "druid.segmentCache.locations" property

1,737 views
Skip to first unread message

Zha Rui

unread,
Aug 29, 2016, 11:39:32 PM8/29/16
to Druid User
Hi experts:

  I have a liiter confusion about "druid.segmentCache.locations" property. I hive 3 historical nodes and the property set as below:

  druid.segmentCache.locations=[{"path":"var/druid/segment-cache","maxSize"\:50000000000}]

  That means 50GB segments local disk cache for each historical nodes and 150GB cache totally.

  After I ran several batch ingestion, the 150GB local disk cache was ran out,and the coordinator UI shows "100% to load until available" about the new dataSource I just ingested.  

  In the Historial Node Configuration Documentation, I found the description of "druid.segmentCache.locations" and dafault value NONE means no caching, so I unset the properties but the historical nodes fail to started and  I got error as below:

   2016-08-30T03:35:12,099 ERROR [main] io.druid.cli.CliHistorical - Error when starting up.  Failing.

com.google.inject.ProvisionException: Unable to provision, see the following errors:


1) druid.segmentCache.locations - may not be empty

  at io.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:131) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> io.druid.guice.StorageNodeModule)

  at io.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:131) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> io.druid.guice.StorageNodeModule)

  while locating com.google.common.base.Supplier<io.druid.segment.loading.SegmentLoaderConfig>

  at io.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:132) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> io.druid.guice.StorageNodeModule)

  while locating io.druid.segment.loading.SegmentLoaderConfig

    for the 2nd parameter of io.druid.segment.loading.SegmentLoaderLocalCacheManager.<init>(SegmentLoaderLocalCacheManager.java:59)

  while locating io.druid.segment.loading.SegmentLoaderLocalCacheManager

  at io.druid.guice.LocalDataStorageDruidModule.configure(LocalDataStorageDruidModule.java:53) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> io.druid.guice.LocalDataStorageDruidModule)

  while locating io.druid.segment.loading.SegmentLoader

    for the 1st parameter of io.druid.server.coordination.ServerManager.<init>(ServerManager.java:106)

  at io.druid.cli.CliHistorical$1.configure(CliHistorical.java:78) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> io.druid.cli.CliHistorical$1)

  while locating io.druid.server.coordination.ServerManager

  at io.druid.cli.CliHistorical$1.configure(CliHistorical.java:80) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> io.druid.cli.CliHistorical$1)

  while locating io.druid.query.QuerySegmentWalker

    for the 5th parameter of io.druid.server.QueryResource.<init>(QueryResource.java:110)

  while locating io.druid.server.QueryResource


 So I want to know about this property:


 (1) I just used hdfs as the deep storage, so why historical still need local disk cache?

 (2) I want to set no caching, what the error mean? how to fix it ?

 

Gian Merlino

unread,
Aug 29, 2016, 11:45:37 PM8/29/16
to druid...@googlegroups.com
Druid needs to download and cache data locally before it can be announced for possible serving. It's not done on demand. So, you need enough segment cache allocated to store 100% of your data.

Gian

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/1ac6985d-1e73-4b4e-8fc4-0b6026db7032%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Zha Rui

unread,
Aug 29, 2016, 11:53:03 PM8/29/16
to Druid User
Gian:

 Than you for quick reply. As you say even if I have persisted segments on HDFS druid still needs another copy of segments in local disk cache??? And what does it meaning "no caching" as the default value of "druid.segmentCache.locations"?

Zha Rui 

在 2016年8月30日星期二 UTC+8上午11:45:37,Gian Merlino写道:

Gian

To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.

Gian Merlino

unread,
Aug 30, 2016, 1:11:53 AM8/30/16
to druid...@googlegroups.com
Yeah, you still need another copy on the local disks. Druid always serves queries off its local memory or disk, and views deep storage as more of a "backup". druid.segmentCache.locations is mandatory since the default value of "none" is not a workable config.

Gian

To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

Zha Rui

unread,
Aug 30, 2016, 1:26:22 AM8/30/16
to Druid User
I‘ve got it. Thank you so much Gian !

在 2016年8月30日星期二 UTC+8下午1:11:53,Gian Merlino写道:
Yeah, you still need another copy on the local disks. Druid always serves queries off its local memory or disk, and views deep storage as more of a "backup". druid.segmentCache.locations is mandatory since the default value of "none" is not a workable config.

Gian

On Mon, Aug 29, 2016 at 8:53 PM, Zha Rui <zrd...@gmail.com> wrote:
Gian:

 Than you for quick reply. As you say even if I havei persisted segments on HDFS druid still needs another copy of segments in local disk cache??? And what does it meaning "no caching" as the default value of "druid.segmentCache.locations"?

Mo

unread,
Oct 25, 2016, 9:58:22 AM10/25/16
to Druid User
@Gian
If I dont use HDFS but LOCAL config (single historical node) where all segments are on local, can I disable segment cache?

Thanks,
Mo

Slim Bouguerra

unread,
Oct 25, 2016, 1:54:30 PM10/25/16
to druid...@googlegroups.com
You can not run druid without deep storage unless all the druid nodes are in the same machine. 
So you need deep storage eg S3 or HDFS and enough disk to cache the segment locally.

-- 

B-Slim
_______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______

Varsha Neelesh

unread,
Dec 27, 2016, 10:12:56 PM12/27/16
to Druid User
Hi Gian, can the segment cache location be an mounted NFS? Or does it have to be a local disk directory or local memory only?

Regards,
Varsha 

Gian

Nishant Bangarwa

unread,
Dec 28, 2016, 12:41:01 PM12/28/16
to Druid User
Hi Varsha,

While it is theoretically possible to set segment cache location to mounted NFS, It is not advisable as druid memory maps the files from segment cache in order to serve queries and using NFS there can adversely affect query performance.  


Message has been deleted

Varsha Neelesh

unread,
Dec 29, 2016, 5:30:37 PM12/29/16
to Druid User
Thanks Nishant.
Reply all
Reply to author
Forward
0 new messages