2014-10-17 16:46:43,191 ERROR [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - Failed to load segment for dataSource: {class=io.druid.server.coordination.ZkCoordinator, exceptionType=class io.druid.segment.loading.SegmentLoadingException, exceptionMessage=Exception loading segment[click_conversion_2014-08-31T00:00:00.000Z_2014-09-01T00:00:00.000Z_2014-10-16T16:59:39.454Z], segment=DataSegment{size=68530163, shardSpec=NoneShardSpec, metrics=[count, commissions, sales, orders], dimensions=[browser, channel, city, country, coupon, device, outclick_type, property, region, site], version='2014-10-16T16:59:39.454Z', loadSpec={type=s3_zip, bucket=s3-int-std-agg-deep-storage, key=click_conversion/2014-08-31T00:00:00.000Z_2014-09-01T00:00:00.000Z/2014-10-16T16:59:39.454Z/0/index.zip}, interval=2014-08-31T00:00:00.000Z/2014-09-01T00:00:00.000Z, dataSource='click_conversion', binaryVersion='9'}}
io.druid.segment.loading.SegmentLoadingException: Exception loading segment[click_conversion_2014-08-31T00:00:00.000Z_2014-09-01T00:00:00.000Z_2014-10-16T16:59:39.454Z]
at io.druid.server.coordination.ZkCoordinator.addSegment(ZkCoordinator.java:129)
at io.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:44)
at io.druid.server.coordination.BaseZkCoordinator$1.childEvent(BaseZkCoordinator.java:113)
at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:494)
at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:488)
at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:92)
at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:83)
at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:485)
at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35)
at org.apache.curator.framework.recipes.cache.PathChildrenCache$11.run(PathChildrenCache.java:755)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: io.druid.segment.loading.SegmentLoadingException: Problem decompressing object[S3Object [key=click_conversion/2014-08-31T00:00:00.000Z_2014-09-01T00:00:00.000Z/2014-10-16T16:59:39.454Z/0/index.zip, bucket=s3-int-std-agg-deep-storage, lastModified=Thu Oct 16 19:07:53 UTC 2014, dataInputStream=org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream@42ff28c0, Metadata={ETag="369cff46932e170a9cce1aa6e8df3019", Date=Fri Oct 17 16:46:38 UTC 2014, Content-Length=34937426, id-2=K53EBUsDk+HZxt+c8OazPSM43ZJ9i/1elkoty7V+m+Doc3kOQYXfC757/Gg1C2HKIItcYfnUM+Y=, request-id=1FDA28445F60E5F9, Last-Modified=Thu Oct 16 19:07:53 UTC 2014, md5-hash=369cff46932e170a9cce1aa6e8df3019, Content-Type=application/zip}]]
at io.druid.storage.s3.S3DataSegmentPuller.getSegmentFiles(S3DataSegmentPuller.java:138)
at io.druid.segment.loading.OmniSegmentLoader.getSegmentFiles(OmniSegmentLoader.java:125)
at io.druid.segment.loading.OmniSegmentLoader.getSegment(OmniSegmentLoader.java:93)
at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:146)
at io.druid.server.coordination.ZkCoordinator.addSegment(ZkCoordinator.java:125)
... 17 more
Caused by: java.io.IOException: Problem decompressing object[S3Object [key=click_conversion/2014-08-31T00:00:00.000Z_2014-09-01T00:00:00.000Z/2014-10-16T16:59:39.454Z/0/index.zip, bucket=s3-int-std-agg-deep-storage, lastModified=Thu Oct 16 19:07:53 UTC 2014, dataInputStream=org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream@42ff28c0, Metadata={ETag="369cff46932e170a9cce1aa6e8df3019", Date=Fri Oct 17 16:46:38 UTC 2014, Content-Length=34937426, id-2=K53EBUsDk+HZxt+c8OazPSM43ZJ9i/1elkoty7V+m+Doc3kOQYXfC757/Gg1C2HKIItcYfnUM+Y=, request-id=1FDA28445F60E5F9, Last-Modified=Thu Oct 16 19:07:53 UTC 2014, md5-hash=369cff46932e170a9cce1aa6e8df3019, Content-Type=application/zip}]]
at io.druid.storage.s3.S3DataSegmentPuller$1.call(S3DataSegmentPuller.java:114)
at io.druid.storage.s3.S3DataSegmentPuller$1.call(S3DataSegmentPuller.java:88)
at com.metamx.common.RetryUtils.retry(RetryUtils.java:22)
at io.druid.storage.s3.S3Utils.retryS3Operation(S3Utils.java:79)
at io.druid.storage.s3.S3DataSegmentPuller.getSegmentFiles(S3DataSegmentPuller.java:86)
... 21 more
Caused by: java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:345)
at com.google.common.io.ByteStreams.copy(ByteStreams.java:211)
at io.druid.utils.CompressionUtils.unzip(CompressionUtils.java:104)
at io.druid.storage.s3.S3DataSegmentPuller$1.call(S3DataSegmentPuller.java:103)
... 25 more
2014-10-17 16:46:43,192 INFO [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - /druid/loadQueue/10.101.187.175:8080/click_conversion_2014-08-31T00:00:00.000Z_2014-09-01T00:00:00.000Z_2014-10-16T16:59:39.454Z was removed
It does look like the indexing went through properly, so possibly something is wrong with the coordinator/historical data loading dance. Are there any logs on the coordinator that look like it's trying to assign these segments to historical nodes (and succeeding)? Are there any logs on your historical nodes indicating they're trying to download these segments (and succeeding)?
--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/5c66ab9e-2654-4791-aaa4-86e55b86ede6%40googlegroups.com.
|
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/5c66ab9e-2654-4791-aaa4-86e55b86ede6%40googlegroups.com.
Nishant
Software Engineer | METAMARKETS
m +91-9729200044
You received this message because you are subscribed to a topic in the Google Groups "Druid Development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-development/h2oLzqU9RwU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/68067114-f99b-4a75-94ff-f2eed1d9e117%40googlegroups.com.
I thought druid stored data in memory, not on the filesystem. I'm using machines with 30 GB of main memory, but only 6GB of disk space on /. Since we're talking about the segment cache here (keyword being cache) I figured that old data would be flushed from disk as new data flowed in, and that the segment cache didn't need to be that big.
You're saying druid stores the entire dataset both in memory and on disk at the same time and that my segment cache needs to be at least as big (bigger?) than the size of the machine's main memory?Confused :(
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsub...@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/5c66ab9e-2654-4791-aaa4-86e55b86ede6%40googlegroups.com.
--
You received this message because you are subscribed to a topic in the Google Groups "Druid Development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-development/h2oLzqU9RwU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-development+unsub...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/68067114-f99b-4a75-94ff-f2eed1d9e117%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Every historical node must first download a segment locally before it can run queries over it. Druid memory maps segments by default. This error is saying you've told Druid to download segments to /tmp/indexCache, but the maxSize is too small to accommodate the segment.
To set the maximum size your node can node, please set these two configs (example sizes provided):druid.server.maxSize=1550000000000druid.segmentCache.locations=[{"path": "/mnt/persistent/zk_druid", "maxSize": 1550000000000}]
To understand how to configure historical nodes for your hardware, I'd recommend reading:
We rely on the OS to page columns and segments in and out of memory. Data that is queried often is held in memory and data that is not queried is paged out as more queries come in.
If you provide your hardware, we can make suggestions about how to configure things for your particular hardware. We are also happy to do a follow up call to better discuss Druid configuration.
Every historical node must first download a segment locally before it can run queries over it. Druid memory maps segments by default. This error is saying you've told Druid to download segments to /tmp/indexCache, but the maxSize is too small to accommodate the segment.
To set the maximum size your node can node, please set these two configs (example sizes provided):druid.server.maxSize=1550000000000druid.segmentCache.locations=[{"path": "/mnt/persistent/zk_druid", "maxSize": 1550000000000}]
To understand how to configure historical nodes for your hardware, I'd recommend reading:
We rely on the OS to page columns and segments in and out of memory. Data that is queried often is held in memory and data that is not queried is paged out as more queries come in.
If you provide your hardware, we can make suggestions about how to configure things for your particular hardware. We are also happy to do a follow up call to better discuss Druid configuration.
On Oct 19, 2014, at 3:55 PM, Fangjin Yang <fangj...@gmail.com> wrote:Every historical node must first download a segment locally before it can run queries over it. Druid memory maps segments by default. This error is saying you've told Druid to download segments to /tmp/indexCache, but the maxSize is too small to accommodate the segment.To set the maximum size your node can node, please set these two configs (example sizes provided):druid.server.maxSize=1550000000000druid.segmentCache.locations=[{"path": "/mnt/persistent/zk_druid", "maxSize": 1550000000000}]I generate my Druid node configuration in a bash shell script that calculates maxSize (among other things) based on inspections of the machine's hardware. For the druid.segmentCache.locations size I use the following:# Index cache is 50% of available space at index cache pathINDEX_CACHE_PATH="/tmp/druid/indexCache"mkdir -p "$INDEX_CACHE_PATH"INDEX_CACHE_SIZE="$(echo "(($(df $INDEX_CACHE_PATH -k --output=size | tail -n 1 | awk '{print $1}') * 1024 * .50) + 0.5) / 1" | bc)"Which for my machine, comes out to be 4160450560 bytes, or approximately 4.2 GB. Here's the output from df for reference:$ df -h /tmp/druid/indexCacheFilesystem Size Used Avail Use% Mounted on/dev/xvda1 7.8G 6.3G 1.2G 85% /As you can see, I've configured Druid to use an appropriate amount of space, avoiding filling up the the drive all the way and running into out of memory errors.As for the druid.server.maxSize, I use the following:# Max size is 90% of memorySERVER_MAX_SIZE="$(echo "(($(grep MemTotal /proc/meminfo | awk '{print $2}') * 1024 * .90) + 0.5) / 1" | bc)"That comes out to 28407667507 bytes, or approximately 28 GB. Inspecting the /proc filesystem for reference:
$ grep MemTotal /proc/meminfoMemTotal: 30824292 kBWhich shows that my machine has approximately 30 GB of main memory.So just to summarize, this is my current configuration for maxSize for my historical nodes:# 90% of main memory, 28 GB (out of 30 GB total)druid.server.maxSize=28407667507# 50% of disk space at the given path, 4.2 GB (out of 7.8 GB total)druid.segmentCache.locations=[{"path": "/tmp/druid/indexCache", "maxSize": 4160450560}]If I set my segmentCache's maxSize any higher I get out of memory errors when the /tmp filesystem gets completely full. I'm not anywhere near running out of main memory based on the output of top:top - 14:19:52 up 2 days, 20:39, 1 user, load average: 0.00, 0.01, 0.05Tasks: 123 total, 1 running, 122 sleeping, 0 stopped, 0 zombie%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 stKiB Mem: 30824292 total, 7657208 used, 23167084 free, 155812 buffersKiB Swap: 0 total, 0 used, 0 free. 5988252 cached MemSo I should be able to accommodate another 23 GB of data in the main memory on my historical node.
To understand how to configure historical nodes for your hardware, I'd recommend reading:Quoting from this link, I'm trying to understand exactly what this means:"Historical nodes use off-heap memory to store intermediate results, and by default, all segments are memory mapped before they can be queried."Does this mean that I need to have as much disk space for my segmentCache as I have for the druid server maxSize? That seems to be what you're implying here, unless I'm mistaken.
We rely on the OS to page columns and segments in and out of memory. Data that is queried often is held in memory and data that is not queried is paged out as more queries come in.That doesn't seem to be happening here in this case. I have a lot more memory available to hold data that's not getting used because the disk space seems to be the real limiting factor. If you always have the same amount of disk space as main memory, you would never be able to get into a situation where the OS would need to page data out of memory. So I can't figure out how the scenario you've mentioned would ever actually happen.If you provide your hardware, we can make suggestions about how to configure things for your particular hardware. We are also happy to do a follow up call to better discuss Druid configuration.Hopefully the above details provide enough information to resolve this issue. In the meantime I'll talk to my boss about setting up a call, we might have some more people that would want to sit on that call :)