History nodes failed to download segments from s3

386 views
Skip to first unread message

jaehc

unread,
Aug 19, 2016, 9:29:48 AM8/19/16
to Druid User
hello,

It was Okay to generate segments with EMR. I checked the newly created segments in the S3 bucket.
However, when a historical node tried to download the segment from S3, it failed. 

Because I am not good at AWS, it comes to me hard to find the problem.

druid : 0.9.1.1
os : Amazon Linux AMI release 2016.03

_common/common.runtime.properties
druid.extensions.loadList=["druid-s3-extensions"]
....
druid.storage.type=s3
druid.storage.bucket=druid-bench
druid.storage.baseKey=druid/segments
druid.s3.accessKey=...
druid.s3.secretKey=...


History server log
io.druid.segment.loading.SegmentLoadingException: Exception loading segment[wikiticker_2015-09-12T00:00:00.000Z_2015-09-13T00:00:00.000Z_2016-08-19T09:49:18.904Z]
at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:309) ~[druid-server-0.9.1.1.jar:0.9.1.1]
at io.druid.server.coordination.ZkCoordinator.addSegment(ZkCoordinator.java:350) [druid-server-0.9.1.1.jar:0.9.1.1]
at io.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:44) [druid-server-0.9.1.1.jar:0.9.1.1]
at io.druid.server.coordination.ZkCoordinator$1.childEvent(ZkCoordinator.java:152) [druid-server-0.9.1.1.jar:0.9.1.1]
at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:522) [curator-recipes-2.10.0.jar:?]
at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:516) [curator-recipes-2.10.0.jar:?]
at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93) [curator-framework-2.10.0.jar:?]
at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]
at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:85) [curator-framework-2.10.0.jar:?]
at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:514) [curator-recipes-2.10.0.jar:?]
at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35) [curator-recipes-2.10.0.jar:?]
at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:772) [curator-recipes-2.10.0.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [?:1.7.0_111]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_111]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [?:1.7.0_111]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_111]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_111]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_111]
at java.lang.Thread.run(Thread.java:745) [?:1.7.0_111]
Caused by: java.lang.RuntimeException: org.jets3t.service.ServiceException: Request Error. -- ResponseCode: 400, ResponseStatus: Bad Request, RequestId: FD20B18427AE04C8, HostId: yE5BEAYtDmc8Vr2Sg0QN6PHDGImbxZYmAnbKppdBvYJXELBy5xBXtUssUuFo0fs+pBD/WYnK9vU=
at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
at io.druid.storage.s3.S3DataSegmentPuller.isObjectInBucket(S3DataSegmentPuller.java:341) ~[?:?]
at io.druid.storage.s3.S3DataSegmentPuller.getSegmentFiles(S3DataSegmentPuller.java:174) ~[?:?]
at io.druid.storage.s3.S3LoadSpec.loadSegment(S3LoadSpec.java:62) ~[?:?]
at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegmentFiles(SegmentLoaderLocalCacheManager.java:143) ~[druid-server-0.9.1.1.jar:0.9.1.1]
at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:95) ~[druid-server-0.9.1.1.jar:0.9.1.1]
at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:152) ~[druid-server-0.9.1.1.jar:0.9.1.1]
at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:305) ~[druid-server-0.9.1.1.jar:0.9.1.1]
... 18 more
Caused by: org.jets3t.service.ServiceException: Request Error.
at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:426) ~[jets3t-0.9.4.jar:0.9.4]
at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:279) ~[jets3t-0.9.4.jar:0.9.4]
at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRestHead(RestStorageService.java:1052) ~[jets3t-0.9.4.jar:0.9.4]
at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectImpl(RestStorageService.java:2264) ~[jets3t-0.9.4.jar:0.9.4]
at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectDetailsImpl(RestStorageService.java:2193) ~[jets3t-0.9.4.jar:0.9.4]
at org.jets3t.service.StorageService.getObjectDetails(StorageService.java:1120) ~[jets3t-0.9.4.jar:0.9.4]
at org.jets3t.service.StorageService.getObjectDetails(StorageService.java:575) ~[jets3t-0.9.4.jar:0.9.4]
at io.druid.storage.s3.S3Utils.isObjectInBucket(S3Utils.java:92) ~[?:?]
at io.druid.storage.s3.S3DataSegmentPuller$4.call(S3DataSegmentPuller.java:332) ~[?:?]
at io.druid.storage.s3.S3DataSegmentPuller$4.call(S3DataSegmentPuller.java:328) ~[?:?]
at com.metamx.common.RetryUtils.retry(RetryUtils.java:60) ~[java-util-0.27.9.jar:?]
at com.metamx.common.RetryUtils.retry(RetryUtils.java:78) ~[java-util-0.27.9.jar:?]
at io.druid.storage.s3.S3Utils.retryS3Operation(S3Utils.java:85) ~[?:?]
at io.druid.storage.s3.S3DataSegmentPuller.isObjectInBucket(S3DataSegmentPuller.java:326) ~[?:?]
at io.druid.storage.s3.S3DataSegmentPuller.getSegmentFiles(S3DataSegmentPuller.java:174) ~[?:?]
at io.druid.storage.s3.S3LoadSpec.loadSegment(S3LoadSpec.java:62) ~[?:?]
at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegmentFiles(SegmentLoaderLocalCacheManager.java:143) ~[druid-server-0.9.1.1.jar:0.9.1.1]
at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:95) ~[druid-server-0.9.1.1.jar:0.9.1.1]
at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:152) ~[druid-server-0.9.1.1.jar:0.9.1.1]
at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:305) ~[druid-server-0.9.1.1.jar:0.9.1.1]
... 18 more
Caused by: org.jets3t.service.impl.rest.HttpException: 400 Bad Request
at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:425) ~[jets3t-0.9.4.jar:0.9.4]
at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:279) ~[jets3t-0.9.4.jar:0.9.4]
at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRestHead(RestStorageService.java:1052) ~[jets3t-0.9.4.jar:0.9.4]
at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectImpl(RestStorageService.java:2264) ~[jets3t-0.9.4.jar:0.9.4]
at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectDetailsImpl(RestStorageService.java:2193) ~[jets3t-0.9.4.jar:0.9.4]
at org.jets3t.service.StorageService.getObjectDetails(StorageService.java:1120) ~[jets3t-0.9.4.jar:0.9.4]
at org.jets3t.service.StorageService.getObjectDetails(StorageService.java:575) ~[jets3t-0.9.4.jar:0.9.4]
at io.druid.storage.s3.S3Utils.isObjectInBucket(S3Utils.java:92) ~[?:?]
at io.druid.storage.s3.S3DataSegmentPuller$4.call(S3DataSegmentPuller.java:332) ~[?:?]
at io.druid.storage.s3.S3DataSegmentPuller$4.call(S3DataSegmentPuller.java:328) ~[?:?]
at com.metamx.common.RetryUtils.retry(RetryUtils.java:60) ~[java-util-0.27.9.jar:?]
at com.metamx.common.RetryUtils.retry(RetryUtils.java:78) ~[java-util-0.27.9.jar:?]
at io.druid.storage.s3.S3Utils.retryS3Operation(S3Utils.java:85) ~[?:?]
at io.druid.storage.s3.S3DataSegmentPuller.isObjectInBucket(S3DataSegmentPuller.java:326) ~[?:?]
at io.druid.storage.s3.S3DataSegmentPuller.getSegmentFiles(S3DataSegmentPuller.java:174) ~[?:?]
at io.druid.storage.s3.S3LoadSpec.loadSegment(S3LoadSpec.java:62) ~[?:?]
at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegmentFiles(SegmentLoaderLocalCacheManager.java:143) ~[druid-server-0.9.1.1.jar:0.9.1.1]
at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:95) ~[druid-server-0.9.1.1.jar:0.9.1.1]
at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:152) ~[druid-server-0.9.1.1.jar:0.9.1.1]
at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:305) ~[druid-server-0.9.1.1.jar:0.9.1.1]
... 18 more


I guess the the problem is due to the default region which is s3.amazonaws.com.
Or, are there other problmes for this?

]$ aws s3 ls s3://druid-bench --region ap-northeast-2
                           PRE druid/
                           PRE logs/
]$ aws s3 ls s3://druid-bench --region ap-northeast-1
An error occurred (InvalidRequest) when calling the ListObjects operation: You are attempting to operate on a bucket in a region that requires Signature Version 4.  You can fix this issue by explicitly providing the correct region location using the --region argument, the AWS_DEFAULT_REGION environment variable, or the region variable in the AWS CLI configuration file.  You can get the bucket's location by running "aws s3api get-bucket-location --bucket BUCKET".

Shuai Chang

unread,
Aug 21, 2016, 9:36:01 PM8/21/16
to Druid User
+1, we've seen the same issue in eu-central-1 and ap-northeast-1

Fangjin Yang

unread,
Aug 25, 2016, 5:28:48 PM8/25/16
to Druid User
I believe these problems are actually just communication problems with S3 that occur from time to time.

The V4 signature problem is real though.

The underlying problems are https://issues.apache.org/jira/browse/HADOOP-9248 and https://issues.apache.org/jira/browse/HADOOP-13325

The confirmed workaround is:

1) Clone Druid master, add `case "s3a":` at line 404 of JobHelper.java, change aws-java-sdk to version 1.7.4 in pom.xml, rebuild

2) In common.runtime.properties, configure S3 deep storage as normal.

3) Save a file in conf/druid/_common/jets3t.properties with the contents:

s3service.s3-endpoint = s3.ap-northeast-2.amazonaws.com

storage-service.request-signature-version=AWS4-HMAC-SHA256

4) Run: java   -cp "dist/druid/lib/*"   -Ddruid.extensions.directory="dist/druid/extensions"   -Ddruid.extensions.hadoopDependenciesDir="dist/druid/hadoop-dependencies"   io.druid.cli.Main tools pull-deps   --no-default-hadoop   -h "org.apache.hadoop:hadoop-client:2.7.2" -h "org.apache.hadoop:hadoop-aws:2.7.2"

5) In druid.indexer.runner.javaOpts on middleManager, add -Dcom.amazonaws.services.s3.enableV4

6) In job json, "hadoopDependencyCoordinates" : ["org.apache.hadoop:hadoop-client:2.7.2", "org.apache.hadoop:hadoop-aws:2.7.2"]

7) In job json, "jobProperties" : {

     "fs.s3.impl" : "org.apache.hadoop.fs.s3a.S3AFileSystem",

     "fs.s3n.impl" : "org.apache.hadoop.fs.s3a.S3AFileSystem",

     "fs.s3a.endpoint" : "s3.ap-northeast-2.amazonaws.com",

     "fs.s3a.access.key" : "XXX",

     "fs.s3a.secret.key" : "YYY"

   }


On Friday, August 19, 2016 at 6:29:48 AM UTC-7, jaehc wrote:

Gowtham Sai

unread,
Jan 10, 2017, 12:54:46 PM1/10/17
to Druid User
I'm facing the same issue. Is it fixed or still the same? Using druid-0.9.2.

j...@jaduda.com

unread,
Oct 2, 2017, 11:08:26 AM10/2/17
to Druid User
We had the same issue. Our workaround was to create a new bucket in eu-west-1 which supports Version 2.

Lawrence Huang

unread,
May 24, 2018, 7:56:11 PM5/24/18
to Druid User
See my comment here for using s3a deep storage:
Reply all
Reply to author
Forward
0 new messages