Presto support for s3, s3a schemes for Hive S3

902 views
Skip to first unread message

axba...@gmail.com

unread,
Jun 27, 2017, 4:14:43 PM6/27/17
to Presto
I've been trying to get Presto to work with Hive + S3 for querying .csv files.

I am able to easily set up Hive with S3 using the "s3://" scheme. However, with Presto 0.178, org.apache.http.HttpException is thrown with "s3 protocol is not supported".

So, question #1: Does Presto 0.178+ officially support the "s3://" scheme, or not?

I've seen a number of conflicting reports on this and would like to know what's really supported and what's not.

I've attempted the same with the "s3a:// scheme" in Hive. However, setting this up in Hive is elusive as I keep hitting "Scheme 's3a' not registered". I have not found anything that explains this error.

So, question #2: Has anyone successfully set up Presto with Hive and S3 using the "s3a://" scheme?

Any help or insight you can provide is greatly appreciated.

axba...@gmail.com

unread,
Jun 27, 2017, 4:16:58 PM6/27/17
to Presto, axba...@gmail.com
If the "s3://" scheme is actually supported in Presto 0.178, can anyone shed any light on why I may be getting "s3 protocol is not supported"?


axba...@gmail.com

unread,
Jun 27, 2017, 5:00:25 PM6/27/17
to Presto, axba...@gmail.com

I also tested with "s3n://" and I get the same org.apache.http.HttpException is thrown with "s3 protocol is not supported".

Has anyone seen this before?

Kurt Larson

unread,
Jun 27, 2017, 5:28:22 PM6/27/17
to Presto, axba...@gmail.com
Do you have the flexibility to run EMR Presto?  If so EMR Presto with an EMR Hive metastore service on the same EMR cluster works just fine as is deployed by AWS.

 

Alex Baretto

unread,
Jun 27, 2017, 5:29:40 PM6/27/17
to Kurt Larson, Presto
Thanks. Unfortunately I am not at liberty to use EMR for this task. 

Nezih Yigitbasi

unread,
Jun 27, 2017, 5:29:49 PM6/27/17
to Presto, axba...@gmail.com
Can you share the complete stack trace (if you are using the presto cli use --debug argument when starting the cli) ? Presto uses the PrestoS3FileSystem for s3, s3a, s3n uris so it should support the s3 uris unless something else is going on.

Nezih

axba...@gmail.com

unread,
Jun 27, 2017, 6:16:30 PM6/27/17
to Presto, axba...@gmail.com
Sure, here it is:

com.facebook.presto.spi.PrestoException: Unable to execute HTTP request: null
at com.facebook.presto.hive.HiveSplitSource.propagatePrestoException(HiveSplitSource.java:139)
at com.facebook.presto.hive.HiveSplitSource.isFinished(HiveSplitSource.java:117)
at com.facebook.presto.split.ConnectorAwareSplitSource.isFinished(ConnectorAwareSplitSource.java:72)
at com.facebook.presto.split.BufferingSplitSource.fetchSplits(BufferingSplitSource.java:67)
at com.facebook.presto.split.BufferingSplitSource.lambda$fetchSplits$1(BufferingSplitSource.java:73)
at com.google.common.util.concurrent.AbstractTransformFuture$AsyncTransformFuture.doTransform(AbstractTransformFuture.java:211)
at com.google.common.util.concurrent.AbstractTransformFuture$AsyncTransformFuture.doTransform(AbstractTransformFuture.java:200)
at com.google.common.util.concurrent.AbstractTransformFuture.run(AbstractTransformFuture.java:130)
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:399)
at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:902)
at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:813)
at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:655)
at com.google.common.util.concurrent.AbstractTransformFuture$TransformFuture.setResult(AbstractTransformFuture.java:245)
at com.google.common.util.concurrent.AbstractTransformFuture.run(AbstractTransformFuture.java:177)
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:399)
at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:902)
at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:813)
at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:655)
at com.google.common.util.concurrent.SettableFuture.set(SettableFuture.java:48)
at io.airlift.concurrent.MoreFutures.lambda$toListenableFuture$10(MoreFutures.java:445)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:580)
at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.amazonaws.AmazonClientException: Unable to execute HTTP request: null
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:724)
at com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:466)
at com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:427)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:376)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4039)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3976)
at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:776)
at com.facebook.presto.hive.PrestoS3FileSystem.listPrefix(PrestoS3FileSystem.java:474)
at com.facebook.presto.hive.PrestoS3FileSystem.access$000(PrestoS3FileSystem.java:108)
at com.facebook.presto.hive.PrestoS3FileSystem$1.<init>(PrestoS3FileSystem.java:266)
at com.facebook.presto.hive.PrestoS3FileSystem.listLocatedStatus(PrestoS3FileSystem.java:264)
at com.facebook.presto.hadoop.HadoopFileSystem.listLocatedStatus(HadoopFileSystem.java:30)
at com.facebook.presto.hive.HadoopDirectoryLister.list(HadoopDirectoryLister.java:32)
at com.facebook.presto.hive.util.HiveFileIterator.getLocatedFileStatusRemoteIterator(HiveFileIterator.java:113)
at com.facebook.presto.hive.util.HiveFileIterator.computeNext(HiveFileIterator.java:86)
at com.facebook.presto.hive.util.HiveFileIterator.computeNext(HiveFileIterator.java:41)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:140)
at com.facebook.presto.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:235)
at com.facebook.presto.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:86)
at com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:187)
at com.facebook.presto.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:45)
at com.facebook.presto.hive.util.ResumableTasks.lambda$submit$1(ResumableTasks.java:33)
... 4 more
Caused by: org.apache.http.client.ClientProtocolException
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:186)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:875)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:715)
... 26 more
Caused by: org.apache.http.HttpException: s3 protocol is not supported
at org.apache.http.impl.conn.DefaultRoutePlanner.determineRoute(DefaultRoutePlanner.java:88)
at org.apache.http.impl.client.InternalHttpClient.determineRoute(InternalHttpClient.java:124)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:183)
... 31 more

Nezih Yigitbasi

unread,
Jun 27, 2017, 8:24:42 PM6/27/17
to Presto, axba...@gmail.com
Can you share your hive catalog configuration? Are you setting hive.s3.endpoint? What I see below is the PrestoS3FileSystem is sending the request to an s3:// uri, instead it should send the requests to an http/https uri (the AWS S3 services). I suspect this is a config issue.

Alex Baretto

unread,
Jun 27, 2017, 9:27:52 PM6/27/17
to Nezih Yigitbasi, Presto
I had:

connector.name=hive-hadoop2
hive.metastore.uri=thrift://[metastore-ip-addr]:9083
hive.s3.endpoint=s3://[bucket-name]
hive.s3.aws-access-key=[aws-bucket-user-access-key]
hive.s3.aws-secret-key=[aws-bucket-user-secret-key]

instead of 

connector.name=hive-hadoop2
hive.metastore.uri=thrift://[metastore-ip-addr]:9083
hive.s3.endpoint=http://[bucket-name].s3-us-west-2.amazonaws.com
hive.s3.aws-access-key=[aws-bucket-user-access-key]
hive.s3.aws-secret-key=[aws-bucket-user-secret-key]

After making that correction, I get the following error:

Query 20170628_012604_00003_qcg43 failed: The specified bucket does not exist (Service: Amazon S3; Status Code: 404; Error Code: NoSuchBucket; Request ID: 29EAB48EA367F1BE)
com.facebook.presto.spi.PrestoException: The specified bucket does not exist (Service: Amazon S3; Status Code: 404; Error Code: NoSuchBucket; Request ID: 29EAB48EA367F1BE)
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The specified bucket does not exist (Service: Amazon S3; Status Code: 404; Error Code: NoSuchBucket; Request ID: 29EAB48EA367F1BE)
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1387)
at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:940)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:715)

at com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:466)
at com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:427)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:376)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4039)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3976)
at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:776)
at com.facebook.presto.hive.PrestoS3FileSystem.listPrefix(PrestoS3FileSystem.java:474)
at com.facebook.presto.hive.PrestoS3FileSystem.access$000(PrestoS3FileSystem.java:108)
at com.facebook.presto.hive.PrestoS3FileSystem$1.<init>(PrestoS3FileSystem.java:266)
at com.facebook.presto.hive.PrestoS3FileSystem.listLocatedStatus(PrestoS3FileSystem.java:264)
at com.facebook.presto.hadoop.HadoopFileSystem.listLocatedStatus(HadoopFileSystem.java:30)
at com.facebook.presto.hive.HadoopDirectoryLister.list(HadoopDirectoryLister.java:32)
at com.facebook.presto.hive.util.HiveFileIterator.getLocatedFileStatusRemoteIterator(HiveFileIterator.java:113)
at com.facebook.presto.hive.util.HiveFileIterator.computeNext(HiveFileIterator.java:86)
at com.facebook.presto.hive.util.HiveFileIterator.computeNext(HiveFileIterator.java:41)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:140)
at com.facebook.presto.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:235)
at com.facebook.presto.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:86)
at com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:187)
at com.facebook.presto.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:45)
at com.facebook.presto.hive.util.ResumableTasks.lambda$submit$1(ResumableTasks.java:33)
... 4 more

I can access that bucket from both 'hive' and 'hdfs'

nezih yigitbasi

unread,
Jun 27, 2017, 9:29:39 PM6/27/17
to Alex Baretto, Presto
The problem is: hive.s3.endpoint=s3://[bucket-name]

The protocol here should be http instead of s3

Alex Baretto

unread,
Jun 27, 2017, 9:30:42 PM6/27/17
to nezih yigitbasi, Presto
Thanks. I changed it. Now I get a new error after changing it. 

Query 20170628_012604_00003_qcg43 failed: The specified bucket does not exist (Service: Amazon S3; Status Code: 404; Error Code: NoSuchBucket; Request ID: 29EAB48EA367F1BE)
com.facebook.presto.spi.PrestoException: The specified bucket does not exist (Service: Amazon S3; Status Code: 404; Error Code: NoSuchBucket; Request ID: 29EAB48EA367F1BE)

Alex Baretto

unread,
Jun 27, 2017, 9:31:36 PM6/27/17
to nezih yigitbasi, Presto
And I am using the same creds (access key and secret key) with Presto (via my catalog) as I am with hive and hdfs.

nezih yigitbasi

unread,
Jun 27, 2017, 9:34:06 PM6/27/17
to Alex Baretto, Presto
Cat you just remove that line from your config and try again?

Alex Baretto

unread,
Jun 27, 2017, 9:39:12 PM6/27/17
to nezih yigitbasi, Presto
Removing the 'hive.s3.endpoint=https://[bucket-name].s3-us-west-2.amazonaws.com" line fixed things. My queries are now working on s3n:// and s3:// :D

Do you think this is related to some stripping issues that the getBucket() has had? 

Thanks! -A. 


Nezih Yigitbasi

unread,
Jun 27, 2017, 9:44:19 PM6/27/17
to Presto, nezihyi...@gmail.com, axba...@gmail.com
Not exactly sure, but the endpoint may not include the bucket name, see http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region.
You can confirm that by trying "http://s3-us-west-2.amazonaws.com" as endpoint and see whether it works.

Sky Yin

unread,
Jun 27, 2017, 10:21:17 PM6/27/17
to presto...@googlegroups.com, axba...@gmail.com
Do you mean the hive URI configuration? It should look like "hive.metastore.uri=thrift://..." You don't need to point to S3 path anywhere in the configuration.

I'm running Presto without EMR and querying files in S3 through Hive metastore.


--
You received this message because you are subscribed to the Google Groups "Presto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to presto-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

David Phillips

unread,
Jun 27, 2017, 10:34:05 PM6/27/17
to presto...@googlegroups.com
Try removing the endpoint property entirely. I think you only need it for special circumstances like talking to non-US buckets (possibly only those in China since it's separate).

Alex Baretto

unread,
Jun 28, 2017, 11:11:17 AM6/28/17
to Presto
Thanks. I have my queries working with 's3://' and 's3n://' since removing that property. 

I am still seeing problems with 's3a://' in Hive. I will post that on a separate thread. 

On Tue, Jun 27, 2017 at 7:34 PM, David Phillips <da...@acz.org> wrote:
Try removing the endpoint property entirely. I think you only need it for special circumstances like talking to non-US buckets (possibly only those in China since it's separate).

--
You received this message because you are subscribed to a topic in the Google Groups "Presto" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/presto-users/alzldiG8beQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to presto-users+unsubscribe@googlegroups.com.

Dan Young

unread,
Jun 28, 2017, 12:25:57 PM6/28/17
to presto...@googlegroups.com
Hello Alex,

We're using s3a w/o problems w/ external tables......what sort of issues are you seeing, what version of Hive are you running?

you can ping me directly d...@looker.com/dano...@gmail.com

Regards,

Dano


On Wed, Jun 28, 2017 at 8:11 AM Alex Baretto <axba...@gmail.com> wrote:
Thanks. I have my queries working with 's3://' and 's3n://' since removing that property. 

I am still seeing problems with 's3a://' in Hive. I will post that on a separate thread. 

On Tue, Jun 27, 2017 at 7:34 PM, David Phillips <da...@acz.org> wrote:
Try removing the endpoint property entirely. I think you only need it for special circumstances like talking to non-US buckets (possibly only those in China since it's separate).

--
You received this message because you are subscribed to a topic in the Google Groups "Presto" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/presto-users/alzldiG8beQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to presto-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Presto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to presto-users...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages