Calling a REST Endpoint as datasource

961 views
Skip to first unread message

Dheeraj Bansal

unread,
Jun 18, 2017, 1:26:26 PM6/18/17
to Presto
Hi,

I have a requirement where I need to call multiple rest enpoints as data source which will return me json format data. I have to join on these endpoints as they are represented as data sources. I have gone through the source code example-http-connector which serves similar purpose but still flow of control is not very clear to me.  

Can someone paint a picture w.r.t. how different classes are inter dependent and interact which other and is there a way to test connector locally in STS or other IDE's without deploying it in presto plugins folder. 

Any pointer/help will help.

Thanks,
Dheeraj

kokosing

unread,
Jun 19, 2017, 2:04:22 AM6/19/17
to Presto
Hi,

I have written simple REST connector framework on which I built simple Twitter, Slack and Github connectors. I used that as an example that Presto is capable to query any data source on presentation about Presto.

You can find all the code here: https://github.com/prestodb-rocks/presto-rest
presto-rest-base takes care of all the Presto connector related things and exposes only one simple interface (https://github.com/prestodb-rocks/presto-rest/blob/master/presto-rest-base/src/main/java/rocks/prestodb/rest/Rest.java) which has to be implemented in order to have custom REST service be used as data source in Presto. Here you have an example implementation: https://github.com/prestodb-rocks/presto-rest/blob/master/presto-rest-slack/src/main/java/rocks/prestodb/rest/slack/SlackRest.java


Hope it will be helpful. Please notice that is kind of POC quality of code, it requires more to make it more robust to be eligible to work at production. Anyway, I am here to help you! :)

Cheers,
Grzesiek
Message has been deleted

Dheeraj Bansal

unread,
Jun 19, 2017, 6:40:15 AM6/19/17
to Presto
Hi Grzesiek,

One more question, I assume token in all these examples are used to authenticate against these services, how to generate these tokens and what is their flow.

Thanks,
Dheeraj

On Monday, 19 June 2017 15:03:46 UTC+5:30, Dheeraj Bansal wrote:
Hi Grzesiek,

Thanks for helping. Code looks neatly organized. :) 

One question though, Is there an easy way to create model classes for below JSON's. You can see they are fairly complex and different from each other and hence two separate models have to be created in the same eclipse project ?




Thanks,
Dheeraj

Dheeraj Bansal

unread,
Jun 19, 2017, 11:49:39 AM6/19/17
to Presto
Hi Grzesiek,

Few Updates :

1. I have created model classes from json to java using http://www.jsonschema2pojo.org/
2. I was able to run test cases through TestNG by passing security token for my github account.

Still I am unclear on how to build this project and how to deploy in presto and where does this presto token needs to be mentioned as part of env variable. I would be very thankful if u can help me understand these issues.

Thanks,
Dheeraj

kokosing

unread,
Jun 20, 2017, 2:13:15 AM6/20/17
to Presto
Hi,

I created model for JSON manually, thanks to that I have only fields which I am interested in. Regarding token, it depends on your REST service how do you authenticate, in implemented examples (slack, github) security tokens were sent as parameters. Regarding twitter it is much more complicated and it is handled by some dedicated library.

This project lacks documentation as I always do not have time to update that. Each connector module is able to build a fat jar, a jar which contains all the dependencies when. To create them you need to run `mvn package`. These jars have to be added to presto connectors jars, also per each you need to create properties file with authorization configuration.

Regards,
Grzesiek
Message has been deleted

Dheeraj Bansal

unread,
Jun 20, 2017, 2:15:17 PM6/20/17
to Presto
Hi Grzesiek,

Kindly share the sample .properties file for these github, twitter and slack which can be used to query ANSI-SQL.

Thanks,
Dheeraj

On Tuesday, 20 June 2017 18:11:08 UTC+5:30, Dheeraj Bansal wrote:
Hi Grzesiek,

Thanks for helping.

I modified the twitter connector code as per our requirement along with commenting security and passing query parameter. Code was getting compiled properly in STS but I am getting following error while building jar through "mvn package". I have attached both java project as well as maven error. 

Please remove the .remove extension and import in STS as java existing project.

Kindly help. I will be grateful.

Thanks,
Dheeraj

--
You received this message because you are subscribed to a topic in the Google Groups "Presto" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/presto-users/t31yux2fs7E/unsubscribe.
To unsubscribe from this group and all its topics, send an email to presto-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dheeraj Bansal

unread,
Jun 21, 2017, 8:35:25 AM6/21/17
to Presto
Hi Grzesiek,

We are able to create a connector the same way github code works and test cases are running fine. We have created a fat jar and placed inside the plugins folder and added .properties file in the /etc/catalog folder but getting the following error while bringing the server up. Any clues on what it means and how to resolve.

Is there a guide to build and deploy these custom connectors ?

Thanks,
Dheeraj

kokosing

unread,
Jun 22, 2017, 1:31:13 AM6/22/17
to Presto
Hi,

I am sorry but I do not see mentinoed error, have you forgot to attach a file? 

I am not aware of any such guide, here are some useful information: https://prestodb.io/docs/current/installation/deployment.html (see catalog properties section).

Regards,
Grzesiek
To unsubscribe from this group and all its topics, send an email to presto-users...@googlegroups.com.

Dheeraj Bansal

unread,
Jun 22, 2017, 2:50:18 AM6/22/17
to presto...@googlegroups.com
Hello,

We figured out the problem. We were making a big fat jar with all the dependencies bundled inside. Where as presto expects connector jar and dependent jars separately in the plugins folder.

Thanks a ton for all the help you provided all this while.

Thanks,
Dheeraj 

To unsubscribe from this group and all its topics, send an email to presto-users+unsubscribe@googlegroups.com.

Dheeraj Bansal

unread,
Jun 23, 2017, 1:17:42 AM6/23/17
to Presto
Hi,

How can we read the "where clause information" of the query in custom connector so we can pass that query parameter to the data source and can get filtered result ?

Thanks,
Dheeraj

Grzegorz Kokosiński

unread,
Jun 23, 2017, 1:32:23 AM6/23/17
to presto...@googlegroups.com
When a ConnectorMetadata returns a TableLayouts information it also need to return a TupleDomain which describes a domain of data values stored in given table. Then execution engine filters out from reading data which is known that will be not needed. Presto-rest does not support that.

Dheeraj Bansal

unread,
Jun 23, 2017, 7:49:45 AM6/23/17
to presto...@googlegroups.com
Hi Grzegorz,

Thanks for the response but can you give some hints or guide me on how can I modify your code base so where clause can be supported. 
Without where clause we might not be able to use it because our rest endpoints will return the same data everytime and hence no use as query engine.

Thanks,
Dheeraj

kokosing

unread,
Jun 25, 2017, 4:39:52 PM6/25/17
to Presto
Hi, 

Where conditions are passed to connector via  com.facebook.presto.spi.connector.ConnectorMetadata#getTableLayouts method (see parameters constraint and desired columns (projection)).
Notice that WHERE predicate like a > 3 and a  < 10 is translated into (TupleDomain) domain which for column `a` tells that allowed values are from range from 3 till 10.

Presto-rest does not support this. In order to to do this you might need to modify rocks.prestodb.rest.RestConnectorTableLayoutHandle to store there these (domain and desired columns) parameters and then you need to pass them one more time in rocks.prestodb.rest.RestSplitManager#getSplits to class rocks.prestodb.rest.RestConnectorSplit#RestConnectorSplit, so then you could use them in rocks.prestodb.rest.RestRecordSetProvider#getRecordSet to retrieve the records you need.

However, I do not urge to use presto-rest for that. It was just POC kind of thing. Have you considered to write UDF?

Cheers,
Grzesiek

Dheeraj Bansal

unread,
Jul 6, 2017, 2:39:46 AM7/6/17
to Presto
Hi Grzegorz,

We are able to implement where clause. Thanks for all the guidance and help.

Regards,
Dheeraj 

Grzegorz Kokosiński

unread,
Jul 6, 2017, 3:45:11 AM7/6/17
to presto...@googlegroups.com
Hi,

I am glad you have made it!

Cheers,
Grzegorz

To unsubscribe from this group and all its topics, send an email to presto-users+unsubscribe@googlegroups.com.

Dheeraj Bansal

unread,
Jul 14, 2017, 1:33:47 AM7/14/17
to Presto
Hi Grzegorz,

Does presto-rocks distributes the query to multiple worker nodes by default or do we have to write some code so that custom connector for API also distributes the load across multiple nodes.

If entire query is getting executed at only one worker node than it may be a performance hit.

Thanks,
Dheeraj

Grzegorz Kokosiński

unread,
Jul 14, 2017, 3:37:39 AM7/14/17
to presto...@googlegroups.com
Hi,

REST service is called from single worker, but it is up to Presto scheduler which worker will be used in given query.

I would not expect that query is executed on single worker node. Presto query will be distributed across the cluster in regular way, only this REST call will be performed on a single node.

Reference classes:
https://github.com/prestodb-rocks/presto-rest/blob/master/presto-rest-base/src/main/java/rocks/prestodb/rest/RestConnectorSplit.java

To unsubscribe from this group and all its topics, send an email to presto-users+unsubscribe@googlegroups.com.

Dheeraj Bansal

unread,
Jul 14, 2017, 5:07:48 AM7/14/17
to presto...@googlegroups.com
Hi Grzegorz,

But MongoDB or Mysql queries are distributed across the worker nodes right ? 

So we are doing a select on mongo on millions of record, presto will distribute it across multiple workers ?

Thanks,
Dheeraj

Grzegorz Kokosiński

unread,
Jul 17, 2017, 1:47:14 AM7/17/17
to presto...@googlegroups.com
Queries to Mysql are typically run from single node and then data might be distributed over the cluster by Presto execution engine, the same thing is for this simple rest connector. 

I am not aware how MongoDB connector is implemented.

Dheeraj Bansal

unread,
Jul 17, 2017, 2:23:50 AM7/17/17
to presto...@googlegroups.com
Thanks for the reply Grzegorz. :)

Dheeraj Bansal

unread,
Jul 31, 2017, 3:03:06 AM7/31/17
to presto...@googlegroups.com
Hi Grzegorz,

Is there a way we can access query_id in custom connector ?

Thanks,
Dheeraj

Grzegorz Kokosiński

unread,
Jul 31, 2017, 5:02:11 AM7/31/17
to presto...@googlegroups.com
com.facebook.presto.spi.ConnectorSession#getQueryId

Dheeraj Bansal

unread,
Jul 31, 2017, 5:08:52 AM7/31/17
to presto...@googlegroups.com
Thanks sir.

One more quick question, query id gets generated before query is passed on to underlying connector ?

Regards,
Dheeraj

Grzegorz Kokosiński

unread,
Jul 31, 2017, 5:28:38 AM7/31/17
to presto...@googlegroups.com
Yes ;)

Dheeraj Bansal

unread,
Jul 31, 2017, 5:30:11 AM7/31/17
to presto...@googlegroups.com
Awesome :). 

Thanks a ton :)

PRAVEEN BABU

unread,
Sep 8, 2017, 2:46:43 AM9/8/17
to Presto
Hi kokosing

we have implemented a rest connector with where clause in which we can pass query params and request body params. ConnectorMetadata#getTableLayouts we get a constraint parameter which has the Tupeldomain, but we noticed that the moment you pass anything apart from a single condition or conditions with only AND operator,tupleDomain becomes empty.Is there any specific reason why Presto does not send other where clause statements to the underlying connector. If you have come across this problem please let me know the reason if you don't mind. And one more thing the entire where clause is there in predicate filed in the constraint object but I am not able to extract it properly, I am still working on it. If you found out any other way to extract all where clause statements in the connector please help me, it would be a great help.

Dain Sundstrom

unread,
Sep 8, 2017, 1:29:33 PM9/8/17
to presto...@googlegroups.com
A TupleDomain is a summary of the *ranges* over which a single column can have data, and is specifically not a full predicate. There are plans to extend this, but I caution you that when we do implement this, many (if not most), will not connectors be able support anything but trivial predicates. This is because the connector must have the same semantics that Presto has for evaluating queries. Specifically, if the query plan is run with a pushed down predicate it must return the same results as a query plan that did not result in predicate push down. For example, in ANSI SQL, which Presto implements, `a + b` must fail in the case of numeric overflow, so you could not push this down to a system like Hive, where you simply get an overflowed result.

-dain

> On Sep 7, 2017, at 11:46 PM, pb4p...@gmail.com wrote:
>
> Hi kokosing
>
> we have implemented a rest connector with where clause in which we can pass query params and request body params. ConnectorMetadata#getTableLayouts we get a constraint parameter which has the Tupeldomain, but we noticed that the moment you pass anything apart from a single condition or conditions with only AND operator,tupleDomain becomes empty.Is there any specific reason why Presto does not send other where clause statements to the underlying connector. If you have come across this problem please let me know the reason if you don't mind. And one more thing the entire where clause is there in predicate filed in the constraint object but I am not able to extract it properly, I am still working on it. If you found out any other way to extract all where clause statements in the connector please help me, it would be a great help.
>
> --
> You received this message because you are subscribed to the Google Groups "Presto" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to presto-users...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages