Support SQL like filters for REST API collections

1,382 views
Skip to first unread message

Sergey Khaladzinski

unread,
Nov 17, 2013, 12:42:06 PM11/17/13
to api-...@googlegroups.com
REST API design seems to be simple when it comes to basic resources like `/product/{id}` or `/products` (collection).

In the real world though, you need to be able to filter your collections. For example display products where `price > 500` and `price < 1000`.

I guess my quesion is how do you model SQL like syntax into your REST API, so you could do complex collection filtering using `MAX`, `MIN`, `AND`, `OR`, `>`, `<`, `<=`, `>=`, `sort by` with `asc` and `desc`?

How can you represent that logic cleanly with query string, so your API is still readable and doesn't look like this:

params[sorts][0][0]=usertotal&params[sorts][0][1]=ASC&params[fields][0][0]=id&params[fields][0][1]=uid&params[fields][0][2]=status&params[fields][0][3][t]=sum&params[fields][0][3][0]=total&params[fields][0][3][1]=usertotal&params[wheres][0][0][t]=and&params[wheres][0][0][0][t]=and&params[wheres][0][0][0][0][t]=cmp&params[wheres][0][0][0][0][0]=created&params[wheres][0][0][0][0][1]=>&params[wheres][0][0][0][0][2]=1.01.2013&params[wheres][0][0][0][1][t]=or&params[wheres][0][0][0][1][0][t]=cmp&params[wheres][0][0][0][1][0][0]=status&params[wheres][0][0][0][1][0][1]==&params[wheres][0][0][0][1][0][2]=1&params[wheres][0][0][0][1][1][t]=cmp&params[wheres][0][0][0][1][1][0]=status&params[wheres][0][0][0][1][1][1]==&params[wheres][0][0][0][1][1][2]=2&params[wheres][0][0][1][t]=cmp&params[wheres][0][0][1][0]=uid&params[wheres][0][0][1][1]==&params[wheres][0][0][1][2]=1.01.2013&params[limit]=10&params[offset]=0


If you look closer to query sample above, it uses array to transfer/encode filters from client to API so that data could be easily extracted from query string and processed on the API side. At the same time it completely ruins readability, make API totally useless without SDK.

I'm struggling to find the right balance between structuring complex collection filters using query string on the client and keeping API clean..

Would really appreciate any feedback on this.

Kijana Woodard

unread,
Nov 17, 2013, 12:55:38 PM11/17/13
to api-...@googlegroups.com

You might want something like odata.

For me, I don't like giving api users sql-like abilities. It feels like a cop out. I'd rather have resources like
/bargin-bin
/featured
/premium-offers
/solid-values

And so forth. I don't like the idea of essentially saying, I have no idea what your use cases are, so here's a connection to my db so you can get whatever you want.

I also feel it makes it much more difficult to change the implementation over time.

This pertains to public APIs more than internal ones.

--
You received this message because you are subscribed to the Google Groups "API Craft" group.
To unsubscribe from this group and stop receiving emails from it, send an email to api-craft+...@googlegroups.com.
Visit this group at http://groups.google.com/group/api-craft.
For more options, visit https://groups.google.com/groups/opt_out.

mca

unread,
Nov 17, 2013, 1:43:11 PM11/17/13
to api-...@googlegroups.com
I'll rant first, then pass along some advice ;)

RANT
You don't NEED this kind of interaction, but you (or some of your users) likely WANT it. This level of request customization is very costly to implement and scale. Be prepared for lots of trouble. You're now up for much more complex caching, much more involved UI work, and a much more dangerous payload heading into your backend (think "SELECT * from Users where 1=1;DROP TABLE"). The surface area has now increased to the point where some users are likely to create invalid queries (or queries that don't actually return what they expect) and your support costs will go up as well.

It also represents an exposure of an internal data model to the outside world that is almost never a good idea. First, the tight coupling to the internal model represents a risk that you break every client when you change the data model. Second the exposure of this internal model is a boon to competitors who want to know just how your business works. Finally, exposing the model gives fuel to malicious client apps to troll your system for valuable info and the chance to do damage.

When presented w/ a request to build this kind of system, I resist for as long as possible. IME, this kind of effort can end in tears; and it often does.

ADVICE
If you (or your team/customer/patron) decide to go ahead with this level of custom request handling, then here's the way that I've found limits (but does not eliminate) the potential harm.

1) adopt a standardized query language; one that you can validate internally against a model. T-SQL, SPARQL, even YQL would work, possibly even OData's query model. You likely will want to allow only a subset of features in order to limit potential damage to your data model (e.g. now DDL features, no JOINs, etc.)

2) ONLY pass these queries in a request BODY, not in a query string. You avoid all sorts of messy encoding issues, and can get past unpredictable line length limits for any machines along the way. This also allows you to validate the incoming message body against a parser, reject dangerous executions, and help users fix poorly formed requests. 

3) optimize the request pattern to a) accept a BODY (via POST), b) create a server-side query record that can be replayed via a shortcut URL, and c) build clients that keep track of their own shortcut URLs and execute queries using GET /queries/{shortcut-id}. This allows you to re-use queries, cache the results (as you're now using GETs), gives you the chance to monitor usage of various queries, etc.

FINAL COMMENT
My final advice: don't do it. Save yourself. Run away.

Cheers.

--

Serge Khaladzinski

unread,
Nov 17, 2013, 2:32:47 PM11/17/13
to api-...@googlegroups.com
Thanks for the info.

Do you suggest any alternatives to collection filters? One approach I can see is to create verbs for each allowed action, but it will become a mess pretty fast.

It also feels wrong to always get all items in the collection and filter on client, slow and lots of load for nothing.

I think every API should have filter support to some extent.



--
You received this message because you are subscribed to a topic in the Google Groups "API Craft" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/api-craft/yA0pboBEfPY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to api-craft+...@googlegroups.com.



--
Cheers,

Sergey Khaladzinski

mca

unread,
Nov 17, 2013, 2:41:06 PM11/17/13
to api-...@googlegroups.com
simple serer-side filters can be supported via the {name}={value} (Name/Value Pairs) pattern either from the URI or the BODY. You can also look into using the Lucene query filters (not as simple/URI-friendly as NVP pattern, but well-supported.

the key is to make sure all the server-side queries are cached as unique responses so that intermediaries don't get confused and you can still scale your back-end.

filtering (SQL WHERE) is a challenge, but much easier than field-projection (SELECT {fieldname1}, {fieldname2}, etc.).


Cheers.

Serge Khaladzinski

unread,
Nov 17, 2013, 7:31:49 PM11/17/13
to api-...@googlegroups.com
I think I'm curious where that "simple filter" border is, what's simple and what's not. I was thinking to implement ODATA $filter (Logical Operators only), probably $expand (to query linked resources) and $select to get just individual fields.

The only thing I don't really like is their syntax with space in between, like " /Products?$filter=Price gt 20" or "/Products?$filter=Price le 200 and Price gt 3.5".

I'm curious what other options exist to encode logical operators in the url?

The other question I have is the idea $expand and getting linked resources within single request. Do I need to support every possible linked resource combination on my backend (API) controller? It feels like there will be a lot of duplication since you can traverse collection from any direction..

Expand example: Customers?$expand=Orders (get customers + their orders )

The request will hit customers controller on the API side and I'll need to do some logic like "if $expand contains Orders, then construct JOIN on that table". Lot's of if statements like this for every possible relation. ( plus if we add permissions to the mix.. )

mca

unread,
Nov 17, 2013, 9:19:50 PM11/17/13
to api-...@googlegroups.com
yeah simple is a vague term. you'll need to decide how much risk you want to assume here.

most of my work involves exposing domain-agnostic filtering such that {field}={value}&{field}={value}&... is all that is needed. no gt, lt, eq, no joins, field selection, etc. that's my definition of "simple" - YMMV.

this is all matters of design and degree, not right/wrong. my advice to you if we were working on the same team would be to just start doing the work and see how much work is involved and how much risk you accrue in the process.

Cheers.

Pedro Santos

unread,
Nov 18, 2013, 1:39:59 PM11/18/13
to api-...@googlegroups.com
Regarding the "expand" issue, there are several implementation possibilities (depending on you requirements and internal architecture)... for example:

1) You could as you sugested, do the expansion right at the persistance layer (using joins & whatnots)
2) You could do the expansion at the buisness/application level (by executing the logic associated with the expansion and including just "inject" the result in the response for the original request)
3) You could set up a "proxy" in front of you API that would process the "expansions" effectively making N+1 requests to you API (for N expansions in a request)

All these have pros & cons, that is to say, they need to be evaluated within the context your usage scenarios.

Kevin Swiber

unread,
Nov 18, 2013, 11:46:02 PM11/18/13
to api-...@googlegroups.com
Hello.

I've actually spent some time implementing a SQL subset over an API.  Here are some pointers based on my experience.

  1. Make no mistake.  You are building a full query engine on top of an API.  This means parsing, SQL injection protection, field mapping, unbounded result sets, and indexing are yours to solve.
  2. Opt for single collection querying at first.  Handling JOIN logic gets messy.  If you can get by without it, do so.
  3. Offering rich projection/filtering/sorting capabilities means your API is likely designed for broadly defined use cases.  This isn't bad, necessarily.  Just know you're building an API that resembles a data model, which leads to the next point...
  4. Remember: This is still an API model, not a data model.  Most of the time, you don't want to spew the raw data model in responses.  Design the API from a business domain point of view.  Use field mapping as necessary, and don't include fields that don't make sense for the API.
  5. Constrain your SQL support.  I support only reads, not writes, using SQL.  And the available operations are much more constrained than any of the SQL specs.
  6. Paging is a mandatory feature.
  7. Abstract all the hard technical bits from the rest of your API.  Not only will it become reusable, but your actual API code will become more readable.
  8. If you support ad-hoc queries, not just a stored procedure model, _do_ send SQL as a query parameter.  Support a Unicode notation if necessary (e.g. \u025B).  If using a form-like search control, there are tricks for submitting character encoding support.
  9. Know that this impacts developers using curl.  You may want to offer a Web-based explorer for your API.
  10. Keep version iteration in consideration.  Field mapping is your friend.

I've done some work related to this, though it is ongoing.  The SQL subset is currently defined in a project named Calypso, and the documentation can be found here: https://github.com/kevinswiber/calypso#calypso-query-language

Presto, a new project by Facebook, also has a Java-based SQL parser: https://github.com/facebook/presto/tree/master/presto-parser/src/main/java/com/facebook/presto/sql

Best of luck,

Kevin




On Sun, Nov 17, 2013 at 12:42 PM, Sergey Khaladzinski <gans...@gmail.com> wrote:

--
You received this message because you are subscribed to the Google Groups "API Craft" group.
To unsubscribe from this group and stop receiving emails from it, send an email to api-craft+...@googlegroups.com.
Visit this group at http://groups.google.com/group/api-craft.
For more options, visit https://groups.google.com/groups/opt_out.



--
Kevin Swiber
Projects: https://github.com/kevinswiber
Twitter: @kevinswiber

Serge Khaladzinski

unread,
Nov 22, 2013, 10:49:41 AM11/22/13
to api-...@googlegroups.com

Very useful $expand info! While doing expand implementation it seems like it makes sense only for one-to-one mappings, ie User to UserProfile and you can get both in the same request.

If you want to expand collection (one-to-many) for example /User/1?$expand=Orders then how many items do you return for expanded collection, what happens if you need to sort Users and Orders separately or return different number of items for internal expanded collections..

It really feels like expand is a great choice for one to one. So we decided to support only that.

Any thoughts?

Pedro Santos

unread,
Nov 22, 2013, 11:21:11 AM11/22/13
to api-...@googlegroups.com
In our case (an API of a product in the company I work for), we use the format ?expand={link-rel}|{limit} when expanding collections up to a maximum limit. However this only gives control over how many initial entries are expanded, it doesn't allow the client to specify either a sort parameter or an offset. The way I see it, expanding a collection link rel is like seeing a "preview" of a collection. To properly navigate the collection we already have the resource that represents the collection itself. However I guess you could define a format that would let you also specify an offset and/or sort order in the same manner.

Peter Monks

unread,
Nov 22, 2013, 4:00:47 PM11/22/13
to api-...@googlegroups.com
G'day,

There was an earlier discussion [1] that touched on a similar topic - it might be worth perusing.

Ralf Handl

unread,
Nov 25, 2013, 4:13:11 AM11/25/13
to api-...@googlegroups.com
OData 4.0 allows nesting of system query options within the $expand option, so you can

GET /User/1?$orderby=Name&$expand=Orders($orderby=TotalAmount desc)

This also allows applying different $filters to the base collection and the expanded collection(s), or different $top values, see http://docs.oasis-open.org/odata/odata/v4.0/csprd03/part2-url-conventions/odata-v4.0-csprd03-part2-url-conventions.html#_Toc369183247.

Serge Khaladzinski

unread,
Nov 27, 2013, 11:12:57 AM11/27/13
to api-...@googlegroups.com
I was wondering if there is good OData syntax parser for PHP. Just the parser. I saw their repository with tons of classes and not much docs, really hard to figure out how this whole thing works there.

I imagine it working this way: pass URL which arrived to the API endpoint (as param) and parser will return a structure which could be used later to get actual results ( from relational DB, NoSQL any data store basically ).


--
You received this message because you are subscribed to a topic in the Google Groups "API Craft" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/api-craft/yA0pboBEfPY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to api-craft+...@googlegroups.com.
Visit this group at http://groups.google.com/group/api-craft.
For more options, visit https://groups.google.com/groups/opt_out.



--
Cheers,

Sergey Khaladzinski
Reply all
Reply to author
Forward
0 new messages