filtering a collection

113 views
Skip to first unread message

nickdu

unread,
Feb 26, 2015, 11:20:30 PM2/26/15
to api-...@googlegroups.com
I see a recent question regarding filtering, but I think my question might be a bit different.

If I have a rest collection which I want to filter, I assume after applying the filter I get back a rest collection.  Is that true?  And to that collection I may want to apply another filter, again getting back another collection.  As you can see I'm talking about successive filtering to narrow down your search.  Is this something that has been done and if so, anyone care to share their solutions/ideas?  At the moment I can't seem to get my head around it.  I guess each filtered collection is the same collection resource as the original one exception for the one or more filters applied.  If I wanted to go down this route it would seem I might have to make use of a request payload to store the aggregated filter.

Thanks,
Nick

Adrian Lynch

unread,
Feb 27, 2015, 4:00:46 AM2/27/15
to api-...@googlegroups.com
Here are some thoughts which I'm hoping people will critique.

Create a filter resource under the collection you're attempting to filter (users in this case):

POST /users/filters + {status: "active"}
>> {id: 1, status: "active"}

Get the results of that filter:

GET /users/filters/1
>> [{an active user}, {an active user}, {an active user}]

My issue with the above GET is that it's the product of your filter and not the filter itself that's returned. Maybe the resource needs to be sub divided? {filter: {}, results: [{an active user}, ...]}

Now you have options.

Make it a client problem to manage filters and create new ones on top of new ones on top of new ones:

POST /users/filters + {status: "active", age: 50}
>> {id: 2, status: "active", age: 50}

GET /users/filters/2
>> [{an active user aged 50}, {an active user aged 50}]

Or if you're feeling brave, create sub filters:

POST /users/filters/1/filters + {age: 50}
>> {age: 50}

The above looks odd to me but it could work.

Another option, have filters as a regular resource:

POST /users/filters + {status: "active"}
>> {id: 1, status: "active"}

GET /users/filters/1
>> {id: 1, status: "active"}

PUT, DELETE, PATCH etc.

Then supply that as a field when getting users:

GET /users?filter=1

Or:

GET /users?filters=1,2,3

Which meets your requirement to layer filters on top of each other.

I think I like that last scenario the best.

Adrian

--
You received this message because you are subscribed to the Google Groups "API Craft" group.
To unsubscribe from this group and stop receiving emails from it, send an email to api-craft+...@googlegroups.com.
Visit this group at http://groups.google.com/group/api-craft.
For more options, visit https://groups.google.com/d/optout.

Paul Cohen

unread,
Feb 27, 2015, 5:38:01 AM2/27/15
to api-...@googlegroups.com
Hi,

It's been a little time since I did any real API hacking but I'll give
some feedback. :-)

On Fri, Feb 27, 2015 at 10:00 AM, Adrian Lynch
<adrian...@concreteplatform.com> wrote:
> Create a filter resource under the collection you're attempting to filter
> (users in this case):

An interesting approach to filters. I'll comment on it further down.

I have usually used query parameters for filtering and using a
specifically named query parameter, e.g. "filter-by", and a few
syntactical conventions for specifying filtering criteria:

http://api.example.org/acollectionofstuff?filter-by=size=12,color=blue

This is to be understood as filter out all items with size=12 and color=blue.

Refining the filter is simply achieved by adding more filter query
parameters, and using some simple syntax for supporting boolean
conditions:

http://api.example.org/acollectionofstuff?filter-by=size=~12,color=|blue;green

This is to be understood as filter out all items with size /=12 and
color=blue or color=green.

The filter parameter syntax could also include support for numeric
intervals and wild cards or regexps in string values.

Pros:
* Easy to understand - I think!
* Easy to test and play with in web browser.
* Caching (of filter and the resulting collection) on server side is
feasible for reasonably sized filtered collections (presumably
implemented with some time-to-live policy).
* Can easily be combined with support for sorting the filtered
collection, e.g. with a query parameter like "sort-by".

Cons:
* The query parameter "filter-by" and its filter-syntax is
"out-of-band" information. And so is "sort-by" as well.
* HTTP cache does not work in proxies.

As for Nicks question on where to do the filtering, once the
(representation of the) filtered collection is with the client, it
seems natural to me that the client does refined filtering on that
set, without involving the server.

Adrians proposal of elevating filters to resources is interesting,
especially for cases with more complex filters.

Pros:
* No "out-of-band" information with specially named query parameters
or syntax.
* Can easily handle complex filters.
* Resulting filtered collections can also be provided as separate
resources (presumably implemented with some time-to-live policy).
* A more elaborate filter syntax can be developed than is possible
with query parameters.
* HTTP cache works in proxies.

Cons:
* Can't test and play with in web browser.
* Need for specifically implementing support for filter resources on
all collection resources for which you want to have filters. In
Adrians proposal he has dedicated filters for "users".
* No support for sorting the resulting collection. This is okay if
we don't want the server to do the sorting.

Another approach with filters as resources would be to:

a) have them independent from specific collection resources and use a
generic filter syntax for the collection media types you support.

b) allow filters to be applied to any collection resource by a HTTP
header, e.g. filtering the collection resource "/users" with filter
"/filters/567":

GET /users
Headers:
X-filter-by: /filters/567

Filters referring to keys/attributes etc that don't exist in a
specific collection resource should simply be ignored when applied to
that collection resource (Postel's law).

/Paul

--
Paul Cohen
www.seibostudios.se
mobile: +46 730 787 035
e-mail: paul....@seibostudios.se

Adrian Lynch

unread,
Feb 27, 2015, 6:10:26 AM2/27/15
to api-...@googlegroups.com
I like the generic filter resource option!

POST /filters + {status: "active", colour: "blue"}
>> {id: 1, status: "active", colour: "blue"}

GET /users?filters=1
>> [{and active user}]

GET /things?filters=1
>> [{a blue thing}]

It's hard not to think about the code behind the API when designing, but the generic option would be super simple to do!

With the current API we're working on, we've shied away from headers but passing a URI for filters to be applied is interesting too.

Adrian

Paul Cohen

unread,
Feb 27, 2015, 7:00:17 AM2/27/15
to api-...@googlegroups.com
Hi,

On Fri, Feb 27, 2015 at 12:10 PM, Adrian Lynch
<adrian...@concreteplatform.com> wrote:
> It's hard not to think about the code behind the API when designing, but the
> generic option would be super simple to do!

Cool.

> With the current API we're working on, we've shied away from headers

Ok. Why? Just curious.

> but passing a URI for filters to be applied is interesting too.

Yes, that's an option of course. But also requires "out-of-band" info
to clients and documentation for developers - just like my proposed
"filter-by" and "sort-by" convention. I don't see that as a major
drawback however. It's just good to be aware of trade-offs.

darrel...@gmail.com

unread,
Feb 27, 2015, 10:32:01 AM2/27/15
to api-...@googlegroups.com
Before going ahead with designing a generic filter option, you might want to read this http://bizcoder.com/don-t-design-a-query-string-you-will-one-day-regret  The issues that I draw attention to in my post may not be applicable to your scenario, but I think it is at least worth being aware of the downsides of generic filter query strings before going ahead with it.

Darrel

Kijana Woodard

unread,
Feb 27, 2015, 1:12:49 PM2/27/15
to api-...@googlegroups.com
Great post!

Paul Cohen

unread,
Feb 27, 2015, 1:31:03 PM2/27/15
to api-...@googlegroups.com
Hi,

On Fri, Feb 27, 2015 at 4:28 PM, <darrel...@gmail.com> wrote:
> Before going ahead with designing a generic filter option, you might want to
> read this
> http://bizcoder.com/don-t-design-a-query-string-you-will-one-day-regret The
> issues that I draw attention to in my post may not be applicable to your
> scenario, but I think it is at least worth being aware of the downsides of
> generic filter query strings before going ahead with it.

Caveats are always good to point out. However in my use of query
parameters, e.g.

http://api.example.org/acollectionofstuff?filter-by=size=12,color=blue

the keys/attributes "size" and "color" are properties of the resource
(representation) in question and have nothing to do with the
persistance solution behind the API or how those resource
representations are generated. The keys/attributes are not generic -
but specific to the resource. I realize I may have given another
impression. A client should make no assumption of what persistance
solution is used by the implementation backend. It may be a relational
database system or a key/value database or a document-oriented
database or a completely new übercool hyperindexed petabyte-scale
persistance solution.

I feel the article a) assumes the use of a relational database system
(and its constraints) and b) basically says don't provide resources
(that can be dereferenced with query parameters) for which you can't
guarantee reasonable performance.

Chris Mullins

unread,
Feb 27, 2015, 1:56:16 PM2/27/15
to api-...@googlegroups.com
Filtering a collection? Crazy talk. Who would ever have wanted to do that before? 

Seriously, though, the things to keep front in your mind:
  • Don't invent something new, unless you have genuinely new problems and/or requirements. 
A very thorough, possibly too thorough, set of Collection Filtering syntax and recommendations is found in ODataV4. You can take a look through that, and at least start to get a gauge of the problem space. The relevant section is here. Choosing a relevant subset, yet sticking with the syntax, may be a valid option. For example, you may not need (or want) Custom Functions. 

If you go the OData style, there are numerous libraries that can help you implement this both client and server side. 

Cheers,
Chris

wiki1000

unread,
Feb 27, 2015, 4:02:40 PM2/27/15
to api-...@googlegroups.com
I understand that you want to generate intermediate results in the form of lists of uris.
This can be done with a special kind of filtering, which returns not only a list of uris subset of the list of uris returned from a plain GET, but also a pointer to the dynamically generated set of uris.

and the response would be decorated with a  link header:
Link:<>;rel:xxx-query;type=text/uri-list;accept-post="application/xhtml+xml;profile=filter"

Dereferencing this link (dynamically generated) will give again the same  intermediate result.
But posting a filter to it will give another one.
In this way you can apply several filters successively.
Of course such links must be garbage collected, but a true web server shall be able to do that...

Another point is that the "filter" profile provided by a special POST here can be as simple an urlencoding of a few variables, or a full fledged query language. The point is certainly that it has to be carefully tailored to each case.

Cheers.

darrel...@gmail.com

unread,
Feb 27, 2015, 4:15:57 PM2/27/15
to api-...@googlegroups.com
Paul,

The post does make the assumption your are using a some kind of database to store your data.  I also make the presumption that the index problems of RDBMS that I am familiar with are not completely removed with key-value store databases.  Although, I would be happy to be proved wrong 😊.

If you are explicit in choosing what parts of your domain can be projected/sorted and filtered by your resources then you have a much better chance of managing the performance of your API.  My caveat was for those choosing the “easy” option of allowing anything and then living to regret that decision when their API is successful and they can no-longer support the flexibility initially provided.

It is absolutely true that a client should not make any assumption about the persistence mechanism used behind an API.  However, this article was aimed primarily at API developers and the unfortunate reality is that we must consider performance characteristics when designing an API.  Persistence mechanisms have a significant impact on the performance of APIs no matter how much we wish they didn’t.

There are scenarios where it is not necessary to be able to provide performance guarantees on APIs.  Scenarios where the flexibility is far more important than the performance.  For those cases, my advice is not relevant.

Regards,

Darrel

Sent from Surface

Paul Cohen

unread,
Feb 27, 2015, 6:01:16 PM2/27/15
to api-...@googlegroups.com
Hi,

On Fri, Feb 27, 2015 at 9:59 PM, <darrel...@gmail.com> wrote:
> It is absolutely true that a client should not make any assumption about the persistence mechanism used behind an API. However, this article was aimed primarily at API developers and the unfortunate reality is that we must consider performance characteristics when designing an API. Persistence mechanisms have a significant impact on the performance of APIs no matter how much we wish they didn’t.

I do agree. And your caveat is valid, as my proposal easily could be
interpreted as a general query mechanism and could lead not only to
poor performance, but in particular to a waste of time and resources
on features that no-one really wants.

One of the big challenges with building distributed web API:s, is
figuring out how to best serve clients who have product features and
business processes that you don't know much about in advance.

Kijana Woodard

unread,
Feb 27, 2015, 6:10:52 PM2/27/15
to api-...@googlegroups.com
"...is figuring out how to best serve clients who have product features and business processes that you don't know much about in advance."

Having them complain that "features are missing" is actually a good sign. They *want* to use your service.
Silence is death. :-]

Find some subset of clients and build something really solves a problem for them.

darrel...@gmail.com

unread,
Feb 27, 2015, 8:04:32 PM2/27/15
to api-...@googlegroups.com
Paul,

Your comment hits the nail on the head,

"...is figuring out how to best serve clients who have product features and business processes that you don't know much about in advance."
To continue on from what Kijana said.  Having customers ask for missing features one great way of gathering requirements.  However, for this approach to be successful you need to be able to evolve your API rapidly without introducing breaking changes.  This needs to be planned for. 

Web APIs should start small and grow quickly in response to customer demand.

Darrel

nickdu

unread,
Feb 27, 2015, 8:24:11 PM2/27/15
to api-...@googlegroups.com
Thanks for all the replies.  Let me add some more to this.  When I apply a filter it's likely that the results that match will still require pagination and thus I will still need the same behaviors of a collection, e.g. next/prev.  I'm also likely to want to filter again as the result set might be too large still.  Ideally I don't want to have to remember my current filter but instead just apply a new filter which will only apply to the already filtered list.  And so on and so on.  I don't want it to be like that game, I think it might have been called Simon, where I have to remember all the previous stuff I did in the past and add a new step.  I guess it's more like the "breadcrumbs", similar to how many sites do filtering.

While the approaches suggested thus far seem reasonable enough, I don't want the server to be holding state and potentially have to clean up or garbage collect filters.  Can't the filter come back as state in the representation and I require the client to post that on each next/prev request?  Or the server could copy the supplied filter into "current" query parameters and each time I applied a new filter it would add to those query parameters.

I don't think this assumes any dbms system on the backend, though there very well may be.  Most likely if your lists are significant enough the data is stored in a db.  And if the data is not that significant it wouldn't be hard for the server to provider the filtering on the in-memory lists.  I haven't played with this yet, but .NET has linq which I think provides SQL-like where, and I think orderby, clauses over an enumeration.

My initial thinking was that I could support three query variables: select, where, and orderby.  Each would act like their SQL counterparts.

While I don't want to over complicate things I would also like them to behave accurately.  Do most pagination implementations work by simply asking for the next 'n' items?  If so, the problem with that is if a new item was inserted in front of the items I'm already viewing, they I could see duplications on the next page.  Worse, if items that I've already viewed have been deleted then I might skip over items that would have shown up on the next page.  Filtering and sorting just add a bit more complexity to this problem.

Thanks,
Nick

nickdu

unread,
Feb 27, 2015, 8:37:23 PM2/27/15
to api-...@googlegroups.com
Darrel,

I just read your post and I guess part of what I'm suggesting is what you're suggesting not to do.  That I still have to think about and I understand the issue with allowing "ad-hoc" query capabilities.  However, there is still the question about "successive" filtering and ideally how not to have the server hold the state, and pagination.


Thanks,
Nick

On Thursday, February 26, 2015 at 9:20:30 PM UTC-7, nickdu wrote:

wiki1000

unread,
Feb 28, 2015, 9:59:59 AM2/28/15
to api-...@googlegroups.com
Hello,
I would not be as negative as most of the responses here.
In the context of an api usage by an identified  user, it is definitively a feature to be able to store
somewhere, providing it can be objectified behind a resource, something which may look like a state.

I maintain that, perhaps protected behind an option expressed with a query parameter, it is a feature
to keep a query response behind a uri, including pagination if it is necessary.
This may be very useful in some cases to be determined.

Reply all
Reply to author
Forward
0 new messages