Issue1: search API

4 views
Skip to first unread message

Piotr Husiatyński

unread,
Jan 2, 2010, 5:03:19 AM1/2/10
to aur2-dev
I'm trying to solve issue1 from github (http://github.com/sebnow/aur2/
issues#issue/1). Here's some prototype code, using django piston
application: http://github.com/husio/aur2/commits/why_not_piston

And here are some thoughts of mine:
* we need better specification for this API
* supporting more that one method to do the same thing is bad
* piston is not good enough for what we want to write
* indexer would be nice. Maybe xapian?

Is github issue official project tracker?

Sebastian Nowicki

unread,
Jan 2, 2010, 6:01:55 AM1/2/10
to aur2...@googlegroups.com
Hi,

On 02/01/2010, at 6:03 PM, Piotr Husiatyński wrote:

> I'm trying to solve issue1 from github (http://github.com/sebnow/aur2/
> issues#issue/1). Here's some prototype code, using django piston
> application: http://github.com/husio/aur2/commits/why_not_piston

Chris expressed interest in it earlier, perhaps you should speak with
him to see if he made any progress.

> And here are some thoughts of mine:
> * we need better specification for this API

The specification is still in the draft stage and can be altered by
anyone on the ArchWiki[1]. If you have any ideas feel free to discuss
or edit the wiki. I think it's quite comprehensive, but it was just a
brainstorm, and I haven't really tried to implement it yet. Flaws are
very likely.

> * supporting more that one method to do the same thing is bad

Where are multiple methods used to do the same thing? Do you mean the
API vs HTML? In that case I agree, and it would be interesting to
somehow get the API interface to also output HTML when used in an
appropriate context. Otherwise, could you please clarify?

> * piston is not good enough for what we want to write

It might not be. It does seem to be the most popular RESTful API
framework for Django though, so it was the first choice. Skimming the
documentation it seemed like it was flexible enough to basically do
anything. What problems are there with it exactly?

> * indexer would be nice. Maybe xapian?

I'm not familiar with xapian, but I think it might be overkill. We
don't store a lot of text, so processing the little text (descriptions
are the longest, and they are <80 characters by convention) we have is
not a problem. It might be useful for convenience. If it's a
performance thing we would need to fill the database with lots of data
first, and then profile it. I might be missing the point of Xapian
completely though.

> Is github issue official project tracker?

No, but it can be used temporarily until AUR2 is used officially
(might take a while). There is currently no better place to hold
issues and we need to track them. I think the Github API might allow
us to port these issues elsewhere in the future.

Nice to see more contributions!

[1]: http://wiki.archlinux.org/index.php/AUR_2#Draft

Piotr Husiatyński

unread,
Jan 2, 2010, 7:04:15 AM1/2/10
to aur2-dev
On Jan 2, 12:01 pm, Sebastian Nowicki <seb...@gmail.com> wrote:
> Hi,
>
> On 02/01/2010, at 6:03 PM, Piotr Husiatyński wrote:
> > * supporting more that one method to do the same thing is bad
>
> Where are multiple methods used to do the same thing? Do you mean the  
> API vs HTML? In that case I agree, and it would be interesting to  
> somehow get the API interface to also output HTML when used in an  
> appropriate context. Otherwise, could you please clarify?

This one is related to api draft: there's GET params version and url
parse + GET params one. I'll write my own version and post it on
wiki :)

> > * piston is not good enough for what we want to write
>
> It might not be. It does seem to be the most popular RESTful API  
> framework for Django though, so it was the first choice. Skimming the  
> documentation it seemed like it was flexible enough to basically do  
> anything. What problems are there with it exactly?

I'm using it for the first time. I've read docs, even try to make some
changes in the code. But if we want to send values with other names
than the default model instance attributes, we need to write special
emitter class or create property alias for each single attribute.
Will `api` support more functionality that simple 'filter packages',
'get package info' and 'upload package'? If not, it's very easy to
write this without using piston.

> > * indexer would be nice. Maybe xapian?
>
> I'm not familiar with xapian, but I think it might be overkill. We  
> don't store a lot of text, so processing the little text (descriptions  
> are the longest, and they are <80 characters by convention) we have is  
> not a problem. It might be useful for convenience. If it's a  
> performance thing we would need to fill the database with lots of data  
> first, and then profile it. I might be missing the point of Xapian  
> completely though.

Indexing is not only about full text search. Look at aur application
models. If we want to find package by license name or author name, or
any other "related" field attribute, SQL query might be slow (and it
would be nice to search by all available fields by default). And after
fetching that package, we will once again need to fetch all related
objects. But, using indexer and memcached will speed up the search and
results generation.

In my work, we're using solr for indexing, and except few places, we
don't use database for searching, because it's too slow for
complicated queries.

Sebastian Nowicki

unread,
Jan 2, 2010, 9:52:47 AM1/2/10
to aur2...@googlegroups.com

On 02/01/2010, at 8:04 PM, Piotr Husiatyński wrote:

> On Jan 2, 12:01 pm, Sebastian Nowicki <seb...@gmail.com> wrote:
>> Hi,
>>
>> On 02/01/2010, at 6:03 PM, Piotr Husiatyński wrote:
>>> * supporting more that one method to do the same thing is bad
>>
>> Where are multiple methods used to do the same thing? Do you mean the
>> API vs HTML? In that case I agree, and it would be interesting to
>> somehow get the API interface to also output HTML when used in an
>> appropriate context. Otherwise, could you please clarify?
>
> This one is related to api draft: there's GET params version and url
> parse + GET params one. I'll write my own version and post it on
> wiki :)

You mean `GET /api/package/<pkgname>` and `GET /api/packages?
query=pkgname`? They serve completely different purposes. The former
is to retrieve a single package (no searching). The latter is a more
complex query allowing to get a list of packages matching it. There is
a little bit of overlap, but not much. I'm interested to hear your
ideas on making it more consistent though. It would be easy to make
the first one just `GET /api/packages?query=pkgname`, but I'd prefer
to stay clear of query strings. I'm not sure if it would be possible
to give the same amount of flexibility by using "url parsing" for
searching.


>
>>> * piston is not good enough for what we want to write
>>
>> It might not be. It does seem to be the most popular RESTful API
>> framework for Django though, so it was the first choice. Skimming the
>> documentation it seemed like it was flexible enough to basically do
>> anything. What problems are there with it exactly?
>
> I'm using it for the first time. I've read docs, even try to make some
> changes in the code. But if we want to send values with other names
> than the default model instance attributes, we need to write special
> emitter class or create property alias for each single attribute.
> Will `api` support more functionality that simple 'filter packages',
> 'get package info' and 'upload package'? If not, it's very easy to
> write this without using piston.

I doubt it. Most of it will be basic CRUD, which Django Piston was
supposed to make trivial. The only feature that would deviate from it
would be the package search, as more complex queries need to be
executed.

Don't stick too close to the draft. It's still a draft and, as I
already stated, it's likely to be flawed. If you can think of a better
way to do something (even better key names), suggest it.

>
>>> * indexer would be nice. Maybe xapian?
>>
>> I'm not familiar with xapian, but I think it might be overkill. We
>> don't store a lot of text, so processing the little text
>> (descriptions
>> are the longest, and they are <80 characters by convention) we have
>> is
>> not a problem. It might be useful for convenience. If it's a
>> performance thing we would need to fill the database with lots of
>> data
>> first, and then profile it. I might be missing the point of Xapian
>> completely though.
>
> Indexing is not only about full text search. Look at aur application
> models. If we want to find package by license name or author name, or
> any other "related" field attribute, SQL query might be slow (and it
> would be nice to search by all available fields by default). And after
> fetching that package, we will once again need to fetch all related
> objects. But, using indexer and memcached will speed up the search and
> results generation.
>
> In my work, we're using solr for indexing, and except few places, we
> don't use database for searching, because it's too slow for
> complicated queries.

Perhaps it would be useful. I'll look into it further, but for now
using Django queries should be good enough. I'd prefer to get a nice
base completed and released and then add fancy features, otherwise
we'll never get a release out. This is why I said the API should at
least rival the current AUR API. Implementing the ideal API can come
later, especially the package searching stuff, as that's the most
complex.

On a side note, maybe

Piotr Husiatyński

unread,
Jan 2, 2010, 11:51:05 AM1/2/10
to aur2-dev
On Jan 2, 3:52 pm, Sebastian Nowicki <seb...@gmail.com> wrote:
> On 02/01/2010, at 8:04 PM, Piotr Husiatyński wrote:
>
> > On Jan 2, 12:01 pm, Sebastian Nowicki <seb...@gmail.com> wrote:
> >> Hi,
>
> >> On 02/01/2010, at 6:03 PM, Piotr Husiatyński wrote:
> >>> * supporting more that one method to do the same thing is bad
>
> >> Where are multiple methods used to do the same thing? Do you mean the
> >> API vs HTML? In that case I agree, and it would be interesting to
> >> somehow get the API interface to also output HTML when used in an
> >> appropriate context. Otherwise, could you please clarify?
>
> > This one is related to api draft: there's GET params version and url
> > parse + GET params one. I'll write my own version and post it on
> > wiki :)
>
> You mean `GET /api/package/<pkgname>` and `GET /api/packages?
> query=pkgname`? They serve completely different purposes. The former  
> is to retrieve a single package (no searching). The latter is a more  
> complex query allowing to get a list of packages matching it. There is  
> a little bit of overlap, but not much. I'm interested to hear your  
> ideas on making it more consistent though. It would be easy to make  
> the first one just `GET /api/packages?query=pkgname`, but I'd prefer  
> to stay clear of query strings. I'm not sure if it would be possible  
> to give the same amount of flexibility by using "url parsing" for  
> searching.

Now it makes sense ;)

It all depends on how complicated queries could be. Should user be
able to specify either he want to use AND or OR to join query
parameters? How about time ranges or basic regular expression support?
It's all just features not required by site to work, but it would be
nice to implement them in near future without api rewrite.

Maybe developers of the aur helpers could write what they are missing
in current api, and if requests won't be too complicated to implement,
we will add it to aur2? Don't know if such topic allready exists.

It of course all depends on how high the traffic would be. Maybe using
indexer won't be needed. But I understand that you want to focus on
pushing the code live.

Sebastian Nowicki

unread,
Jan 2, 2010, 1:07:50 PM1/2/10
to aur2...@googlegroups.com

At the moment I just want to support basic text matching for the
package name/description and the various options supported by the web
frontend (repository, last updated, sort by, sort order, search by).
Later I would like to support queries akin to Google's like "foobar
maintainer:monty sort-by:votes." This would be supported in the web
frontend search as well, getting rid of that huge search form. Like
with google, I think it would be nice to be able to use AND/OR in the
query string: "maintainer:foo OR maintainer:bar". I'm not entirely
sure how useful this flexibility would be, since searches have been
quite simple from my experience.

>
> Maybe developers of the aur helpers could write what they are missing
> in current api, and if requests won't be too complicated to implement,
> we will add it to aur2? Don't know if such topic allready exists.


I contacted the developer of yaourt previously, and amended the draft
to include his suggestions (I think all of them), but it would be nice
to contact others as well, especially since a bunch of them have
sprung up lately, and they use the RPC.

Reply all
Reply to author
Forward
0 new messages