in one way or another, this does seem to be becoming an FAQ. It's a
perfectly reasonable question, so let me try and frame an answer to
it.
Sunburnt was written primarily to support programmatic generation of
solr queries. My queries typically are a few hundred characters long
and involve making selections on 4 or 5 different fields
simultaneously with various sorts of boolean logic.
For example, my data has tags, which are also represented in solr;
users are able to search using both tags and free text queries. My
data also has date attributes, and users can search for data newer
than or older than a certain date (or range of dates). Some of my data
is private, and only certain users should be able to see search
results containing that data.
It's really quite hard to construct an appropriate query string which
ANDs or ORs all of the above parameters, all within the confines of
the Lucene query language, including proper escaping for any special
characters.
I certainly don't want to make my users have to understand all the odd
corners of that query syntax. Furthermore, I don't want my users to
have to understand the way I've constructed my data model in order to
construct the query they need.
It seems much nicer to expose various controls on the search
interface, so that the only text input the user has is a free text
string. Then they can click on tags, make appropriate choices of date
ranges etc. The construction of the query string can then be done
server side, by pulling together all the options the user has chosen,
and using the sunburnt API to generate the final query string. For
example:
SI.query(user_input_string).query(tag=tag1).query(tag=tag2).query(private=False).query(date_gt="2011-01-01")
and because each portion of the query parameter is chainable, I can
modularize the code for dealing with multiple options. For an example
of this in action, see:
http://timetric.com/tags/timetric%253Adataset%253Dworldbank/?q=wood
Now, as you (and others) have pointed out, this means that if you want
to expose *limited* amounts of further control over the query string
direct to the user, then it puts an extra load on the developer. You -
as developer - have to parse out any AND, OR, and NOTs that you want
to expose to the user and then put them back in again.
(I'm assuming - correct me if I'm wrong - that what you're really
after is basically free text search on the default field, but with
AND, OR, and NOT included. You're not interested in having the user
control any non-default query attributes)
From my perspective, and for my use case, what I'm doing is not
unreasonable. I need to add extra ANDs and ORs to the query string
anyway, for all the extra attributes. And in order for sunburnt to be
able to do that, I need to pick apart the query string, otherwise I
can't (re-)construct the combined query correctly. So Timetric has
code which does that.
I do see that if you're working on a simpler project, where all you're
after is a straightforward free text search, no extra attributes, then
it looks like overkill having to parse the query yourself, only for
sunburnt to put it back together again.
I'd argue that it gives you more freedom going forward, since you're
now in a position to start exposing queries along other dimensions by
controlling the query parameter through sunburnt. Of course, if you're
never going to do that, that's irrelevant!
I'd also argue that I'm not sure you want to give that full power to
the user in any case - I don't necessarily want to let the user make
all of the queries that I (or my backend) can make. But that may not
apply to everyone.
Something I've been meaning to do for a while is add my query-parsing
code into sunburnt, because it might make people's lives easier in
overcoming this problem. It still wouldn't pass the query string
straight through though.
What you're asking for - passing the query string straight through -
isn't possible currently, though it could be made possible. You'd have
to make sure that if we were taking the query string straight from the
user, that you stop people from chaining extra queries on, because
that won't make sense any more. You might also want to do some sanity
checking of the user-supplied query in case you want to prevent any
queries on the non-default text field, or present likely corrections
to the user (eg misspellings of "AND")
I won't have time for doing it, but I'd be happy to help anyone who
wanted to add a feature like that.
Anyway - I hope that helps answer your question. Sorry for the length!
Toby
Mike
firstly, you can replicate a .complete_query() call very simply, just by doing:
results = si.search(q="my dismax search terms")
(assuming that the /select/ endpoint is configured with a dismax parser)
However, rather than formalizing that pattern in the API straightaway,
can I ask what is probably a very naive question - why do you want to
do this in sunburnt?
What I mean is, the biggest reason I wrote sunburnt (rather than reuse
an existing solr/python library) was precisely in order not to have to
create the query string (or other parameters) by hand; much of the
rest of the code is just scaffolding around that functionality. If
all you need to do is pass a search string straight through to Solr, I
could argue that you don't really need a library to do it; Solr is
such a well-written service you can just call the HTTP endpoint
directly, maybe using solr's own python output format
(http://wiki.apache.org/solr/SolPython).
Now I'm guessing that sunburnt is actually offering you other things
that you want, but it would be really helpful to know what they are.
The better we understand your usecase, the better we can ensure that
any API for direct querying interacts well with other parts of
sunburnt that interest you. I can imagine it might be that you
particularly like sunburnt's parsed results object, or you like the
way that other (non-query) parameters are specified; or maybe it's
something else.
Either way - it would be very helpful to understand why you want to do
this *in sunburnt*.
Toby
On 27 November 2011 18:24, Michael Lissner
> firstly, you can replicate a .complete_query() call very simply, just by doing:
> results = si.search(q="my dismax search terms")
Oh! That's so obvious and great. So I can pass through things like the
Solr proximity queries ("cat dog"~3), nested queries ("((foo not bar)
or baz) and bongo)") and negation ("foo -baz") and all the other stuff
through this syntax? If so, that's really great.
> (assuming that the /select/ endpoint is configured with a dismax parser)
Well, the edismax, but yeah.
> can I ask what is probably a very naive question - why do you want to
> do this in sunburnt?
The site I'm building is used primarily by lawyers, who are
ridiculously well-trained searchers - you should see my query logs.
They literally take courses and trainings on how to search, so I
expect them to do some strange queries.
As for why I'm using Sunburnt, essentially it rounds out a lot of the
complexity that I'd have to figure out myself otherwise. Things like
you mentioned: chaining filters, building python objects, getting
facet counts in handy dictionaries, handling connections and common
cases. There's also the benefit of the community that's here too. I
should mention also that I came here after giving up on Haystack. It
got my attention at first, but it drove me nuts with things it was
doing magically (like imports), so I wanted something lighter-weight.
Maybe I need something even more lighter-weight, but Sunburnt seems
like a good compromise between being able to access Solr directly if I
need to, and having convenient APIs most of the time.
Thanks for asking about the use case. It's good to know you're
thinking about how people are using Sunburnt.
Mike
It looks like the search function returns a SolrResponse object, which
means that it can't be chained to other functions such as the facet_by
function.
Unless there's a way to chain the facet_by and search functions
together, I don't think I can use the search function, because
otherwise I might be passing a self-parsed query to facet_by and an
unparsed one to search, potentially creating different results.
Or am I missing something? Lots of questions tonight, sorry about
that.
Mike
On Nov 27, 12:34 pm, Michael Lissner <mliss...@michaeljaylissner.com>
wrote:
> ...
>
> read more »