query about query strings

288 views
Skip to first unread message

Aaron Morton

unread,
Oct 9, 2011, 10:37:16 PM10/9/11
to Python Sunburnt
Sorry if this has been asked before, but I could not see it.

I'm intrigued by the choice of not passing the query string through to
solr as a query string http://opensource.timetric.com/sunburnt/queryingsolr.html#searching-your-solr-instance

I may be missing something, but it appears to mean to take make things
harder to do a simple search. Was the choice made to support a certain
use case ?

Is there a way to have the query string passed through to solr as the
q param ? I'd like to limit the amount of parsing I need to do of the
user query.

Toby White

unread,
Oct 11, 2011, 9:10:22 AM10/11/11
to python-...@googlegroups.com
Hi Aaron,

in one way or another, this does seem to be becoming an FAQ. It's a
perfectly reasonable question, so let me try and frame an answer to
it.

Sunburnt was written primarily to support programmatic generation of
solr queries. My queries typically are a few hundred characters long
and involve making selections on 4 or 5 different fields
simultaneously with various sorts of boolean logic.

For example, my data has tags, which are also represented in solr;
users are able to search using both tags and free text queries. My
data also has date attributes, and users can search for data newer
than or older than a certain date (or range of dates). Some of my data
is private, and only certain users should be able to see search
results containing that data.

It's really quite hard to construct an appropriate query string which
ANDs or ORs all of the above parameters, all within the confines of
the Lucene query language, including proper escaping for any special
characters.

I certainly don't want to make my users have to understand all the odd
corners of that query syntax. Furthermore, I don't want my users to
have to understand the way I've constructed my data model in order to
construct the query they need.

It seems much nicer to expose various controls on the search
interface, so that the only text input the user has is a free text
string. Then they can click on tags, make appropriate choices of date
ranges etc. The construction of the query string can then be done
server side, by pulling together all the options the user has chosen,
and using the sunburnt API to generate the final query string. For
example:

SI.query(user_input_string).query(tag=tag1).query(tag=tag2).query(private=False).query(date_gt="2011-01-01")

and because each portion of the query parameter is chainable, I can
modularize the code for dealing with multiple options. For an example
of this in action, see:

http://timetric.com/tags/timetric%253Adataset%253Dworldbank/?q=wood

Now, as you (and others) have pointed out, this means that if you want
to expose *limited* amounts of further control over the query string
direct to the user, then it puts an extra load on the developer. You -
as developer - have to parse out any AND, OR, and NOTs that you want
to expose to the user and then put them back in again.

(I'm assuming - correct me if I'm wrong - that what you're really
after is basically free text search on the default field, but with
AND, OR, and NOT included. You're not interested in having the user
control any non-default query attributes)

From my perspective, and for my use case, what I'm doing is not
unreasonable. I need to add extra ANDs and ORs to the query string
anyway, for all the extra attributes. And in order for sunburnt to be
able to do that, I need to pick apart the query string, otherwise I
can't (re-)construct the combined query correctly. So Timetric has
code which does that.

I do see that if you're working on a simpler project, where all you're
after is a straightforward free text search, no extra attributes, then
it looks like overkill having to parse the query yourself, only for
sunburnt to put it back together again.

I'd argue that it gives you more freedom going forward, since you're
now in a position to start exposing queries along other dimensions by
controlling the query parameter through sunburnt. Of course, if you're
never going to do that, that's irrelevant!

I'd also argue that I'm not sure you want to give that full power to
the user in any case - I don't necessarily want to let the user make
all of the queries that I (or my backend) can make. But that may not
apply to everyone.

Something I've been meaning to do for a while is add my query-parsing
code into sunburnt, because it might make people's lives easier in
overcoming this problem. It still wouldn't pass the query string
straight through though.

What you're asking for - passing the query string straight through -
isn't possible currently, though it could be made possible. You'd have
to make sure that if we were taking the query string straight from the
user, that you stop people from chaining extra queries on, because
that won't make sense any more. You might also want to do some sanity
checking of the user-supplied query in case you want to prevent any
queries on the non-default text field, or present likely corrections
to the user (eg misspellings of "AND")

I won't have time for doing it, but I'd be happy to help anyone who
wanted to add a feature like that.

Anyway - I hope that helps answer your question. Sorry for the length!

Toby

Aaron Morton

unread,
Oct 11, 2011, 7:29:40 PM10/11/11
to Python Sunburnt
Thanks for the great explanation, you should put this on the wiki.

My use can is an in house media management app, so we have a limited
set of users and we can train them. And they tend to be power users.

The first thing that stopped me was, like you say, the user entering
"foo bar baz" and I wanted to pass this through to search on the
default
field and see it search using OR. It would be nice if this use case
was
simpler to implement.

I can see where you are coming from with query construction, makes me
think of SqlAlchmey. And I could imagine it been handy for my case if
I presented the user with an "build a query" interface. Or (more
likely)
if I wanted to pass the search text box and construct the query.

IMHO the flexibility to pass through a query and use your query
construction API would be great. Lets say you could do these things:

* pass through a raw solr query string (e.g. q=foo;fq=bar)
* use the query builder to build and execute query
* save the query created by the query builder for later use. Either
as a solr query string and/or a structure that can be used to
rebuild the sunburnt query object model for editing.

Or maybe it's just the way the query("foo bar baz") works.

Thanks for the library, this time round I grabbed solrpy because
I needed to keep moving forward. But I'll try to check back later to
see how you are going.

Cheers
> SI.query(user_input_string).query(tag=tag1).query(tag=tag2).query(private=F alse).query(date_gt="2011-01-01")
> > solr as a query stringhttp://opensource.timetric.com/sunburnt/queryingsolr.html#searching-y...

Michael Lissner

unread,
Nov 27, 2011, 1:24:32 PM11/27/11
to Python Sunburnt
A couple thoughts:
1. This should be posted prominently in the documentation, just as a
warning. People that are new to Solr might not realize that a large
complicated piece of code is needed from them to be able to pass
queries through.
2. Has anybody built such a parser for Sunburnt yet? If so, I'd love
to see it. I'm surprised we don't have a sample of this yet.
3. Toby, if we passed the user's query through to the backend with
something like a .complete_query() function, would that create any
problems? Seems like an easy addition to allow Solr to do the heavy
lifting here.
4. I'm struggling to comprehend what's needed to do this myself. I
think I'd like to support everything that the edismax query parser
supports, but I'm guessing this is much more complicated than it seems
at first. And it feels like re-inventing the wheel, since the edismax
parser already does this.

Mike

Toby White

unread,
Nov 27, 2011, 3:02:24 PM11/27/11
to python-...@googlegroups.com
Hi Michael,

firstly, you can replicate a .complete_query() call very simply, just by doing:

results = si.search(q="my dismax search terms")

(assuming that the /select/ endpoint is configured with a dismax parser)

However, rather than formalizing that pattern in the API straightaway,
can I ask what is probably a very naive question - why do you want to
do this in sunburnt?

What I mean is, the biggest reason I wrote sunburnt (rather than reuse
an existing solr/python library) was precisely in order not to have to
create the query string (or other parameters) by hand; much of the
rest of the code is just scaffolding around that functionality. If
all you need to do is pass a search string straight through to Solr, I
could argue that you don't really need a library to do it; Solr is
such a well-written service you can just call the HTTP endpoint
directly, maybe using solr's own python output format
(http://wiki.apache.org/solr/SolPython).

Now I'm guessing that sunburnt is actually offering you other things
that you want, but it would be really helpful to know what they are.
The better we understand your usecase, the better we can ensure that
any API for direct querying interacts well with other parts of
sunburnt that interest you. I can imagine it might be that you
particularly like sunburnt's parsed results object, or you like the
way that other (non-query) parameters are specified; or maybe it's
something else.

Either way - it would be very helpful to understand why you want to do
this *in sunburnt*.

Toby


On 27 November 2011 18:24, Michael Lissner

Michael Lissner

unread,
Nov 27, 2011, 3:34:40 PM11/27/11
to Python Sunburnt
Thanks for the reply. I really appreciate it.

> firstly, you can replicate a .complete_query() call very simply, just by doing:
> results = si.search(q="my dismax search terms")

Oh! That's so obvious and great. So I can pass through things like the
Solr proximity queries ("cat dog"~3), nested queries ("((foo not bar)
or baz) and bongo)") and negation ("foo -baz") and all the other stuff
through this syntax? If so, that's really great.

> (assuming that the /select/ endpoint is configured with a dismax parser)

Well, the edismax, but yeah.

> can I ask what is probably a very naive question - why do you want to
> do this in sunburnt?

The site I'm building is used primarily by lawyers, who are
ridiculously well-trained searchers - you should see my query logs.
They literally take courses and trainings on how to search, so I
expect them to do some strange queries.

As for why I'm using Sunburnt, essentially it rounds out a lot of the
complexity that I'd have to figure out myself otherwise. Things like
you mentioned: chaining filters, building python objects, getting
facet counts in handy dictionaries, handling connections and common
cases. There's also the benefit of the community that's here too. I
should mention also that I came here after giving up on Haystack. It
got my attention at first, but it drove me nuts with things it was
doing magically (like imports), so I wanted something lighter-weight.
Maybe I need something even more lighter-weight, but Sunburnt seems
like a good compromise between being able to access Solr directly if I
need to, and having convenient APIs most of the time.

Thanks for asking about the use case. It's good to know you're
thinking about how people are using Sunburnt.

Mike

Michael Lissner

unread,
Nov 30, 2011, 2:26:09 AM11/30/11
to Python Sunburnt
I looked into this today:
results = si.search(q="dismax search terms")

It looks like the search function returns a SolrResponse object, which
means that it can't be chained to other functions such as the facet_by
function.

Unless there's a way to chain the facet_by and search functions
together, I don't think I can use the search function, because
otherwise I might be passing a self-parsed query to facet_by and an
unparsed one to search, potentially creating different results.

Or am I missing something? Lots of questions tonight, sorry about
that.

Mike

On Nov 27, 12:34 pm, Michael Lissner <mliss...@michaeljaylissner.com>
wrote:

> ...
>
> read more »

Michael Lissner

unread,
Dec 23, 2011, 1:07:55 AM12/23/11
to Python Sunburnt
Just following up to this message again.

I ended up making a function called raw_query that allowed me to pass
a dict of parameters. I found this to be a huge benefit when
developing my app because I could simply pass whatever complicated
Solr query I needed to directly into Sunburnt.

In a way, this means I'm not using most of Sunburnt's features, but
I still find a lot of its features very useful. Would be great to see
this officially supported.

Mike

On Nov 29, 11:26 pm, Michael Lissner <mliss...@michaeljaylissner.com>
> ...
>
> read more »

Michael Lissner

unread,
Sep 21, 2016, 1:35:16 PM9/21/16
to Python Sunburnt
For anybody else that arrives here, this landed in October 2015 with this PR: https://github.com/tow/sunburnt/pull/80/
Reply all
Reply to author
Forward
0 new messages