Basic Search Engine

49 views
Skip to first unread message

Nicholas Dudfield

unread,
May 10, 2009, 6:13:29 AM5/10/09
to sqlal...@googlegroups.com
Greetings,

I have been using SQLA for a few months.

For admin CRUD index pages I have been using a naive search_keywords
function as
seen at end of message.

Instead of using a primitive shlex.split, which incidentally is not unicode
friendly, and one crude search_type (AND|OR) I'd like to use something
that will
lex/parse a search string and build the queries.

eg.
((x or z or y) and q) or not "h m"

I imagine this would be a fairly common requirement however I can't seem
to find
any implementation anywhere.

I used google code search with the query "pyparsing sqlalchemy lang:python"
however found no useful results.

Before I set off attempting to learn about lexing/[py]parsing I was
wondering if
any one has some code like this laying about underneath their bed.

Cheers.

================================================================================
=
=
================================================================================

ensure_list = lambda l: l if isinstance(l, list) else [l]

def shlex_split(line):
""" shlex.split does not handle unicode properly so must be codecd """
if isinstance(line, unicode):
line = line.encode('utf-8')
return [ w.strip().decode('utf-8') for w in shlex.split(line) ]

def like_escape(s):
return ( s.replace('\\', '\\\\')
.replace('%', '\\%')
.replace('_', '\\_') )

def search_keywords(q, model, key_words='', fields=[], search_type='and',
default_fields=[], **kw):

if not key_words:
return q

# Escape the search string
# TODO: this should be done by formencode validators
if isinstance(key_words, basestring):
key_words = shlex_split(key_words)

# Space delimited keyword search
key_words = ['%'+ like_escape(w) +'%' for w in key_words]

# Make sure fields is a list and if none specified use default
fields = ensure_list(fields) if fields else default_fields # TODO

# WHERE ($X OR $Y) AND|OR ($Z OR $Q) ...
search_type = and_ if search_type == 'and' else or_

if key_words:
q = q.filter (
search_type ( * (
or_( *( getattr(model, field).like(key_word, escape=r'\\')
for field in fields ) )
for key_word in key_words
)
)
)

return q

================================================================================
=
=
================================================================================

Michael Trier

unread,
May 10, 2009, 11:14:25 AM5/10/09
to sqlal...@googlegroups.com, sqlal...@googlegroups.com
It may not be what you want but have you explored our full text search
support through the use of the match operator?

On May 10, 2009, at 6:13 AM, Nicholas Dudfield <ndud...@gmail.com>
wrote:

Paul McGuire

unread,
May 10, 2009, 6:06:46 PM5/10/09
to sqlalchemy
On May 10, 5:13 am, Nicholas Dudfield <ndudfi...@gmail.com> wrote:
> Greetings,
>
> I have been using SQLA for a few months.
>
> For admin CRUD index pages I have been using a naive search_keywords
> function as
> seen at end of message.
>
> Instead of using a primitive shlex.split, which incidentally is not unicode
> friendly, and one crude search_type (AND|OR) I'd like to use something
> that will
> lex/parse a search string and build the queries.
>
> eg.
>   ((x or z or y) and q) or not "h m"
>
> I imagine this would be a fairly common requirement however I can't seem
> to find
> any implementation anywhere.
>
> I used google code search with the query "pyparsing sqlalchemy lang:python"
> however found no useful results.

Google for "pyparsing query parser" and you'll find some helpful
links:
http://pyparsing.wikispaces.com/file/view/searchparser.py (source is
on the pyparsing wiki, plus it ships with the pyparsing source
distribution)
http://rephrase.net/days/07/04/pyparsing (a nice walkthrough of a
query parser and evaluator)

The O'Reilly shortcut "Getting Started with Pyparsing" (not free, but
extremely well-written) culminates with the development of a search
query parser to search for recipes by querying for matching
ingredients ("noodles and (tuna or ham)", "pineapple and not 'pizza
crust'", etc.).

-- Paul
(author of "Getting Started with Pyparsing")
Message has been deleted

fluence

unread,
May 10, 2009, 8:11:57 PM5/10/09
to sqlalchemy
@Michael

No, I'm about as new to sql as I am to sqlalchemy. I'll have to look
into it. Thanks

@Paul

Thanks, I'll look into that. I had found searchparser.py but was just
wondering if anyone
had already adapted it to work with sqlalchemy querys.


On May 11, 8:21 am, Paul McGuire <pt...@austin.rr.com> wrote:
> On May 10, 5:13 am, Nicholas Dudfield <ndudfi...@gmail.com> wrote:
>
>
>
> > Greetings,
>
> > I have been using SQLA for a few months.
>
> > For admin CRUD index pages I have been using a naive search_keywords
> > function as
> > seen at end of message.
>
> > Instead of using a primitive shlex.split, which incidentally is not unicode
> > friendly, and one crude search_type (AND|OR) I'd like to use something
> > that will
> > lex/parse a search string and build the queries.
>
> > eg.
> >   ((x or z or y) and q) or not "h m"
>
> > I imagine this would be a fairly common requirement however I can't seem
> > to find
> > any implementation anywhere.
>
> > I used google code search with the query "pyparsing sqlalchemy lang:python"
> > however found no useful results.
>
> Google for "pyparsing query parser" and you'll find some helpful
> links:http://pyparsing.wikispaces.com/file/view/searchparser.pyhttp://rephrase.net/days/07/04/pyparsing
>
> The O'Reilly shortcut "Getting Started with Pyparsing" ends with the
> development of a search query parser to search for recipes by querying
> for matching ingredients.

fluence

unread,
May 11, 2009, 7:20:39 AM5/11/09
to sqlalchemy
@Paul

I have been having a play with [py]parsing. What a nifty little
library!

I read those 2 free tutes and liked what I saw so bought a
subscription to safari just so I could read your short cut.

For my purposes (a few k objects at most, generally a few hundred) a
non indexed and inefficient like search will do fine until I work out
the match operator.


Here is what I have so far. Any tips welcome.

#################################### IMPORTS
###################################

# PyParsing
from pyparsing import ( CaselessLiteral, Literal, Word, alphas,
quotedString,
removeQuotes, operatorPrecedence,
ParseException,
stringEnd, opAssoc )

# SqlAlchemy
from sqlalchemy import and_, not_, or_

################################## LIKE ESCAPE
#################################

LIKE_ESCAPE = r'\\'

def like_escape(s):
return '%' + ( s.replace('\\', '\\\\')
.replace('%', '\\%')
.replace('_', '\\_') ) + '%'

############################### REUSABLE ACTIONS
###############################

class UnaryOperation(object):
def __init__(self, t):
self.op, self.a = t[0]

def __repr__(self):
return "%s:(%s)" % (self.name, str(self.a))

def express(self):
return self.operator[0](self.a.express())

class BinaryOperation(object):
def __init__(self, t):
self.op = t[0][1]
self.operands = t[0][0::2]

def __repr__(self):
return "%s:(%s)" % ( self.name,
",".join(str(oper) for oper in
self.operands) )

def express(self):
return self.operator[0](*( oper.express() for oper in
self.operands ))

class SearchAnd(BinaryOperation):
name = 'AND'
operator = [and_]

class SearchOr(BinaryOperation):
name = 'OR'
operator = [or_]

class SearchNot(UnaryOperation):
name = 'NOT'
operator = [not_]

############################### REUSABLE GRAMMARS
##############################

AND_ = CaselessLiteral("and") | Literal('+')
OR_ = CaselessLiteral("or") | Literal('|')
NOT_ = CaselessLiteral("not") | Literal('!')

searchTermMaster = (
Word(alphas) | quotedString.copy().setParseAction
( removeQuotes ) )

########################## THREAD SAFE PARSER FACTORY
##########################

def like_parser(model, fields=[]):
class SearchTerm(object):
def __init__(self, tokens):
self.term = tokens[0]

def express(self):
return or_ (
*( getattr(model, field).like( like_escape(self.term),
escape = LIKE_ESCAPE)
for field in fields )
)

def __repr__(self):
return self.term

searchTerm = searchTermMaster.copy().setParseAction(SearchTerm)

searchExpr = operatorPrecedence( searchTerm,
[ (NOT_, 1, opAssoc.RIGHT, SearchNot),
(AND_, 2, opAssoc.LEFT, SearchAnd),
(OR_, 2, opAssoc.LEFT, SearchOr) ] )

return searchExpr + stringEnd

########################### SEARCH FIELDS LIKE HELPER
##########################

def search_fields_like(s, model, fields):
if isinstance(fields, basestring): fields = [fields]
parser = like_parser(model, fields)
return parser.parseString(s)[0].express()

################################################################################

fluence

unread,
May 11, 2009, 7:28:39 AM5/11/09
to sqlalchemy
I apologise for the formatting. How does one go about posting
snippets inline properly? In the future I think I'll just post links
to pastes.

Paste of the above code: http://pastie.org/474342

Paul McGuire

unread,
May 11, 2009, 9:50:25 PM5/11/09
to sqlalchemy
On May 11, 6:20 am, fluence <ndudfi...@gmail.com> wrote:
> @Paul
>
> I have been having a play with [py]parsing. What a nifty little
> library!
>
> I read those 2 free tutes and liked what I saw so bought a
> subscription to safari just so I could read your short cut.
>

Glad to hear that pyparsing is giving you a jump start! I downloaded
sqlalchemy, and eventually got the parameters straight to call your
code (passing the search string, the "c" attribute of a Table,
followed by a list of column names). Here's what I got:

from sqlalchemy import Table, Column, Integer, String, MetaData,
ForeignKey

metadata = MetaData()
prod_table = Table('product', metadata,
Column('id', Integer, primary_key=True),
Column('name', String),
Column('color', String),
Column('size', String),
Column('material', String),
)

print search_fields_like("RED OR GREEN AND NOT BLUE", prod_table.c,
"color size".split())

Gives:

product.color LIKE :color_1 ESCAPE '\\' OR product.size LIKE :size_1
ESCAPE '\\' OR (product.color LIKE :color_2 ESCAPE '\\' OR
product.size LIKE :size_2 ESCAPE '\\') AND NOT (product.color
LIKE :color_3 ESCAPE '\\' OR product.size LIKE :size_3 ESCAPE '\\')

(Where do the parsed values, like RED, GREEN, and BLUE go?


You may at some point need to go beyond just "Word(alphas)" for search
terms, such as "Word(alphanums)" (words made up of alphas or letters),
or "Word(alphas, alphanums)" (words made up of alphas or letters, but
must start with an alpha).

Since your search string just takes search values, this is what makes
it necessary for you to qualify the call with a list of potential
search columns. This is okay if you are searching fields of an
article (like say, title, subject, abstract, and/or body). But if the
columns are dissimilar, such as fields of the products in a catalog,
then you'll be searching fields like size and color with impossible or
even misleading options ("size like '%RED%'" is a wasted search, but
"color like '%L%'" will return YELLOW items, whether they are Large or
not). For an application like the product catalog, then you could
recast your query grammar to search for boolean combinations of
conditional expressions like "field like value". Then you wouldn't
need the additional parameter listing the fields to search, you can
parse them out of the query string itself. This would also support
conditional tests other than like, such as "price < 200".

In any event, I hope pyparsing will help support your experimenting,
and let you try some different application ideas while pyparsing
offloads some of the dull parsing stuff.

-- Paul

fluence

unread,
May 11, 2009, 11:50:00 PM5/11/09
to sqlalchemy
@Paul

Thanks for the tips. Much appreciated.

>>> In any event, I hope pyparsing will help support your experimenting,

If it weren't for pyparsing I think I'd be up the proverbial creek
without a high level abstraction paddle :) I think it will be prove
very handy in learning parsing concepts.

It's all very fuzzy at the moment but hopefully will become clearer.
I have lots of ideas to practice on in any case.

>>> this is what makes it necessary for you to qualify the call with a list of potential search columns

Yeah, not ideal huh. In my application they are sourced from html
checkboxes on the search form.

>>> (Where do the parsed values, like RED, GREEN, and BLUE go?

SqlAlchemy uses bind params/ placeholders. I'm not sure how you can
actually see the full interpolated SQL generated. Any one out there
know?

>>> You may at some point need to go beyond just "Word(alphas)" s

Unicode too ideally.

>>> "price < 200".

The possibilities! pyparsing ftw!

Thanks again.
Reply all
Reply to author
Forward
0 new messages