Initial Thoughts on Montezuma

Robert Uhl

unread,

Jul 17, 2006, 7:28:57 PM7/17/06

to

Well, I saw the announcement of Montezuma on Planet Lisp (it's a
full-text search engine, a Lisp port of the Ruby port of Lucene), and
tried to install it; unfortunately due to the CLiki outage I wasn't able
to until this morning. But once CLiki was back up & running I very
quickly had it installed.

I've a PostgreSQL database with a tasting notes regarding beers I have
drunk; a Python web interface is available at
<http://latakia.dyndns.org/tasting-notes>. One of my rainy-day projects
is porting this over to Lisp, and so I figured that Montezuma might be
cool to evaluate as a search engine.

The API is mostly well thought-out, although there are a few quibbles I
have. Essentially, you have one or more indices which index a set of
documents, each of which is a set of fields. A field consists of two
strings: a name and a value. Essentially, a document is just a hash
table. When you search the index, you can constrain it to search only
certain fields, or all of them.

It's pretty simple to set up; once it's installed you make an index by
instantiating montezuma:index. There are some keyword arguments, but
it's unclear which of those are actually needed for some tasks or
improve efficiency (for example, what effect does specifying :fields
have?). Indices can be persistent, which is pretty cool.

Adding items ('documents,' in Montezuma's parlance) is fairly easy; you
can create them as a simple list of conses (representing field-text
pairs), or you can go whole-hog and create a document, add fields to it,
then add the document to the index.

Searching returns a score and the number of a document; given the number
you can return the fields you want. It's not clear what the score
is--it's not a 0..1 range, but higher scores are better.

My sample size was not over-large, but searches seemed to be quite
speedy.

Sub-word searching (e.g. returning 'eggdrop' on a search for 'egg')
doesn't appear to be implemented, although it's possible that I'm
missing an option.

All in all, for a 0.1.1 release Montezuma is pretty cool; there's a lot
of potential there. If your project needs this type of search
capability, it's worth taking a look.

--
Robert Uhl <http://public.xdi.org/=ruhl>
I believe in life, and I also believe in love, but the world in which
I live in keeps trying to prove me wrong. --P. Weller

Robert Uhl

unread,

Jul 17, 2006, 10:29:41 PM7/17/06

to

Robert Uhl <eadm...@NOSPAMgmail.com> writes:
>
> Sub-word searching (e.g. returning 'eggdrop' on a search for 'egg')
> doesn't appear to be implemented, although it's possible that I'm
> missing an option.

I missed that it supports wildcards; 'egg*' would return 'egg,' 'eggs' &
'eggdrop'; 'egg?' would return 'egg' and 'eggs.'

--
Robert Uhl <http://public.xdi.org/=ruhl>

I've never understood how anyone can like showers--my experience has
always been that in a shower the water always gets into your Gin&Tonic,
something that only occasionally troubles me while relaxing in the bath.
--Tanuki

jjwi...@gmail.com

unread,

Jul 18, 2006, 6:02:30 PM7/18/06

to

Hi, Robert.

Robert Uhl wrote:

> The API is mostly well thought-out, although there are a few
> quibbles I have.
>

> It's pretty simple to set up; once it's installed you make an index
> by instantiating montezuma:index. There are some keyword arguments,
> but it's unclear which of those are actually needed for some tasks
> or improve efficiency (for example, what effect does specifying
> :fields have?).

My goal for the first release was to make it work (more or less). In
subsequent releases I'd like to polish the API, and then add
documentation.

This isn't ideal, but since Montezuma is a port of Ferret it can be
useful to look at the Ferret code/docs for a point of reference. For
example, there's a pretty good description[1] of many of the options
that can be used when creating an index, and it's even mostly relevant
to the Lisp version.

(The :fields option lets you specify which fields will be searched by
default by a query that doesn't otherwise specify a field.)

> Adding items ('documents,' in Montezuma's parlance) is fairly easy;
> you can create them as a simple list of conses (representing
> field-text pairs), or you can go whole-hog and create a document,
> add fields to it, then add the document to the index.

Right, for convenience you can use an association list or a hash
table, or for full control you can create a document object.

> All in all, for a 0.1.1 release Montezuma is pretty cool; there's a lot
> of potential there.

Thank you.

John

[1]
http://ferret.davebalmain.com/trac/browser/trunk/lib/ferret/index/index.rb

Tin Gherdanarra

unread,

Jul 21, 2006, 11:46:44 AM7/21/06

to

Robert Uhl wrote:
> Well, I saw the announcement of Montezuma on Planet Lisp (it's a
> full-text search engine, a Lisp port of the Ruby port of Lucene), and
> tried to install it; unfortunately due to the CLiki outage I wasn't able
> to until this morning. But once CLiki was back up & running I very
> quickly had it installed.
>

Sounds pretty cool. I think the name is a little unfortunate,
because it invites all sorts of tasteless puns
http://en.wikipedia.org/wiki/Montezuma%27s_Revenge_%28medicine%29

--
Lisp kann nicht kratzen, denn Lisp ist fluessig