MongoDB for a chemical property search engine

524 views
Skip to first unread message

Loïc d'Anterroches

unread,
May 29, 2010, 5:02:34 AM5/29/10
to mongodb-user
Hello,

just a short message to announce Cheméo, http://www.chemeo.com a
search engine for chemical properties built on top of MongoDB. I wrote
down a fairly extensive explanation of the tools and software used,
included quite some MongoDB tips here:
http://chemeo.com/doc/technology

A short summary for the most interesting part with respect to MongoDB,
is that I need to index a large number of properties for each chemical
component with search by min/max value ranges. This resulted in
troubles with respect to indexing (70+ properties at the moment and it
will grow). So at the end I followed the advices of MongoDB fathers
and created for each component with a special key for indexing. Here
is a copy/paste of the relevant part of the post:

{i: [
{k: 'myprop',
n: 10.1, // Min value
x: 123.0}, // Max value
{k: 'otherprop',
n: -1234.1,
x: 254.0},
// you can add more properties
]}

Then you need to think about the query. It will always need to know
about the key and then min or max or both. So we need two indexes:

* Index 1 on ('i.k', 'i.n', 'i.x'), which can also be used to
search on the key only and the key plus the min value;
* Index 2 on ('i.k', 'i.x'), which can be use to search on the key
plus the max value.

This means that now, when looking for a component with the $all and
$elemMatch operators, you will always hit the indexes, yeah! But then,
a guy will do a search which will translate to something like that:

{ i: { $all: { [{ $elemMatch: { k: "mw", n: { $lte: 400.0 } } },
{ $elemMatch: { k: "tc", x: { $gte: 500.0 } } },
{ $elemMatch: { k: "hf", n: { $lte: 500.0 }, x:
{ $gte: 50.0 } } }
] } } }

And your server will fall, because mw is the molecular weight and
Mongo will take the first hit in the $all query and then do a standard
scan for the other properties without using the index. In that case,
even if we have only 50 components matching the hf value, if mw
provides 50,000 components, Mongo will scan 50,000 components. Oups,
the wrong part of the index is used. You need to know your data to
order your query the best way to correctly hit your index.

The rest of the post http://chemeo.com/doc/technology is talking about
things like Node.js/Python/Open Babel/pyparselet and the other tools
used.

The indexing stuff is really what took me some time to get right as I
am coming from the RDBMS world.

loïc

--
Indefero, project management & code hosting - http://www.indefero.net
Pluf PHP5 Framework inspired by Django - http://www.pluf.org
Cheméo, high quality chemical properties - http://www.chemeo.com


nwhite

unread,
May 29, 2010, 5:20:37 AM5/29/10
to mongod...@googlegroups.com
simply beautiful. Your explanation of mongo and how to think about indexes (with real world examples) is probably the best article on the subject I have seen to date.

Again well done!



--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.


Loïc d'Anterroches

unread,
May 29, 2010, 6:40:11 AM5/29/10
to mongodb-user
On May 29, 11:20 am, nwhite <changereal...@gmail.com> wrote:
> simply beautiful. Your explanation of mongo and how to think about indexes
> (with real world examples) is probably the best article on the subject I
> have seen to date.
>
> Again well done!

Thank you Nathan, I am happy it can help somebody with the switch from
RDBMS thinking to document store thinking.

loïc
> > The rest of the posthttp://chemeo.com/doc/technologyis talking about
> > things like Node.js/Python/Open Babel/pyparselet and the other tools
> > used.
>
> > The indexing stuff is really what took me some time to get right as I
> > am coming from the RDBMS world.
>
> > loïc
>
> > --
> > Indefero, project management & code hosting -http://www.indefero.net
> > Pluf PHP5 Framework inspired by Django -http://www.pluf.org
> > Cheméo, high quality chemical properties -http://www.chemeo.com
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "mongodb-user" group.
> > To post to this group, send email to mongod...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > mongodb-user...@googlegroups.com<mongodb-user%2Bunsu...@googlegroups.com>
> > .

Kyle Banker

unread,
May 29, 2010, 9:36:49 AM5/29/10
to mongod...@googlegroups.com
Very nice writeup. The indexing example would make a great addition to the MongoDB cookbook:

If you'd like to add, feel free to fork the project. Otherwise, I can do a writeup and give you credit.

To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.

Loïc d'Anterroches

unread,
May 30, 2010, 3:48:07 AM5/30/10
to mongodb-user


On May 29, 3:36 pm, Kyle Banker <k...@10gen.com> wrote:
> Very nice writeup. The indexing example would make a great addition to the
> MongoDB cookbook:http://cookbook.mongodb.org/
>
> If you'd like to add, feel free to fork the project. Otherwise, I can do a
> writeup and give you credit.

Thank you Kyle. I am on the way for a one week conference to present
Cheméo, so I will do that when coming back.

loïc
> > > > The rest of the posthttp://chemeo.com/doc/technologyistalking about
> > > > things like Node.js/Python/Open Babel/pyparselet and the other tools
> > > > used.
>
> > > > The indexing stuff is really what took me some time to get right as I
> > > > am coming from the RDBMS world.
>
> > > > loïc
>
> > > > --
> > > > Indefero, project management & code hosting -http://www.indefero.net
> > > > Pluf PHP5 Framework inspired by Django -http://www.pluf.org
> > > > Cheméo, high quality chemical properties -http://www.chemeo.com
>
> > > > --
> > > > You received this message because you are subscribed to the Google
> > Groups
> > > > "mongodb-user" group.
> > > > To post to this group, send email to mongod...@googlegroups.com.
> > > > To unsubscribe from this group, send email to
> > > > mongodb-user...@googlegroups.com<mongodb-user%2Bunsu...@googlegroups.com>
> > <mongodb-user%2Bunsu...@googlegroups.com<mongodb-user%252Buns...@googlegroups.com>

GVP

unread,
May 30, 2010, 3:58:58 PM5/30/10
to mongodb-user
Loïc , this product and write-up is absolutely awesome.

Serious props, not just as a MongoDB user, but as someone who respects
how much effort and work goes into something like chemeo.
> > > > > mongodb-user...@googlegroups.com<mongodb-user%2Bunsubscribe@google groups.com>
> > > <mongodb-user%2Bunsu...@googlegroups.com<mongodb-user%252Bunsubscribe@g ooglegroups.com>
>
> > > > > .
> > > > > For more options, visit this group at
> > > > >http://groups.google.com/group/mongodb-user?hl=en.
>
> > > --
> > > You received this message because you are subscribed to the Google Groups
> > > "mongodb-user" group.
> > > To post to this group, send email to mongod...@googlegroups.com.
> > > To unsubscribe from this group, send email to
> > > mongodb-user...@googlegroups.com<mongodb-user%2Bunsubscribe@google groups.com>

Mathias Stearn

unread,
May 30, 2010, 2:30:04 PM5/30/10
to mongodb-user
You may want to vote for this case:
http://jira.mongodb.org/browse/SERVER-1000. We have a few ideas on how
to optimize $all, be we haven't had time to implement them yet.

> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.

Meghan Gill

unread,
Jun 1, 2010, 10:33:41 AM6/1/10
to mongod...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages