best NoSQL for building an inverted index

jenna_s

unread,

Apr 17, 2012, 3:01:28 AM4/17/12

to NOSQL

Hi,

I'm working on a small project where I need to build an inverted index
and then run similarity algorithms on it based on a user query - basic
information retrieval. Is there one NoSQL product that stands out when
it comes to building & searching inverted indices?

Thanks so much,
J

Itamar Syn-Hershko

unread,

Apr 17, 2012, 11:14:43 AM4/17/12

to nosql-di...@googlegroups.com

RavenDB builds on Lucene and provides all this out of the box - produces an inverted index from properties you define and supports suggestions out of the box

But why would you need a NoSQL database? just use Lucene directly

--
You received this message because you are subscribed to the Google Groups "NOSQL" group.
To post to this group, send email to nosql-di...@googlegroups.com.
To unsubscribe from this group, send email to nosql-discussi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nosql-discussion?hl=en.

Emin Gun Sirer

unread,

Apr 17, 2012, 11:30:25 AM4/17/12

to nosql-di...@googlegroups.com

Hi J,

I assume that your data is too large scale to fit on a single host, so you cannot use a local database or inverted index. I urge you to look at HyperDex, which has a unique SEARCH primitive that can perform the kind of information retrieval you mentioned efficiently, while also supporting sharding and scaling out. The url is: http://www.hyperdex.org

- egs

Eric Bloch

unread,

Apr 17, 2012, 11:41:33 AM4/17/12

to nosql-di...@googlegroups.com

MarkLogic is based on inverted indexes - every document inserted or edited (XML, JSON, text, etc) has its full text indexed in real-time (no subsequent indexing phase). http://community.marklogic.com - free academic (and express) licenses available. From "Inside MarkLogic" :

When people think of MarkLogic they often think of its text search capabilities. The
founding team has a deep background in search: Chris Lindblad was the architect of the

Ultraseek Server, while Paul Pederson was the VP of Enterprise Search at Google.
MarkLogic supports numerous search features including word and phrase search, boolean
search, proximity, wildcarding, stemming, tokenization, decompounding, case-sensitivity

options, punctuation-sensitivity options, diacritic-sensitivity options, document quality
settings, numerous relevance algorithms, individual term weighting, topic clustering,
faceted navigation, custom-indexed fields, and more.

It gets used for all sorts of IR and text analysis/analytics. It was designed and architected from scratch and does not rely on some version of Lucene being bolted on later.

There's an easy to use and powerful REST API for it available at http://github.com/marklogic/Corona as well

Eric

--
Eric Bloch
2305 Forest View Avenue, Hillsborough CA 94010
Email: eric....@gmail.com
Web page: http://www.virginia-avenue.com/
Phone: 650-339-0376

Konstantin Osipov

unread,

Apr 17, 2012, 11:48:42 AM4/17/12

to nosql-di...@googlegroups.com

* jenna_s <jenna....@gmail.com> [12/04/17 19:16]:

sphinx?

--
http://tarantool.org - an efficient, extensible in-memory data store

David Bayliss

unread,

Apr 17, 2012, 2:59:46 PM4/17/12

to nosql-di...@googlegroups.com

If you want rather more control over your inverted index (and/or ever want to scale to bigger data) - you might want to check out HPCC Systems: http://hpccsystems.com/

It gives complete control over the index build (and usage) process.

I give an example of how to build an inverted index using it here: http://www.dabhand.org/ECL/construct_a_simple_bible_search.htm

hth

David

Martin Bruse

unread,

Apr 17, 2012, 4:55:15 PM4/17/12

to nosql-di...@googlegroups.com

Check out elasticsearch as well, lots of people use it simply as a
linearly scalable and fully indexed database.

Based on Lucene and with all its power but also operationally simple
and scales out really well.

//Martin

> --
> You received this message because you are subscribed to the Google Groups
> "NOSQL" group.

> To view this discussion on the web visit
> https://groups.google.com/d/msg/nosql-discussion/-/wmIYIXk5OrUJ.

jenna_s

unread,

Apr 17, 2012, 7:37:23 PM4/17/12

to NOSQL

Thanks everyone. I didn't expect that many replies and so many
different replies. I guess I should start looking at each one.

A friend has suggested that I use Redis, given that my user interface
in implemented in Rails. Does anyone have experience with Redis and
inverted indices and why I should/should not use Redis VS Lucene/
Hyperdex/MarkLogic/Sphinx/HPCC/ElasticSearch?

Thanks,
J

Brian Bulkowski

unread,

Apr 17, 2012, 8:07:04 PM4/17/12

to nosql-di...@googlegroups.com

Jenna ---

Those two choices are apples and oranges.

Redis is a simple, fast key store that allows you to build your own
reverse index.

Lucene/Hyperdex/MarkLogic/Sphinx/... *ARE* text oriented reverse indexes.

Choose whichever you need!

Good luck!
-brian

>> it comes to building& searching inverted indices?
>>
>> Thanks so much,
>> J

Ran Tavory

unread,

Apr 18, 2012, 12:24:04 AM4/18/12

to nosql-di...@googlegroups.com

But wait, there's more...
Solr is lucene wrapped as a web service.
Solandra is a Solr implementation over Cassandra for exceptional high scale.
For high scale, Solr Cloud is also worth a check.

Ran

> --
> You received this message because you are subscribed to the Google Groups "NOSQL" group.

jenna_s

unread,

Apr 18, 2012, 1:25:12 AM4/18/12

to NOSQL

After reading about all these technologies, I realized I didn't ask
the right question: what's the best NoSQL to roll your own inverted
index? My project is not only about IR. The algorithms I need to
implement are based on other data that's in addition to text indexing.

I do appreciate everyone's answers.

On Apr 17, 12:01 am, jenna_s <jenna.sim...@gmail.com> wrote:

John D. Mitchell

unread,

Apr 18, 2012, 1:34:00 AM4/18/12

to nosql-di...@googlegroups.com

On Apr 17, 2012, at 22:25 , jenna_s wrote:
[...]

> After reading about all these technologies, I realized I didn't ask
> the right question: what's the best NoSQL to roll your own inverted
> index? My project is not only about IR. The algorithms I need to
> implement are based on other data that's in addition to text indexing.

If you're doing this to learn and have a fair bit of exploration into composing various data structures into custom solutions, Redis is quite nice for that since it gives you building blocks that are easy to use.

The bigger, commercial "databases" (nosql or not) all have their own, baked-in approaches to dealing with things so it's more of a question of whether they fit your needs or not based on how/what they implement under the hood.

So, without knowing a lot more about what you're trying to do, it's hard to say much more.

Have fun,
John

Reply all

Reply to author

Forward