ThriftDB - Why not ElasticSearch or Riak Search? Why didn't existing services meet Octopart's search needs?

543 views
Skip to first unread message

andres

unread,
Jun 5, 2011, 9:09:41 PM6/5/11
to ThriftDB
This post is in response to a question on convore:
https://convore.com/hacker-news/thriftdb-why-not-elasticsearch-or-riak_search/

ThriftDB shares many features with ElasticSearch and Riak Search as
well as several other products that have launched recently (e.g.
IndexTank, Cloudant's CouchDB search) so I understand why people are
curious about why we didn't use them for Octopart.

The reason why is because we were solving a different problem. At
Octopart, we have 15M parts in our database and our schema changes
frequently so we wanted a way to add/delete/modify object attributes
on the fly without taking down the site or reindexing data.

The problem we found with the current generation search solutions is
that notion of "schema-less" JSON is a bit of a misnomer. JSON
encapsulates the object schema so you end up storing your schema on a
per-document basis. That means you still have to iterate through all
of your data to make schema changes.

When we came across Facebook's Thrift protocol were impressed with
their approach to flexible schemas and we realized that we could use
it to solve some of our own development bottlenecks. Unlike JSON,
Thrift allows you to store the document schema independently of the
serialized data. That makes it much easier to maintain a consistent
schema across all of your objects.

ThriftDB is essentially a wrapper around a key-value store and a
search index that makes it possible to modify your schema on-the-fly.
We use Thrift to maintain object schemas independently of the data
stored in the key-value store and the search index. We're actually
using Solr internally as a search index but you could imagine
switching out Solr for ElasticSearch or a similar engine.

Hopefully that explains why we didn't just use ElasticSearch or
something similar. I don't want anybody to think that we've invented a
new full-text indexing method. We just needed an abstraction on top of
the search index/database to get rid of some development bottlenecks
we encountered.

That being said, this conversation is about why we didn't use
ElasticSearch internally. Another question to ask is what makes hosted
ThriftDB different from ElasticSearch, et al. Besides the fact that
it's on-demand and cloud-based, another benefit to ThriftDB is that it
has a security model so you can use it on the open internet.
ElasticSearch and Riak Search are meant to be run on private networks.
For example, the webapp at hnearch.com sends ajax requests directly to
ThriftDB.

Hope that helps! Let me know if you have any other questions.

Andres
Reply all
Reply to author
Forward
0 new messages