autoassign index values?

28 views
Skip to first unread message

Jonathan Rosen

unread,
Jul 28, 2012, 4:03:41 PM7/28/12
to thri...@googlegroups.com
Is there a way to auto-assign index values in schemas?

Andres Morey

unread,
Jul 28, 2012, 7:01:40 PM7/28/12
to thri...@googlegroups.com, thri...@googlegroups.com
No - you assign the values yourself which comes in handy later when you want to add/delete/modify attributes.

Andres

Jonathan Rosen

unread,
Aug 9, 2012, 10:43:00 PM8/9/12
to thri...@googlegroups.com
I think it's questionable how handy this is. If you want to avoid overwriting a previous item, you have to first check whether an item at that id exists. Much like autoincrement in rdbms, it is useful sometimes to just add an object, and have the id returned by the db. I think this would be a useful feature, as sometimes you might not care what the id value is. If you return the id when you add, you can still add/delete/modify.

Andres Morey

unread,
Aug 10, 2012, 7:19:46 AM8/10/12
to thri...@googlegroups.com, thri...@googlegroups.com
The drawback to autoincremented index ids is that you have less control over the schema. A spelling error or 'extra' data in an object will trigger a new attribute. New attributes will also trigger new full text index fields which will decrease write performance and increase memory usage.

ThriftDB is optimized for enabling search across a set of objects with a uniform schema. If you're trying to store arbitrary JSON objects MongoDB might be more appropriate.

Andres Morey

unread,
Aug 10, 2012, 7:31:41 AM8/10/12
to Andres Morey, thri...@googlegroups.com
Sorry, I misunderstood your question. I thought you were referring to autoincrementing the attribute index id ('thrift_index') in schemas.

Implementing auto assigned '_id' attributes is a good idea... except it's  actually tricky to implement on a horizontally scalable architecture.

Are you trying to use ThriftDB as a primary datastore or as a secondary datastore for search?

Jonathan Rosen

unread,
Aug 10, 2012, 2:46:02 PM8/10/12
to thri...@googlegroups.com, Andres Morey
I'd like to use ThriftDB as a primary datastore. Can you explain some of the advantages/disadvantages to doing this? Octopart is a similar use case to what I'm trying to build. Does Octopart only use ThriftDB for search purposes?

Andres Morey

unread,
Aug 10, 2012, 2:56:19 PM8/10/12
to thri...@googlegroups.com
At Octopart we keep our data normalized in a RDMS and use ThriftDB to power the frontend. The flexibility and feature set of a RDMS is very convenient for day-to-day data processing.

Jonathan Rosen

unread,
Aug 10, 2012, 3:40:54 PM8/10/12
to thri...@googlegroups.com
One of the things that attracted me to ThriftDB was the flexibility of updating the schema. I understand that transactional data should be kept in RDBMS. However, for example, a catalog of products (a la Octopart), where those products may have an undetermined number of attributes across the data set, is ThriftDB not appropriate? The ability to easily update the schema in a multi-dimensional way to adjust for future unknowns is very appealing.

It's hard for me to see a compelling use case for a key-value store that interfaces with an rdbms solely for the purposes of search. One of the main advantages of doc databases is that you can store your objects as you would in code, so read and write is fast and easy. If I have to create an RDBMS, and then map that to ThriftDB, where am I gaining any leverage? It is easy enough to search an RDBMS once the structure is created...the problem as I see it is the "impedance mismatch" between code and database, and having to create the relational structure in the first place.

I guess I'm missing the value prop of ThriftDB if it can't be relied on as a primary datastore. I see that it has the potential to simplify search across individual database elements, but search isn't the main problem that needs solving in an rdbms.

Andres Morey

unread,
Aug 10, 2012, 4:38:23 PM8/10/12
to thri...@googlegroups.com
The advantage of using an RDMS as your primary datastore is that you can keep your data highly normalized and access useful features such as autoincrement primary keys and full table scans which are essential for performing data management functions. RDMS's have some full-text search capabilities but don' support more advanced search modes (e.g. full-text search on nested objects, faceting).

Eventually ThriftDB will be useful as a primary datastore but it will take time to implement many features that developers take for granted in RDMS's (e.g. autoincrement primary keys).
Reply all
Reply to author
Forward
0 new messages