So what happens if I change a schema?

14 views
Skip to first unread message

Richard Rodger

unread,
Jun 7, 2011, 5:28:51 PM6/7/11
to ThriftDB
Hi Guys,

Let's say I PUT a given schema to a collection.

Then I add a few items.

Then later I PUT a new schema, let's say with a new attribute.

What happens?

Do all subsequent requests for older items get a default value?

What happen's I PUT a completely different schema?

Richard

andres

unread,
Jun 7, 2011, 5:50:01 PM6/7/11
to ThriftDB
ThriftDB is designed to handle schema changes gracefully. Here are a
few common scenarios:

- Add a new attribute
If you upload a schema with a new attribute, all items (include old
ones) will gain that attribute as well. The default value for any
attribute is null.

- Remove an attribute
If you upload a new schema with an attribute removed, then all items
(including old ones) will lose that attribute as well. You won't be
able to search on the attribute anymore either.

- Change an attribute name
If you upload a new schema with an attribute name changed, then all
items (including old ones) will have that attribute name changed as
well. You can filter on the new attribute seamlessly.

These common scenarios are handled very well by ThriftDB. However, you
will run into problems if you change a datatype (e.g. DateTimeType ->
IntegerType) so be careful with that.

ThriftDB uses the Thrift serialization protocol to implement the
flexible schema feature so it might be helpful to read up on that in
more detail (http://thrift.apache.org/).

Andres

Richard Rodger

unread,
Jun 7, 2011, 7:24:13 PM6/7/11
to thri...@googlegroups.com
Hi Andres,

Sounds good, and of course I wasn't expecting "magic" :)

How do the serialization rules help disambiguate behavior wrt to schema changes?
Section 5 of the whitepaper seems the most relevant - it would be useful to understand how the cases relate to the thriftdb service.

thanks,
Richard

Richard Rodger
CEO, Chartaca.com
ric...@chartaca.com
@rjrodger
+353 87 6827135

andres

unread,
Jun 7, 2011, 8:28:58 PM6/7/11
to ThriftDB
The Thrift software suite consists of the serialization protocol plus
some other features designed to make it easy to implement cross-
language services. Much of section 5 assumes that the client and the
server both have copies of the schema but in the case of ThriftDB,
there's only one schema definition so most of the special cases don't
apply.

When you add an object to ThriftDB, we use the current schema to
serialize it to disk. If the object has extra attributes that aren't
defined in the schema they just get ignored. If the object is missing
attributes defined in the schema, then they get added to the object
and set to null.

When you request an object from ThriftDB we use the current schema to
deserialize it from disk. If the object is missing attributes that are
defined in the schema then they get added to the object and set to
null.

Thrift is basically using unique integer ids ('thrift_index') to avoid
hard-coding names. That's what allows it to change attribute names on
the fly.

Hope that helps. Let me know where/if I can be more specific.

Andres
> rich...@chartaca.com
> @rjrodger
> +353 87 6827135
Reply all
Reply to author
Forward
0 new messages