Dynamic data model best practice?

51 views
Skip to first unread message

Brendan Duddridge

unread,
Jun 4, 2015, 9:22:47 PM6/4/15
to mobile-c...@googlegroups.com
How can Couchbase deal with dynamic data models?

I have a database app right now using SQLite and FMDB on iOS and Mac which lets the user define their own forms to capture whatever kind of data they need. I have over 20 different types of data they can capture including Text, Notes, Numbers, Audio Recordings, Photos, and relationships to other forms.

The way my database is organized is I have a Form object which has an array of Field objects. A separate object holds the records. Each record can contain one or more value indexed by the field_id. So that way a user can decide at any moment to add or remove any field from their form or add as many forms that they'd like. It's very flexible.

I'm currently trying to convert over from FMDB / SQLite to CouchbaseLite in order to take advantage of your awesome built-in sync mechanism.

My question is, what is the best way to structure my Couch Document given the fact that I don't really have a fixed schema from a user's perspective. So right now if I want to get a value, I first need to know the form, the record, and the field. All those 3 things will give me the value. I currently store the values in a separate table indexed by form, record, and field. But I find that's very inefficient space and query wise. I was hoping that perhaps using CouchbaseLite (and server) would help me to speed that up and also have sync support.

Any pointers would be greatly appreciated.

Oh, and I have done some initial work on writing a conversion routine, so this is what the JSON looks like for an individual record:

{
"01347DCD-9BE7-41AB-8B46-6588F6421FC8": "Brendan Duddridge",
"12BE6A9B-C5F3-4281-A659-DFA787357FD8": "sup...@tapforms.com",
"14AF6617-A724-4C77-A84B-F33496D68519": "H0H 0H0",
"22F6AE40-08A8-43E2-BE0E-E9AD86252C3D": "Duddridge",
"2879D926-CE7A-44EC-BD8D-610E6AB505E0": "Brendan",
"5FF7E2CB-700D-49D5-8723-04DDA79B8AD4": "Calgary",
"6C15F333-DCFA-43DC-B0A8-61EC02085C17": "Canada",
"6D82B007-62AB-46A0-B5DE-ACCBBA154CCF": "",
"6DAD7D2D-EFF8-44F5-9322-A95DE3C8E3D2": "Tap Zapp Software Inc.",
"85316537-6802-437D-A6E4-37E311400D09": "(403) 555-1212",
"91EEC079-3DA3-4F45-B8D2-42F9D3591B43": "10",
"BBEA5EF1-1AB9-4CE0-8F11-9C23D0BE7ADE": "123 Tap Zapp Place",
"C4E83E9F-4B4C-4268-9372-A2114E8CB833": "http://www.tapforms.com",
"DB647E15-9006-4DA5-8E29-F076657FA72C": "AB",
"dateCreated": "2014-09-28 09:04:30 +0000",
"dateModified": "2015-04-11 05:24:33 +0000",
"formId": "715BAC6E-6D39-4B65-9E4E-ECDCD45F591D",
"hasAlarm": "0",
"isFavourite": "0",
"primaryKey": "45A98F59-82CF-45E9-BAF9-ED7B496F960F",
"sortOrder": "0",
"type": "FormEntry"
}

Those UUIDs to the left of the values are the Field UUIDs which are actually the Document IDs for the Field type entries in the CouchbaseLite database.

What I'm not sure about is how to build views on this kind of data in order to efficiently query it. Right now in my current model I have each value in its own row in the database. I have an FTS database I use to search quickly. It's pretty quick, but the problem is the size of the database is enormous for the amount of data stored because for each value I store a bunch of other meta data and essentially each value is stored in the main table as well as the FTS index table so it can get pretty huge. 

I also need to be able to query by any of the fields in the database in a variety of ways. For example, date ranges, numeric ranges, full text queries, empty values, not empty values, possibly all combined. An example might be, give me a list of movies with a rating > 3 that have the word "Terminator" OR "Hunger" in the movie title. Or another might be a search for all the records that do or don't have an "I saw this movie" checkbox selected.

Any help would be greatly appreciated.

Thanks,

Brendan


Jens Alfke

unread,
Jun 5, 2015, 11:53:43 AM6/5/15
to mobile-c...@googlegroups.com
Interesting — so you have sort of a build-your-own-database app, where users can define their own fields and then populate records with them?

The JSON you showed looks reasonable, although it’ll help performance if you can avoid using UUIDs for the field identifiers. The shorter the better. Have you considered making the field identifier be the user-visible name of the field, perhaps with a prefix added to disambiguate it from the built-in fields like “sortOrder”? The only drawback would be that if you allow these fields to be renamed, it would require updating every document containing that field.

I also need to be able to query by any of the fields in the database in a variety of ways. For example, date ranges, numeric ranges, full text queries, empty values, not empty values, possibly all combined.

Couchbase Lite isn’t super good at this. It’s easy to do in SQL because you can piece together a SELECT statement at runtime and the database will evaluate it, and even if the data isn’t indexed for that query it’ll do its best by linear searching. It also knows how to combine results from multiple indexes. Couchbase Lite’s map/reduce system requires that you have an index for what you want to search on, and if you need to merge results from multiple indexes you have to do it by hand. We’d like to improve this in the future (possibly by adopting the N1QL query language) but that doesn’t help you now.

Take a look at the new CBLQueryBuilder class available in 1.1 for iOS/Mac. It’s similar to Core Data’s NSFetchRequest: it lets you define a query using NSPredicate, NSExpression and NSSortDescriptor. Behind the scenes it will create a view to generate the necessary index. If you’re using this for ad-hoc queries, it may end up creating a lot of views, which can reduce performance, so you may want to manually delete them from time to time.

—Jens

Brendan Duddridge

unread,
Jun 5, 2015, 12:48:06 PM6/5/15
to mobile-c...@googlegroups.com
Thanks for the response Jens.

Yes, the app I wrote is called Tap Forms Organizer. www.tapforms.com and you can define your own forms with as many fields as you like and then create records within those forms to populate the data. It's all very dynamic and ad-hoc. Think Bento or FileMaker Pro. In fact, I even have a Bento importer to import native Bento template files into Tap Forms.

I'm hoping that Couchbase will help me with improving performance and also the biggest thing is sync. Right now I sync via generating change files (stored in encrypted SQLite files using SQLCipher) based on the modification date of everything in the database. I upload those to either iCloud Drive or Dropbox, depending on the user's choice of sync services. Then I merge the data from those change files with the data in the local database when the devices receive notification of updated sync files. But it's fraught with complications. I'm really impressed with what you've done with CBL. I am also excited to be able to offer peer to peer syncing, which is something I can't do yet (other than beaming a single record at a time using Multipoint Connectivity).

I can make the field keys based on the names of the fields and even convert them into a more JSON friendly structure. For example, "The Movie Title" becomes "the_movie_title". But yes, customers can rename fields as they wish. They can even have fields with the same name in the form. When Tap Forms adds a new field, it will create a default field name based on the type of field the user chooses. So for example, a Photo field would have the name "Photo" by default. If they added 5 Photo fields, they'd have the same named field 5 times unless they decided to rename them manually. So that's why I chose the primary key / UUID of the field object to use for the key.

I actually currently am using NSPredicate to store user defined queries in the database. I break those down and manually convert them into SQL statements, combining multiple predicates from any compound predicates into SQL INTERSECTION statements to reduce the number of results. I will need to somehow convert those into something usable by CouchbaseLite. I have read about CBLQueryBuilder and it sounds very promising. What I didn't understand about it is that it dynamically creates views for you. Is that right? So I don't have to bother making my own views then for any user generated dynamic queries? The user can also just type in a search term into the search field and it will currently search all Number, Text, and Note fields simultaneously and return the results. Plus there's the more advanced search where they have to specify which fields to query against, combining multiple search rules together which are ANDed (via the INTERSECTION mentioned above). Sorry if I'm being redundant here. I do that sometimes :-)

I will need some built-in views of course for regular things like fetching all the fields for a form or fetching all the saved searches for a form. Or if they delete a field, I will need to delete not only the field, but everywhere the field is referenced. It could be something as simple as nilling out a to-one relationship or just removing the field's key from each and every value where it's referenced from all documents. It also supports linking forms together via a one-to-many and many-to-many relationship. So if you delete a record, I have to remove all references to that record from all documents.

I'm hoping that it will be possible for me to still provide all this using Couchbase. I know it's going to be a challenge. But your sync strategy is just too compelling for me not to give it a shot.

Thanks!

Brendan

Brendan Duddridge

unread,
Jun 7, 2015, 4:25:21 AM6/7/15
to mobile-c...@googlegroups.com
The only drawback would be that if you allow these fields to be renamed, it would require updating every document containing that field.

Is there a built-in function for renaming keys? That would be very handy.

If a user renames a key on device A, and then changes the value for the old key on device B, what would happen? You would essentially want the new key from device A along with the new value from device B. But I don't see how that could be automated unless I had a mapping from the old key to the new key. Either that or it would result in a conflict I presume.

On Friday, June 5, 2015 at 9:53:43 AM UTC-6, Jens Alfke wrote:
Reply all
Reply to author
Forward
0 new messages