Re: Migrating a high volume MongoDB prototype to Couchbase; problems with views/indexes

95 views
Skip to first unread message

Tom Green

unread,
Apr 11, 2014, 7:20:45 AM4/11/14
to couc...@googlegroups.com
Hi Niels,

I’m a solutions engineer at Couchbase, based in the UK.
I’ll reach out to you directly so that we can work with you on getting a good design that’s well suited to the Couchbase architecture.

I just wanted to highlight straight away that although Couchbase provides these locking mechanisms which can be used to provide safe concurrent access, they are not providing a ‘transaction’ in the sense you might be familiar with from an RDBMS perspective.

Even with the locking in place, updates are atomic on a per document basis only.

Choosing the right data model will be key to making this work well for you, and it’s something we can look at in conjunction with the team here.

Cheers,
Tom

Solutions Engineer, UK/EMEA
Couchbase





Hello all,

As I am new to Couchbase I've been making some mistakes while trying to migrate a system from MongoDB to Couchbase, particularly with regards to the design of views/indexes. I hope someone can help.
This is quite a long post, sorry for that; I wanted to present a reasonably whole picture of the context, problem and my own considerations thus far.

System context
--------------
The problem domain consists of just two objects: an Account and Transaction. In relational terms an Account has 0..N Transactions. The system handles up to a thousand of inserts and retrievals per minute and should have the potential to grow to a capacity of tens of thousands of inserts and retrievals per minute. I do not know exact numbers but my guess is the inserts/retrieval-ratio is fairly even (50/50).

Current state
--------------
Currently the system is implemented in Oracle DB and I've been prototyping a MongoDB version the last couple of weeks. This prototype is successfull and has received positive feedback. However, the company I work for has selected Couchbase due to its mobile data synchronization. From what I've read Couchbase should be able to scale to a level that would satisfy my use case as well. As a result I've been trying to migrate the MongoDB prototype to Couchbase thereby limiting the number of database types that we need to support within the company.

Document design
--------------
Simplified, a transaction has five interesting properties (I use abbreviations to make them better recognizable):
- TX_ID: a unique identifier defined by the client system. 
- TX_NR: a transaction number which is is a counter starting from 1 and incremented by 1 for each new transaction _per_ account (i.e. the first transaction of any new account will have TX_NR = 1).
- TX_MUT: a balance mutation value.
- TX_ACC_ID: the account identifier (see below)
- TX_BAL: the (historical) account balance at the time the transaction was processed.

An account has two interesting properties;
- ACC_ID: a unique identifier that is defined by the client system. I selected this as the document key.
- ACC_BAL: the balance contains the available funds for the account. In MongoDB I cannot do a Transaction insert and Account update atomically within one transaction (and MongoDB support no locking out-of-the-box) so in the current MongoDB prototype this ACC_BAL property is virtual and in reality retrieved by fetching the TX_BAL from the last transaction).

The documents contain quite a bit more data but I don't think that matters for the purpose of this discussion.

Problem
--------------
When I tried to apply my MongoDB design to Couchbase I got strong suspicions I wasn't following best practices, particularly with regard to views. Views are eventual consistent by default, I suspect for performance reasons. 
In my use case multiple transactions should be processable in batch, asynchronously or synchronously. So concurrency issues are a concern (and have caused us plently of problems even in a RDBMS (Oracle DB) due to faulty application implementation). Also, the room for data integrity issues is almost non-existent; the data should be correct.

Data access patterns
--------------
First scenario: retrieving a (paginated) subset of an account transaction list
In the design of MongoDB I used the TX_ACC_ID reference within the Transaction as the sharding key and as a result all queries in normal use cases would go to a single shard. In Couchbase I have no control over where my data ends up and retrieving a list of transactions (a common use case) I'd need to do a multi-get I believe. Given the issues with couchbase views (scatter/gather queries, eventual consistent, cumbersome to update as part of automated CI (and perhaps not very performant?)) I prefer not to use them. My thought was to make document keys predictable and not use TX_ID as part of the object key but instead use "ACC_ID + TX_NR" as the Transaction document key. This allows me to do a fetch of say 10 records and skipping the first 30 without additional lookups.

However, as part of the response I also need to return the total number of Transactions within the account. Because I can do locking in Couchbase (I'm considering pessimistic locking here) I can maintain a value such ACC_LAST_TX_NR within the Account document and update it each time when inserting a new transaction.

Second scenario: inserting a transaction
In MongoDB I've limited the insert Transaction to creating a new document and that's it. Several indexes make sure the document is available in a strong consistent manner. To do the actual insert I lookup the TX_BAL and TX_NR from the last known transaction, retrieve the Account, increment TX_NR and insert the new transaction. If in the meantime another Transaction was persisted to MongoDB a unique index on TX_NR will prevent inserting the new transaction (this unique index basically acts as an optimistic lock).
In Couchbase the query "get last transaction for specified account" would require a view that is strongly consistent. That is problematic. However, because Couchbase supports locking I can update the balance within the Account like I used to do in Oracle DB (the MongoDB approach was more of a workaround anyway). Inserting a transaction in Couchbase would involve three steps:
1. Retrieve Account and take lock
2. Insert Transaction with TX_NR = ACC_LAST_TX_NR + 1
3. Update the Account with the new ACC_BAL and set ACC_LAST_TX_NR + 1

Questions
--------------
1. When inserting a new transaction, how do I guarantee the TX_ID is unique? The TX_ID is determined by the client system and prevents duplicate transaction processing. The document key is already used by "ACC_ID + TX_NR" to do easy lookups. Couchbase does not support unique indexes, nor do I suspect updating a view for this purpose is sufficiently performant.
2. I could not find a resource describing how updates work, and I do know some databases handle them quite poorly in certain circumstances (MongoDB reserves some additional space, but if you grow your document it needs to be moved). Some say you're better of deleting the old document and inserting a new one is better in some situations. While MongoDB is insert-only, Couchbase would become insert and update heavy. Has this a significant impact on how it will behave?
3. I've been using Mongoose (an ODM) with Node.js for my MongoDB integration. For Couchbase I didn't find a mature and well-documented ODM thus far and I am considering rolling my own minimalistic ODM untill I can replace it with something else. Given the future with N1QL I don't want to "buy in" into any ODM that may be ill supported (a common theme for Node.js modules... *sigh*). If anyone has any tips on this, much appreciated.

One solution I was considering for my first problem was maintaining a map of "TX_NR => TX_ID" within the Account and effectively maintain my own index in each Account. The proposed insert TX scenario retrieves the Account in step (1); checking for duplicates would be a breeze and in step (3) I can update the Account index. This approach does introduce some scalability concers. currently Accounts may have several hundreds of transactions and may even grow to thousands or even tens of thousands of transactions in the future. Putting all those values in a single Account document doesn't feel right (lots of data transfer, processing, and potentially huge documents).

Sorry for the long post. Currently the prototype in MongoDB works and performs well, but having to support just one type of database would be much simpler (and compared to Couchbase the MongDB infrastructure is quite complex).

As I am new to Couchbase I might be making quite a few wrong assumptions about Couchbase. 

I hope someone can help.
Regards,
Niels
Always Scalable, Always On NoSQL


Tom Green
Solutions Engineer
Phone: +447590 57 1818
Skype: tomgreen000
Email: t...@couchbase.com

Reply all
Reply to author
Forward
0 new messages