Mongodb storage engine requirements.

34 views
Skip to first unread message

aayushg...@gmail.com

unread,
Jan 20, 2017, 5:00:31 PM1/20/17
to mongodb-dev
Hi,

I have a few queries regarding the pluggable storage engine requirements.

1. The pluggable storage engine README states that Storage Engines must ensure atomicity and snapshot isolation guarantees across all record stores. Is this needed for all the record stores for a specific collection or across all collections?

2. Am I right in understanding that all operations within a recovery unit will pertain to a single collection or are there cases where operations can span collections?

Thanks,
Aayush

Geert Bosch

unread,
Jan 20, 2017, 5:57:38 PM1/20/17
to mongo...@googlegroups.com
On Jan 20, 2017, at 4:46 PM, aayushg...@gmail.com wrote:

1. The pluggable storage engine README states that Storage Engines must ensure atomicity and snapshot isolation guarantees across all record stores. Is this needed for all the record stores for a specific collection or across all collections?
This is indeed needed for all record stores across all collections. A very common case includes operations that write to the "oplog" collection in the "local" database as well as the actual collection being updated. Also, at least for sharded clusters, the config server needs to update multiple collections in a single transaction.


2. Am I right in understanding that all operations within a recovery unit will pertain to a single collection or are there cases where operations can span collections?

The latter. These cases will likely become more common over time. We essentially try to push the complexity of concurrency control to the storage engine, so it can use whatever methods it wants to ensure this consistency, whether MVCC, locking or a combination of both.

Regards,

  -Geert

aayushg...@gmail.com

unread,
Jan 21, 2017, 9:49:13 AM1/21/17
to mongodb-dev

Thanks a lot for the clarifications, Geert! A quick followup below.

On Friday, January 20, 2017 at 2:57:38 PM UTC-8, Geert Bosch wrote:

On Jan 20, 2017, at 4:46 PM, aayushg...@gmail.com wrote:

1. The pluggable storage engine README states that Storage Engines must ensure atomicity and snapshot isolation guarantees across all record stores. Is this needed for all the record stores for a specific collection or across all collections?
This is indeed needed for all record stores across all collections. A very common case includes operations that write to the "oplog" collection in the "local" database as well as the actual collection being updated. Also, at least for sharded clusters, the config server needs to update multiple collections in a single transaction.

Just to clarify, the case where the config server updates multiple collections in a single transaction, it is guaranteed that all these operations are on the same shard aka no distributed transactions across servers.

2. Am I right in understanding that all operations within a recovery unit will pertain to a single collection or are there cases where operations can span collections?

The latter. These cases will likely become more common over time. We essentially try to push the complexity of concurrency control to the storage engine, so it can use whatever methods it wants to ensure this consistency, whether MVCC, locking or a combination of both.

Regards,

  -Geert

Dan Pasette

unread,
Jan 21, 2017, 2:14:19 PM1/21/17
to mongo...@googlegroups.com
On Sat, Jan 21, 2017 at 9:49 AM, <aayushg...@gmail.com> wrote:

Thanks a lot for the clarifications, Geert! A quick followup below.

On Friday, January 20, 2017 at 2:57:38 PM UTC-8, Geert Bosch wrote:

On Jan 20, 2017, at 4:46 PM, aayushg...@gmail.com wrote:

1. The pluggable storage engine README states that Storage Engines must ensure atomicity and snapshot isolation guarantees across all record stores. Is this needed for all the record stores for a specific collection or across all collections?
This is indeed needed for all record stores across all collections. A very common case includes operations that write to the "oplog" collection in the "local" database as well as the actual collection being updated. Also, at least for sharded clusters, the config server needs to update multiple collections in a single transaction.

Just to clarify, the case where the config server updates multiple collections in a single transaction, it is guaranteed that all these operations are on the same shard aka no distributed transactions across servers.

Correct.  These are updates to config data which is stored in a single replica set.  No distributed transactions.
 

--
You received this message because you are subscribed to the Google Groups "mongodb-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-dev+unsubscribe@googlegroups.com.
To post to this group, send email to mongo...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-dev.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-dev/51ea42a6-1a96-4a40-8476-71494ef7040d%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages