WiredTiger showing lots of transactions rolled back

Dai Shi

unread,

Oct 1, 2015, 7:40:57 PM10/1/15

to mongodb-user

Hi, first time posting here.

We've been converting our fleet over to WiredTiger, and it has mostly been smooth. However, I am noticing that db.serverStatus().wiredTiger.transaction shows many transactions rolled back. "transactions rolled back" / "transaction begins" is sometimes as high as 65%. We haven't noticed any issues from our application, but this seems concerning. A few questions I'm hoping some of you might be able to answer:

- Has anyone else experienced this?
- Is this normal, or am I right to be concerned?
- Is there any way to tell what transactions were rolled back and why? I don't see any messages in the logs

Here is an example output:

db.serverStatus().wiredTiger.transaction
{
    "transaction begins" : 1492108654,
    "transaction checkpoints" : 7486,
    "transaction checkpoint generation" : 7486,
    "transaction checkpoint currently running" : 0,
    "transaction checkpoint max time (msecs)" : 11141,
    "transaction checkpoint min time (msecs)" : 5,
    "transaction checkpoint most recent time (msecs)" : 6806,
    "transaction checkpoint total time (msecs)" : 46095554,
    "transactions committed" : 552630338,
    "transaction failures due to cache overflow" : 0,
    "transaction range of IDs currently pinned by a checkpoint" : 0,
    "transaction range of IDs currently pinned" : 0,
    "transactions rolled back" : 939433194
}

We are running v3.0.6.

MARK CALLAGHAN

unread,

Oct 1, 2015, 10:10:38 PM10/1/15

to mongod...@googlegroups.com

Is this caused by write-write conflict detection in WiredTiger?
Has the write-write conflict detection been explained in public?

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/93e1450c-319a-4096-b044-d70ca1e7227c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Mark Callaghan
mdca...@gmail.com

michael...@10gen.com

unread,

Oct 2, 2015, 1:37:50 AM10/2/15

to mongodb-user

There's a more prosaic explanation for this one...

WiredTiger used to have a strict blanket rule that if there was any error during a transaction, the transaction must be rolled back. There were some rare MongoDB operations that "failed" in some circumstances -- I can't remember the specifics, but it was something along the lines of attempting to open a cursor on a non-existent table, which isn't an unreasonable thing to do.

Rather than track whether every query had hit one of these errors to figure out whether to commit or roll back, MongoDB 3.0 always rolls back queries. So the numbers you see in the statistics reflect updates (commits) vs queries (rolled back) to a first approximation.

We have since relaxed WiredTiger so that committing a read-only transaction that hit an error is permitted, but MongoDB master has not been updated to take advantage of the change.

Mark is correct that concurrent updates to the same document can conflict and require operations to retry. MongoDB takes care of this transparently, retrying with back-off, you can see a count of write conflicts reported in the MongoDB log. These retries will be included in the "rolled back" count, but unless there is something pathological going on, the query count will dominate.

Michael.

Rhys Campbell

unread,

Oct 2, 2015, 3:56:58 AM10/2/15

to mongodb-user

Just to chip in.. I see this too. No associated issues known.

MARK CALLAGHAN

unread,

Oct 2, 2015, 10:54:17 AM10/2/15

to mongod...@googlegroups.com

I need to ask Igor how MongoRocks does this. He probably told me at least once.

Does WT use optimistic concurrency control for findAndModify? So that...
* concurrent clients can each read a matching document at time T, but nothing is locked internally
* client A commits the change at time T+1
* client B attempts to commit the change at time T+2, internally WT detects a write from client A between times T & T+2 --> rollback

On Fri, Oct 2, 2015 at 12:56 AM, Rhys Campbell <rhys.jame...@gmail.com> wrote:

Just to chip in.. I see this too. No associated issues known.

--

You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.

To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/34f419f0-a1bb-489b-a1b4-06b485c2583b%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Mark Callaghan
mdca...@gmail.com

Ernest Staszuk

unread,

Oct 3, 2015, 10:38:57 PM10/3/15

to mongodb-user

We encountered problems with WiredTiger this week. It filled all CPU cores.

Failing scenario:
* There was plenty of documents updated from time to time to preserve statistics about requests for data which was not present in our data source (so we might for example make up missing entries)
* One of our data sources is degenerated (has little entries and many requests)
* This generated massive amount of updates to a single document
* Which caused one machine to freak out (all cores occupied by a MongoDB shard)
* Live-lock like condition was observed. System has been very busy, but progress was not observable.

We suspect that because WiredTiger implements optimistic locking on writes this caused tight loop and updates of this one document filled all CPU cores. But that leaded to lots of repetitions due to optimistic locking conflicts. Propably without any sleeping nor timeout (I would like to get confirmation on that).

The question is - would it be possible to implement heuristic which would make this condition less like to happen?

Alexander Gorrod

unread,

Oct 5, 2015, 6:52:14 PM10/5/15

to mongodb-user

On Saturday, October 3, 2015 at 12:54:17 AM UTC+10, MarkCallaghan wrote:

Does WT use optimistic concurrency control for findAndModify? So that...
* concurrent clients can each read a matching document at time T, but nothing is locked internally
* client A commits the change at time T+1
* client B attempts to commit the change at time T+2, internally WT detects a write from client A between times T & T+2 --> rollback

Yes - that is correct. It is the reason why having a multi-threaded workload updating a single document in parallel will result in lots of write-conflicts. If you are running a workload that primarily updates a single document from many threads at once, mmapv1 may be a better choice of storage engine in MongoDB for now.

Alexander Gorrod

unread,

Oct 5, 2015, 6:58:47 PM10/5/15

to mongodb-user

Hi Ernest,

On Sunday, October 4, 2015 at 1:38:57 PM UTC+11, Ernest Staszuk wrote:

Failing scenario:
* There was plenty of documents updated from time to time to preserve statistics about requests for data which was not present in our data source (so we might for example make up missing entries)
* One of our data sources is degenerated (has little entries and many requests)
* This generated massive amount of updates to a single document
* Which caused one machine to freak out (all cores occupied by a MongoDB shard)
* Live-lock like condition was observed. System has been very busy, but progress was not observable.

We suspect that because WiredTiger implements optimistic locking on writes this caused tight loop and updates of this one document filled all CPU cores. But that leaded to lots of repetitions due to optimistic locking conflicts. Propably without any sleeping nor timeout (I would like to get confirmation on that).

MongoDB has heuristics to avoid such tight loops, it will back off retrying operations if it notices that there are a lot of conflicts. The symptoms you are describing sound unexpected - are you running the latest released version of MongoDB? If so could you open a JIRA ticket (https://jira.mongodb.org) in the SERVER category describing your system configuration and details about the issue.

Reply all

Reply to author

Forward