Update problem with getLastError (shardRawGLE)

81 views
Skip to first unread message

florian0927

unread,
Apr 23, 2012, 4:46:27 AM4/23/12
to mongodb-user
Hi all,

We are using sharding + replica set configuration and recently we are
facing a strange error.

When all servers are online, everything is OK. However, if one Master
of a replica set goes offline for a short time (there was some
operations during it's offline), it is reselected as the master of the
shard as soon as it becomes online.
After that, the problem comes :
In the log of our application, we see that the getlasterror of update
operations gives sometime a shardRawGLE and with "n" : 0 (But the
update is somehow done before, without log.)
As the getlasterror returns n = 0, our application condiers that there
was an error of this update, and it retries the same update operation
with same conditions. For these retries, we have a different
getLastError in the log, showing that updatedExisting is false. (which
means that the update is done by the first try ?)

------------------
Here's the code of updates and getLastErrors :
m_pDBConnection->GetConnection()->update(tableName, m_objQuery,
m_objFields, false, false);
m_pDBConnection->GetConnection()->runCommand(m_pDBConnection-
>GetDBName(), BSON("getlasterror" << 1 << "w" << "majority" <<
"wtimeout" << 10000), objError);
int iRows = objError.getIntField("n");
if (iRows > 0)
{
if( pcRowsAffected )
*pcRowsAffected = iRows;
EDPTRACE_DBG_INF(*********); //Application Log for a normal execution
}
else
{
if( pcRowsAffected )
*pcRowsAffected = 0;
EDPTRACE_DBG_ERR(**********); //Application Log showing that an error
occurs
}
--------------------
Here's the two different getLastErrors:
1) First try :
Error description : { "shards" :
[ "Shard1/10.1.6.130:2222,10.1.64.207:2222",
"Shard2/10.1.64.102:2223,10.1.64.207:2223",
"Shard3/10.1.6.130:2224,10.1.64.102:2224,10.1.64.207:2224" ],
"shardRawGLE" : { "Shard1/10.1.6.130:2222,10.1.64.207:2222" : { "n" :
0, "lastOp" : 0, "connectionId" : 52600, "wnote" : "no write has been
done on this connection", "wtime" : 0, "err" : null, "ok" : 1 },
"Shard2/10.1.64.102:2223,10.1.64.207:2223" : { "n" : 0, "lastOp" : 0,
"connectionId" : 20070, "wnote" : "no write has been done on this
connection", "wtime" : 0, "err" : null, "ok" : 1 },
"Shard3/10.1.6.130:2224,10.1.64.102:2224,10.1.64.207:2224" :
{ "writeback" : { "$oid" : "4f9459054657a7c59703e612" },
"instanceIdent" : "LY-YANG:2224", "n" : 0, "lastOp" : 0,
"connectionId" : 58, "wnote" : "no write has been done on this
connection", "wtime" : 0, "err" : null, "ok" : 1 } }, "n" : 0, "err" :
null, "ok" : 1 }
2) All tries after the first :
Error description : { "singleShard" :
"Shard3/10.1.6.130:2224,10.1.64.102:2224,10.1.64.207:2224",
"updatedExisting" : false, "n" : 0, "lastOp" : 5734306103559192599,
"connectionId" : 51, "err" : null, "ok" : 1, "writeback" : { "$oid" :
"4f9459194657a7c59703e614" }, "instanceIdent" : "LY-YANG:2224",
"wnote" : "no write has been done on this connection", "wtime" : 0,
"writebackGLE" : { "singleShard" :
"Shard3/10.1.6.130:2224,10.1.64.102:2224,10.1.64.207:2224",
"updatedExisting" : false, "n" : 0, "lastOp" : 5734306103559192599,
"connectionId" : 51, "err" : null, "ok" : 1 }, "initialGLEHost" :
"Shard3/10.1.6.130:2224,10.1.64.102:2224,10.1.64.207:2224" }
---------------------
We are using 2.0.2 with C++ driver.
Here's a similar issue, but I didn't find the reason (It's said that
the error could be caused by data migration, but in my case, there was
a long time between the Master's re-online and the update operations.
The synchronisation is certainly done before the updates.) :
http://groups.google.com/group/mongodb-user/browse_thread/thread/600d3abcc6122b7e

florian0927

unread,
Apr 23, 2012, 11:14:53 AM4/23/12
to mongodb-user
ups
anyone has an idea why this is happening ?
> The synchronisation is certainly done before the updates.) :http://groups.google.com/group/mongodb-user/browse_thread/thread/600d...

Mathias Stearn

unread,
Apr 23, 2012, 12:28:24 PM4/23/12
to mongod...@googlegroups.com
It looks like you are getting a new connection. Note that the connectionIds are different on the first and second calls. Is it possible that the update is going on a separate connection than the getLastError? It might be useful to try "priming" the connection by doing a find with the same query as the update to make sure that the connection is in a good state.

florian0927

unread,
Apr 24, 2012, 5:55:04 AM4/24/12
to mongodb-user
The getLastError and the update have always with the same connection.
However, if the update failed, our application will use a new
connection and recall the Mongo driver and retry the same operation
(same query and same modifications). Thus, the connectionIds are
different. But I'm sure that getLastError is associated with the right
update operation.
Since each Mongo Update is logged by our application, and according to
the log, none of these operations was done successfully, while these
operations were executed in database, I just want to understand why
there was such kind of problem.

Thanks Mathias for your suggestion. To do a "priming" can help with
tests, but in production, it will slow down the update operations I
think ? Or will the query write the results in buffer and will not
influence pretty much the performance ?

Mathias Stearn

unread,
Apr 27, 2012, 3:49:32 PM4/27/12
to mongod...@googlegroups.com
It will slow down updates, but not significantly since it will ensure that the connections are good and the data is in ram before running the update.
Reply all
Reply to author
Forward
0 new messages