Google Groups

Re: Update problem with getLastError (shardRawGLE)


Mathias Stearn Apr 23, 2012 9:28 AM
Posted in group: mongodb-user
It looks like you are getting a new connection. Note that the connectionIds are different on the first and second calls. Is it possible that the update is going on a separate connection than the getLastError? It might be useful to try "priming" the connection by doing a find with the same query as the update to make sure that the connection is in a good state.

On Monday, April 23, 2012 11:14:53 AM UTC-4, florian0927 wrote:
ups
anyone has an idea why this is happening ?

On 23 avr, 10:46, florian0927 <florian0...@gmail.com> wrote:
> Hi all,
>
> We are using sharding + replica set configuration and recently we are
> facing a strange error.
>
> When all servers are online, everything is OK. However, if one Master
> of a replica set goes offline for a short time (there was some
> operations during it's offline), it is reselected as the master of the
> shard as soon as it becomes online.
> After that, the problem comes :
> In the log of our application, we see that the getlasterror of update
> operations gives sometime a shardRawGLE and with "n" : 0 (But the
> update is somehow done before, without log.)
> As the getlasterror returns n = 0, our application condiers that there
> was an error of this update, and it retries the same update operation
> with same conditions. For these retries, we have a different
> getLastError in the log, showing that updatedExisting is false. (which
> means that the update is done by the first try ?)
>
> ------------------
> Here's the code of updates and getLastErrors :
> m_pDBConnection->GetConnection()->update(tableName, m_objQuery,
> m_objFields, false, false);
> m_pDBConnection->GetConnection()->runCommand(m_pDBConnection->GetDBName(), BSON("getlasterror" << 1 << "w" << "majority" <<
>
> "wtimeout" << 10000), objError);
> int iRows = objError.getIntField("n");
> if (iRows > 0)
> {
>         if( pcRowsAffected )
>                 *pcRowsAffected = iRows;
>         EDPTRACE_DBG_INF(*********); //Application Log for a normal execution}
>
> else
> {
>         if( pcRowsAffected )
>                 *pcRowsAffected = 0;
>         EDPTRACE_DBG_ERR(**********); //Application Log showing that an error
> occurs}
>
> --------------------
> Here's the two different getLastErrors:
> 1) First try :
>    Error description : { "shards" :
> [ "Shard1/10.1.6.130:2222,10.1.64.207:2222",
> "Shard2/10.1.64.102:2223,10.1.64.207:2223",
> "Shard3/10.1.6.130:2224,10.1.64.102:2224,10.1.64.207:2224" ],
> "shardRawGLE" : { "Shard1/10.1.6.130:2222,10.1.64.207:2222" : { "n" :
> 0, "lastOp" : 0, "connectionId" : 52600, "wnote" : "no write has been
> done on this connection", "wtime" : 0, "err" : null, "ok" : 1 },
> "Shard2/10.1.64.102:2223,10.1.64.207:2223" : { "n" : 0, "lastOp" : 0,
> "connectionId" : 20070, "wnote" : "no write has been done on this
> connection", "wtime" : 0, "err" : null, "ok" : 1 },
> "Shard3/10.1.6.130:2224,10.1.64.102:2224,10.1.64.207:2224" :
> { "writeback" : { "$oid" : "4f9459054657a7c59703e612" },
> "instanceIdent" : "LY-YANG:2224", "n" : 0, "lastOp" : 0,
> "connectionId" : 58, "wnote" : "no write has been done on this
> connection", "wtime" : 0, "err" : null, "ok" : 1 } }, "n" : 0, "err" :
> null, "ok" : 1 }
> 2) All tries after the first :
>   Error description : { "singleShard" :
> "Shard3/10.1.6.130:2224,10.1.64.102:2224,10.1.64.207:2224",
> "updatedExisting" : false, "n" : 0, "lastOp" : 5734306103559192599,
> "connectionId" : 51, "err" : null, "ok" : 1, "writeback" : { "$oid" :
> "4f9459194657a7c59703e614" }, "instanceIdent" : "LY-YANG:2224",
> "wnote" : "no write has been done on this connection", "wtime" : 0,
> "writebackGLE" : { "singleShard" :
> "Shard3/10.1.6.130:2224,10.1.64.102:2224,10.1.64.207:2224",
> "updatedExisting" : false, "n" : 0, "lastOp" : 5734306103559192599,
> "connectionId" : 51, "err" : null, "ok" : 1 }, "initialGLEHost" :
> "Shard3/10.1.6.130:2224,10.1.64.102:2224,10.1.64.207:2224" }
> ---------------------
> We are using 2.0.2 with C++ driver.
> Here's a similar issue, but I didn't find the reason (It's said that
> the error could be caused by data migration, but in my case, there was
> a long time between the Master's re-online and the update operations.
> The synchronisation is certainly done before the updates.) :http://groups.google.com/group/mongodb-user/browse_thread/thread/600d...