java.lang.IllegalArgumentException: invalid hexadecimal representation of an ObjectId: [MinKey]

tjacks...@gmail.com

unread,

Aug 21, 2015, 10:03:38 AM8/21/15

to mongodb-user

I follow the mongo connector to obtain chunks information from sharded mongodb

https://github.com/mongodb/mongo-hadoop/blob/master/core/src/main/java/com/mongodb/hadoop/splitter/ShardChunkMongoSplitter.java

However, when using min, max key obtained to query the collection, some min key seems to be empty, resulting in the query operation throws

java.lang.IllegalArgumentException: invalid hexadecimal representation of an ObjectId: [MinKey]

where MinKey printed is the min key value which looks like empty.

What might lead to such problem? Or what value (e.g. -1 ?) can I use to replace it?

Thanks

Luke Lovett

unread,

Aug 21, 2015, 12:58:03 PM8/21/15

to mongodb-user

Hello,

Can you please share:

1. The full stack trace
2. The options you are setting on the MongoDB Hadoop connector

Thanks!

tjacks...@gmail.com

unread,

Aug 21, 2015, 2:25:18 PM8/21/15

to mongodb-user

Sorry I think I look into the wrong way. After double check, I am able to query the data with query by range (also using java driver) like

db.mycollection.find({"_id": { $gte: ObjectId("55cf6a4cafd6a72e4bb33e1e"), $lt: ObjectId("55cf6a4cafd6a72e4bb33f69")}})

But I have a few more questions now. Apology to change questions that may not match to my original one.

Basically I customize my own logic based on mongo connector - find chunks in shards, etc. (It's interesting to learn how mongo related internal logic is working). And I compare the output (chunks and shards info) based on my logic with db.printShardingStatus(), the result shows that the number is the same. For instance,

db.printShardingStatus() result: (shard key is "_id")

{

shard key: { "_id" : 1 }

chunks:

shard0000 17

shard0001 17

shard0002 16

shard0003 16

}

My code output result:

4 shards with 66 chunks

shard0003 16

shard0000 17

shard0002 16

shard0001 17

But I find that when looping through those chunks (connecting to shard directly) the total records doesn't match the count in mongo shell. For example, db.mycollection.find().count() displays 2640 records, but the data found through mongo driver with the same query only gives around 340 records (two processes that process 200 + 140). Does that mean one can't simply obtain data through chunks range? Any other additional steps are needed so that one can have separates processes with each to read partial data from mongodb in achieving the effect like mapreduce?

Thanks

Luke Lovett

unread,

Aug 21, 2015, 3:00:07 PM8/21/15

to mongodb-user

If the balancer is still running or if there are any orphaned documents lying around on the shards, then a count() executed on the mongos may not return the same number of documents as the sum of the count()s on each shard individually. Check out the Chunk Migration Procedure here for more details on this: http://docs.mongodb.org/manual/core/sharding-chunk-migration/#chunk-migration-procedure

I'm not sure that I understand the specific situation you're describing, though. Is either the mongo shell or your application connecting to a mongos? Is the other connected to a single shard? With the balancer turned off, the sum of the counts of each of the chunks determined manually should be about equal to the count through the mongos.

As a side note, if your goal is to use range queries to build splits with the Hadoop connector, you can use this option to do so: https://github.com/mongodb/mongo-hadoop/wiki/Configuration-Reference#mongoinputsplituse_range_queries. Note that 'mongo.input.query' cannot contain the shard key if you use this option, since the connector will build a range query using the shard key in this case.

Reply all

Reply to author

Forward