java.lang.IllegalArgumentException: invalid hexadecimal representation of an ObjectId: [MinKey]

2,284 views
Skip to first unread message

tjacks...@gmail.com

unread,
Aug 21, 2015, 10:03:38 AM8/21/15
to mongodb-user
I follow the mongo connector to obtain chunks information from sharded mongodb


However, when using min, max key obtained to query the collection, some min key seems to be empty, resulting in the query operation throws 

java.lang.IllegalArgumentException: invalid hexadecimal representation of an ObjectId: [MinKey]

where MinKey printed is the min key value which looks like empty. 

What might lead to such problem? Or what value (e.g. -1 ?) can I use to replace it?

Thanks 

Luke Lovett

unread,
Aug 21, 2015, 12:58:03 PM8/21/15
to mongodb-user
Hello,

Can you please share:

1. The full stack trace
2. The options you are setting on the MongoDB Hadoop connector

Thanks!

tjacks...@gmail.com

unread,
Aug 21, 2015, 2:25:18 PM8/21/15
to mongodb-user


Sorry I think I look into the wrong way. After double check, I am able to query the data with query by range (also using java driver) like 

db.mycollection.find({"_id": { $gte: ObjectId("55cf6a4cafd6a72e4bb33e1e"), $lt: ObjectId("55cf6a4cafd6a72e4bb33f69")}})

But I have a few more questions now. Apology to change questions that may not match to my original one. 

Basically I customize my own logic based on mongo connector - find chunks in shards, etc. (It's interesting to learn how mongo related internal logic is working). And I compare the output (chunks and shards info) based on my logic with db.printShardingStatus(), the result shows that the number is the same. For instance, 

db.printShardingStatus() result: (shard key is "_id")

{
shard key: { "_id" : 1 }
chunks:
shard0000 17
shard0001 17
shard0002 16
shard0003 16
}

My code output result:
4 shards with 66 chunks
shard0003 16
shard0000 17
shard0002 16
shard0001 17

But I find that when looping through those chunks (connecting to shard directly) the total records doesn't match the count in mongo shell. For example, db.mycollection.find().count() displays 2640 records, but the data found through mongo driver with the same query only gives around 340 records (two processes that process 200 + 140). Does that mean one can't simply obtain data through chunks range? Any other additional steps are needed so that one can have separates processes with each to read partial data from mongodb in achieving the effect like mapreduce? 

Thanks

Luke Lovett

unread,
Aug 21, 2015, 3:00:07 PM8/21/15
to mongodb-user
If the balancer is still running or if there are any orphaned documents lying around on the shards, then a count() executed on the mongos may not return the same number of documents as the sum of the count()s on each shard individually. Check out the Chunk Migration Procedure here for more details on this: http://docs.mongodb.org/manual/core/sharding-chunk-migration/#chunk-migration-procedure

I'm not sure that I understand the specific situation you're describing, though. Is either the mongo shell or your application connecting to a mongos? Is the other connected to a single shard? With the balancer turned off, the sum of the counts of each of the chunks determined manually should be about equal to the count through the mongos.

As a side note, if your goal is to use range queries to build splits with the Hadoop connector, you can use this option to do so: https://github.com/mongodb/mongo-hadoop/wiki/Configuration-Reference#mongoinputsplituse_range_queries. Note that 'mongo.input.query' cannot contain the shard key if you use this option, since the connector will build a range query using the shard key in this case.
Reply all
Reply to author
Forward
0 new messages