Error while using mongo hadoop connector with input splits enabled

235 views
Skip to first unread message

TC

unread,
Jan 6, 2015, 6:12:59 PM1/6/15
to mongod...@googlegroups.com
I have a pig script that reads from a sharded mongodb collection using the mongo hadoop connector, and it works fine as long as mongo.input.split.create_input_splits is set to 'false'. When I set mongo.input.split.create_input_splits to 'true', I get the following error:
 
2015-01-06 22:30:59,352 [JobControl] INFO  com.mongodb.hadoop.splitter.MongoSplitterFactory  - Retrieved Collection stats:{ "serverUsed" : <...>} , "ok" : 1.0}
  2015-01-06 22:30:59,359 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter  - Cleaning up the staging area /user/hdfs/.staging/job_1419282352684_0171
  2015-01-06 22:30:59,367 [JobControl] WARN  org.apache.hadoop.security.UserGroupInformation  - PriviledgedActionException as:hdfs (auth:SIMPLE) cause:org.apache.pig.backend.executionengine.ExecException: ERROR 2118: not authorized for query on config.chunks
  2015-01-06 22:30:59,367 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob  - PigLatin:pigtest.pig got an error while submitting
  org.apache.pig.backend.executionengine.ExecException: ERROR 2118: not authorized for query on config.chunks
  at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:288)
  at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:493)
  at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:510)
  at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394)
  at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295)
  at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
  at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292)
  at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
  at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:191)
  at java.lang.Thread.run(Thread.java:745)
  at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270)
  Caused by: com.mongodb.MongoException: not authorized for query on config.chunks
  at com.mongodb.MongoException.parse(MongoException.java:82)
  at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:292)
  at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:273)
  at com.mongodb.DBCursor._check(DBCursor.java:368)
  at com.mongodb.DBCursor._hasNext(DBCursor.java:459)
  at com.mongodb.DBCursor.hasNext(DBCursor.java:484)
  at com.mongodb.hadoop.splitter.ShardChunkMongoSplitter.calculateSplits(ShardChunkMongoSplitter.java:90)
  at com.mongodb.hadoop.MongoInputFormat.getSplits(MongoInputFormat.java:58)
  at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274)
 
The mongo cluster is version 2.4.12, CDH 5.1.0, and mongo hadoop connector 1.3.0.
 
I have set the mongo.auth.uri to use an admin user and the admin database. In the mongo.input.uri, I am using the same admin user credentials. I have connected directly to the mongo cluster using those credentials, and I can read the config database so I know the credentials are correct.
 
The pig script looks like this:
 
set mongo.input.split.create_input_splits true;
set mongo.input.split.read_shard_chunks true;
set mongo.auth.uri 'mongodb://adminuser:pw@server:27017/admin';
set mongo.input.uri 'mongodb://adminuser:pw@server:27017/myDB.myCollection'
mydata = LOAD 'mongodb://server:27017/myDB.myCollection' using com.mongodb.hadoop.pig.MongoLoader;
 
What am I doing wrong here?
 
Thanks,
TC
 

Justin Lee

unread,
Jan 7, 2015, 8:26:25 AM1/7/15
to mongod...@googlegroups.com

--------------------------------

name     : "Justin Lee", 
  title    : "Software Engineer",
  twitter  : "@evanchooly",
  web      : [ "10gen.com", "antwerkz.com" ],
  location : "New York, NY" }

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/f431a820-d397-4f64-acf6-25916fcc6d59%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

TC

unread,
Jan 7, 2015, 11:17:10 AM1/7/15
to mongod...@googlegroups.com
Hi Justin,
 
I've already tried creating a local user for the input database, but even by assigning the clusterAdmin role to that user, I still get the same error. If I try to create the user with read access on the config database (using "otherDBRoles"), it says that "otherDBRoles" is only valid for admin.system.users. Also, I believe that the clusterAdmin role has no effect for users that aren't in the admin.system.users as well.
 
I figure that using the admin credentials in the mongo.input.uri should work, since the admin user has access to everything. Also, I can't easily upgrade to mongodb 2.6 at the moment.

TC

unread,
Jan 7, 2015, 2:45:17 PM1/7/15
to mongod...@googlegroups.com
Could this be the same bug, as described here: https://jira.mongodb.org/i#browse/HADOOP-171
 
 

On Wednesday, January 7, 2015 8:26:25 AM UTC-5, Justin Lee wrote:

TC

unread,
Jan 12, 2015, 10:19:17 AM1/12/15
to mongod...@googlegroups.com
I just updated to the latest code branch of the mongo hadoop connector, and now everything seems to work fine.
Reply all
Reply to author
Forward
0 new messages