Starter having trouble starting.

maxsap

unread,

May 20, 2012, 8:03:26 AM5/20/12

to mongodb-user

Hello all, I have an existing application that uses MongoDB as the
data layer. Currently I am using Mongo's Map reduce mechanism but, I
am facing some performance issues. So I thought of using Hadoop to
implement that logic.
I have run the tresury-yield example with success and thought of
creating a simple project just to get to know the mongo-hadoop driver.
So I created a project inserted the appropriate jar files in the build
path and run it.
This is my java code:
"final Configuration conf = new Configuration();
MongoConfigUtil.setInputURI( conf, "mongodb://
username:pass...@192.168.1.198/locations" );
MongoConfigUtil.setOutputURI( conf, "mongodb://localhost/
test.out" );
System.out.println( "Conf: " + conf );

final Job job = new Job( conf, "word count" );

job.setJarByClass( WordCount.class );

job.setMapperClass( TokenizerMapper.class );

job.setCombinerClass( IntSumReducer.class );
job.setReducerClass( IntSumReducer.class );

job.setOutputKeyClass( Text.class );
job.setOutputValueClass( IntWritable.class );

job.setInputFormatClass( MongoInputFormat.class );
job.setOutputFormatClass( MongoOutputFormat.class );

System.exit( job.waitForCompletion( true ) ? 0 : 1 );"

but I am getting this error:

"Conf: Configuration: core-default.xml, core-site.xml
12/05/20 14:12:03 WARN util.NativeCodeLoader: Unable to load native-
hadoop library for your platform... using builtin-java classes where
applicable
12/05/20 14:12:03 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the
same.
12/05/20 14:12:03 WARN mapred.JobClient: No job jar file set. User
classes may not be found. See JobConf(Class) or
JobConf#setJar(String).
12/05/20 14:12:03 INFO mapred.JobClient: Cleaning up the staging area
file:/tmp/hadoop-maximos/mapred/staging/maximos1261801897/.staging/
job_local_0001
Exception in thread "main" java.lang.NullPointerException
at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:
796)
at com.mongodb.DBApiLayer.doGetCollection(DBApiLayer.java:116)
at com.mongodb.DBApiLayer.doGetCollection(DBApiLayer.java:43)
at com.mongodb.DB.getCollection(DB.java:81)
at
com.mongodb.hadoop.util.MongoSplitter.calculateSplits(MongoSplitter.java:
51)
at
com.mongodb.hadoop.MongoInputFormat.getSplits(MongoInputFormat.java:
51)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:
962)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:
1093)
at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:
850)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
at
com.mongodb.hadoop.examples.wordcount.WordCount.main(WordCount.java:
100)"

What I am doing wrong? is this a mongo, hadoop or mongo-hadoop
problem?
thanks in advanced maxsap.

maxsap

unread,

May 24, 2012, 3:30:33 AM5/24/12

to mongodb-user

Anyone any idea?

Asya Kamsky

unread,

May 26, 2012, 11:16:35 AM5/26/12

to mongod...@googlegroups.com

It looks like you forgot to specify the name of your collection (that you are getting the data from).

In the example the line looks like this:

MongoConfigUtil.setInputURI( conf, "mongodb://localhost/test.in" );

However, in your code I see:

MongoConfigUtil.setInputURI( conf, "mongodb://

username:passw...@192.168.1.198/locations" );

I'm not sure if locations is the collection name or database name, if it's the collection, then you can try prefixing it with the database name. If it's the database, add .yourcollectionname to the end of it.

maxsap

unread,

May 26, 2012, 1:49:30 PM5/26/12

to mongodb-user

Ok changed to /locations.venues.Newsletter but I now get this error:
"Conf: Configuration: core-default.xml, core-site.xml
12/05/26 20:47:40 WARN util.NativeCodeLoader: Unable to load native-

hadoop library for your platform... using builtin-java classes where
applicable

12/05/26 20:47:40 WARN mapred.JobClient: Use GenericOptionsParser for

parsing the arguments. Applications should implement Tool for the
same.

12/05/26 20:47:40 WARN mapred.JobClient: No job jar file set. User

classes may not be found. See JobConf(Class) or
JobConf#setJar(String).

12/05/26 20:47:40 INFO util.MongoSplitter: Calculate Splits Code ...
Use Shards? false, Use Chunks? true; Collection Sharded? false
12/05/26 20:47:40 INFO util.MongoSplitter: Creation of Input Splits is
enabled.
12/05/26 20:47:40 INFO util.MongoSplitter: Using Unsharded Split mode
(Calculating multiple splits though)
12/05/26 20:47:40 INFO util.MongoSplitter: Calculating unsharded input
splits on namespace 'locations.venues.Newsletter' with Split Key
'{ "_id" : 1}' and a split size of '8'mb per
12/05/26 20:47:40 INFO mapred.JobClient: Cleaning up the staging area
file:/tmp/hadoop-maximos/mapred/staging/maximos627483422/.staging/
job_local_0001
Exception in thread "main" java.lang.IllegalArgumentException: Unable
to calculate input splits: db assertion failure
at
com.mongodb.hadoop.util.MongoSplitter.calculateUnshardedSplits(MongoSplitter.java:
106)
at
com.mongodb.hadoop.util.MongoSplitter.calculateSplits(MongoSplitter.java:
75)

at
com.mongodb.hadoop.MongoInputFormat.getSplits(MongoInputFormat.java:
51)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:
962)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:
1093)
at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:
850)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
at
com.mongodb.hadoop.examples.wordcount.WordCount.main(WordCount.java:
100)
"

I think it has to do with this ->{ "_id" : 1} but how can I change the
splitKey in mongo-hadoop?

Asya Kamsky

unread,

May 27, 2012, 1:07:07 AM5/27/12

to mongod...@googlegroups.com

On Saturday, May 26, 2012 10:49:30 AM UTC-7, maxsap wrote:

I think it has to do with this ->{ "_id" : 1} but how can I change the
splitKey in mongo-hadoop?

It looks like you may be trying to run the example "as is" after changing URI strings only, is that right?

The example is using _id as the split key, which may not make sense with your data.

There are two places you can change any of the defaults, one in the XML config file or programmatically in your code. I believe the example directory contains both of those methods demonstrated.

To make sure there are no missing bits, can you post exactly what your database name is, collection name, what sort of documents are in the collection, and exactly what changes you made to the example code/configuration to run it? There may be something else that needs to be changed.

Asya

maxsap

unread,

May 29, 2012, 3:04:33 PM5/29/12

to mongodb-user

Hello Asya and sorry for the late responce,
I have changed the code to this :
"
Comments c = new Comments();
MongoConfigUtil.setInputSplitKey(conf, c);
MongoConfigUtil.setInputURI( conf, "mongodb://
user:pass...@192.168.1.198:27017/locations.Comments" );"

where Comments class is declared as : public class Comments extends
BasicDBObject
but I still get the illigal argument exception but the message have
changed to :
"12/05/29 21:58:28 INFO util.MongoSplitter: Calculating unsharded
input splits on namespace 'locations.Comments' with Split Key '{ }'

and a split size of '8'mb per"

Correct me if I am wrong but I think the driver is trying to split the
data with the given key but something is wrong with the key.
The layout from the MongoVUE is
-locations
------+Collections
------------+Comments
Thanks in advanced maxsap.

maxsap

unread,

May 30, 2012, 2:25:46 PM5/30/12

to mongodb-user

Just changed the splitkey to this:
"
com.mongodb.BasicDBObject query = new com.mongodb.BasicDBObject();
query.put( "VenueID",1);
MongoConfigUtil.setInputSplitKey(conf, query);
"
but no luck the error remains the same.

Asya Kamsky

unread,

May 30, 2012, 9:23:45 PM5/30/12

to mongod...@googlegroups.com

Maxsap:

As a sanity check, could you provide the following information (or either confirm or correct where I have it wrong):

Your database is called "locations"

Your collection is called "Comments"

The field in this collection on which you would like to split your job is "VenueID"

You have an index on this field

Please attach the full output of the last thing you tried (with setInputSplitKey on VenueID) including

the output and full stack trace (make sure you include the line starting with ""Calculating unsharded input splits on namespace")

Thanks,

Asya

maxsap

unread,

Jun 1, 2012, 3:21:39 PM6/1/12

to mongodb-user

yes that is correct the database is called locations and the
collection is called Comments.
The only difference is that the field does not have an index.

maxsap

unread,

Jun 5, 2012, 4:34:03 PM6/5/12

to mongodb-user

I have changed the VenueID to be indexed but no luck the console
output is:
"Conf: Configuration: core-default.xml, core-site.xml
12/06/05 23:30:13 WARN util.NativeCodeLoader: Unable to load native-

hadoop library for your platform... using builtin-java classes where
applicable

12/06/05 23:30:13 WARN mapred.JobClient: Use GenericOptionsParser for

parsing the arguments. Applications should implement Tool for the
same.

12/06/05 23:30:13 WARN mapred.JobClient: No job jar file set. User

classes may not be found. See JobConf(Class) or
JobConf#setJar(String).

12/06/05 23:30:13 INFO util.MongoSplitter: Calculate Splits Code ...

Use Shards? false, Use Chunks? true; Collection Sharded? false

12/06/05 23:30:13 INFO util.MongoSplitter: Creation of Input Splits is
enabled.
12/06/05 23:30:13 INFO util.MongoSplitter: Using Unsharded Split mode
(Calculating multiple splits though)
12/06/05 23:30:13 INFO util.MongoSplitter: Calculating unsharded input
splits on namespace 'locations.Comments' with Split Key '{ "VenueID" :
1}' and a split size of '8'mb per
12/06/05 23:30:13 INFO mapred.JobClient: Cleaning up the staging area
file:/tmp/hadoop-maximos/mapred/staging/maximos-1696961709/.staging/

106)"

Brendan W. McAdams

unread,

Jun 6, 2012, 10:33:01 AM6/6/12

to mongod...@googlegroups.com

Is your MongoDB setup sharded? This error typically occurs when you have a sharded MongoDB setup but your input collection is unsharded. Current releases of MongoDB are unable to allow Hadoop to calculate new splits via sharding.

You'll need to disable "Calculate Input Splits" in this case and/or shard your input collection.

Reply all

Reply to author

Forward