"Wrong FS" error using MongoStorage with Pig

182 views
Skip to first unread message

Andrea Leistra

unread,
Nov 24, 2010, 11:31:54 AM11/24/10
to mongodb-user
I am trying to use the MongoStorage format to write from Pig. I have
mongodb and mongo-hadoop installed and running (in my own directory; I
do not have root permissions on the machine.) We are using the
Cloudera distribution of Hadoop.

When I try to run the modified pigtutorial.pig from the download, I
get the following error after it gets to 100% completion:


"ERROR org.apache.pig.backend.tools.grunt.Grunt - ERROR 2999:
Unexpected internal error. Wrong FS: mongodb://localhost/test.pig.output,
expected: hdfs://usredhdp00-priv"

This occurs whether or not I give the full path and/or machine name to
the output file rather than "localhost" (though the target changes, as
expected). How do I fix this?

Thank you,
Andrea Leistra

Brendan W. McAdams

unread,
Nov 24, 2010, 12:32:32 PM11/24/10
to mongod...@googlegroups.com
Andrea,

Hopefully we can sort this out.  

The last line is: 


STORE ordered_uniq_frequency INTO 'mongodb://localhost/demo.pig.output' USING com.mongodb.hadoop.pig.MongoStorage;

Right?

The HDFS error usually occurs when the MongoStorage doesn't get loaded, as it bumps things back to the default.

What directory are you running the script from?  

Lines 26-28 do a REGISTER of necessary jar files and if any of those isn't found or loaded correctly this issue could occur.

I'd recommend setting the paths in those for exactly where they are in relation to where you're running pig from.

Which version of Cloudera?

Finally - I have to check how Pig behaves in cluster mode: I believe the jar files you REGISTER must be available on each node in the Cloudera system.  I'm going to fire up a VM and verify that.


(It might take me a few passes to figure out what's going on here so forgive the 20 questions)



--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.


Andrea Leistra

unread,
Nov 30, 2010, 12:15:45 PM11/30/10
to mongodb-user
Sorry for the delayed response.

Brendan,

I tried the last line both exactly as written and substituting the
full directory path and machine name; they gave the same error.

I am running the script from ~/mongodb/mongo-hadoop (where "examples"
is a subdirectory), and have specified the full path for all of the
jars that I register.

Not sure which version of Cloudera, I've asked our cluster
administrator. Hadoop version is 0.20.2.

Full stack trace of the error message from the pig log file:

Backend error message
---------------------
Error: com.mongodb.Mongo.<init>(Lcom/mongodb/MongoURI;)V

Pig Stack Trace
---------------
ERROR 2999: Unexpected internal error. Wrong FS:
mongodb://usredhdp00-priv/home/CONCUR/andreal/mongodata/test.pig.output,
expected: hdfs://usredhdp00-priv

java.lang.IllegalArgumentException: Wrong FS:
mongodb://usredhdp00-priv/home/CONCUR/andreal/mongodata/test.pig.output,
expected: hdfs://usredhdp00-priv
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:385)
at
org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:
106)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:
162)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:
515)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:723)
at org.apache.pig.StoreFunc.cleanupOnFailureImpl(StoreFunc.java:172)
at org.apache.pig.StoreFunc.cleanupOnFailure(StoreFunc.java:158)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:
1012)
at org.apache.pig.PigServer.execute(PigServer.java:1000)
at org.apache.pig.PigServer.access$100(PigServer.java:112)
at org.apache.pig.PigServer$Graph.execute(PigServer.java:1252)
at org.apache.pig.PigServer.executeBatch(PigServer.java:324)
at
org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:
110)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:
167)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:
139)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
at org.apache.pig.Main.main(Main.java:414)

Brendan W. McAdams

unread,
Nov 30, 2010, 12:33:53 PM11/30/10
to mongod...@googlegroups.com
Andrea,

Pig acts a bit weird when the Storage Engine throws an error and "Wrong FS" seems to be a pretty generic message.

Looking at the trace you supplied, pig is throwing an error in the MongoURI parser, and it's trying to use the following Mongo URL:

mongodb://usredhdp00-priv/home/CONCUR/andreal/mongodata/test.pig.output

I'm assuming that this isn't the URI you provided and that pig translated it but can you confirm?  I'm setting up a testbed here with the same version of Hadoop to see if i can reproduce the issues.

-b




--

Andrea Leistra

unread,
Nov 30, 2010, 2:49:28 PM11/30/10
to mongodb-user
The path quoted was what I tried after just using "localhost" failed.
I get the same error with.mongodb://localhost/mongodata/
test.pig.output.

On Nov 30, 12:33 pm, "Brendan W. McAdams" <bren...@10gen.com> wrote:
> Andrea,r
>
> Pig acts a bit weird when the Storage Engine throws an error and "Wrong FS"
> seems to be a pretty generic message.
>
> Looking at the trace you supplied, pig is throwing an error in the MongoURI
> parser, and it's trying to use the following Mongo URL:
>
> mongodb://usredhdp00-priv/home/CONCUR/andreal/mongodata/test.pig.output
>
> I'm assuming that this isn't the URI you provided and that pig translated it
> but can you confirm?  I'm setting up a testbed here with the same version of
> Hadoop to see if i can reproduce the issues.
>
> -b
>
> > mongodb-user...@googlegroups.com<mongodb-user%2Bunsu...@googlegroups.com>
> > .

Brendan W. McAdams

unread,
Nov 30, 2010, 2:50:36 PM11/30/10
to mongod...@googlegroups.com
Ok. I'm working on trying to replicate this on my end with a cluster.  I suspect I know where the issue may be but not certain yet.

Obviously, if you are on a cluster you need the Address for Mongo to be one that can be seen from the remote machine.  E.g. if mongo isn't on the same box as your cluster nodes localhost won't work.

To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.

Brendan W. McAdams

unread,
Nov 30, 2010, 3:31:22 PM11/30/10
to mongod...@googlegroups.com
Andrea,

I'm having trouble reproducing the issue - I setup a hadoop cluster and connected pig to an external Mongo process but it runs OK.

What command are you giving Pig to run with? 

Also, can you confirm the pig version with:

pig --version

We've been testing with 0.7.0, and I'm not certain how it will behave with older revisions.



To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.

Andrea Leistra

unread,
Dec 1, 2010, 10:37:27 AM12/1/10
to mongodb-user
That may be the problem, since I have mongo set up on the gateway
node. Is there any way around this, or should I ask for Mongo to be
installed on a cluster node?

On Nov 30, 2:50 pm, "Brendan W. McAdams" <bren...@10gen.com> wrote:
> Ok. I'm working on trying to replicate this on my end with a cluster.  I
> suspect I know where the issue may be but not certain yet.
>
> Obviously, if you are on a cluster you need the Address for Mongo to be one
> that can be seen from the remote machine.  E.g. if mongo isn't on the same
> box as your cluster nodes localhost won't work.
>
> > <mongodb-user%2Bunsu...@googlegroups.com<mongodb-user%252Buns...@googlegroups.com>

Brendan W. McAdams

unread,
Dec 1, 2010, 11:32:42 AM12/1/10
to mongod...@googlegroups.com
It's not a problem per se - but think of it this way.  Look at the attached image, which is a rough diagram of how a clustered pig job might look, assuming Mongo is on the same gateway box that Pig is running from.

The log file is on HDFS which is on one or more of the job-node boxes, depending on how big it is.

When you run pig it converts the pig script to a MapReduce job and hands the job off to the Job Tracker.

The Job tracker passes the MapReduce job to one or more job-node machines.

When each job-node finishes it's mapreduce it writes *from that machine* to MongoDB.

So you don't need to move MongoDB to the cluster, but you need to:

a) Make sure the job-node machines can connect to MongoDB (access to the IP and Port Mongo is on) where it lives

b) Specify the Hostname and Port as seen from the job-nodes for MongoDB.   Localhost will only work if you run the mapReduce and Pig and Mongo all in the same place.  You can test THAT by running pig -x local <scriptname>.

If you can access mongo by a hostname and port from the job-nodes you should be OK --- Just adjust your mongo address in pig accordingly.

I suspect this might be the problem, although Pig seems to be swallowing the exception info associated with why Mongo failed to connect.

Does this make sense?




To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
pig-expl.png
Reply all
Reply to author
Forward
0 new messages