Mongo Pig storage support

177 views
Skip to first unread message

Ayon Sinha

unread,
Dec 29, 2011, 5:42:27 PM12/29/11
to mongodb-user
Hi,
I'm trying the Pig to Mongo storage and I can't store a relation with
a Tuple in it.
I understand this is not officially released yet, but wanted to report
this early.

I get
java.lang.IllegalArgumentException: can't serialize class
org.apache.pig.data.BinSedesTuple
at org.bson.BasicBSONEncoder._putObjectField(BasicBSONEncoder.java:
234)
at org.bson.BasicBSONEncoder.putIterable(BasicBSONEncoder.java:259)
at org.bson.BasicBSONEncoder._putObjectField(BasicBSONEncoder.java:
198)
at org.bson.BasicBSONEncoder.putObject(BasicBSONEncoder.java:140)
at org.bson.BasicBSONEncoder.putObject(BasicBSONEncoder.java:86)
at com.mongodb.DefaultDBEncoder.writeObject(DefaultDBEncoder.java:27)
at com.mongodb.OutMessage.putObject(OutMessage.java:142)
at com.mongodb.DBApiLayer$MyCollection.insert(DBApiLayer.java:252)
at com.mongodb.DBApiLayer$MyCollection.insert(DBApiLayer.java:211)
at com.mongodb.DBCollection.insert(DBCollection.java:57)
at com.mongodb.DBCollection.insert(DBCollection.java:87)
at com.mongodb.DBCollection.save(DBCollection.java:716)
at com.mongodb.DBCollection.save(DBCollection.java:691)
at
com.mongodb.hadoop.output.MongoRecordWriter.write(MongoRecordWriter.java:
98)
at com.mongodb.hadoop.pig.MongoStorage.putNext(MongoStorage.java:77)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat
$PigRecordWriter.write(PigOutputFormat.java:138)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat
$PigRecordWriter.write(PigOutputFormat.java:97)
at org.apache.hadoop.mapred.MapTask
$NewDirectOutputCollector.write(MapTask.java:498)
at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:
80)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly
$Map.collect(PigMapOnly.java:48)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:
239)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:
232)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:
53)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner
$Job.run(LocalJobRunner.java:177)

I have tried 2.7.2 driver as well with no luck.

Brendan W. McAdams

unread,
Dec 29, 2011, 5:52:48 PM12/29/11
to mongod...@googlegroups.com
Which version of pig?

As far as I know we don't support Pig tuples, you'll need to expand it out into a type Mongo supports.


--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.


rjurney

unread,
Dec 29, 2011, 8:12:35 PM12/29/11
to mongodb-user
A tuple is just an object, having named fields. A bag is an array. A
bag of tuples is an array of objects. MongoDB supports fields in
objects that reference arrays of objects. BSON does this.

So the Mongo driver should also support this. You can look at th
AvroStorage conversion for reference. The driver is of limited value
otherwise.

Russ

Brendan W. McAdams

unread,
Dec 29, 2011, 8:53:13 PM12/29/11
to mongod...@googlegroups.com
That's all well and good, at the moment there isn't support for any specific Pig types.  It will likely be in a forthcoming release.

None of th is code has been released in a packaged form yet and is still in flight.

rjurney

unread,
Dec 29, 2011, 8:55:26 PM12/29/11
to mongodb-user
Any idea what kind of timeline that will be?

Brendan W. McAdams

unread,
Dec 29, 2011, 9:07:43 PM12/29/11
to mongod...@googlegroups.com
I'd like to get these kind of pig improvements in the 1.0 release, expected somtime in January.

rjurney

unread,
Dec 29, 2011, 9:36:22 PM12/29/11
to mongodb-user
That is great to hear! I'm looking at the Java libs for MongoDB
itself... are complex types - arrays, objects, etc. supported yet?

rjurney

unread,
Dec 31, 2011, 5:08:26 PM12/31/11
to mongodb-user
Just to follow up, because I'm looking at fixing this myself: does
BasicDBDObjectBuilder have the methods required to build complex
objects - including nested objects and and arrays?

referencing https://github.com/mongodb/mongo-java-driver/blob/master/src/main/com/mongodb/BasicDBObjectBuilder.java

Russ

Eliot Horowitz

unread,
Dec 31, 2011, 5:36:48 PM12/31/11
to mongod...@googlegroups.com
Yes.
You can embed other BasicDBObject

rjurney

unread,
Dec 31, 2011, 6:26:14 PM12/31/11
to mongodb-user
Thanks!

On Dec 31, 2:36 pm, Eliot Horowitz <el...@10gen.com> wrote:
> Yes.
> You can embed other BasicDBObject
>
>
>
>
>
>
>
> On Sat, Dec 31, 2011 at 5:08 PM, rjurney <russell.jur...@gmail.com> wrote:
> > Just to follow up, because I'm looking at fixing this myself: does
> > BasicDBDObjectBuilder have the methods required to build complex
> > objects - including nested objects and and arrays?
>
> > referencinghttps://github.com/mongodb/mongo-java-driver/blob/master/src/main/com...

rjurney

unread,
Dec 31, 2011, 10:45:42 PM12/31/11
to mongodb-user
I went ahead and fixed this myself, using
org.apache.pig.builtin.JsonStorage and Alan Gates' book, Programming
Pig for reference.

I'll send a pull request on github tomorrow, after I clean it up. It
only goes one level deep, but should help you add arbitrarily complex
nesting in your final implementation.

rjurney

unread,
Dec 31, 2011, 11:21:37 PM12/31/11
to mongodb-user
3-self reply :)

Here's the pull request: https://github.com/mongodb/mongo-hadoop/pull/29
Reply all
Reply to author
Forward
0 new messages