Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion Dumbo & MongoDB
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Nathan  
View profile  
 More options Jul 3 2011, 8:45 pm
From: Nathan <nbyl...@gmail.com>
Date: Sun, 3 Jul 2011 17:45:10 -0700 (PDT)
Local: Sun, Jul 3 2011 8:45 pm
Subject: Re: Dumbo & MongoDB
OK everything is reading and writing to mongodb using the dumbo
wordcount demo. The columns it writes to is hard coded for now, but I
will make a configurable property in the XML file where you can output
the values. Also, right now it will probably only let you write to one
collection, with a key / value pair. If it becomes necessary to try
and save actual BSONObjects with multiple k/v pairs, I will try that
next.

But it's working. Woop woop!

On Jul 2, 9:31 pm, Nathan <nbyl...@gmail.com> wrote:

> OK, I got it reading records just fine. It completes the M/R job, but
> it's not writing it to the database. I am not getting errors though.
> It says output written to test.out (the db.collection_name I am trying
> to write to in MongoDB), but there is nothing in that hadoop fs folder
> except an empty _SUCCESS file and a bunch of logs

> So I don't know where my output is going.

> On Jul 2, 8:03 pm, Nathan <nbyl...@gmail.com> wrote:

> > I feel so close. This class mimics theirs, but uses
> > TypedBytesWriteable instead of BSONObjects.

> > @SuppressWarnings("deprecation")
> > public class TypedBytesTableInputFormat implements
> > InputFormat<TypedBytesWritable, TypedBytesWritable> {

> >         @Override
> >         public RecordReader<TypedBytesWritable, TypedBytesWritable>
> > getRecordReader(InputSplit split, JobConf job, Reporter reporter) {

> >                 if (!(split instanceof MongoInputSplit))
> >             throw new IllegalStateException("Creation of a new
> > RecordReader requires a MongoInputSplit instance.");

> >         final MongoInputSplit mis = (MongoInputSplit) split;

> >         return (RecordReader<TypedBytesWritable, TypedBytesWritable>)
> > new TypedBytesMongoRecordReader(mis);
> >         }
> > ....
> > ....
> > ....
> > ....

> > public class TypedBytesMongoRecordReader extends
> > RecordReader<TypedBytesWritable, TypedBytesWritable> {

> >         public TypedBytesMongoRecordReader(MongoInputSplit mis) {
> >                 _cursor = mis.getCursor();
> >         }
> > ...
> > ...
> > ...
> > ...

> > Unfortunately I get this error:

> > java.lang.ClassCastException:
> > com.mongodb.hadoop.input.TypedBytesMongoRecordReader cannot be cast to
> > org.apache.hadoop.mapred.RecordReader
> >         at
> > com.mongodb.hadoop.TypedBytesTableInputFormat.getRecordReader(TypedBytesTab leInputFormat.java:
> > 31)
> >         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:370)
> >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
> >         at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> >         at java.security.AccessController.doPrivileged(Native Method)
> >         at javax.security.auth.Subject.doAs(Subject.java:396)
> >         at
> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.j ava:
> > 1115)
> >         at org.apache.hadoop.mapred.Child.main(Child.java:262)

> > I feel so close! Not sure why I get a ClassCastException when my
> > TypedBytesMongoRecordReader is a child of the RecordReader. Any Java
> > people care to chime in?

> > On Jul 2, 3:02 pm, Nathan <nbyl...@gmail.com> wrote:

> > > The odd thing is it can't find this package when I try and import it
> > > (I have all my jar's in build path, including the hadoop streaming):

> > > import org.apache.hadoop.typedbytes.TypedBytesWritable;

> > > Says there is no typedbytes package in hadoop. Eclipse tries to
> > > resolve this error by importing the hadoop-streaming.jar from the
> > > lasthbase project. I have looked, and this is definetly not as
> > > depreceated method, so it should be there, so I don't know what that
> > > problem is.

> > > On Jul 2, 1:35 pm, Nathan <nbyl...@gmail.com> wrote:

> > > > I get what you are saying though. I am going to try and create a
> > > > wrapper this weekend, but don't expect much success since I am not a
> > > > Java guy. :)

> > > > They have a lot of the same methods in their input & output formats,
> > > > but are there specific methods that must be overridden? Are there very
> > > > specific things that MUST happen in the input & output formats? Any
> > > > tips are appreciated. Hopefully this is pretty straight forward, as
> > > > there is only two classes to mess with.

> > > > On Jul 2, 1:09 pm, Nathan <nbyl...@gmail.com> wrote:

> > > > > Thanks for your reply. The last message I posted it's reading from
> > > > > MongoDB just fine, and their mongodb-hadoop driver uses TypedBytes as
> > > > > well. This is the error I am currently strugggling with:

> > > > > java.io.IOException: Can't write: 4e0e98380bfb6ce2d9091ea6 as class
> > > > > org.bson.types.ObjectId

> > > > > 4e0e98380bfb6ce2d9091ea6 is the mongodb objectId string of the first
> > > > > record in my test collection, so I know it's able to access the data.
> > > > > Also, in the error stack trace, it outputs this:

> > > > > org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:
> > > > > 162)
> > > > > at org.apache.hadoop.io.ObjectWritable.write(ObjectWritable.java:70)
> > > > > at
> > > > > org.apache.hadoop.typedbytes.TypedBytesWritableOutput.writeWritable(TypedBy
> > > > > tesWritableOutput.java: 217)

> > > > > So I know their driver is trying to use typed bytes. They have working
> > > > > examples in pure Java, but I have grown accustom to dumbo, and would
> > > > > like to use it and help this project grow. Supposively the project
> > > > > supports streaming jobs too, so there should be no problem working
> > > > > with dumbo as is once everything is figured out. I am not sure what is
> > > > > happening yet, but I will share as soon as I have something working. I
> > > > > also encourage anyone else interested to please take a look or share
> > > > > their opinions. :)

> > > > > On Jul 2, 12:03 pm, Klaas Bosteels <klaas.boste...@gmail.com> wrote:

> > > > > > Hi Nathan,

> > > > > > Based on what you told us, I don't think there's a real difference between
> > > > > > how the two take configuration params. The mongodb example probably just
> > > > > > makes use of the possibility that Hadoop provides for putting the params in
> > > > > > an xml file and reading them from that file instead of passing them
> > > > > > directly.

> > > > > > To make mongo input or output work, you will need to write a custom input or
> > > > > > output format that writes or reads typed bytes writables. I haven't looked
> > > > > > at the code much, but you might be able to do this by wrapping the
> > > > > > mongo-hadoop formats. You should be able to figure out how to work with
> > > > > > typed bytes writables by having a look at the lasthbase code.

> > > > > > Also, to use (Java) input or output formats you need to run on Hadoop.
> > > > > > That's the reason why the local run you pasted in on of your emails failed
> > > > > > miserably.

> > > > > > Sorry for the late answer, and please share your code if you figure out how
> > > > > > to do this!

> > > > > > Regards,
> > > > > > -Klaas

> > > > > > On Thu, Jun 30, 2011 at 8:34 PM, Nathan <nbyl...@gmail.com> wrote:
> > > > > > > I was using HBase for a while and was happy when I found the lasthbase
> > > > > > > driver on github that worked great with dumbo. Recently I have started
> > > > > > > working with MongoDB and found a mongodb-hadoop driver here:

> > > > > > >https://github.com/mongodb/mongo-hadoop/

> > > > > > > I asked a friend of mine who is much more familiar with Java to
> > > > > > > compare the two, to see if we can use the mongodb classes easily in
> > > > > > > the same way dumbo uses the lasthbase.jar. For reference, here is the
> > > > > > > Input & Output format classes for both HBase & mongodb projects:

> > > > > > >https://github.com/mongodb/mongo-hadoop/tree/master/src/main/com/mong...

> > > > > > >https://github.com/tims/lasthbase/tree/master/src/java/fm/last/hbase/...

> > > > > > > With lasthbase, the input & output information is specified on the
> > > > > > > command line, but in the mongodb, they have a WordCountXML example
> > > > > > > that reads all connection, query, and other configurable information
> > > > > > > from an XML file. I liked this approach, but had some questions. It
> > > > > > > seems as though the lasthbase classes extended a JobConfigurable
> > > > > > > class, but its been a long time since it's been updated. Mongodb-
> > > > > > > hadoop does not have this. A LOT of the setup looks the same, but was
> > > > > > > looking for a good starting point on making their classes work with
> > > > > > > dumbo.

> > > > > > > What is dumbo expecting, or better yet, what is lasthbase sending to
> > > > > > > dumbo? What does dumbo need from the jar file to start streaming the
> > > > > > > data to the map/reduce job(s)? And how should it be streamed? I don't
> > > > > > > know Java, but my friend is willing to try and help get it going if I
> > > > > > > can get him all the information possible. To him it SEEMS some things
> > > > > > > can be moved around and into the input & output format classes on
> > > > > > > mongodb-hadoop, tell it to read the xml file, and then you have
> > > > > > > another driver that connects to a document database for use with
> > > > > > > dumbo.

> > > > > > > But he has no understand of dumbo, and we could use some assitance.

> > > > > > > --
> > > > > > > You received this message because you are subscribed to the Google Groups
> > > > > > > "dumbo-user" group.
> > > > > > > To post to this group, send email to dumbo-user@googlegroups.com.
> > > > > > > To unsubscribe from this group, send email to
> > > > > > > dumbo-user+unsubscribe@googlegroups.com.
> > > > > > > For more options, visit this group at
> > > > > > >http://groups.google.com/group/dumbo-user?hl=en.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.