wordcount demo. The columns it writes to is hard coded for now, but I
the values. Also, right now it will probably only let you write to one
collection, with a key / value pair. If it becomes necessary to try
But it's working. Woop woop!
> OK, I got it reading records just fine. It completes the M/R job, but
> it's not writing it to the database. I am not getting errors though.
> It says output written to test.out (the db.collection_name I am trying
> to write to in MongoDB), but there is nothing in that hadoop fs folder
> except an empty _SUCCESS file and a bunch of logs
> So I don't know where my output is going.
> On Jul 2, 8:03 pm, Nathan <nbyl...@gmail.com> wrote:
> > I feel so close. This class mimics theirs, but uses
> > TypedBytesWriteable instead of BSONObjects.
> > @SuppressWarnings("deprecation")
> > public class TypedBytesTableInputFormat implements
> > InputFormat<TypedBytesWritable, TypedBytesWritable> {
> > @Override
> > public RecordReader<TypedBytesWritable, TypedBytesWritable>
> > getRecordReader(InputSplit split, JobConf job, Reporter reporter) {
> > if (!(split instanceof MongoInputSplit))
> > throw new IllegalStateException("Creation of a new
> > RecordReader requires a MongoInputSplit instance.");
> > final MongoInputSplit mis = (MongoInputSplit) split;
> > return (RecordReader<TypedBytesWritable, TypedBytesWritable>)
> > new TypedBytesMongoRecordReader(mis);
> > }
> > ....
> > ....
> > ....
> > ....
> > public class TypedBytesMongoRecordReader extends
> > RecordReader<TypedBytesWritable, TypedBytesWritable> {
> > public TypedBytesMongoRecordReader(MongoInputSplit mis) {
> > _cursor = mis.getCursor();
> > }
> > ...
> > ...
> > ...
> > ...
> > Unfortunately I get this error:
> > java.lang.ClassCastException:
> > com.mongodb.hadoop.input.TypedBytesMongoRecordReader cannot be cast to
> > org.apache.hadoop.mapred.RecordReader
> > at
> > com.mongodb.hadoop.TypedBytesTableInputFormat.getRecordReader(TypedBytesTab leInputFormat.java:
> > 31)
> > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:370)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
> > at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:396)
> > at
> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.j ava:
> > 1115)
> > at org.apache.hadoop.mapred.Child.main(Child.java:262)
> > I feel so close! Not sure why I get a ClassCastException when my
> > TypedBytesMongoRecordReader is a child of the RecordReader. Any Java
> > people care to chime in?
> > On Jul 2, 3:02 pm, Nathan <nbyl...@gmail.com> wrote:
> > > The odd thing is it can't find this package when I try and import it
> > > (I have all my jar's in build path, including the hadoop streaming):
> > > import org.apache.hadoop.typedbytes.TypedBytesWritable;
> > > Says there is no typedbytes package in hadoop. Eclipse tries to
> > > resolve this error by importing the hadoop-streaming.jar from the
> > > lasthbase project. I have looked, and this is definetly not as
> > > depreceated method, so it should be there, so I don't know what that
> > > problem is.
> > > On Jul 2, 1:35 pm, Nathan <nbyl...@gmail.com> wrote:
> > > > I get what you are saying though. I am going to try and create a
> > > > wrapper this weekend, but don't expect much success since I am not a
> > > > Java guy. :)
> > > > They have a lot of the same methods in their input & output formats,
> > > > but are there specific methods that must be overridden? Are there very
> > > > specific things that MUST happen in the input & output formats? Any
> > > > tips are appreciated. Hopefully this is pretty straight forward, as
> > > > there is only two classes to mess with.
> > > > On Jul 2, 1:09 pm, Nathan <nbyl...@gmail.com> wrote:
> > > > > Thanks for your reply. The last message I posted it's reading from
> > > > > MongoDB just fine, and their mongodb-hadoop driver uses TypedBytes as
> > > > > well. This is the error I am currently strugggling with:
> > > > > java.io.IOException: Can't write: 4e0e98380bfb6ce2d9091ea6 as class
> > > > > org.bson.types.ObjectId
> > > > > 4e0e98380bfb6ce2d9091ea6 is the mongodb objectId string of the first
> > > > > record in my test collection, so I know it's able to access the data.
> > > > > Also, in the error stack trace, it outputs this:
> > > > > org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:
> > > > > 162)
> > > > > at org.apache.hadoop.io.ObjectWritable.write(ObjectWritable.java:70)
> > > > > at
> > > > > org.apache.hadoop.typedbytes.TypedBytesWritableOutput.writeWritable(TypedBy
> > > > > tesWritableOutput.java: 217)
> > > > > So I know their driver is trying to use typed bytes. They have working
> > > > > examples in pure Java, but I have grown accustom to dumbo, and would
> > > > > like to use it and help this project grow. Supposively the project
> > > > > supports streaming jobs too, so there should be no problem working
> > > > > with dumbo as is once everything is figured out. I am not sure what is
> > > > > happening yet, but I will share as soon as I have something working. I
> > > > > also encourage anyone else interested to please take a look or share
> > > > > their opinions. :)
> > > > > On Jul 2, 12:03 pm, Klaas Bosteels <klaas.boste...@gmail.com> wrote:
> > > > > > Hi Nathan,
> > > > > > Based on what you told us, I don't think there's a real difference between
> > > > > > how the two take configuration params. The mongodb example probably just
> > > > > > makes use of the possibility that Hadoop provides for putting the params in
> > > > > > an xml file and reading them from that file instead of passing them
> > > > > > directly.
> > > > > > To make mongo input or output work, you will need to write a custom input or
> > > > > > output format that writes or reads typed bytes writables. I haven't looked
> > > > > > at the code much, but you might be able to do this by wrapping the
> > > > > > mongo-hadoop formats. You should be able to figure out how to work with
> > > > > > typed bytes writables by having a look at the lasthbase code.
> > > > > > Also, to use (Java) input or output formats you need to run on Hadoop.
> > > > > > That's the reason why the local run you pasted in on of your emails failed
> > > > > > miserably.
> > > > > > Sorry for the late answer, and please share your code if you figure out how
> > > > > > to do this!
> > > > > > Regards,
> > > > > > -Klaas
> > > > > > On Thu, Jun 30, 2011 at 8:34 PM, Nathan <nbyl...@gmail.com> wrote:
> > > > > > > I was using HBase for a while and was happy when I found the lasthbase
> > > > > > > driver on github that worked great with dumbo. Recently I have started
> > > > > > > working with MongoDB and found a mongodb-hadoop driver here:
> > > > > > >https://github.com/mongodb/mongo-hadoop/
> > > > > > > I asked a friend of mine who is much more familiar with Java to
> > > > > > > compare the two, to see if we can use the mongodb classes easily in
> > > > > > > the same way dumbo uses the lasthbase.jar. For reference, here is the
> > > > > > > Input & Output format classes for both HBase & mongodb projects:
> > > > > > >https://github.com/mongodb/mongo-hadoop/tree/master/src/main/com/mong...
> > > > > > >https://github.com/tims/lasthbase/tree/master/src/java/fm/last/hbase/...
> > > > > > > With lasthbase, the input & output information is specified on the
> > > > > > > command line, but in the mongodb, they have a WordCountXML example
> > > > > > > that reads all connection, query, and other configurable information
> > > > > > > from an XML file. I liked this approach, but had some questions. It
> > > > > > > seems as though the lasthbase classes extended a JobConfigurable
> > > > > > > class, but its been a long time since it's been updated. Mongodb-
> > > > > > > hadoop does not have this. A LOT of the setup looks the same, but was
> > > > > > > looking for a good starting point on making their classes work with
> > > > > > > dumbo.
> > > > > > > What is dumbo expecting, or better yet, what is lasthbase sending to
> > > > > > > dumbo? What does dumbo need from the jar file to start streaming the
> > > > > > > data to the map/reduce job(s)? And how should it be streamed? I don't
> > > > > > > know Java, but my friend is willing to try and help get it going if I
> > > > > > > can get him all the information possible. To him it SEEMS some things
> > > > > > > can be moved around and into the input & output format classes on
> > > > > > > mongodb-hadoop, tell it to read the xml file, and then you have
> > > > > > > another driver that connects to a document database for use with
> > > > > > > dumbo.
> > > > > > > But he has no understand of dumbo, and we could use some assitance.
> > > > > > > --
> > > > > > > You received this message because you are subscribed to the Google Groups
> > > > > > > "dumbo-user" group.
> > > > > > > To post to this group, send email to dumbo-user@googlegroups.com.
> > > > > > > To unsubscribe from this group, send email to
> > > > > > > dumbo-user+unsubscribe@googlegroups.com.
> > > > > > > For more options, visit this group at
> > > > > > >http://groups.google.com/group/dumbo-user?hl=en.