Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Dumbo & MongoDB
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  17 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Nathan  
View profile  
 More options Jun 30 2011, 2:34 pm
From: Nathan <nbyl...@gmail.com>
Date: Thu, 30 Jun 2011 11:34:12 -0700 (PDT)
Local: Thurs, Jun 30 2011 2:34 pm
Subject: Dumbo & MongoDB
I was using HBase for a while and was happy when I found the lasthbase
driver on github that worked great with dumbo. Recently I have started
working with MongoDB and found a mongodb-hadoop driver here:

https://github.com/mongodb/mongo-hadoop/

I asked a friend of mine who is much more familiar with Java to
compare the two, to see if we can use the mongodb classes easily in
the same way dumbo uses the lasthbase.jar. For reference, here is the
Input & Output format classes for both HBase & mongodb projects:

https://github.com/mongodb/mongo-hadoop/tree/master/src/main/com/mong...

https://github.com/tims/lasthbase/tree/master/src/java/fm/last/hbase/...

With lasthbase, the input & output information is specified on the
command line, but in the mongodb, they have a WordCountXML example
that reads all connection, query, and other configurable information
from an XML file. I liked this approach, but had some questions. It
seems as though the lasthbase classes extended a JobConfigurable
class, but its been a long time since it's been updated. Mongodb-
hadoop does not have this. A LOT of the setup looks the same, but was
looking for a good starting point on making their classes work with
dumbo.

What is dumbo expecting, or better yet, what is lasthbase sending to
dumbo? What does dumbo need from the jar file to start streaming the
data to the map/reduce job(s)? And how should it be streamed? I don't
know Java, but my friend is willing to try and help get it going if I
can get him all the information possible. To him it SEEMS some things
can be moved around and into the input & output format classes on
mongodb-hadoop, tell it to read the xml file, and then you have
another driver that connects to a document database for use with
dumbo.

But he has no understand of dumbo, and we could use some assitance.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nathan  
View profile  
 More options Jun 30 2011, 11:51 pm
From: Nathan <nbyl...@gmail.com>
Date: Thu, 30 Jun 2011 20:51:45 -0700 (PDT)
Local: Thurs, Jun 30 2011 11:51 pm
Subject: Re: Dumbo & MongoDB
For instance, I compiled the mongo-hadoop.jar file, and I wanted to
just see what happened. I put the file in my /usr/lib/hadoop-0.20
folder. The I ran this command just to see what would happen:

dumbo test-in.py -libjar mongo-hadoop.jar -inputformat
com.mongodb.hadoop.mapred.MongoInputFormat -outputformat
com.mongodb.hadoop.mapred.MongoOutputFormat -input
mongodb://localhost/demo.yield_historical.in -output
mongodb://localhost/demo.yield_historical.out

XEC: PYTHONPATH="/usr/local/lib/python2.7/dist-packages/dumbo-0.21.30-
py2.7.egg:$PYTHONPATH" python -m dumbo.cmd encodepipe -file
mongodb://localhost/demo.yield_historical.in | PYTHONPATH="/usr/local/
lib/python2.7/dist-packages/dumbo-0.21.30-py2.7.egg:$PYTHONPATH"
dumbo_mrbase_class='dumbo.backends.common.MapRedBase'
dumbo_jk_class='dumbo.backends.common.JoinKey'
dumbo_runinfo_class='dumbo.backends.common.RunInfo' python -m test-in
map 0 262144000  > 'mongodb://localhost/demo.yield_historical.out'
/bin/sh: cannot create mongodb://localhost/demo.yield_historical.out:
Directory nonexistent
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/local/lib/python2.7/dist-packages/dumbo-0.21.30-py2.7.egg/
dumbo/cmd.py", line 170, in <module>
    sys.exit(dumbo())
  File "/usr/local/lib/python2.7/dist-packages/dumbo-0.21.30-py2.7.egg/
dumbo/cmd.py", line 53, in dumbo
    retval = encodepipe(parseargs(sys.argv[2:]))
  File "/usr/local/lib/python2.7/dist-packages/dumbo-0.21.30-py2.7.egg/
dumbo/cmd.py", line 133, in encodepipe
    for file in files:
  File "/usr/local/lib/python2.7/dist-packages/dumbo-0.21.30-py2.7.egg/
dumbo/cmd.py", line 130, in <genexpr>
    files = (open(f) for f in addedopts['file'])
IOError: [Errno 2] No such file or directory: 'mongodb://localhost/
demo.yield_historical.in'

From the error above, it doesn't seem to be picking up the JAR file I
passed in CLI. I just installed dumbo from github today and Cloudera's
CDH3 from their repo. Any tips? Does -libjar still work? I looked at
the source and only folder references to libegg, unless I was looking
in the wrong place.

Anyone else interested in using mongodb as their source/sink for
hadoop? :)

On Jun 30, 1:34 pm, Nathan <nbyl...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nathan  
View profile  
 More options Jul 1 2011, 3:56 pm
From: Nathan <nbyl...@gmail.com>
Date: Fri, 1 Jul 2011 12:56:36 -0700 (PDT)
Local: Fri, Jul 1 2011 3:56 pm
Subject: Re: Dumbo & MongoDB
I changed my cli argument to this:

dumbo test-in.py -hadoop /usr/lib/hadoop -libjar mongo-hadoop.jar -
inputformat com.mongodb.hadoop.mapred.MongoInputFormat -outputformat
com.mongodb.hadoop.mapred.MongoOutputFormat -input
mongodb://localhost/demo.yield_historical.in -output
mongodb://localhost/demo.yield_historical.out

Adding the -hadoop path. It can't find the mongo-hadoop.jar now. I
believe I just need to update the HADOOP_CLASSPATH in my install. But
the file IS located in /usr/lib/hadoop along with all the default
jar's. My original questions still remain though as I stumble my way
through this. What is the interaction between dumbo, the mongo-
hadoop.jar, and hadoop? Are there specific methods that need to be in
place that do certain things? Can the MongoInputClass be altered to
look for an xml file fed in through cli (passed through dumbo of
course).

I am guessing dumbo would need to be altered. But not sure how all the
communication works, and where in the code. If I can get a better
understanding, I am going to fork the project, and create an "addon"
that allows for mongo access.

This group seems pretty dead lately though...

On Jun 30, 10:51 pm, Nathan <nbyl...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nathan  
View profile  
 More options Jul 1 2011, 11:53 pm
From: Nathan <nbyl...@gmail.com>
Date: Fri, 1 Jul 2011 20:53:55 -0700 (PDT)
Local: Fri, Jul 1 2011 11:53 pm
Subject: Re: Dumbo & MongoDB
OK a little bit farther. I added the mongo-java driver & the mongo-
hadoop.jar into the HADOOP_CLASSHPATH and added -conf wordcount.xml
file from their example project. Now I am getting this error:

2011-07-01 22:49:13,688 INFO org.apache.hadoop.mapred.Task: Cleaning
up job
2011-07-01 22:49:13,688 INFO org.apache.hadoop.mapred.Task: Aborting
job with runstate : FAILED
2011-07-01 22:49:13,729 INFO
org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs'
truncater with mapRetainSize=-1 and reduceRetainSize=-1
2011-07-01 22:49:13,731 WARN org.apache.hadoop.mapred.Child: Error
running child
java.io.IOException: No FileSystem for scheme: mongodb
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:
1511)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:
1548)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1530)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:183)
        at
org.apache.hadoop.mapred.FileOutputCommitter.cleanupJob(FileOutputCommitter .java:
94)
        at
org.apache.hadoop.mapred.FileOutputCommitter.abortJob(FileOutputCommitter.j ava:
112)
        at
org.apache.hadoop.mapred.OutputCommitter.abortJob(OutputCommitter.java:
185)
        at org.apache.hadoop.mapred.Task.runJobCleanupTask(Task.java:948)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:309)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.j ava:
1115)
        at org.apache.hadoop.mapred.Child.main(Child.java:262)
2011-07-01 22:49:13,734 INFO org.apache.hadoop.mapred.Task: Runnning
cleanup for the task
2011-07-01 22:49:13,735 WARN
org.apache.hadoop.mapred.FileOutputCommitter: java.io.IOException: No
FileSystem for scheme: mongodb
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:
1511)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:
1548)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1530)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:183)
        at
org.apache.hadoop.mapred.FileOutputCommitter.getTempTaskOutputPath(FileOutp utCommitter.java:
234)
        at
org.apache.hadoop.mapred.FileOutputCommitter.abortTask(FileOutputCommitter. java:
179)
        at
org.apache.hadoop.mapred.OutputCommitter.abortTask(OutputCommitter.java:
233)
        at org.apache.hadoop.mapred.Task.taskCleanup(Task.java:933)
        at org.apache.hadoop.mapred.Child$5.run(Child.java:300)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.j ava:
1115)
        at org.apache.hadoop.mapred.Child.main(Child.java:297)

2011-07-01 22:49:13,735 WARN
org.apache.hadoop.mapred.FileOutputCommitter: Error discarding
outputjava.io.IOException: No FileSystem for scheme: mongodb
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:
1511)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:
1548)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1530)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:183)
        at
org.apache.hadoop.mapred.FileOutputCommitter.abortTask(FileOutputCommitter. java:
182)
        at
org.apache.hadoop.mapred.OutputCommitter.abortTask(OutputCommitter.java:
233)
        at org.apache.hadoop.mapred.Task.taskCleanup(Task.java:933)
        at org.apache.hadoop.mapred.Child$5.run(Child.java:300)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.j ava:
1115)
        at org.apache.hadoop.mapred.Child.main(Child.java:297)

On Jul 1, 2:56 pm, Nathan <nbyl...@gmail.com> wrote:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nathan  
View profile  
 More options Jul 2 2011, 1:11 am
From: Nathan <nbyl...@gmail.com>
Date: Fri, 1 Jul 2011 22:11:17 -0700 (PDT)
Local: Sat, Jul 2 2011 1:11 am
Subject: Re: Dumbo & MongoDB
Even closer. Doing a simple word count from the test db, with in
collection (in the mongo-hadoop README) using a simple dumbo map
reduce job, It starts up just fine, but fails on the map job(s). It
never gets to reducing, but it throws this error. The
"4e0e98380bfb6ce2d9091ea6" objectId is the id from the db.in
collection in test db.

java.io.IOException: Can't write: 4e0e98380bfb6ce2d9091ea6 as class
org.bson.types.ObjectId
        at
org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:
162)
        at org.apache.hadoop.io.ObjectWritable.write(ObjectWritable.java:70)
        at
org.apache.hadoop.typedbytes.TypedBytesWritableOutput.writeWritable(TypedBy tesWritableOutput.java:
217)
        at
org.apache.hadoop.typedbytes.TypedBytesWritableOutput.write(TypedBytesWrita bleOutput.java:
136)
        at
org.apache.hadoop.streaming.io.TypedBytesInputWriter.writeTypedBytes(TypedB ytesInputWriter.java:
57)
        at
org.apache.hadoop.streaming.io.TypedBytesInputWriter.writeKey(TypedBytesInp utWriter.java:
47)
        at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:108)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:
36)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:390)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.j ava:
1115)
        at org.apache.hadoop.mapred.Child.main(Child.java:262)

On Jul 1, 10:53 pm, Nathan <nbyl...@gmail.com> wrote:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Klaas Bosteels  
View profile  
 More options Jul 2 2011, 1:03 pm
From: Klaas Bosteels <klaas.boste...@gmail.com>
Date: Sat, 2 Jul 2011 19:03:36 +0200
Local: Sat, Jul 2 2011 1:03 pm
Subject: Re: Dumbo & MongoDB

Hi Nathan,

Based on what you told us, I don't think there's a real difference between
how the two take configuration params. The mongodb example probably just
makes use of the possibility that Hadoop provides for putting the params in
an xml file and reading them from that file instead of passing them
directly.

To make mongo input or output work, you will need to write a custom input or
output format that writes or reads typed bytes writables. I haven't looked
at the code much, but you might be able to do this by wrapping the
mongo-hadoop formats. You should be able to figure out how to work with
typed bytes writables by having a look at the lasthbase code.

Also, to use (Java) input or output formats you need to run on Hadoop.
That's the reason why the local run you pasted in on of your emails failed
miserably.

Sorry for the late answer, and please share your code if you figure out how
to do this!

Regards,
-Klaas


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nathan  
View profile  
 More options Jul 2 2011, 2:09 pm
From: Nathan <nbyl...@gmail.com>
Date: Sat, 2 Jul 2011 11:09:50 -0700 (PDT)
Local: Sat, Jul 2 2011 2:09 pm
Subject: Re: Dumbo & MongoDB
Thanks for your reply. The last message I posted it's reading from
MongoDB just fine, and their mongodb-hadoop driver uses TypedBytes as
well. This is the error I am currently strugggling with:

java.io.IOException: Can't write: 4e0e98380bfb6ce2d9091ea6 as class
org.bson.types.ObjectId

4e0e98380bfb6ce2d9091ea6 is the mongodb objectId string of the first
record in my test collection, so I know it's able to access the data.
Also, in the error stack trace, it outputs this:

org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:
162)
at org.apache.hadoop.io.ObjectWritable.write(ObjectWritable.java:70)
at
org.apache.hadoop.typedbytes.TypedBytesWritableOutput.writeWritable(TypedBy
tesWritableOutput.java: 217)

So I know their driver is trying to use typed bytes. They have working
examples in pure Java, but I have grown accustom to dumbo, and would
like to use it and help this project grow. Supposively the project
supports streaming jobs too, so there should be no problem working
with dumbo as is once everything is figured out. I am not sure what is
happening yet, but I will share as soon as I have something working. I
also encourage anyone else interested to please take a look or share
their opinions. :)

On Jul 2, 12:03 pm, Klaas Bosteels <klaas.boste...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nathan  
View profile  
 More options Jul 2 2011, 2:35 pm
From: Nathan <nbyl...@gmail.com>
Date: Sat, 2 Jul 2011 11:35:59 -0700 (PDT)
Local: Sat, Jul 2 2011 2:35 pm
Subject: Re: Dumbo & MongoDB
I get what you are saying though. I am going to try and create a
wrapper this weekend, but don't expect much success since I am not a
Java guy. :)

They have a lot of the same methods in their input & output formats,
but are there specific methods that must be overridden? Are there very
specific things that MUST happen in the input & output formats? Any
tips are appreciated. Hopefully this is pretty straight forward, as
there is only two classes to mess with.

On Jul 2, 1:09 pm, Nathan <nbyl...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nathan  
View profile  
 More options Jul 2 2011, 4:02 pm
From: Nathan <nbyl...@gmail.com>
Date: Sat, 2 Jul 2011 13:02:51 -0700 (PDT)
Local: Sat, Jul 2 2011 4:02 pm
Subject: Re: Dumbo & MongoDB
The odd thing is it can't find this package when I try and import it
(I have all my jar's in build path, including the hadoop streaming):

import org.apache.hadoop.typedbytes.TypedBytesWritable;

Says there is no typedbytes package in hadoop. Eclipse tries to
resolve this error by importing the hadoop-streaming.jar from the
lasthbase project. I have looked, and this is definetly not as
depreceated method, so it should be there, so I don't know what that
problem is.

On Jul 2, 1:35 pm, Nathan <nbyl...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nathan  
View profile  
 More options Jul 2 2011, 9:03 pm
From: Nathan <nbyl...@gmail.com>
Date: Sat, 2 Jul 2011 18:03:40 -0700 (PDT)
Local: Sat, Jul 2 2011 9:03 pm
Subject: Re: Dumbo & MongoDB
I feel so close. This class mimics theirs, but uses
TypedBytesWriteable instead of BSONObjects.

@SuppressWarnings("deprecation")
public class TypedBytesTableInputFormat implements
InputFormat<TypedBytesWritable, TypedBytesWritable> {

        @Override
        public RecordReader<TypedBytesWritable, TypedBytesWritable>
getRecordReader(InputSplit split, JobConf job, Reporter reporter) {

                if (!(split instanceof MongoInputSplit))
            throw new IllegalStateException("Creation of a new
RecordReader requires a MongoInputSplit instance.");

        final MongoInputSplit mis = (MongoInputSplit) split;

        return (RecordReader<TypedBytesWritable, TypedBytesWritable>)
new TypedBytesMongoRecordReader(mis);
        }
....
....
....
....

public class TypedBytesMongoRecordReader extends
RecordReader<TypedBytesWritable, TypedBytesWritable> {

        public TypedBytesMongoRecordReader(MongoInputSplit mis) {
                _cursor = mis.getCursor();
        }
...
...
...
...

Unfortunately I get this error:

java.lang.ClassCastException:
com.mongodb.hadoop.input.TypedBytesMongoRecordReader cannot be cast to
org.apache.hadoop.mapred.RecordReader
        at
com.mongodb.hadoop.TypedBytesTableInputFormat.getRecordReader(TypedBytesTab leInputFormat.java:
31)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:370)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.j ava:
1115)
        at org.apache.hadoop.mapred.Child.main(Child.java:262)

I feel so close! Not sure why I get a ClassCastException when my
TypedBytesMongoRecordReader is a child of the RecordReader. Any Java
people care to chime in?

On Jul 2, 3:02 pm, Nathan <nbyl...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nathan  
View profile  
 More options Jul 2 2011, 10:31 pm
From: Nathan <nbyl...@gmail.com>
Date: Sat, 2 Jul 2011 19:31:03 -0700 (PDT)
Local: Sat, Jul 2 2011 10:31 pm
Subject: Re: Dumbo & MongoDB
OK, I got it reading records just fine. It completes the M/R job, but
it's not writing it to the database. I am not getting errors though.
It says output written to test.out (the db.collection_name I am trying
to write to in MongoDB), but there is nothing in that hadoop fs folder
except an empty _SUCCESS file and a bunch of logs

So I don't know where my output is going.

On Jul 2, 8:03 pm, Nathan <nbyl...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nathan  
View profile  
 More options Jul 3 2011, 8:45 pm
From: Nathan <nbyl...@gmail.com>
Date: Sun, 3 Jul 2011 17:45:10 -0700 (PDT)
Local: Sun, Jul 3 2011 8:45 pm
Subject: Re: Dumbo & MongoDB
OK everything is reading and writing to mongodb using the dumbo
wordcount demo. The columns it writes to is hard coded for now, but I
will make a configurable property in the XML file where you can output
the values. Also, right now it will probably only let you write to one
collection, with a key / value pair. If it becomes necessary to try
and save actual BSONObjects with multiple k/v pairs, I will try that
next.

But it's working. Woop woop!

On Jul 2, 9:31 pm, Nathan <nbyl...@gmail.com> wrote:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nathan  
View profile  
 More options Jul 4 2011, 12:53 pm
From: Nathan <nbyl...@gmail.com>
Date: Mon, 4 Jul 2011 09:53:09 -0700 (PDT)
Local: Mon, Jul 4 2011 12:53 pm
Subject: Re: Dumbo & MongoDB
Haha. Feels like a long journey just in this thread from "I don't know
Java" to "Hey I got it working!"

Anyways, I am going to try and do some tweaks to it so you can store
the output document structure in the XML file and have all the data
loaded into the driver instead of on the command line. I have it
checked in on github right now, but it only works if I hard-code the
output fields in the driver. Working on making it more robust.

On Jul 3, 7:45 pm, Nathan <nbyl...@gmail.com> wrote:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Klaas Bosteels  
View profile  
 More options Jul 4 2011, 12:54 pm
From: Klaas Bosteels <klaas.boste...@gmail.com>
Date: Mon, 4 Jul 2011 18:54:28 +0200
Local: Mon, Jul 4 2011 12:54 pm
Subject: Re: Dumbo & MongoDB
Cool, thanks for sharing!

-K

On 04 Jul 2011, at 18:53, Nathan wrote:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jon Eisen  
View profile  
 More options Aug 31 2012, 12:08 pm
From: Jon Eisen <yanata...@gmail.com>
Date: Fri, 31 Aug 2012 09:08:47 -0700 (PDT)
Local: Fri, Aug 31 2012 12:08 pm
Subject: Re: Dumbo & MongoDB

Hey Nathan, did you ever publish your code to get that working? I'm working
on the same thing right now.

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Paul DeCoursey  
View profile  
 More options Oct 31 2012, 2:55 pm
From: Paul DeCoursey <pdecour...@gmail.com>
Date: Wed, 31 Oct 2012 11:55:08 -0700 (PDT)
Local: Wed, Oct 31 2012 2:55 pm
Subject: Re: Dumbo & MongoDB

I'm also curious if about sample code.  I can't get dumbo to talk to mongo
for the life of me.

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Paul DeCoursey  
View profile  
 More options Nov 7 2012, 2:28 pm
From: Paul DeCoursey <pdecour...@gmail.com>
Date: Wed, 7 Nov 2012 11:28:46 -0800 (PST)
Local: Wed, Nov 7 2012 2:28 pm
Subject: Re: Dumbo & MongoDB

Ok, I've got it working, but it won't do splits... which why the heck would
I even want to use Hadoop if I can't do splits!?!?

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »