Developing a transformation component

46 views
Skip to first unread message

Rakesh

unread,
Jan 25, 2012, 4:04:13 PM1/25/12
to javaposse
Hi,

I need to develop an app that will take data in one database and
transform it and put it into another database.

The databases are MongoDb storing JSon data.

I could just do it in Java, converting the Json in Java objects,
transform them and then convert the resulting objects back into JSon
for inserting into the other database.

Anyone have any ideas for doing it any other way? The requirements:

1. Developed using proper tooling - none of this Vi/Emacs/Non-GUI
crap. I use IntelliJ.
2. Needs to be unit testable
3. Performance is key


Thanks

Rakesh

Ryan Schipper

unread,
Jan 25, 2012, 4:50:23 PM1/25/12
to java...@googlegroups.com
I'm not familiar with MongoDB, but if it was an Oracle database I'd be
writing a stored procedure in PL/SQL. The proc would run on the target
DB and perform remote selects on the source.

Are remote connection capability common in this brand new NoSQL world?

Regards,

Ryan Schipper

> --
> You received this message because you are subscribed to the Google Groups "The Java Posse" group.
> To post to this group, send email to java...@googlegroups.com.
> To unsubscribe from this group, send email to javaposse+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/javaposse?hl=en.
>

Rakesh

unread,
Jan 25, 2012, 5:38:36 PM1/25/12
to java...@googlegroups.com
pretty sure there's no concept of stored procs with MongoDb.

Kevin Wright

unread,
Jan 25, 2012, 7:51:34 PM1/25/12
to java...@googlegroups.com
Transforming one entire data structure to another...  You've landed yourself right in the middle of functional programming's sweet spot. :)

There's a number of options, so grabbing the available drivers from here (http://www.mongodb.org/display/DOCS/Drivers) and filtering to just the Java platform:

Java: Officially supported and mature.  Would seem at first glance to move you out of your comfort zone, but to get FP you'd need either google-guice or lambdaj.  Juice is enough of a paradigm shift that you're practically learning another language anyway, and lambdaj needs you to set up support in your IDE and build system.  Most boilerplatey of the alternatives.

Scala: Also officially supported (via the casbah drivers) and extensively used within 10gen (mongodb maintainers).  Hands down offers the best range of collection-manipulating options compared to the alternatives, and pattern-matching capabilities are perfectly suited to the task.  Expect the same performance as Java.  Requires you to learn another (java-esque) language, which should be a similar level of difficulty to learning a broad framework like guice and the associated FP paradigms.

Clojure: Community supported. Should yield the most concise solution and dynamic typing fits will with untyped JSON data.  Will be slower than either Scala or Java and has a heavy up-front cost of first learning Clojure (which is a lisp variant)

Groovy: Will be the least performant option, with the least support for richer FP constructs.  Worth considering if you already have a lot of in-house Groovy talent, but certainly not worth learning the language for this task, especially not given the advantages of the alternatives.


I could probably whip up a couple of comparable examples if you can explain the exact transform in a bit more depth.

Ryan Schipper

unread,
Jan 26, 2012, 3:17:11 AM1/26/12
to java...@googlegroups.com

Graham Allan

unread,
Jan 26, 2012, 12:50:41 PM1/26/12
to java...@googlegroups.com


On 26 January 2012 00:51, Kevin Wright <kev.lee...@gmail.com> wrote:
[snip]
Java: Officially supported and mature.  Would seem at first glance to move you out of your comfort zone, but to get FP you'd need either google-guice or lambdaj.  Juice is enough of a paradigm shift that you're practically learning another language anyway, and lambdaj needs you to set up support in your IDE and build system.  Most boilerplatey of the alternatives.

[snip]

Kevin, just to clarify, do you mean google-guava? Guava being the library that introduces some functional concepts, Guice being a dependency injection framework.

Kind regards,
Graham

Kevin Wright

unread,
Jan 26, 2012, 1:02:03 PM1/26/12
to java...@googlegroups.com
Oh, drat, indeed I did.

Wayne Fay

unread,
Jan 26, 2012, 1:18:14 PM1/26/12
to java...@googlegroups.com
> I need to develop an app that will take data in one database and
> transform it and put it into another database.
>
> The databases are MongoDb storing JSon data.
> ...

> Anyone have any ideas for doing it any other way? The requirements:

You didn't say if this is a one-time "moving from one db to another"
kind of thing or an ongoing need. If one time, then just hack
something together and be done with it.

If this is going to be an ongoing need, maybe you should think about
integrating Mule ESB, Spring Integration, Apache Camel (which has a
Scala DSL for Kevin) or a similar ESB framework that abstracts out
some of the Json and Mongodb pieces and lets you focus on the
transformation that needs to occur.

I assume your goal is not to spend a lot of time on the annoying bits
at either end (getting data in, adapting it to an object you can
interact with, pushing data back out) but rather focus energies on the
important part (business logic/transformation) in the middle. This may
also be total overkill depending on the specifics of your project but
something to consider.

Wayne

Rakesh

unread,
Jan 26, 2012, 3:19:31 PM1/26/12
to java...@googlegroups.com
its ongoing for sure. The stack you mention seems a bit heavy weight...

Wayne Fay

unread,
Jan 26, 2012, 4:07:56 PM1/26/12
to java...@googlegroups.com
> its ongoing for sure. The stack you mention seems a bit heavy weight...

These ESB platforms generally only load what you need (ala OSGI) so
while they have a long list of supported components, you will rarely
load more than what you're actually using for your specific
implementation. I can't make specific claims about performance for
your situation but we have been happy with our ESB experiences.

And I did say...


>> important part (business logic/transformation) in the middle. This may
>> also be total overkill depending on the specifics of your project but
>> something to consider.

If this is a single app to move data from one place to another, an ESB
is probably overkill. If you anticipate a future need to pipe the same
data (via another transformer or the same one etc) to another
destination (endpoint), or do some other analysis of the data, you may
find this to be a useful foundation block that you can build on (as we
have).

Wayne

Rakesh

unread,
Jan 26, 2012, 4:11:09 PM1/26/12
to java...@googlegroups.com
well we may want to take the data and also send it to a data warehouse
for reporting at some point....

Kevin Wright

unread,
Jan 26, 2012, 7:31:07 PM1/26/12
to java...@googlegroups.com
Do you have an example of the sort of transformations you're after?  With names changed to protect the innocent, obviously.

I'm sure you could then elicit a few snippets of code from this list to demonstrate the different approaches, which should help you decide.

morten hattesen

unread,
Feb 9, 2012, 10:22:18 AM2/9/12
to The Java Posse
I would recommend looking at Apache Camel, too, but running it "bare",
and leave out Mule (or any other ESB, for that matter), since it seems
to be overkill for your need.

The Camel solution will leave you with a decoupled messaging solution
that is easily expandable and monitorable (JMX/MBeans).

Rakesh

unread,
Feb 10, 2012, 8:24:58 AM2/10/12
to java...@googlegroups.com
The current front runner is to use a map reduce function (written in
Javascript) and run by MongoDB...

It actually seems like a good choice as its almost like a stored
procedure so performance should be excellent.

R

Reply all
Reply to author
Forward
0 new messages