Hadoop dependencies

39 views
Skip to first unread message

Rainer Gemulla

unread,
Sep 21, 2009, 12:35:21 PM9/21/09
to jaql-...@googlegroups.com
One thing I noticed is that we have a lot of dependencies on Hadoop in
our main code line (e.g., MapReduceFn or io.hadoop package). We have to
find a way to keep the code in the main package but push the
dependencies to the vendor directory. As soon as the old API is
abandoned, things will break.

Rainer

Vuk Erecegovac

unread,
Sep 21, 2009, 1:28:47 PM9/21/09
to jaql-...@googlegroups.com
For the most part, it should be isolated to the packages that are specific to hadoop (e.g., com.ibm.jaql.*.hadoop.*). When you say that we have a lot of dependencies in the main line, how much is outside of these directories? With regards to the vendor branches, so far, we've only put the files there from jaql that require different api's or handling from version to version of either hadoop or hbase. As the changes become larger (as what we see in hadoop trunk), I expect all of jaql's *.hadoop.* dirs to go into vendor IF we want to support at the same time (<= 0.20) hadoop api's and (> 0.20) hadoop api's.

Rainer Gemulla

unread,
Sep 21, 2009, 2:49:23 PM9/21/09
to jaql-...@googlegroups.com
On a quick glance, dependencies exist only in packages

com.ibm.jaql.io.hbase
com.ibm.jaql.io.hadoop
com.ibm.jaql.io.registry
com.ibm.jaql.lang.expr.hadoop
com.ibm.jaql.util

One solution to this problem would be to put only the Hadoop specifics
parts of the code into the vendor directory. See
com.ibm.jaql.hadoop.io.FromDelConverter in the vendor/hadoop directory
for an example.


Rainer

Vuk Erecegovac

unread,
Sep 21, 2009, 5:27:12 PM9/21/09
to jaql-...@googlegroups.com
ok, not too dispersed. I like the approach you took for FromDelConverter to isolate the differences so that we can even further limit the amount of copying (which will only increase if we need to support <= 0.20 and > 0.20 hadoop api's). Lets do the following:
 
1. new packages where hadoop or hbase api's are used. The violations of this from below are in registry and util... we should try and separate these out.
 
2. try hadoop trunk to see what is the best approach to limit copying (as a result of hadoop api changes). Abstract classes, as used for FromDelConverter, seems like a good approach.
I made a new issue for the project that cross-references this discussion: http://code.google.com/p/jaql/issues/detail?id=44
Reply all
Reply to author
Forward
0 new messages