Maven vs ANT/Ivy

Skip to first unread message

Evgeny Minkevich

Sep 21, 2013, 2:14:36 AM9/21/13
Good day,

Just wanted to check if maven as the project build tool has been given any consideration.

Are there any issues that would make such move impractical?


Matt Casters

Sep 21, 2013, 4:14:47 AM9/21/13
to Kettle Developers mailing list

We're indeed considering maven. If you would like to help out, please don't hesitate to start experiments.



Op 21-sep.-2013 08:14 schreef "Evgeny Minkevich" <> het volgende:
You received this message because you are subscribed to the Google Groups "kettle-developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To post to this group, send email to
Visit this group at
For more options, visit

Evgeny Minkevich

Sep 22, 2013, 2:04:31 AM9/22/13

Not an issue. 

A few considerations that I would like to mention.

As I'm not 100% familiar with the project structure and the build process specifics (if there are any) I would appreciate some guidance.
Could you please check the questions below?

Switching to maven requires adoption of the maven folder structure and this needs to be addressed carefully

One approach could be to switch to the new structure in working sets.
I.e. the core modules (core, engine,dbdialog, ui, jdbc) come first, then plugins, etc...

In the interim the existing build cycle could be adopted to invoke maven for the components, until everything has been switched over?
As an example I have migrated the first set - core modules - to maven for revision 

To use git I had to address this issue: - git converts EOL back and forth between the formats depending on the client configuration.
Mixed EOL breaks havoc with patch generation if the EOL are not consistent within the project.

Empty folders were dropped.

The project can be built and I have replaced the libs within the client to see how they would work. Seems ok. 
However the tests have not been migrated yet. 

The code is available here:

I have included build.xml that automatically converts ivy dependencies.
Which I copied over into manually created poms (the converter can not generate the required structure).

The structure

The project is composed of a number of modules, similar to how the current model is configured:



The root project pom is perceived to be responsible for 

- global declarations (parameters, re used components)
- balck box testing <not done>
- final assembly <not done>

Changes that have been done to all modules:

  • *.java is moved into src/main/java
  • *.html into src/main/javadoc
  • Everything else, but java and html are moved into into src/main/resources

Transitive dependencies:

I was very conservative with disabling transitive dependencies. Kept the transitive on.
The changes made were only to enable build.
As there are duplicates between components, it would make sense to consilidate the dependencies in the parent pom later on.

ivy's "changing=true" probably should be addressed with the SNAPSHOT versioning?
Anyway - not addressing this with maven for now - so if the parent module has refreshed without version change the repo needs to be purged.

Noticed that a few dependencies are sourced form petnaho repo and have been renamed. 
janino - as an example. There is a package with group id from codehaus, yet pentaho uses the one from janino.
Not changing this, unless required for build or POM is incomplete (as for wstx in the engine module)

Again, the whole dependency tree probably needs to be re analyzed.

Changes that I had to make for ENGINE module only:
  • antlr runtime changed version from 3.4-complete to 3.4
  • wstx-asl group id changed from woodstox  to org.codehaus.woodstox . hence downloading NOT from pentaho
  • Duplicate dependency declaration for poi-ooxml :  revisions 3.8 and 3.9. Removed 3.8
  • Disabled drools compiler dependency on janino 2.5.15 - as 2.5.16 is explicitly defined

A number of dependencies are being picked up by ivy, yet are not included in poms and missing. Defining them explicitly.
Introduced dependency.pentaho-libs.revision variable to track their version:

pentaho-library: libbase
pentaho-library: libformula


Version management Going Forward
The version numbers are managed with maven version plugin.
When the revision number needs to be updated, it can be done in two ways:

mvn versions:set -DnewVersion=<new version>
and then either:

mvn versions:revert
mvn versions:commit

changing the root pom.xml and execute

mvn -N versions:update-child-modules
mvn versions:commit


Manifest - are the custom fields needed? Or we can suffice with the ones that will be generated by maven?
How do we take it from here? Would you like me to continue to work on and you would migrate the trunk yourself?
In either case I will gladly try to help.

Please feel free to contact me directly if needed.

p.s. Myself I prefer BitBucket to github - functionality and the UI are much better to my taste. Plus one can get private repos with a free account, which is handy.

Matt Burgess

Sep 22, 2013, 1:49:45 PM9/22/13
Speaking of experimenting, I've started working with Gradle for my Kettle plugins and Kettle itself. It supports Ant, Ivy, Maven, as well as its own idioms (plus it has the full scripting power of Groovy), so there's a very natural migration story to tell.

For example, we could have a Gradle script that simply imports the Ant build scripts/properties (like Subfloor) and resolves in the dependencies from the Ivy files (while using the resolvers specified in ivysettings.xml).  On the other hand, if we went with Maven, Gradle could sit atop that someday if/when we need more power from our project management tools.  Or we could just swap out everything with Gradle, like replacing Subfloor with a Pentaho plugin and using Gradle's dependency management system instead of Ivy or Maven.  Or we can slowly migrate using these steps as milestones.

The build guys get nervous when I mention Gradle because with great power comes great responsibility, and they are worried about everyone creating their own "works of art" for the build process. I think we could show them that each project's build.gradle would be VERY simple, and for tasks common to subprojects, you can "push down" the functionality from the top-level project/repo or even resolve a build.gradle from some common area.

Anyway, I'm not opposed to Maven as a replacement for Ant/Ivy/Subfloor because it is a definite improvement, although the growing pains are great (as Evgeny has described). Gradle's default project structure mimics Maven's, but is easily overridden (that's what I did for Kettle core and engine).  Gradle (like antcontrib but not Ant or Maven IIRC) can also provide a way for the project builder to download Gradle if it is not already on the machine, via the Gradle Wrapper.

I can live without Gradle for this, but there are other cases I will make to Pentaho to go this way. For example, it's possible to do dependency analysis on the project or project(s) and generate a custom Jenkins job that will build the exact assembly I want.  This would alleviate (among other things) the issue where two fixes go into a project, where one breaks the build and the other does not.  The assembly does not proceed until this build is fixed, even though one could've been built with the single (working) change instead of just polling source control and kicking off a build on change.

Having said all that, I very much appreciate Evgeny's work in this area, it is excellent information and would create a situation in which we could more easily adopt Maven or Gradle :)


Evgeny Minkevich

Sep 22, 2013, 7:44:12 PM9/22/13
I'm not familiar with Gradle at all, so I can not comment on it.

Broadly speaking, for an open source project getting as much as attention to the code is "a good thing".
So anything can be done to simplify, to get one going quicker, probably should be done.

One approach is to stick with the familiar - with the tools and practices that are already being widely used.
As there a fair bit of the projects I'm dealing with are using Maven/Git it was just the easiest pattern for me to implement.


Public repo like this does allow to start looking at the code _comfortably_ even before checking out.

And from the sticking with the community point of view GitHub wins hands down (even though myself I prefer BitBucket).
Everyone is using it - easier to get contributors?


Git allows for a local repo. 

Myself I do like to do a lot of incremental commits to be able to revert to "that thing I had two revisions ago".
When I'm happy - I squash and push upstream. And If I do not (with a public repo) a familiar tool is one thing less to convert.
I do checkout SVN repo with git, but it is finicky with incremental updates somewhat.


Predefined project structure and dependency management. Widely used.
Once one is familiar with it, starting working with a new project is just a checkout and an import into favorite IDE - one is already somewhat familiar with it.
So for a public project, imperative restrictiveness of Maven potentially could be a good thing - forcing the crowd into the same way.

I use IntelliJ IDEA. As it gets things done so much faster, I prefer to convert projects to be used with it. 
The original SVN repo was quite easy to import and start.
The current ivy one - I was not able to get going. A lot of configuration were required to get ivy right.

Not ivy's fault at all, but the tooling availability does contribute to the whole idea "get going faster" and following pre-learned patterns.

So to summarize:

It is critical to onboard new users. Tooling should:

- Let starting with code quickly (either looking through UI or by checking out)
- Accommodate for a greater variety of the development tools
- Reduce the learning curve

Git/Maven is just one of the many combinations. 

I'm not sure where Maven stands in regards of building the distribution package as opposed to working with code/supporting IDEs.
There might be a plugin for it - will check and report.

Whatever does this this better and leaves the freedom for the dev methods/tools - I'm all for it.
Again - I do like to be restrictive about the core, and to have some flexibility with everything else. 

Matts idea to combine those two does appeal to me (provided my assumption about Gradle is correct).

So my next step is to check Gradle, thank you for the hint. 

Evgeny Minkevich

Sep 24, 2013, 6:35:37 AM9/24/13
I have updated the pentaho-kettle github project - now everything is there, including plugins.
The tests cycle has not been migrated though. The junit source code is in, but the testfiles are not and I have disabled the test phase for the time being.

Started to work on simplifying the dependencies. Will look at the test cycle next.

Matt Casters

Sep 24, 2013, 7:14:19 AM9/24/13
to Kettle Developers mailing list
Looking good Evgeny.  I'm confident that in the next couple of weeks we should see the official git repo appear on

Thanks for looking into the pom.xml :-)


Matt Casters <>
Chief Data Integration, Kettle founder, Author of Pentaho Kettle Solutions (Wiley)
Fonteinstraat 70 - 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
Pentaho  -  Powerful Analytics Made Easy

2013/9/24 Evgeny Minkevich <>
I have updated the pentaho-kettle github project - now everything is there, including plugins.
The tests cycle has not been migrated though. The junit source code is in, but the testfiles are not and I have disabled the test phase for the time being.

Started to work on simplifying the dependencies. Will look at the test cycle next.

Evgeny Minkevich

Sep 24, 2013, 7:25:12 AM9/24/13
No problem.

Do you still want me to continue working on it?

I would like to optimize dependencies at least.
I think some complexity can be offloaded to the transitive resolution.

My goal is to produce the set identical to the one available with the downloadable package.

Evgeny Minkevich

Oct 2, 2013, 6:36:45 AM10/2/13

Good day.


Next step has been completed - all dependencies are managed transitively and full compliance with the distribution pack has been enforced. In other words - if a transitive dependency tried to bring in a version of a jar different from the one found in the distribution pack, it was overriden and the distribution pack version had been explicitly declared.


Testing and assembly have not been implemented yet.


Issues overview.

There are a few common issue patterns. I have fixed them, but someone would need to validate if the approach is correct. 


  1. Server and Client distribution pack differences


In a few cases the server library set would be different from the client.
























I gave precedence to the server versions.


  1. Missing Dependencies


Some transitive dependencies were missing all together.


 Missing from the server:






Missing from the client and the server distribution packs:



  1. Plugin Discrepancies


Missing dependencies:


Where a dependency was not required for compilation, it was excluded to stick with the distribution lib sets.

Otherwise it was left there.


HL7 (included):

  • hapi-*


OpenERP plugin (included):

  • ws-commons-util
  • openerp-java-api
  • xmlrpc-client
  • xmlrpc-common


Palo (excluded):


  • palojlib


Star Modeler

Missing from the client: geronimo-stax-api_1.0_spec


Missing from both client and the server:

  • woden-api
  • woden-impl-dom
  • geronimo-activation_1.1_spec
  • geronimo-javamail_1.4_spec
  • geronimo-jms_1.1_spec
  • httpcore-nio





  1. Unused dependencies


Some declared dependencies were not used directly or indirectly for any of the components. Such dependencies were removed. If there is a need to have them explicitly declared a separate submodule can be create with those dependencies.



Jackson-* lib pack. Client would have 1.9.2, server 1.9.3 yet this library pack is not used by any of the dependencies. So it was removed.


  1. Version


Quite often the distribution pack would contain the library of the later version that the one brought in transitively.

There were two exceptions:


Star Modeler neethi 2.0.4, when the packs contain 2.0.1

HL7 jdom 1.1, when the pack contain 1.0


Those two were excluded from being resolved.





Project compiles with mvn compile/package/install

Project compiles in IntelliJ IDEA - just import it as maven.

Before importing into Eclipse please run mvn eclipse:eclipse to generate project files.


Bear in mind that before the new jar modules have been pushed into the local .m2 repo, the ones from the Pentaho Artifactory will be resolved - hence building projects individually would not be possible (i.e. until mvn install has been run).


Hope that helps.

Reply all
Reply to author
0 new messages