Maven vs ANT/Ivy

231 views
Skip to first unread message

Evgeny Minkevich

unread,
Sep 21, 2013, 2:14:36 AM9/21/13
to kettle-d...@googlegroups.com
Good day,

Just wanted to check if maven as the project build tool has been given any consideration.

Are there any issues that would make such move impractical?

Thanks.

Matt Casters

unread,
Sep 21, 2013, 4:14:47 AM9/21/13
to Kettle Developers mailing list

We're indeed considering maven. If you would like to help out, please don't hesitate to start experiments.

TIA!

Matt

Op 21-sep.-2013 08:14 schreef "Evgeny Minkevich" <evgeny.m...@gmail.com> het volgende:
--
You received this message because you are subscribed to the Google Groups "kettle-developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kettle-develop...@googlegroups.com.
To post to this group, send email to kettle-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/kettle-developers.
For more options, visit https://groups.google.com/groups/opt_out.

Evgeny Minkevich

unread,
Sep 22, 2013, 2:04:31 AM9/22/13
to kettle-d...@googlegroups.com, mcas...@pentaho.org

Not an issue. 

A few considerations that I would like to mention.

As I'm not 100% familiar with the project structure and the build process specifics (if there are any) I would appreciate some guidance.
Could you please check the questions below?

Switching to maven requires adoption of the maven folder structure and this needs to be addressed carefully

One approach could be to switch to the new structure in working sets.
I.e. the core modules (core, engine,dbdialog, ui, jdbc) come first, then plugins, etc...

In the interim the existing build cycle could be adopted to invoke maven for the components, until everything has been switched over?
  
As an example I have migrated the first set - core modules - to maven for revision 5.0.0.1. 

To use git I had to address this issue:
http://jira.pentaho.com/browse/PDI-8166 - git converts EOL back and forth between the formats depending on the client configuration.
Mixed EOL breaks havoc with patch generation if the EOL are not consistent within the project.

Empty folders were dropped.

The project can be built and I have replaced the libs within the client to see how they would work. Seems ok. 
However the tests have not been migrated yet. 

The code is available here:

I have included build.xml that automatically converts ivy dependencies.
Which I copied over into manually created poms (the converter can not generate the required structure).

The structure

The project is composed of a number of modules, similar to how the current model is configured:

/pom.xml
      /core/pom.xml 
     /engine/pom.xml

etc

The root project pom is perceived to be responsible for 

- global declarations (parameters, re used components)
- balck box testing <not done>
- final assembly <not done>


Changes that have been done to all modules:

  • *.java is moved into src/main/java
  • *.html into src/main/javadoc
  • Everything else, but java and html are moved into into src/main/resources

Transitive dependencies:

I was very conservative with disabling transitive dependencies. Kept the transitive on.
The changes made were only to enable build.
As there are duplicates between components, it would make sense to consilidate the dependencies in the parent pom later on.

ivy's "changing=true" probably should be addressed with the SNAPSHOT versioning?
Anyway - not addressing this with maven for now - so if the parent module has refreshed without version change the repo needs to be purged.

Noticed that a few dependencies are sourced form petnaho repo and have been renamed. 
janino - as an example. There is a package with group id from codehaus, yet pentaho uses the one from janino.
Not changing this, unless required for build or POM is incomplete (as for wstx in the engine module)

Again, the whole dependency tree probably needs to be re analyzed.


Changes that I had to make for ENGINE module only:
  • antlr runtime changed version from 3.4-complete to 3.4
  • wstx-asl group id changed from woodstox  to org.codehaus.woodstox . hence downloading NOT from pentaho
  • Duplicate dependency declaration for poi-ooxml :  revisions 3.8 and 3.9. Removed 3.8
  • Disabled drools compiler dependency on janino 2.5.15 - as 2.5.16 is explicitly defined

A number of dependencies are being picked up by ivy, yet are not included in poms and missing. Defining them explicitly.
Introduced dependency.pentaho-libs.revision variable to track their version:

pentaho-library: libbase
pentaho-library: libformula

 

Version management Going Forward
The version numbers are managed with maven version plugin.
When the revision number needs to be updated, it can be done in two ways:

mvn versions:set -DnewVersion=<new version>
and then either:

mvn versions:revert
mvn versions:commit


changing the root pom.xml and execute

mvn -N versions:update-child-modules
mvn versions:commit



Questions:

Manifest - are the custom fields needed? Or we can suffice with the ones that will be generated by maven?
How do we take it from here? Would you like me to continue to work on 5.0.0.1 and you would migrate the trunk yourself?
In either case I will gladly try to help.

Please feel free to contact me directly if needed.

p.s. Myself I prefer BitBucket to github - functionality and the UI are much better to my taste. Plus one can get private repos with a free account, which is handy.

Matt Burgess

unread,
Sep 22, 2013, 1:49:45 PM9/22/13
to kettle-d...@googlegroups.com
Speaking of experimenting, I've started working with Gradle for my Kettle plugins and Kettle itself. It supports Ant, Ivy, Maven, as well as its own idioms (plus it has the full scripting power of Groovy), so there's a very natural migration story to tell.

For example, we could have a Gradle script that simply imports the Ant build scripts/properties (like Subfloor) and resolves in the dependencies from the Ivy files (while using the resolvers specified in ivysettings.xml).  On the other hand, if we went with Maven, Gradle could sit atop that someday if/when we need more power from our project management tools.  Or we could just swap out everything with Gradle, like replacing Subfloor with a Pentaho plugin and using Gradle's dependency management system instead of Ivy or Maven.  Or we can slowly migrate using these steps as milestones.

The build guys get nervous when I mention Gradle because with great power comes great responsibility, and they are worried about everyone creating their own "works of art" for the build process. I think we could show them that each project's build.gradle would be VERY simple, and for tasks common to subprojects, you can "push down" the functionality from the top-level project/repo or even resolve a build.gradle from some common area.

Anyway, I'm not opposed to Maven as a replacement for Ant/Ivy/Subfloor because it is a definite improvement, although the growing pains are great (as Evgeny has described). Gradle's default project structure mimics Maven's, but is easily overridden (that's what I did for Kettle core and engine).  Gradle (like antcontrib but not Ant or Maven IIRC) can also provide a way for the project builder to download Gradle if it is not already on the machine, via the Gradle Wrapper.

I can live without Gradle for this, but there are other cases I will make to Pentaho to go this way. For example, it's possible to do dependency analysis on the project or project(s) and generate a custom Jenkins job that will build the exact assembly I want.  This would alleviate (among other things) the issue where two fixes go into a project, where one breaks the build and the other does not.  The assembly does not proceed until this build is fixed, even though one could've been built with the single (working) change instead of just polling source control and kicking off a build on change.

Having said all that, I very much appreciate Evgeny's work in this area, it is excellent information and would create a situation in which we could more easily adopt Maven or Gradle :)

Cheers,
Matt(B)


Evgeny Minkevich

unread,
Sep 22, 2013, 7:44:12 PM9/22/13
to kettle-d...@googlegroups.com
I'm not familiar with Gradle at all, so I can not comment on it.

Broadly speaking, for an open source project getting as much as attention to the code is "a good thing".
So anything can be done to simplify, to get one going quicker, probably should be done.

One approach is to stick with the familiar - with the tools and practices that are already being widely used.
As there a fair bit of the projects I'm dealing with are using Maven/Git it was just the easiest pattern for me to implement.

GitHub

Public repo like this does allow to start looking at the code _comfortably_ even before checking out.

And from the sticking with the community point of view GitHub wins hands down (even though myself I prefer BitBucket).
Everyone is using it - easier to get contributors?

Git

Git allows for a local repo. 

Scenario:
Myself I do like to do a lot of incremental commits to be able to revert to "that thing I had two revisions ago".
When I'm happy - I squash and push upstream. And If I do not (with a public repo) a familiar tool is one thing less to convert.
I do checkout SVN repo with git, but it is finicky with incremental updates somewhat.

Maven

Predefined project structure and dependency management. Widely used.
Once one is familiar with it, starting working with a new project is just a checkout and an import into favorite IDE - one is already somewhat familiar with it.
So for a public project, imperative restrictiveness of Maven potentially could be a good thing - forcing the crowd into the same way.

Scenario:
I use IntelliJ IDEA. As it gets things done so much faster, I prefer to convert projects to be used with it. 
The original SVN repo was quite easy to import and start.
The current ivy one - I was not able to get going. A lot of configuration were required to get ivy right.

Not ivy's fault at all, but the tooling availability does contribute to the whole idea "get going faster" and following pre-learned patterns.


So to summarize:

It is critical to onboard new users. Tooling should:

- Let starting with code quickly (either looking through UI or by checking out)
- Accommodate for a greater variety of the development tools
- Reduce the learning curve

Git/Maven is just one of the many combinations. 

I'm not sure where Maven stands in regards of building the distribution package as opposed to working with code/supporting IDEs.
There might be a plugin for it - will check and report.

Whatever does this this better and leaves the freedom for the dev methods/tools - I'm all for it.
Again - I do like to be restrictive about the core, and to have some flexibility with everything else. 

Matts idea to combine those two does appeal to me (provided my assumption about Gradle is correct).

So my next step is to check Gradle, thank you for the hint. 

Evgeny Minkevich

unread,
Sep 24, 2013, 6:35:37 AM9/24/13
to kettle-d...@googlegroups.com
I have updated the pentaho-kettle github project - now everything is there, including plugins.
The tests cycle has not been migrated though. The junit source code is in, but the testfiles are not and I have disabled the test phase for the time being.

Started to work on simplifying the dependencies. Will look at the test cycle next.


Matt Casters

unread,
Sep 24, 2013, 7:14:19 AM9/24/13
to Kettle Developers mailing list
Looking good Evgeny.  I'm confident that in the next couple of weeks we should see the official git repo appear on github.com/pentaho

Thanks for looking into the pom.xml :-)

Matt


--
Matt Casters <mcas...@pentaho.org>
Chief Data Integration, Kettle founder, Author of Pentaho Kettle Solutions (Wiley)
Fonteinstraat 70 - 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37
Pentaho  -  Powerful Analytics Made Easy


2013/9/24 Evgeny Minkevich <evgeny.m...@gmail.com>
I have updated the pentaho-kettle github project - now everything is there, including plugins.
The tests cycle has not been migrated though. The junit source code is in, but the testfiles are not and I have disabled the test phase for the time being.

Started to work on simplifying the dependencies. Will look at the test cycle next.


Evgeny Minkevich

unread,
Sep 24, 2013, 7:25:12 AM9/24/13
to kettle-d...@googlegroups.com
No problem.

Do you still want me to continue working on it?

I would like to optimize dependencies at least.
I think some complexity can be offloaded to the transitive resolution.

My goal is to produce the set identical to the one available with the 5.0.0.1 downloadable package.

Evgeny Minkevich

unread,
Oct 2, 2013, 6:36:45 AM10/2/13
to kettle-d...@googlegroups.com

Good day.

 

Next step has been completed - all dependencies are managed transitively and full compliance with the 5.0.0.1 distribution pack has been enforced. In other words - if a transitive dependency tried to bring in a version of a jar different from the one found in the distribution pack, it was overriden and the distribution pack version had been explicitly declared.

 

Testing and assembly have not been implemented yet.

 

Issues overview.


There are a few common issue patterns. I have fixed them, but someone would need to validate if the approach is correct. 

 

  1. Server and Client distribution pack differences

 

In a few cases the server library set would be different from the client.

 

Library                  

CLIENT                 

SERVER

Icu4j

missing

4_4_1_1

commons-dbcp

1.2.1

1.4

commons-fileupload

1.2

1.2.1

Log4j

1.2.16

1.2.17

commons-io

1.4

2.1

slf4j-api

1.6.3

1.7.3

 

I gave precedence to the server versions.

 

  1. Missing Dependencies

 

Some transitive dependencies were missing all together.

 

 Missing from the server:

 

ant

ant-launcher

validation-api

 

Missing from the client and the server distribution packs:

 

 

  1. Plugin Discrepancies

 

Missing dependencies:

 

Where a dependency was not required for compilation, it was excluded to stick with the distribution lib sets.

Otherwise it was left there.

 

HL7 (included):

  • hapi-*

 

OpenERP plugin (included):

  • ws-commons-util
  • openerp-java-api
  • xmlrpc-client
  • xmlrpc-common

 

Palo (excluded):

 

  • palojlib

 

Star Modeler

Missing from the client: geronimo-stax-api_1.0_spec

 

Missing from both client and the server:

  • woden-api
  • woden-impl-dom
  • geronimo-activation_1.1_spec
  • geronimo-javamail_1.4_spec
  • geronimo-jms_1.1_spec
  • httpcore-nio

 

Excluded

 

 

  1. Unused dependencies

 

Some declared dependencies were not used directly or indirectly for any of the components. Such dependencies were removed. If there is a need to have them explicitly declared a separate submodule can be create with those dependencies.

 

Example:

Jackson-* lib pack. Client would have 1.9.2, server 1.9.3 yet this library pack is not used by any of the dependencies. So it was removed.

 

  1. Version

 

Quite often the distribution pack would contain the library of the later version that the one brought in transitively.

There were two exceptions:

 

Star Modeler neethi 2.0.4, when the packs contain 2.0.1

HL7 jdom 1.1, when the pack contain 1.0

 

Those two were excluded from being resolved.

 

 

Compilation

 

Project compiles with mvn compile/package/install

Project compiles in IntelliJ IDEA - just import it as maven.

Before importing into Eclipse please run mvn eclipse:eclipse to generate project files.

 

Bear in mind that before the new jar modules have been pushed into the local .m2 repo, the ones from the Pentaho Artifactory will be resolved - hence building projects individually would not be possible (i.e. until mvn install has been run).

 

Hope that helps.

Reply all
Reply to author
Forward
0 new messages