Glean for large projects

2 views
Skip to first unread message

Evgeny

unread,
Dec 9, 2007, 4:44:56 AM12/9/07
to glean-co...@googlegroups.com
I am trying to apply glean on a large J2EE multi-project project.
Where there is a separation of components to directories, and separate jars.

Since Glean does not really support multiple source and class directories, it's a bit of a problem to apply it to this project.
So I just had an idea that I would love some feedback on it if possible ...


There is a continuous integration builds for the project, that use the structure of multiple directories to build everything. It seems to work. I think I will add another compilation just for glean, a javac task that will dump all the java sources from all over the place into one big bucket. Two buckets actually, one bucket for sources and one for classes.

With these two large buckets I'll run glean and generate all reports ... and then clean them I guess.


Any ideas/suggestions/whatever about this?


Evgeny

unread,
Dec 9, 2007, 4:51:11 AM12/9/07
to glean-co...@googlegroups.com
Oh, and I will copy all the (normally built) jars into another bucket just for kicks .... (and for jar dependency analyzers)

Evgeny

unread,
Dec 9, 2007, 4:55:00 AM12/9/07
to glean-co...@googlegroups.com
Ok. Sorry for the spam. But it looks something like this:
http://dpaste.com/hold/27374/

Jose Noheda

unread,
Dec 9, 2007, 6:28:29 AM12/9/07
to glean-co...@googlegroups.com
In our case, we just generate independent reports for each project/jar/war. There was a discussion here about multi-project configurations and IIRC it's on the wish list. Don't know about its status though.

Evgeny

unread,
Dec 9, 2007, 6:57:37 AM12/9/07
to glean-co...@googlegroups.com
As I am getting deeper into this, turns out there are more things to do - like for example mappers that remove the project names and copy all sources into a package hierarchy.

For some reason (beyond my control) for this particular project the developers decided to have one Test package for all the other components. And not tests per component.

Also having a larger code base to work on provides better feedback imho, one example is "cpd" which can show fragments copied from components to component.

jbrugge

unread,
Dec 9, 2007, 10:40:05 PM12/9/07
to glean-code-users
Evgeny,

I think that your approach of gathering everything in to one place is
perhaps the simplest right now. If I were to add support for that to
Glean right now, that's probably what I'd offer: tell me what the
structure of the subprojects are, and I'll build a big temp structure
to work on.

A big part of what makes it tricky is that the Ant tasks for a number
of the tools don't offer that much flexibility - if you could give
them a <fileset>, or a fileset reference, that would make a world of
difference. When the only parameter you can provide is the name of a
directory, well, that's what you can do.

I did talk a bit more about this earlier (see
http://groups.google.com/group/glean-code-users/t/8b5f0119a8be2da2),
and I haven't gone any farther with it since, sorry to say. I would
like to hear how your attempts turn out. One concern I might guess at
is that in blending the entire codebase together, you lose out
landmarks within the project - "okay, that class/package is showing
some issues. Which project is that in again?" I agree with you,
though, that tools like CPD can take on greater value, finding copy/
paste maneuvers that span a team's efforts, not just a code project.

Hope that helps,
John

On Dec 9, 5:57 am, Evgeny <evgeny.zis...@gmail.com> wrote:
> As I am getting deeper into this, turns out there are more things to do -
> like for example mappers that remove the project names and copy all sources
> into a package hierarchy.
>
> For some reason (beyond my control) for this particular project the
> developers decided to have one Test package for all the other components.
> And not tests per component.
>
> Also having a larger code base to work on provides better feedback imho, one
> example is "cpd" which can show fragments copied from components to
> component.
>
> On Dec 9, 2007 1:28 PM, Jose Noheda <jose.noh...@gmail.com> wrote:
>
> > In our case, we just generate independent reports for each
> > project/jar/war. There was a discussion here about multi-project
> > configurations and IIRC it's on the wish list. Don't know about its status
> > though.
>
> > On Dec 9, 2007 10:55 AM, Evgeny <evgeny.zis...@gmail.com> wrote:
>
> > > Ok. Sorry for the spam. But it looks something like this:
> > >http://dpaste.com/hold/27374/
>
> > > On Dec 9, 2007 11:51 AM, Evgeny < evgeny.zis...@gmail.com> wrote:
>
> > > > Oh, and I will copy all the (normally built) jars into another bucket
> > > > just for kicks .... (and for jar dependency analyzers)
>

Evgeny

unread,
Dec 10, 2007, 2:17:31 AM12/10/07
to glean-co...@googlegroups.com
For this "huge" j2ee project I wrote just one template build file that all the separate modules "import". Some modules need to do things a little bit differently, so before the import I could either override an ant's target, or use a hook, like <target name=" compile.local.pre">. These hooks are used before and after each target in the build template. It is used in one place to generate code before compilation, and in other place to create a "*.rar" file after the .jar is created. There is also a dependency mechanism that I won't explain here.

The projects look like this after the build is complete :

ModuleName/
 +- build.xml
 +- src/
 |     +- p/k/g/name
 +- build/
      +- ModuleName.jar
      +- classes/
             +- p/k/g/name

And there is one TestModule projects that contains tests for all the other modules.

To use glean on all of this, I thought at first to compile again after copying all the files into the buckets - but because of the hooks per/project it's duplication and will be hard to keep in sync. So I settle on compiling the whole project, which right now includes compiling the TestModule (with the right dependencies) - and after this copy all the java files, classes and jars produced into 3 huge buckets that glean can use.

It comes out rather well.
cpd - shows more results.
emma/cobertura - work correctly even with that TestModule that tests all the other modules
javadoc - creates per/package docs, like it should basically.
findbugs/pmd/javancss - cover the whole project in one run ... 

If I would use glean on each separate module there are several problems that come out of it:
1. it's a lot of writing and tweaking by my part
2. it does not stop duplication of code, cpd not as effective
3. it allows for a developer to minimize his scope of responsibility to a module instead of the project
4. pmd/findbugs find the same bugs in separate places, with a higher chance that it will not be fixed in all places.

And regarding context, since the packages are not changed .. I guess that when developers use eclipse they can just find a package name in the "workspace", and not just in a "project" (= module). So there is not much of an issue for loosing context. Also in this project the package name hierarchy is not reflected on module directory hierarchy, so several modules can have a package xx.yy.aa.*, while others xx.yy.bb.*, and all these modules are in a flat structure at the top of the project.




Hope this is interesting to someone :)


- evgeny
Reply all
Reply to author
Forward
0 new messages