Re: Maven

0 views

Skip to first unread message

Story Henry

unread,

Jun 26, 2008, 4:12:01 AM6/26/08

to us...@sommer.dev.java.net, bae...@googlegroups.com

Hi Fabrizio,

Nice to see that we are very much in agreement here. :-)

[cc ing this to the baetle mailing list, as this is where some of that
work is going on
this email follows on from my post at
http://blogs.sun.com/bblfish/entry/webifying_integrated_development_environments
]

On 25 Jun 2008, at 20:12, Fabrizio Giudici wrote:

> I don't think that Maven is relevant here - well, I admit, I'm one
> of the big detractors :-) Seriously, the big disadvantage of Maven
> lies in its great complexity (when comparated e.g. with ant) and
> could not be related to what we're discussing now. But I think is
> not related for the way it works. With Maven you basically can
> download automatically required dependencies from a repository. BTW,
> AFAIK you can't do this in a versioned fashion, which in my opinion
> is a really bad practice.

That would indeed be a problem. Not so surprising in a way since as I
understand from one of the commenters on the blog they use the class
name as a URN to identify things. And since classes are not versioned,
things get to be tricky.

> The way I do things is that everything, code and dependencies
> (libraries), are checked into a Subversion repository. Now, for the
> sources, they are in their home place, I mean I wrote them, they
> belong to the project, so their URL in Subversion _is_ the most
> appropriate id. Things are different for libraries, since they have
> been copied from some other place. So, as a starter, in this
> scenario I think that people publishing libraries should put the
> binary deliverables into a Subversion repository, so they are well
> identified by a URL.

Yes, though as long as they have a URL things should be ok. Subversion
is just built right from the core with this in mind, which of course
helps.

> Furthermore, the revision id should be part of the URL (maybe with
> #rev syntax), a think that AFAIK is not in Subversion.

Subversion keeps a URL for every single version, for every branch,
etc...
But it would not be good semweb practice to rely on URL patterns to
decide whether a URL refers to a version, a tag, a branch, etc... For
one it would tie one too much to one version control system. Pure
practical considerations should help people decide in favor of
subversion. No need to build that into a build system :-)

I know because I tried to extract all the information from the
NetBeans repository and later with a slightly different ontology all
the information from the the openrdf.org repository over a year ago. I
placed the results of this in the subdirectories of this resource

http://bblfish.net/work/baetle/mappings/

It takes a little remembering how I did this, but if you look at
http://bblfish.net/work/baetle/mappings/openrdf.org/openrdf.sesame.release.2.0beta2.source.ntriples.bz2

It contains triples such as

@prefix svnbc: <https://src.aduna-software.org/svn/org.openrdf/!svn/bc/
> .
@prefix svn102: <https://src.aduna-software.org/svn/org.openrdf/!svn/ver/102/
> .

svnbc:102/ a btl:CheckIn;
:modified
s102:sandbox/cvs2svn/trunk/src/org/openrdf/sesame/repository/
Repository.java,
s102:sandbox/cvs2svn/trunk/src/org/openrdf/sesame/sail/util/
SailUtil.java,
s102:sandbox/cvs2svn/trunk/src/org/openrdf/sesame/sailimpl/inferencer/
ForwardChainingRDFSInferencer.java, s102:sandbox/
cvs2svn/trunk/src/org/openrdf/sesame/sailimpl/memory/
MemoryStore.java, s102:sandbox/cvs2svn/trunk/src/org/
openrdf/sesame/sailimpl/memory/MemoryStoreRDFSInferencer.java,
s102:sandbox/cvs2svn/trunk/src/org/openrdf/sesame/sailimpl/memory/
model/MemStatementIterator.java .

the above is saying that the checking svnbc:102/ modified the 6 files
listed above. Those are the versioned files.

So though I am not sure if I correctly interpreted which urls in
subversion stand for checkins, etc. this can be done quite precisely.
Then one can use a simple ontology to describe the type of the
resources described, so that one does not have to work out the type of
the resource by looking into the URL. (Some of these could be inferred
from the meaning of :modified perhaps in the ontology).

Of course one of the nice things with RDF is that if another version
system does not make a URI available for a checking, one could use a
blank node.

Ok, so that shows how one can give URIs to pretty much anything in a
version controlled way.

Now there may be shorthands required to make such descriptions easier
in an rdf project description file at a higher level that IDEs could
use to get started. It may be that some notion of url patterns would
be useful, such that at the root of a package source hierarchy one
should be able to find all the sources by moving through the '/' .

> Third, instead of putting a copy of a library in a repository, at
> this point people could put a special "link" that is just the URL to
> the original file (including revision, as stated). Thinking of this
> as a virtual filesystem, the "link" work means exactly what a link
> is for a filesystem.

Yes, exactly. This could easily be described in an RDF file. Amazingly
easily in fact.
If it catches on one could imagine repostitories coming standard with
SPARQL endpoints. Then you could ask
questions such as where all the version of a class are to be found.

> I second the need for a local cache - I can't live without it.

Yep, that just follows nicely from REST. You can cache what you want.

> Summing up, I think that Subversion, not Maven, is an excellent
> starting point.
>

Thanks. CVS works too, you just have to put a web front end above it.
I extracted all the metada from Netbeans CVS.
Or one could use some kind of cvs url, but that would at some times be
very slow, as to get to certain files, cvs has to check out all the
parents.

But subersion is a lot nicer to work with :-)

> On Jun 25, 2008, at 11:31 , Story Henry wrote:
>
>> Home page: http://bblfish.net/
>>> - I only have 2nd hand information on Maven, but it does have its
>>> share of detractors. Those who dislike it, seem to like Ivy [1].
>>> Can anyone knowledgeable about both tell us the pros and cons?
>>> I've found one such list [2], but it is bound not to say anything
>>> negative about Ivy. ;-)
>>
>> Thanks for those pointers. I had not head of Ivy at all. Spending
>> too much time in Semantic Web land perhaps...
>>
>> I wonder what others think of that, or if anyone has any experience
>> working with Ivy.
>>
>>> - As for "No need to download source code: it's on the web! You
>>> don't therefore need a local cache of it.". I think the web-
>>> enabled applications of the future [3] should replicate/
>>> synchronize data and not depend on its immediate availability: I'm
>>> always nervous when I plan ahead for programming on an airplane,
>>> because I can never be entirely sure that I'll have all the
>>> necessary documentation with me.
>>
>> I completely agree. I would be very interested in my browser
>> keeping a full cache of every single resource I ever saw on the
>> internet, so that I could zoom back to a time in the past and just
>> see what I was looking at on the web.
>>
>> To get that though software needs to be designed at the level of
>> URLs though. And that is the point I was trying to get across.
>>
>> My point "you don't *need* a local cache" should perhaps have
>> emphasised the word "need" more clearly. It should of course be
>> possible, and easy to cache everything you want. Before you go on a
>> trip, you should get all the pieces you need to work on a plane. I
>> have to do that as is now, so having IDE's that can be clever about
>> fetching stuff off the web, is not going to make things more
>> difficult.
>>
>> But as it happens even without this caching piece, you find there
>> is always something out there on the web that would have been
>> useful for what you wanted to do. The best solution is to put wifi
>> on planes, or just to read a few books. :-/ Anyway most planes I am
>> on, don't even have electricity.
>>
>> Henry
>>
>>>
>>>
>>> [1] http://ant.apache.org/ivy/
>>> [2] http://ant.apache.org/ivy/m2comparison.html
>>> [3] http://2ality.blogspot.com/2008/04/online-eclipse-e4-lack-of-imagination.html
>>
>
> --
> Fabrizio Giudici, Ph.D. - Java Architect, Project Manager
> Tidalwave s.a.s. - "We make Java work. Everywhere."
> weblogs.java.net/blog/fabriziogiudici - www.tidalwave.it/blog
> Fabrizio...@tidalwave.it - mobile: +39 348.150.6941
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-un...@sommer.dev.java.net
> For additional commands, e-mail: users...@sommer.dev.java.net

Story Henry

unread,

Jun 26, 2008, 6:28:34 AM6/26/08

to us...@sommer.dev.java.net, bae...@googlegroups.com

Introductions
-------------

Thanks Arjohn, for the insight. You must be one of the few people in
this world who understand both the semantic web and Maven. :-) This is
really helpful.

We are really lucky to be joined in this conversation by Fabrizio
Giudici, who is on the NetBeans Dream Team, and knows NetBeans inside
out. I really recommend you look at his beautiful Blue Marine
application at
http://bluemarine.tidalwave.it/
Fabrizio is learning about RDF, and wishes to build it into Blue
Marine. So I recommend him using sesame of course :-)

( I should probably try to introduce people as they join this list )
Arjohn: http://www.openrdf.org/people/foaf-arjohn.rdf
Fabrizio: http://weblogs.java.net/blog/fabriziogiudici/
Axel: http://www.pst.ifi.lmu.de/~rauschma/

The Architectural Issue
-----------------------

Arjohn, I would be interested to have your view of the longer term
architectural picture I was trying to describe here

http://blogs.sun.com/bblfish/entry/webifying_integrated_development_environments

In that article I was trying to express what I think should be an
architectural constraint (is that the right word?) on how to build
IDEs. It is clearly a long term, high level vision, I am trying to
develop there. Clearly Maven has got a lot right by getting things
done now. So my criticism of it would only be forward looking, by
trying to see how one could do even better. That would not subtract
from its achievements. At the same time I understand I need to get a
better grip on what Maven is good at, to make my point more clearly.

Maven and the SW
----------------

From what you say below Maven is very delarative which should make it
GRDDLable. So that could allow one to understand what it is doing from
a semantic web perspective.

You point to an interesting FAQ http://maven.apache.org/guides/introduction/introduction-to-repositories.html
that argues against placing jars in local repositories:

[[
It is not recommended that you store your JARs in CVS. Maven tries to
promote the notion of a user local repository where JARs, or any
project artifacts, can be stored and used for any number of builds.
Many projects have dependencies such as XML parsers and standard
utilities that are often replicated in typical builds. With Maven
these standard utilities can be stored in your local repository and
shared by any number of builds.

This has the following advantages:

1. It uses less storage - while a repository is typically quite
large, because each JAR is only kept in the one place it is actually
saving space, even though it may not seem that way
2. It makes checking out a project quicker - initial checkout,
and to a small degree updating, a project will be faster if there are
no large binary files in CVS. While they may need to be downloaded
again afterwards anyway, this only happens once and may not be
necessary for some common JARs already in place.
3. No need for versioning - CVS and other source control systems
are designed for versioning files, but external dependencies typically
don't change, or if they do their filename changes anyway to indicate
the new version. Storing these in CVS doesn't have any added benefit
over keeping them in a local artifact cache.
]]

That is true. And Maven does fix that problem. But it does not answer
the question as to why there should be a central repository at all. To
take the points one by one:

1. Uses less storage: If we had a declarative build system that
used URLs to identify resources, then we would not need
any space more than what it would take to refer to that resource (the
characters in the URL )
2. Once one identifies every external resource with a URL, then
there checkouts are going to be very fast too
3. No need for versioning of the external resources. Right. Just
point to the versioned external resource using a URL.

Furthermore one could create a :copyOf relation so that one could
specify a number of different places to get those jars, in case of
network downtime. Perhaps some of those would end up at archive.org,
so one could use those urls.

Maven and So(m)mer
------------------

Ok all of the above just deals with Maven and future semantic web
improvements of it.

There remains the question as to how useful Maven could be to us now
on this project. I appreciate declarativeness. So that could be a
point in favor of Maven over Apache Ivy.

I also wonder if having either would help me write project
descriptions that would make it easy for me and others to link Maven
source code to jars. Currently I am having real problems at that
level, and perhaps others are having to. I know NetBeans has Maven
integration now. It probably does not have Ivy integration.

Then there is the question of how complex it is to do. And since I
really should be developing the AddressBook, I can't spend time
immediately on this. But discussing it may allow us to come to a
decision, and perhaps spur someone to do it.

Finally I suppose is the question of how many of the other jars being
used have Maven poms.

(note: So this may well be the reason why it is taken maven longer to
develop than necessary, because it requires POMs to be build and
placed on remote servers.)

On 26 Jun 2008, at 11:16, Arjohn Kampman wrote:

> Story Henry wrote:
>> On 25 Jun 2008, at 20:12, Fabrizio Giudici wrote:
>>> I don't think that Maven is relevant here - well, I admit, I'm one
>>> of the big detractors :-) Seriously, the big disadvantage of Maven
>>> lies in its great complexity (when comparated e.g. with ant) and
>>> could not be related to what we're discussing now. But I think is
>>> not related
>

> IMHO, Maven is much more relevant here than Ant is. Ant is mostly
> procedural, whereas Maven is much more declarative. Maven also
> includes
> a publishing model that is used for automatic resolving and fetching
> of resources ("artifacts") from the web, recursively if necessary.
> AFAIK, Ant has no features for this.

I like declarative. This should mean that Maven would be very GRDDL-
able right?
Ie, Maven may just be a specialised RDF format. :-)

>>> for the way it works. With Maven you basically can download
>>> automatically required dependencies from a repository. BTW, AFAIK
>>> you can't do this in a versioned fashion, which in my opinion is a
>>> really bad practice.
>

> On the contrary, all maven artifacts are versioned. Pom files for
> released artifacts commonly also point to the relevant source URL,
> which
> can be an SVN-tag URL, a CVS URL with a tag label, etc. This pom file
> also specifies the source dir, so one can easily resolve the source
> file
> for a specific class. More common, however, is to also download the
> (versioned) source artifacts for a specific library.

Thanks for that clarification.

>
>> That would indeed be a problem. Not so surprising in a way since as
>> I understand from one of the commenters on the blog they use the
>> class name as a URN to identify things. And since classes are not
>> versioned, things get to be tricky.
>

> That's news to me. In what context are URNs used to identify classes?
> Artifacts are identified by three parameters: a groupId, an artifact
> name and a version number. For example: "org.openrdf.sesame",
> "sesame-runtime" and "2.1.2".

Well I was going from one of the comments on my blog posting.
somewhere here:
http://blogs.sun.com/bblfish/entry/webifying_integrated_development_environments#comment-1214355205000

I was in fact making a short cut here. One can think of the
org.openrdf.sesame.RDF as being a URN

<urn:java:org.openrdf.sesame.RDF>

But of course one can easily create versions like this

[] java:class "org.openrdf.sesame.RDF"
java:version "2.1.2" .

or something like that.

Still they are Names rather than locators.... And my question was more
why not use URLs to locate that blank node. And then why would they
need to be in one repository?

>
>>> I second the need for a local cache - I can't live without it.
>> Yep, that just follows nicely from REST. You can cache what you want.
>

> Maven has a local cache.

yes. We are all in agreement here :-) We all want local caches.
So if I had to choose something now, it looks like Maven should be
really useful. If I wanted to build something
better, I'd still wonder why one can't use more generic mechanisms
build into REST.

>>> Summing up, I think that Subversion, not Maven, is an excellent
>>> starting point.
>>>
>> Thanks. CVS works too, you just have to put a web front end above
>> it. I extracted all the metada from Netbeans CVS.
>> Or one could use some kind of cvs url, but that would at some times
>> be very slow, as to get to certain files, cvs has to check out all
>> the parents.
>

> For more information and some reasons why NOT
> to store jar files in SVN/CVS, see:
>
> http://maven.apache.org/guides/introduction/introduction-to-repositories.html

That is very helpful pointer.

>
>
> Just my 2 euro cents. I would suggest that you read a bit more about
> Maven before discarding it completely. It has a lot of nice features
> that are relevant in this context.

No desire to discard it completely by the way. As I mention above
there are two things going on.
It could well be the best thing available now. And there is how it
could be improved.

>
>
> Cheers,
>
> Arjohn

Story Henry

unread,

Jun 26, 2008, 1:21:58 PM6/26/08

to us...@sommer.dev.java.net, bae...@googlegroups.com

On 26 Jun 2008, at 20:56, Fabrizio Giudici wrote:
> Ok, it's something I read time ago - so let's turn back to
> versioning. It's this the only versioning support?
>
> <dependencies>
> <dependency>
> <groupId>junit</groupId>
> <artifactId>junit</artifactId>
> <version>3.8.1</version>
> <scope>test</scope>
> </dependency>
> </dependencies>

This also makes me think that it could be a lot simpler. We could say

---
@prefix mvn3: <http://maven.apache.org/v3/ont#> .

:ABv0.5 rdfs:label "AddressBook version 0.5";
mvn3:dependency <http://downloads.sourceforge.net/junit/junit3.8.1.jar
> .
---

So that would also remove the need for the maven repository.
I suppose if one wanted to be able to get information about how that
resource was built one would have to add something relating the jar to
a something that described the build procedure.

<http://downloads.sourceforge.net/junit/junit3.8.1.jar> mvn3:buildBy ...

I imagine that is where studying Maven carefully would be very helpful
to gather some ideas.

> Because I don't call it versioning, it's just a label :-) I mean, it
> depends on the guy building the maven repository for being sure that
> this is really 3.8.1 - which is absolutely not good for me (and for
> a large deal of customers I've had). I mean: somebody can commit an
> error and assign the wrong label. In contrast, with a Subversion
> file system, every change you make creates a new revision number
> (for instance, with Subversion I don't use tags any more to mark a
> release, I just take note of the revision number of subversion).
>
> On Jun 26, 2008, at 14:04 , Arjohn Kampman wrote:
>
>> Fabrizio Giudici wrote:

>>> On Jun 26, 2008, at 11:16 , Arjohn Kampman wrote:
>>>>
>>>> Just my 2 euro cents. I would suggest that you read a bit more
>>>> about
>>>> Maven before discarding it completely. It has a lot of nice
>>>> features
>>>> that are relevant in this context.

>>> Well, it seems I have to agree on the need of more reading about
>>> Maven :-) To my (partial) discharge, I discussed some of that
>>> points multiple times in the latest months (especially about
>>> versioning) with some friends who're using Maven, and they clearly
>>> didn't gave me the right answers. So, it's probably better for the
>>> sake of this discussion that you guys drop everything I've said
>>> about it.
>>
>> Javaworld has a nice 3-page intro to maven:
>> http://www.javaworld.com/javaworld/jw-12-2005/jw-1205-maven.html
>>
>> Doesn't cover all the details, of course, but will surely help to
>> get a
>> basic understanding of the concepts.
>

Story Henry

unread,

Jun 26, 2008, 6:23:53 PM6/26/08

to us...@sommer.dev.java.net, bae...@googlegroups.com

On 26 Jun 2008, at 20:27, Fabrizio Giudici wrote:
>> So that would also remove the need for the maven repository.
>> I suppose if one wanted to be able to get information about how
>> that resource was built one would have to add something relating
>> the jar to a something that described the build procedure.
>>
>> <http://downloads.sourceforge.net/junit/junit3.8.1.jar>
>> mvn3:buildBy ...
>

> A MD5 fingerprint would be necessary and sufficient. People are not
> interested in knowing the process behind the artifact, they just
> want to be sure that they can arbitrarily download exactly the same
> artifact bit by bit.

Yes, an MD5 sum would be sufficient to be able to find the same jar
from different mirrors.

I was thinking one would want to also link the jar to the build file
or a repository, so that the IDE could find some way to download the
source code, find the source code associated with the classes in the
jar, etc... So that would have to be expressed somewhere. As Arjohn
says:

> The maven identifier can be used both to get the artifact as well as
> the
> meta data (in the form of the pom file). It also allows mirrors to be
> use transparently.

So there is quite a lot to learn from the way Maven does things.

My guess is that if one could finding/building an ontology for Maven
and writing a GRDLL transform for it one could learn a lot, and find
ways to extend it clearly and easily. Linked Data Solutions would also
become a lot clearer really fast.

Something to do when I get some more time on my hands, or for someone
else who is looking for a summer project.

My guess is that when the semantic web has really taken off and
everyone wants to be part of this, this will be a natural thing for
everyone to think of doing. But it won't be what gets the SemWeb
widely accepted in the immediate future. It may even cause some people
to think we are trying to reinvent the wheel.

The AddressBook I am sure on the other hand is solving a real problem
that people do want to solve now. So I'll get back to working on that,
and try not to get distracted by all these other very interesting
projects. It's worth keeping in the back of our mind....

It's really tricky finding sorting out between all these possibly very
interesting projects.

It's midnight here, time for me to go to bed :-)

Henry

Erling Wegger Linde

unread,

Jun 28, 2008, 5:01:25 AM6/28/08

to bae...@googlegroups.com, us...@sommer.dev.java.net

I think http://maven.apache.org/plugins/maven-doap-plugin/ is relevant
for this discussion too :D

- Erling

>
> Something to do when I get some more time on my hands, or for someone else
> who is looking for a summer project.
>
> My guess is that when the semantic web has really taken off and everyone
> wants to be part of this, this will be a natural thing for everyone to think
> of doing. But it won't be what gets the SemWeb widely accepted in the
> immediate future. It may even cause some people to think we are trying to
> reinvent the wheel.
>
> The AddressBook I am sure on the other hand is solving a real problem that
> people do want to solve now. So I'll get back to working on that, and try
> not to get distracted by all these other very interesting projects. It's
> worth keeping in the back of our mind....
>
> It's really tricky finding sorting out between all these possibly very
> interesting projects.
>
> It's midnight here, time for me to go to bed :-)
>
> Henry
>
>
>
>