Analysis of Chempound software

Peter Murray-Rust

unread,

Nov 15, 2011, 7:58:52 AM11/15/11

to quixote-...@googlegroups.com, Sam Adams

I am exploring Chempound to see what the process of ingestion is. Currently in quixote-utils we have

Deposit/Anna/Cif/CLI/Gau/Henry/NWChem
Import/Anna/Henry/ (extends GuassianLogImporter)
Process/Anna/Henry

Are all these still all required? There is clearly room for refactoring here as I imagine that Henry and Anna only add small customisation to basic functionality.

Sam, if you can answer this in a minute when you have time it would be very useful. As we are using DepositGau, I will continue my explorations there.

P.

--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069

Sam Adams

unread,

Nov 15, 2011, 10:25:24 AM11/15/11

to Peter Murray-Rust, quixote-...@googlegroups.com

Hi Peter,

I created the Anna/Henry classes to do batch processing of the calculations they donated. The quixote-utils project is really a set of personal utilities for my use, but parts of it are now being more widely used... I probably ought to separate them into a different project.

Sam

Peter Murray-Rust

unread,

Nov 15, 2011, 12:49:48 PM11/15/11

to quixote-...@googlegroups.com

On Tue, Nov 15, 2011 at 3:25 PM, Sam Adams <s.e....@gmail.com> wrote:

Hi Peter,

I created the Anna/Henry classes to do batch processing of the calculations they donated.

Thanks - I guessed so. I am assuming that we should go down the DepositGau/NWChem route.

The quixote-utils project is really a set of personal utilities for my use, but parts of it are now being more widely used... I probably ought to separate them into a different project.

OK. I'll leave this to you and start looking at the DepositGau procedure. My guess is that DepositFoo classes could use a superclass for tidiness.
.

Peter Murray-Rust

unread,

Nov 18, 2011, 1:19:44 AM11/18/11

to quixote-...@googlegroups.com

I am going through Chempound starting the documentation. The first phase is to create package-info.java for each package (including "nested" ones) At present these just say
"(PMR) TBD"
I shall preface all my docs with (PMR) as that indicates they are my guesses.

**SAM** There appear to be places where the same Java PACKAGE occurs in more than one Eclipse/maven PROJECT. This is detected as a warning in the Java docs. A typical example is:

PROJECT chempound-webapi contains
   package uk.ac.cam.ch.wwmm.chempound.webapp;
     ChempoundWebApp.java (Interface)
     ...

and so does PROJECT chempound-webapp
   package uk.ac.cam.ch.wwmm.chempound.webapp;
     AddressFilter.java (class)

Is this deliberate or a typo? It would seem that having a package split into tow different places made the maintenance and exploration harder? And Javadoc cannot cope easily (it only wants one package-info per package).

P.

Peter Murray-Rust

unread,

Nov 18, 2011, 1:34:36 AM11/18/11

to quixote-...@googlegroups.com

Here's Javadoc's analysis of where packages occur in more than one project. It only throws one warning so you have to find the duplicate packages by browsing.

Constructing Javadoc information...
E:\workspace\chempound-aggregator\chempound\chempound-webapp\src\main\java\uk\ac\cam\ch\wwmm\chempound\webapp\package-info.java:4: warning: [package-info] a package-info.java file has already been seen for package uk.ac.cam.ch.wwmm.chempound.webapp
package uk.ac.cam.ch.wwmm.chempound.webapp;
                                   ^
javadoc: warning - Multiple sources of package comments found for package "uk.ac.cam.ch.wwmm.chempound.webapp"
E:\workspace\chempound-aggregator\compchem\compchem-importer\src\main\java\uk\ac\cam\ch\wwmm\chempound\compchem\package-info.java:4: warning: [package-info] a package-info.java file has already been seen for package uk.ac.cam.ch.wwmm.chempound.compchem
package uk.ac.cam.ch.wwmm.chempound.compchem;
                                   ^
javadoc: warning - Multiple sources of package comments found for package "uk.ac.cam.ch.wwmm.chempound.compchem"
E:\workspace\chempound-aggregator\chempound\chempound-app\src\main\java\uk\ac\cam\ch\wwmm\chempound\package-info.java:4: warning: [package-info] a package-info.java file has already been seen for package uk.ac.cam.ch.wwmm.chempound
package uk.ac.cam.ch.wwmm.chempound;
                         ^
javadoc: warning - Multiple sources of package comments found for package "uk.ac.cam.ch.wwmm.chempound"
E:\workspace\chempound-aggregator\chempound\chempound-webapp\src\main\java\uk\ac\cam\ch\wwmm\chempound\webapp\search\package-info.java:4: warning: [package-info] a package-info.java file has already been seen for package uk.ac.cam.ch.wwmm.chempound.webapp.search
package uk.ac.cam.ch.wwmm.chempound.webapp.search;
                                          ^
javadoc: warning - Multiple sources of package comments found for package "uk.ac.cam.ch.wwmm.chempound.webapp.search"
E:\workspace\chempound-aggregator\chemistry\chemistry-common\src\main\java\uk\ac\cam\ch\wwmm\chempound\chemistry\package-info.java:4: warning: [package-info] a package-info.java file has already been seen for package uk.ac.cam.ch.wwmm.chempound.chemistry
package uk.ac.cam.ch.wwmm.chempound.chemistry;
                                   ^
javadoc: warning - Multiple sources of package comments found for package "uk.ac.cam.ch.wwmm.chempound.chemistry"
E:\workspace\chempound-aggregator\chempound\chempound-webapi\src\main\java\uk\ac\cam\ch\wwmm\chempound\webapp\guice\package-info.java:4: warning: [package-info] a package-info.java file has already been seen for package uk.ac.cam.ch.wwmm.chempound.webapp.guice
package uk.ac.cam.ch.wwmm.chempound.webapp.guice;
                                          ^
javadoc: warning - Multiple sources of package comments found for package "uk.ac.cam.ch.wwmm.chempound.webapp.guice"
E:\workspace\chempound-aggregator\chempound\chempound-app\src\main\java\uk\ac\cam\ch\wwmm\chempound\config\package-info.java:4: warning: [package-info] a package-info.java file has already been seen for package uk.ac.cam.ch.wwmm.chempound.config
package uk.ac.cam.ch.wwmm.chempound.config;
                                   ^
javadoc: warning - Multiple sources of package comments found for package "uk.ac.cam.ch.wwmm.chempound.config"
E:\workspace\chempound-aggregator\chempound\chempound-app\src\main\java\uk\ac\cam\ch\wwmm\chempound\datastore\package-info.java:4: warning: [package-info] a package-info.java file has already been seen for package uk.ac.cam.ch.wwmm.chempound.datastore
package uk.ac.cam.ch.wwmm.chempound.datastore;
                                   ^
javadoc: warning - Multiple sources of package comments found for package "uk.ac.cam.ch.wwmm.chempound.datastore"
javadoc: warning - Multiple sources of package comments found for package "org.mockito"

Sam Adams

unread,

Nov 19, 2011, 6:10:53 AM11/19/11

to quixote-...@googlegroups.com

Hi Peter,

You're right, the package layout has gotten a bit out of date - it hasn't really kept up as the rest of the architecture has evolved. I'm in the process of some refactoring to reduce the coupling between components at the moment, and hopefully make it simpler for other people to start contributing to the project. I'll have a think about the best way of updating the package structure while I'm at it.

Cheers,

Sam

Peter Murray-Rust

unread,

Nov 19, 2011, 9:37:23 AM11/19/11

to quixote-...@googlegroups.com

On Sat, Nov 19, 2011 at 11:10 AM, Sam Adams <s.e....@gmail.com> wrote:

Hi Peter,

You're right, the package layout has gotten a bit out of date - it hasn't really kept up as the rest of the architecture has evolved. I'm in the process of some refactoring to reduce the coupling between components at the moment, and hopefully make it simpler for other people to start contributing to the project. I'll have a think about the best way of updating the package structure while I'm at it.

Good,

What I am trying to do is get a feel of the overall process of ingesting an entry. There are the following aspects:
* convention over configuration (http://en.wikipedia.org/wiki/Convention_over_configuration) (I assume there is an element of this and I am happy to see it formalized/documented. (We are a coherent group of users/developers and can agree on general principles).
* where Jumbo-converters is invoked
* what RDF triples are (a) fundamental (b) generated differently for each domain
* where the web pages are generated and where the templates are

My intention is that we create an overview of the ingestion process so it becomes clear where the configuration actions (if any) are.

I think that our growing community will have sufficient marginal resource to start adding to Chempound either as domain-specific plugins (e.g. spectra) or base fuctionality or bugfixes.

**SAM** are you happy for Chempound discussion to continue on quixote-dev (it makes sense at present) or should we move to a chempound list?

P.

Sam Adams

unread,

Nov 19, 2011, 11:28:10 AM11/19/11

to quixote-...@googlegroups.com

On 19 November 2011 14:37, Peter Murray-Rust <pm...@cam.ac.uk> wrote:
>
>
> What I am trying to do is get a feel of the overall process of ingesting an entry. There are the following aspects:
> * convention over configuration (http://en.wikipedia.org/wiki/Convention_over_configuration) (I assume there is an element of this and I am happy to see it formalized/documented. (We are a coherent group of users/developers and can agree on general principles).
> * where Jumbo-converters is invoked
> * what RDF triples are (a) fundamental (b) generated differently for each domain
> * where the web pages are generated and where the templates are
>
> My intention is that we create an overview of the ingestion process so it becomes clear where the configuration actions (if any) are.

The ingestion process is the most complex / least well written part of compound at the moment, and one of the main focuses of my current refactoring. It is likely to change significantly over the next couple of weeks, so I wouldn't spend a lot of time trying to understand/document it right now!

>
>
> **SAM** are you happy for Chempound discussion to continue on quixote-dev (it makes sense at present) or should we move to a chempound list?

I'm happy to keep using quixote-dev for now.

Sam

Peter Murray-Rust

unread,

Nov 19, 2011, 12:02:12 PM11/19/11

to quixote-...@googlegroups.com

On Sat, Nov 19, 2011 at 4:28 PM, Sam Adams <s.e....@gmail.com> wrote:

On 19 November 2011 14:37, Peter Murray-Rust <pm...@cam.ac.uk> wrote:
>
>
> What I am trying to do is get a feel of the overall process of ingesting an entry. There are the following aspects:
> * convention over configuration (http://en.wikipedia.org/wiki/Convention_over_configuration) (I assume there is an element of this and I am happy to see it formalized/documented. (We are a coherent group of users/developers and can agree on general principles).
> * where Jumbo-converters is invoked
> * what RDF triples are (a) fundamental (b) generated differently for each domain
> * where the web pages are generated and where the templates are
>
> My intention is that we create an overview of the ingestion process so it becomes clear where the configuration actions (if any) are.

The ingestion process is the most complex / least well written part of compound at the moment, and one of the main focuses of my current refactoring. It is likely to change significantly over the next couple of weeks, so I wouldn't spend a lot of time trying to understand/document it right now!

OK

I started to fork the bitbucket/chempound repo. It looks as if I also have to fork each of the six subrepos:
* chemistry
* chempound
* chempound-parent
* compchem
* crystallography
* deposit-client

would that be correct?

(given that you are refactoring these, probably I will go slow on this anyway!)

P.

Reply all

Reply to author

Forward