The first chempound repository is up online!
Right now it contains several hundred Gaussian calculations from Anna
Croft, and will soon have some calculations from Henry Rzepa, and some
crystal structures. Over the next few days I'm going to be putting
quite a lot of effort into improving the search tools, and early next
week we should be in a position where I can give you a simple script to
launch a local repository on your machine, together with a CLI tool to
deposit data to it.
Right now the calculations' index pages just contain a few fairly
arbitrary properties and parameters... as the parsers get developed
further, and more information is formatted into compchem CML it will be
straightforward to update these.
This is where we need your help!
What information do you want to see indexed / searchable / presented on
the web pages? And can you find examples of it in log files (and even
better, help extend the parsers to capture it and format it into
compchem CML!)? Also, are there and properties that can easily be
calculated / derived from the information in the log files that it would
be useful for the repository to generate?
I'll be presenting all of this on Wednesday at Open Repositories 2011,
in Austin, Texas.
Sam
That is great news!
>
> http://quixote.ch.cam.ac.uk/
>
> Right now it contains several hundred Gaussian calculations from Anna Croft,
> and will soon have some calculations from Henry Rzepa, and some crystal
> structures. Over the next few days I'm going to be putting quite a lot of
> effort into improving the search tools, and early next week we should be in
> a position where I can give you a simple script to launch a local repository
> on your machine, together with a CLI tool to deposit data to it.
>
> Right now the calculations' index pages just contain a few fairly arbitrary
> properties and parameters... as the parsers get developed further, and more
> information is formatted into compchem CML it will be straightforward to
> update these.
>
> This is where we need your help!
>
What I am most interested in - is there a web API I can use yet? I
would love simple API to upload, search, retrieve results etc. I could
then start working on a plugin to interact with this from Avogadro. I
think that would be very powerful for me. Are the raw log files, input
files etc stored with these too?
Great work Sam, look forward to any more details/documentation an available API.
Marcus
On 03/06/2011 21:48, Marcus D. Hanwell wrote:
> What I am most interested in - is there a web API I can use yet? I
> would love simple API to upload, search, retrieve results etc. I could
> then start working on a plugin to interact with this from Avogadro. I
> think that would be very powerful for me.
There are parts of a web api right now. All the data for the
collections / items is available via RDF. I haven't done anything about
JSON yet, but I haven't forgotten about it. I'll try to have a look at
adding something basic on the plane tomorrow.
There's a SPARQL endpoint at http://quixote.ch.cam.ac.uk/sparql/ that
can be used to explore and search the RDF, and you can get the complete
RDF (xml/turtle/n3) descriptions of individual collections / items using
content negotiation.
e.g.
curl -H "Accept: text/n3" -L http://quixote.ch.cam.ac.uk/
<http://quixote.ch.cam.ac.uk/content/>
a <http://www.openarchives.org/ore/terms/Aggregation> .
<http://www.openarchives.org/ore/terms/aggregates>
<http://quixote.ch.cam.ac.uk/content/compchem/> ,
<http://quixote.ch.cam.ac.uk/content/crystallography/> .
...
curl -H "Accept: application/rdf+xml" -L
http://quixote.ch.cam.ac.uk/content/compchem/anna/1_50/
There is also an upload API (using SWORD - an extension of AtomPub to
handle packages of files) that I'll describe in more detail soon.
The big task so far have been getting the basic functionality and core
architecture sorted. As the system has developed its had to go through
a number of fairly major refactorings, but I'm pretty happy with the way
the system is structured now, and it has become much easier to add new
functionality.
> Are the raw log files, input files etc stored with these too?
The log files are all there (there's a bug in the templates right now,
so they're not linked from the HTML pages - but that will be fixed
soon), and they're listed in the RDF. I've only uploaded the log + CML
files so far, but other files can be included - the examples I've got at
the moment have about a dozen files per calculation, and I don't know
what most of them are!
> Great work Sam, look forward to any more details/documentation an available API.
Nothing much documented yet - that's my first task once I'm back from
the US, in a week or so's time.
Sam
Hi Marcus,There are parts of a web api right now. All the data for the collections / items is available via RDF. I haven't done anything about JSON yet, but I haven't forgotten about it. I'll try to have a look at adding something basic on the plane tomorrow.
On 03/06/2011 21:48, Marcus D. Hanwell wrote:
What I am most interested in - is there a web API I can use yet? I
would love simple API to upload, search, retrieve results etc. I could
then start working on a plugin to interact with this from Avogadro. I
think that would be very powerful for me.
The log files are all there (there's a bug in the templates right now, so they're not linked from the HTML pages - but that will be fixed soon), and they're listed in the RDF. I've only uploaded the log + CML files so far, but other files can be included - the examples I've got at the moment have about a dozen files per calculation, and I don't know what most of them are!Are the raw log files, input files etc stored with these too?
Nothing much documented yet - that's my first task once I'm back from the US, in a week or so's time.
Great work Sam, look forward to any more details/documentation an available API.
On Fri, Jun 3, 2011 at 10:36 PM, Sam Adams <se...@cam.ac.uk> wrote:
> The first chempound repository is up online!
>
> http://quixote.ch.cam.ac.uk/
Looks great!
Can you please add OpenData icons and/or CCZero waivers?
Where can I file issues, or leave comments? The following entry has a
single chlorine atom, but the MF is reported as Cl2...
http://quixote.ch.cam.ac.uk/content/compchem/anna/1_50/anna_3/index.html
Is searching by InChI deliberately missing on the Search page?
Egon
--
Dr E.L. Willighagen
Postdoctoral Researcher
Institutet för miljömedicin
Karolinska Institutet (http://ki.se/imm)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers
Hi Sam,
Looks great!
Can you please add OpenData icons and/or CCZero waivers?
Where can I file issues, or leave comments? The following entry has a
single chlorine atom, but the MF is reported as Cl2...
http://quixote.ch.cam.ac.uk/content/compchem/anna/1_50/anna_3/index.html
Is searching by InChI deliberately missing on the Search page?
Egon
--
Dr E.L. Willighagen
Postdoctoral Researcher
Institutet för miljömedicin
Karolinska Institutet (http://ki.se/imm)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers
On Sun, Jun 5, 2011 at 11:06 AM, Peter Murray-Rust <pm...@cam.ac.uk> wrote:
>> Can you please add OpenData icons and/or CCZero waivers?
>
> SAM, can you please look into this?
Thanx :)
> I assumed that stoichiometry meant this, but it appears more complex and I
> now assume that (...) means spinMultiplicity. This brings me to a new idea
> (for a Shapado-like site for compchem) which I will mail separately. That is
> where we should be asking these questions.
Just as much the Blue Obelisk is not just about cheminformatics, the
Blue Obelisk eXchange is not either. If wording is off on this Q&A
website, blame me, and we'll fix it.
>> Is searching by InChI deliberately missing on the Search page?
>
> Sam ?
Thanx,
Hi Peter,
Thanx :)
On Sun, Jun 5, 2011 at 11:06 AM, Peter Murray-Rust <pm...@cam.ac.uk> wrote:
>> Can you please add OpenData icons and/or CCZero waivers?
>
> SAM, can you please look into this?
Just as much the Blue Obelisk is not just about cheminformatics, the
> I assumed that stoichiometry meant this, but it appears more complex and I
> now assume that (...) means spinMultiplicity. This brings me to a new idea
> (for a Shapado-like site for compchem) which I will mail separately. That is
> where we should be asking these questions.
Blue Obelisk eXchange is not either. If wording is off on this Q&A
website, blame me, and we'll fix it.
My name is Pablo de Castro and I work at Carlos III University Madrid.
I am a quixotian -you may have seen me at the mail list- and I recently
contributed a (tiny) bit to the ChemInf paper on the idea to build a
preliminary DSpace-based data repository that could be used to store
raw compchem data prior to it being processed and transferred into
Chempound (the way it looks I'm not quite sure the idea makes sense
anymore, as Chempound could well become the default Quixote repository).
I've been following the mail exchange on Chempound this weekend and I
was wondering whether it might get a bit more sophisticated homepage. I
happen to have recently created a spin-off at my university to provide
support to repositories (look&feel design among those services
provided), http://www.grandir.com/EN/, and I was thinking we might
-just might- put in some contribution to the Chempound homepage design.
It's true our know-how is focused on DSpace, but if you tell me what
software Chempound is based upon we could certainly give it a try. I
know researchers don't care much about the homepage outlook, but it
would do the project no harm to have a nice-looking interface for its
data repository, similar to the way IRs look.
Hope it's useful.
Best wishes,
Pablo
----
Pablo de Castro
SONEX Workgroup for Scholarly Output Notification & Exchange
http://sonexworkgroup.blogspot.com/
&
Servicio de Recursos Electrónicos
Biblioteca de la Escuela Politécnica "Rey Pastor"
Universidad Carlos III de Madrid
Avda. de la Universidad, 30 28911 Leganés (Madrid)
Tfno. 91 624 90 81
e-mail: pca...@db.uc3m.es
Sam Adams <se...@cam.ac.uk> dijo:
Pablo
Sam Adams <se...@cam.ac.uk> dijo:
----
Pablo de Castro
Peter, Sam,
I don’t know the meaning of the (2) in Stoichiomentry Cl(2), it would be rather odd for Gaussian to put the spin there. Perhaps something related to symmetry/special positions? For now I would suggest leaving out any numbers in parentheses and treating the rest as a compchem stoichiomentry.
My first reaction on looking at the repository (apart from the obvious thought that it is great to see such progress) is that I’d like to be able to see a view of the raw output from the code particularly while the system is under development.
I would suggest also having a Help menu/page, explaining the project in brief terms, in the simplest case just linking to Quixote pages so that the URL can be more widely circulated and it will self-document. The process of ingesting data could be mentioned (e.g. what data you seek now (if any), and/or a timetable for accepting data or creating new repositories).
Paul
--
Scanned by iCritical.
Dear Sam,
My name is Pablo de Castro and I work at Carlos III University Madrid. I am a quixotian -you may have seen me at the mail list- and I recently contributed a (tiny) bit to the ChemInf paper on the idea to build a preliminary DSpace-based data repository that could be used to store raw compchem data prior to it being processed and transferred into Chempound (the way it looks I'm not quite sure the idea makes sense anymore, as Chempound could well become the default Quixote repository).
I've been following the mail exchange on Chempound this weekend and I was wondering whether it might get a bit more sophisticated homepage.
I happen to have recently created a spin-off at my university to provide support to repositories (look&feel design among those services provided), http://www.grandir.com/EN/, and I was thinking we might -just might- put in some contribution to the Chempound homepage design. It's true our know-how is focused on DSpace, but if you tell me what software Chempound is based upon we could certainly give it a try. I know researchers don't care much about the homepage outlook, but it would do the project no harm to have a nice-looking interface for its data repository, similar to the way IRs look.
Hope it's useful.
Best wishes,
Pablo
----
Pablo de Castro
SONEX Workgroup for Scholarly Output Notification & Exchange
http://sonexworkgroup.blogspot.com/
&
Servicio de Recursos Electrónicos
Biblioteca de la Escuela Politécnica "Rey Pastor"
Universidad Carlos III de Madrid
Avda. de la Universidad, 30 28911 Leganés (Madrid)
Tfno. 91 624 90 81
e-mail: pca...@db.uc3m.es
Sam Adams <se...@cam.ac.uk> dijo:
Hi,
The first chempound repository is up online!
http://quixote.ch.cam.ac.uk/
Right now it contains several hundred Gaussian calculations from Anna Croft, and will soon have some calculations from Henry Rzepa, and some crystal structures. Over the next few days I'm going to be putting quite a lot of effort into improving the search tools, and early next week we should be in a position where I can give you a simple script to launch a local repository on your machine, together with a CLI tool to deposit data to it.
Right now the calculations' index pages just contain a few fairly arbitrary properties and parameters... as the parsers get developed further, and more information is formatted into compchem CML it will be straightforward to update these.
This is where we need your help!
What information do you want to see indexed / searchable / presented on the web pages? And can you find examples of it in log files (and even better, help extend the parsers to capture it and format it into compchem CML!)? Also, are there and properties that can easily be calculated / derived from the information in the log files that it would be useful for the repository to generate?
I'll be presenting all of this on Wednesday at Open Repositories 2011, in Austin, Texas.
Sam
Peter Murray-Rust <pm...@cam.ac.uk> dijo:
> I am strongly of the opinion thatw e need domain repositories and Chempound
> is the first chemistry repo and breaks lots of new ground. it is still a
> prototype (but designed for flexible growth). We are developing the
> functionality for displaying and searching the chemistry. It is at a
> relatively early stage and so things like descriptions, etc. are part of the
> communal knowledge of the Quixote project. So, for example, we are
> concentrating on things like:
> * analysing the incoming documents. This is hard. Traditional IRs don't do
> any document analysis
> * indexing on textual *and* non-textual stuff (scalars, arrays, numbers,
> booleans, dates, etc.). IRs don't do this
> * displaying aggregations of data (facets, pivots, etc.). Again traditonal
> IRs are useless for this - they assume repositories are museums or rare book
> collections. (I cannot even get my data out of DSpace in Cambridge!)
Even if most of the characteristic shortcomings of IRs you mention are
just circumstantial (maybe they don't do certain things because there
weren't planned to do them, not because they can't: a proper data
librarian might deal with most of those), I am not exactly advocating
for an IR for research data storing purposes here, but in principle
just for using some of the features IRs provide by default such as a
friendlier homepage design or search interface.
>> I was thinking we might -just might- put
>> in some contribution to the Chempound homepage design. It's true our
>> know-how is focused on DSpace, but if you tell me what software Chempound is
>> based upon we could certainly give it a try. I know researchers don't care
>> much about the homepage outlook, but it would do the project no harm to have
>> a nice-looking interface for its data repository, similar to the way IRs
>> look.
>>
>
> You are right - but we are still finding out what Chempound will do. It is
> much more like (say) Sourceforge or mercurial/git than
> DSpace/Fedora/ePrints.
>
> There is now a push for scientists to publish data. Chempound makes it easy
> to do this for large chunks of chemistry. So tying this into the publication
> process rather than de facto deposition is a key strategy.
>
> Will you be at OR11. Because if so, suggest you meet with Sam Adams.
I'm afraid I won't be in Austin - hopefully we'll be able to meet
somewhere after OR11 (OAI7 Geneva?)
>> Hope it's useful.
>>
>> Suggest you join the next Quixote skype meeting.
I'll try to be there, thanks.
Pablo
Even if most of the characteristic shortcomings of IRs you mention are just circumstantial (maybe they don't do certain things because there weren't planned to do them, not because they can't: a proper data librarian might deal with most of those), I am not exactly advocating for an IR for research data storing purposes here, but in principle just for using some of the features IRs provide by default such as a friendlier homepage design or search interface.
I was thinking we might -just might- put
in some contribution to the Chempound homepage design. It's true our
know-how is focused on DSpace, but if you tell me what software Chempound is
based upon we could certainly give it a try. I know researchers don't care
much about the homepage outlook, but it would do the project no harm to have
a nice-looking interface for its data repository, similar to the way IRs
look.
Thanks for the offer of assistance. The simplest thing you could do
right now is probably mock up some HTML pages showing your ideas.
Chempound uses the freemarker template system to generate its pages, and
can insert standard headers and footers on each page, making it pretty
straightforward to reskin the site. I'll be doing some work on
documenting the system next week.
Regarding the search pages - the version up right now is really just a
proof of concept. I'll be working on the search functions a lot over
the next few days.
Best regards,
Sam
> Servicio de Recursos Electr�nicos
> Biblioteca de la Escuela Polit�cnica "Rey Pastor"
> Universidad Carlos III de Madrid
> Avda. de la Universidad, 30 28911 Legan�s (Madrid)
Hi Sam,
Just noticed that it is still failing to search for me.
No problem if you are still working on it, maybe you could let us know when to try again.
I can give more details if needed.
Cheers,
Paul
--
Scanned by iCritical.
We've been doing some preliminary work on the Chempound homepage
design, see http://bit.ly/ltd9lW. Main idea would be to reuse some of
IR standard features -such as for instance usage- for Chempound.
We can discuss it at the skype meeting later today.
Best wishes,
Pablo
Sam Adams <se...@cam.ac.uk> dijo:
> Hi Pablo,
>> Servicio de Recursos Electrónicos
>> Biblioteca de la Escuela Politécnica "Rey Pastor"
>> Universidad Carlos III de Madrid
>> Avda. de la Universidad, 30 28911 Leganés (Madrid)
----
Pablo de Castro
Servicio de Recursos Electrónicos
Biblioteca de la Escuela Politécnica "Rey Pastor"
Universidad Carlos III de Madrid
Avda. de la Universidad, 30 28911 Leganés (Madrid)
Hi Sam,
We've been doing some preliminary work on the Chempound homepage design, see http://bit.ly/ltd9lW. Main idea would be to reuse some of IR standard features -such as for instance usage- for Chempound.
We can discuss it at the skype meeting later today.
Pablo Echenique <echen...@gmail.com> dijo:
> Hi everyone,
>
> let us fix the next Skype meeting this Thursday (tomorrow) at 1900Madrid,
> 1800London, 1700UTC.
> I will try to have an etherpad ready before that.
>
> There are lots of exciting things to discuss! Please try to be there,
> everybody is welcome.
>
> Best,
> Pablo.
>
Peter Murray-Rust <pm...@cam.ac.uk> dijo:
Sam
>>>>> Servicio de Recursos Electr�nicos
>>>>> Biblioteca de la Escuela Polit�cnica "Rey Pastor"
>>>>> Universidad Carlos III de Madrid
>>>>> Avda. de la Universidad, 30 28911 Legan�s (Madrid)
>>> Servicio de Recursos Electr�nicos
>>> Biblioteca de la Escuela Polit�cnica "Rey Pastor"
>>> Universidad Carlos III de Madrid
>>> Avda. de la Universidad, 30 28911 Legan�s (Madrid)
>>> Tfno. 91 624 90 81
>>> e-mail: pca...@db.uc3m.es
>>>
>>>
>>
>>
>> --
>> Peter Murray-Rust
>> Reader in Molecular Informatics
>> Unilever Centre, Dep. Of Chemistry
>> University of Cambridge
>> CB2 1EW, UK
>> +44-1223-763069
>>
>
>
>
> ----
> Pablo de Castro
> Servicio de Recursos Electr�nicos
> Biblioteca de la Escuela Polit�cnica "Rey Pastor"
> Universidad Carlos III de Madrid
> Avda. de la Universidad, 30 28911 Legan�s (Madrid)
Servicio de Recursos Electrónicos
Biblioteca de la Escuela Politécnica "Rey Pastor"
Universidad Carlos III de Madrid
Avda. de la Universidad, 30 28911 Leganés (Madrid)
----
Pablo de Castro
Servicio de Recursos Electrónicos
Biblioteca de la Escuela Politécnica "Rey Pastor"
Universidad Carlos III de Madrid
Avda. de la Universidad, 30 28911 Leganés (Madrid)
Tfno. 91 624 90 81
e-mail: pca...@db.uc3m.es
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
----
Pablo de Castro
Servicio de Recursos Electrónicos
Biblioteca de la Escuela Politécnica "Rey Pastor"
Universidad Carlos III de Madrid
Avda. de la Universidad, 30 28911 Leganés (Madrid)
Marcus