Re: Fwd: URLs of workflows for publication

0 views
Skip to first unread message

Carole Goble

unread,
Nov 11, 2013, 6:22:43 AM11/11/13
to Sarah Bourlat, Sonja Leidenberger, myexperimen...@googlegroups.com, Carole Goble
Sarah

we are moving to DOIs for myExperiment
but here is an immediate issue here for persistent URLs
Finn, Don and Sean -- this is urgent

Carole



Dear Carole,

We have just got reviewers comments back for our paper, and one of them was:

'I was unable to access the workflows at the supplied URLs (ll.147 &
149) to evaluate the details of the workflows. I would suggest
providing PURLs to be able to move web-content in the future without
breaking the link between the information in the manuscript and the
website.'

Now everybody is discussing this and there does not seem to be a consensus, see the emails below. Do you have a simple solution for this?

Many thanks,

Sarah & Sonja

Dr. Sarah J. Bourlat
Department of Biological and Environmental Sciences
University of Gothenburg
Box 463
SE-405 30 Göteborg
Sweden

Mobile: +46 (0)702147811
Office: +46 (0)317863827


Begin forwarded message:

From: Alan R Williams <alan.r....@manchester.ac.uk>
Subject: Re: URLs of workflows for publication
Date: November 11, 2013 11:24:00 AM GMT+01:00
Cc: Sonja Leidenberger <sonja.lei...@bioenv.gu.se>, Alex Hardisty <Hardi...@cardiff.ac.uk>, Matthias Obst <matthi...@bioenv.gu.se>, Jonathan Giddy <J.P....@cs.cardiff.ac.uk>, Abraham Nieva de la Hidalga <a_n...@hotmail.com>

We are discussing this now (as in this second) on the myExperiment skype call.

Alan

On 11/11/2013 10:20, Robert Haines wrote:
Hi all,

I've added Alan to the CC list.

To keep my reply short for now, I think we should be looking at DOIs for
this (http://www.doi.org/).

DOI is an ISO standard, libraries use them, publications have them and
we should be treating our published workflows like published papers. But
I do mean only *published* workflows. Assigning a DOI would be part of
the formal publishing process for our workflows and would get us in the
mindset of versioning things more carefully.

The SEEK platform can (or soon will be able to) assign DOIs and register
them so we will have experience of working with them in the myGrid team
soon.

Rob

On 08/11/2013 17:04, Francisco Quevedo wrote:
Hi Sarah,

Good  constructive feedback from the reviewer, although as I will
explain later on, I'm not sure that even using PURL, we can solve all
the problem we can have of this type. For that reason, I've just added
Jon, Abraham and Rob to the conversation so they can give us their
opinion.

To be honest, I didn't know nothing about PURLs, but after reading a
little bit I can see how by using PURLs, we can give to our users a
permanet URL (PURL) to access the content we want them to access, but
also offering us the possibility to move the real place where the
content is without the need to change the url that we gave to our users.
According to its definition:

"A /Persistent/ URL is an address on the World Wide Web that points to
other Web resources. If a Web resource changes location (and hence URL),
a PURL pointing to it can be updated. A user of a PURL always uses the
same Web address, even though the resource in question may have moved. "
http://purl.oclc.org/docs/help.htm

Technically this can be achieved by using a PURL server (basically a url
resolver), where our users access this PURL server and depending on the
path, it will redirect automatically the request to where the content
is. This type of architecture is used amongst others by the U.S.
Government Printing Office to provide stable URLs to online Federal
information (http://purl.fdlp.gov/docs/index.html)

As an example, imagine that we published a paper in which we say that
the BioVeL workflows can be accessed in the BioVeL portal by going to
the http address "http://tavlite1.biovel.eu/workflows". Then after a
while we decide that this name is not suitable and we change it to
"http://portal.biovel.eu". So, in the case we would have shut down the
tavlite1 server, a reader of our paper will not be able to access the
workflows described in the paper because the url we gave them it doesn't
any longer exist. Although, I want to say that this is not the situation
we have because we have not shut down tavlite1.  But I can see the case
that, if instead of giving them the specific address, we would have give
them a PURL address, eg: http://purl.biovle.eu/portal/workflows,  they
will still be able to access the workflows independently if we had move
or not the machine. For example initially the PURL server will map that
address to "http://tavlite1.biovel.eu/workflows", so if any user type
http://purl.biovle.eu/portal/workflows will be redirected to
http://tavlite1.biovel.eu/workflows, but if we change the portal url, it
will be enough to update the PURL entry and the users will still have
access to it by typing the same PURL address.

However, I want to say that we have achieved something similar to this
feature in the new portals that we have deployed in Amazon by using DNS.
Jon and Rob have set up a DNS in Amazon (Amazon Route 53) that redirect
the users' requests to the exact machine where we have our applications.
For example, when we write "http://portal.biovel.eu" the DNS resolves
that name and redirect the request to "https://portal1.at.biovel.eu/"
that is the real machine (well not exactly but for our case imagine it
is). So basically the DNS server is doing something similar to what the
PURL server would do. With this DNS is easy to change the machine to a
different one. Imagine we move portal1 to a more powerful machine called
portal5, by changing the entry in the DNS table, they user can still
using the url "http://portal.biovel.eu" but this time they will be
redirected to "https://portal5.at.biovel.eu/"  instead of portal1.

Anyway, why have I said at the beginning of the email that even by using
PURLs or using the DNS we can not solve all the problems? Well, I don't
know what is the specific case of the URLs (ll.147 & 149) in the paper,
what were they? But I can imagine the following scenario in which a URL
can be innaccesible even if we have used PURLs or DNS, and this is
basically if the resource is deleted instead of moved. Imagine the
following:

1) Renato publishes in myExperiment the version 18 of his ENM workflow,
and after the evaluation process (process that we haven't fully defined
yet) the workflow passes in myExperiment from the BioVeL internal group
to BioVeL group making it publicly available the 23th of October. At
that point, the workflow can be accessible at
http://www.myexperiment.org/workflows/3355.html for anyone who wants to
see it or download it.

2) Then let's say Matthias the 25th of October, add this workflow from
myExperiment in the BioVeL portal as one of his private workflows. Every
time that a new workflow is added to the portal, it gives a unique
worfklow id. Let's suppose the portal gives to new workflow the id 87,
so the workflow can be reach at
"https://portal1.at.biovel.eu/workflows/87" or its equivalent
"permanent" address "http://portal.biovel.eu/workflows/87". Matthias
then spends some days testing the workflow in the portal and let's say
the 1st of November he decides to made it public. Now, any user who
enters in the portal will see that workflow in the list of public
workflows and can run it. Matthias then also by that time writes a paper
in which he says that the results shown in that paper were obtained
executing the workflow "https://portal1.at.biovel.eu/workflows/87".

Here is my first point, the url we should have written in the paper it
should be the "permanent" one "http://portal.biovel.eu/workflows/87" and
not the "temporal" one "https://portal1.at.biovel.eu/workflows/87"
because if tomorrow, for example, we move the machine from portal1 to
portal5, and once the DNS entry (or the PURL entry if we used PURL) is
updated, the "permanent" url will still be valid whereas the temporal
no.  But let's suppose that we have written the "permanent" url in the
paper.

3) Then imagine that Renato the 15th of December releases a new version
of the ENM, the version 19, which has a really nice cool new features,
like allowing the user to set up the number of cross-validation to be
made depending on the number of unique occurrence points provided. This
new version is uploaded in myExperiment and after a while it is made
public. Then Matthias decided to have a go with it and test it in the
portal adding it as a new workflow in his private workflows. He adds it
as private workflow because until it is not fully tested he still want
that rest of the user see the previous version of the ENM (v18) in the
public workflow. So, the portal assigns then a new id to this new
workflow, lets say the id 93. After some testing , Matthias is happy
with the workflow and he makes it public. But now we have 2 public ENM
workflow in the portal, one for the ENM v18
(http://portal.biovel.eu/workflows/87) and other for ENM v19
(http://portal.biovel.eu/workflows/93), so we decide to delete the v18.
However, this is something we shouldn't do and instead we should have
make it private or perhaps create a new status like superseded or
something like that, in which the workflow will still be able to be run
for anyone by writing its url but it will be not shown in the list of
public workflows. I say this superseded status because I'm not sure if
its a private worklow, any user by writing the url could run it or only
its owner.

Anyway, if we delete the workflow from the portal, there isn't a way for
the users to access that workflow again, even if we use DNS or PURLs,
merely because it has been deleted and not moved, so we can not point to
it. Ok we can add an entry in the DNS or in the PURLs so if the user
want the workflow 87 (ENM v18) we give them workflow 93 (ENM v19), but I
think this is something wrong and it shouldn't be done. Becasue what
happens if in the paper it says that the workflow gave 10 output and now
it only gave 7 because the new version process them differently?

My question is, Is it that the situation with the URL's in the paper? Is
because the worfklow has been deleted or it is because the machine has
been changed?

I have been talking with Abraham about all of this, and he thinks and I
agree with him, that perhaps what we should reference in a paper is
where the workflow is in myExperiment or the service in BioCatalogue
rather than to the portal, mainly because 2 reasons:

a) The portal was conceived as a pilot project to show how to run
workflows easily in a web browser environment, and although it can keep
different versions of workflows. Its goal was not to act as a repository
where the workflows can be found there forever.  That is the aims of
myExperiment and BioCatalgoue (for workflows and services respectively).

b) MyExperiment and BioCatalogue apart form being repositories, they
also offer long term support. In other words, after the BioVeL project
finishes we don't have the commitment to keep the server running more
time of which is specified in the DoW, whereas the MyExperiment and
BioCatalogue should still be there longer than that.

So, perhaps the best thing is to say that results of the paper were
obtained using the version X of the workflow M that can be found at
http://www.myexperiment.org/workflows/XXXX.html
and it was run using the BioVeL portal at "http://portal.biovel.eu", and
perhaps mention also a wiki page that describes how to run it in the
portal or in the workbench. By this way a reviewer or a reader of our
papers will be get always the workflow and if they want, follow the
documentation and run the workflow in the environment they desire.
However, this implies that workflows shouldn't been deleted in
myExperiment under any circumstances, unless 100% sure they are not
refereed in any place. Anyway, this is only a suggestion.

To conclude, I don't know if after this long text I has been able to
clarify your doubt about PURL's or creates new one. However, just to sum
up I want to say that, by using PURLs or by using our current DNS
system, we can solve partially the problem of the URLs for the workflows
in the BioVeL portal as long as the resource is moved but not deleted.

Best wishes,
Fran



On 08/11/2013 10:01, Sarah Bourlat wrote:
Dear Fran,

We just got reviewers comments back for one of our papers on Niche
modelling, which we need to return within 2 months. One comment was:

'I was unable to access the workflows at the supplied URLs (ll.147 &
149) to evaluate the details of the workflows. I would suggest
providing PURLs to be able to move web-content in the future without
breaking the link between the information in the manuscript and the
website.'

What is a PURL and how can we provide it to avoid breaking the link
between the information in the manuscript and the website, every time
a page gets updated?

Many thanks for your help,

Sarah

Dr. Sarah J. Bourlat
Department of Biological and Environmental Sciences
University of Gothenburg
Box 463
SE-405 30 Göteborg
Sweden

Mobile: +46 (0)702147811
Office: +46 (0)317863827
Email: sarah....@bioenv.gu.se <mailto:sarah....@bioenv.gu.se>

http://www.bioenv.gu.se/english/staff/sarah-bourlat/

www.mg4u.eu <http://www.mg4u.eu>
www.biovel.eu <http://www.biovel.eu>








-- 
Professor Carole Goble FREng FBCS CITP
School of Computer Science
University of Manchester
Manchester, UK

tel: +44 161 275 6195
email: carole...@manchester.ac.uk
Reply all
Reply to author
Forward
0 new messages