So I downloaded PRO
(ftp://ftp.pir.georgetown.edu/databases/ontology/pro_obo/pro.obo) and
looked at it with Protege. I expected to find a deep hierarchy with
lots generic classes and many properties. Instead, I see a very flat
hierarchy with mostly highly specific classes and only one property.
Am I missing something? Is PRO just another catalogue?
So PRO is designed for the single molecule level. Do they intend
stoichiometric coefficients to be properties of restrictions on
reactions rather than properties of reactions themselves? Is it useful
at all for the ensemble level?
Take care
Oliver
--
Oliver Ruebenacker, Computational Cell Biologist
BioPAX Integration at Virtual Cell (http://vcell.org/biopax)
Center for Cell Analysis and Modeling
http://www.oliver.curiousworld.org
On Tue, Mar 24, 2009 at 2:52 AM, Alan Ruttenberg
<alanrut...@gmail.com> wrote:
> You need the annotations file too.
Which file do you mean? PAF.txt?
> They've coded stuff as xrefs that
> should be changed to relations.
You mean, you don't like the way PRO is now either?
> Also you should read some of their web
> site and paper.
I read some of their website and are scratching my head why the PRO
class hierarchy seems to be only two levels deep after "protein" when
they talk about four levels.
Anyway, they say PRO is single molecule level. Is it then any use to
Systems Biologists who work on the ensemble level?
What exactly is PRO supposed to teach us? How is their class
hierarchy special?
Oliver, you should see by now that I don't like the way a lot of
things are. Otherwise I wouldn't be working to fix things. So I try to
identify which projects and strategies are most likely to advance
where we are.
I have a date with the PRO developers to address their representation.
Their curation is very good and this is a matter of rendering to be
fixed. But the underlying principle - that proteins as entities need
to be identified as distinct from records about them is sound.
For example, consider the section marked sequence annotations in
http://www.uniprot.org/uniprot/P04637
What are the entities that are being described? Are they all the sort
of thing that coexist in one person? Consider the region described as
associated with 1-44. What do the "natural variations" with indices
between 1 and 4 have to do with the function ascribed to 1-44? Are
there papers that describe the molecular machine that has the natural
variation at 35?
>
>> Also you should read some of their web
>> site and paper.
>
> I read some of their website and are scratching my head why the PRO
> class hierarchy seems to be only two levels deep after "protein" when
> they talk about four levels.
>
> Anyway, they say PRO is single molecule level. Is it then any use to
> Systems Biologists who work on the ensemble level?
SMBL and others describe species that are quite often identified as
collections of molecules of a certain sort. The molecules implied by
the natural variations lines behave quite differently than those
without the variations (namely they occur in cancer and are perhaps
involved in mechanisms unique to cancer). A model of cancer ought to
behave differently than a model in which cancer does not occur. The
differences that explain it need to be somewhere.
> What exactly is PRO supposed to teach us? How is their class
> hierarchy special?
At the base PRO gives identifiers for distinct molecular machines or
their parts. This is a sound basis upon which to record information
about function. Without anything else this is useful.
Regards,
Alan
On Tue, Mar 24, 2009 at 9:21 AM, Alan Ruttenberg
<alanrut...@gmail.com> wrote:
> Oliver, you should see by now that I don't like the way a lot of
> things are. Otherwise I wouldn't be working to fix things. So I try to
> identify which projects and strategies are most likely to advance
> where we are.
Sure, we all want to change the world. So the point of interest is
less what PRO is but what it may have the potential to become?
> But the underlying principle - that proteins as entities need
> to be identified as distinct from records about them is sound.
I understand entities are distinct form records, but I don't see how
their identifications are different from records. So I don't
understand what the difference is effectively.
> For example, consider the section marked sequence annotations in
> http://www.uniprot.org/uniprot/P04637
>
> What are the entities that are being described? Are they all the sort
> of thing that coexist in one person? Consider the region described as
> associated with 1-44. What do the "natural variations" with indices
> between 1 and 4 have to do with the function ascribed to 1-44? Are
> there papers that describe the molecular machine that has the natural
> variation at 35?
What is a molecular machine?
I understand there is missing information, but I don't understand
how that relates to the entity versus record distinction. Besides, is
it possible that the information is missing because it is not known?
> SMBL and others describe species that are quite often identified as
> collections of molecules of a certain sort. The molecules implied by
> the natural variations lines behave quite differently than those
> without the variations (namely they occur in cancer and are perhaps
> involved in mechanisms unique to cancer). A model of cancer ought to
> behave differently than a model in which cancer does not occur. The
> differences that explain it need to be somewhere.
Sure, that information needs to be somewhere. How about reporting
the UniProt variant number?
> At the base PRO gives identifiers for distinct molecular machines or
> their parts. This is a sound basis upon which to record information
> about function. Without anything else this is useful.
I don't understand how that is different from what UniProt intends to do.
On Tue, Mar 24, 2009 at 1:28 PM, Alan Ruttenberg
<alanrut...@gmail.com> wrote:
> I'm not sure what to say other than "think harder".
I'm not sure what to say other than "try harder to communicate what you mean".
On Tue, Mar 24, 2009 at 2:19 PM, Alan Ruttenberg
<alanrut...@gmail.com> wrote:
>
> We're meeting in a couple of hours. Let's talk then.
I'm sure we will.
> I'm willing to
> try harder, though I'm mystified as to why this particular topic comes
> up over and over.
Maybe because the issue has never been resolved?
> Can you bring a use case to the table so we have
> something concrete that we can discuss the matter in reference to?
OK, how about:
http://www.reactome.org/cgi-bin/eventbrowser?DB=gk_current&FOCUS_SPECIES=Homo%20sapiens&ID=177934&
> Remember, the BioPAX OBO effort is an effort to build an *ontology*,
> and the job of an ontology, at least the sort we work on in the
> foundry, is to make it very clear what we are talking about. And it is
> very clear that some bits and a protein are different sort of things.
Well, I've never heard about a project with the goal of being unclear.
> As one more example, consider the fact that there are many resources
> that have records about protein. How do you suggest that we say that
> both some of the protein identifiers enumerated at
>
> http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gene&Cmd=ShowDetailView&TermToSearch=7157&ordinalpos=1&itool=EntrezSystem2.PEntrez.Gene.Gene_ResultsPanel.Gene_RVDocSum#refseq
>
> and
>
> http://www.uniprot.org/uniprot/P04637
>
> refer to the same thing? With owl:sameAs? Seems that would be wrong,
> wouldn't it?
>
> Shouldn't there be something that they are both *about*? And shouldn't
> we document exactly what those things are?
Already BioPAX has XREF and sub properties, with entities as domain
and records as range. So BP clearly does distinguish between entity
and record.
What is the alternative? Have PRO include a class for any variation
of a protein you may possibly be interested in? Is PRO going to
provide a class for UniProt's P04637 with 44 sub classes? Or 44*35?
What if a protein has 10 sites for potential phosphorylation, is PRO
going to provide 2^10 sub classes?
We're meeting in a couple of hours. Let's talk then. I'm willing to
try harder, though I'm mystified as to why this particular topic comes
up over and over. Can you bring a use case to the table so we have
something concrete that we can discuss the matter in reference to?
Remember, the BioPAX OBO effort is an effort to build an *ontology*,
and the job of an ontology, at least the sort we work on in the
foundry, is to make it very clear what we are talking about. And it is
very clear that some bits and a protein are different sort of things.
As one more example, consider the fact that there are many resources
that have records about protein. How do you suggest that we say that
both some of the protein identifiers enumerated at
http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gene&Cmd=ShowDetailView&TermToSearch=7157&ordinalpos=1&itool=EntrezSystem2.PEntrez.Gene.Gene_ResultsPanel.Gene_RVDocSum#refseq
and
refer to the same thing? With owl:sameAs? Seems that would be wrong,
wouldn't it?
Shouldn't there be something that they are both *about*? And shouldn't
we document exactly what those things are?
-Alan
On Tue, Mar 24, 2009 at 2:06 PM, Oliver Ruebenacker <cur...@gmail.com> wrote:
>
> Hello Alan, All,
>
> On Tue, Mar 24, 2009 at 1:28 PM, Alan Ruttenberg
> <alanrut...@gmail.com> wrote:
>> I'm not sure what to say other than "think harder".
>
> I'm not sure what to say other than "try harder to communicate what you mean".
>
> Take care
> Oliver
>
> --
> Oliver Ruebenacker, Computational Cell Biologist
> BioPAX Integration at Virtual Cell (http://vcell.org/biopax)
> Center for Cell Analysis and Modeling
> http://www.oliver.curiousworld.org
>
> >
>
I am sure that for the intents and purposes of the NCBI and the EBI
are not equivalent. Certainly neither of them consider themselves
redundant. Does equivalent means one should be shut down? So I contest
the "all" in your claim.
Are you suggesting that one *does* use sameAs on these two identifiers?
-Alan
Michel,
You are flaming away at this issue, for which there has been much
discussion. If you meant to speak for yourself you could have made
that more clear.
s/They are, for all intensive purposes (of the perspective of a
biological scientist), equivalent/They are, for all intensive purposes
(from my perspective as a biological scientist), equivalent/
If I misinterpreted this message, I apologize. However you seem to
speaking rather globally about this issue here an on other forums. For
example:
"In the life sciences, scientists don't care about database records -
they care about the molecules and the biological processes for which
facts have been collected about."
Even this is overbroad. For computational biologists I have worked
with, with repeatable results only possible against a particular
version of a record or genome build, records *are* important.
I also want to clarify a comment you made about shared names (please
consider writing to our mailing list where discussion of the effort
happens)
"I, like several others, am interested to see how the committee will
"make sure that its URIs ... resolve to information that is useful". I
expect that this will be challenging to establish utility,
particularly in the context of a term contained in an expressive
ontology."
First, the domain of the shared names effort is database records, not
entities named in ontologies. The "useful" information we have
specified and agreed upon concerns how to find a type assertion,
pointers to specific encodings of the record, and pointers to third
party metadata about the record. It is not within scope, for instance,
to translate entry gene to RDF. The scope of shared names is simply to
point to any such encodings in a way that clients can make decisions
about which they want to retrieve.
I'm happy to talk with you in more detail about the effort. But I
would prefer that you attempt to understand its scope and content
before passing judgement.
Incidentally, Marc-Alexandre is on our steering committee,
representing, to the extent he can, Bio2RDF.