last one for tonight

Alan Ruttenberg

unread,

Jun 1, 2006, 1:21:25 AM6/1/06

to Ken I Fukuda, BioPAX Manchester, Emek Demir

I promise.

There are many empty sequenceLocations, such as the following.
Apparently the situation is that there is not enough expressivity in
BioPAX to support the site description (C terminal Serine
phosphorylation)

I think 1 of 2 things should happen.
a) Don't create a sequence location instance and leave feature
location blank

b) Create the sequence location, but move the Site:Ser and COOH-
terminal comments to the sequenceLocation, since they really are
commenting on the location of the feature.

Are these types of sequence location specifications handled in the
new state/generic proposals?

<sequenceFeature rdf:ID="id1716583517_R_smad_pS_sf1">
<FEATURE-LOCATION rdf:resource="#id1716583517_R_smad_pS_sl1"/>
<FEATURE-TYPE rdf:resource="#phosphorylated"/>
<COMMENT>AttributeSetNumber:1</COMMENT>
<COMMENT>Site:Ser</COMMENT>
<COMMENT>COOH-terminal</COMMENT>
</sequenceFeature>
<sequenceLocation rdf:ID="id1716583517_R_smad_pS_sl1"/>

Regards,
Alan

Emek Demir

unread,

Jun 1, 2006, 12:17:54 PM6/1/06

to Alan Ruttenberg, Ken I Fukuda, BioPAX Manchester

Alan,

This can have two meanings:

Any of many serines on the C-terminal. ( see generic proposal-
semiquantitative modifications for a repeated phosphorylation element
example) ( I don't care)

One particular serine on the C-terminal, but I do not know which. ( I
don't know)

You are right, this should be addressed in the sequenceLocation but not
on sequenceFeature. In the latter case, we must create a sequence
location object with unknown start - end points, and with comment
"c-terminal serine". As for the former, I do not have a good solution,
but this particular one is quite rare. I am not overly concerned about
it though, as we can always extend sequence location in the future, if
we have a lot of data in that form. We are not getting stuck. But it
would be nice to be able to address those.

I hope this helps,
ED

Alan Ruttenberg wrote:
> I promise.
>
> There are many empty sequenceLocations, such as the following.
> Apparently the situation is that there is not enough expressivity in
> BioPAX to support the site description (C terminal Serine
> phosphorylation)
>
> I think 1 of 2 things should happen.
> a) Don't create a sequence location instance and leave feature
> location blank
>
> b) Create the sequence location, but move the Site:Ser and

> COOH-terminal comments to the sequenceLocation, since they really are

Alan Ruttenberg

unread,

Jun 1, 2006, 12:38:12 PM6/1/06

to Emek Demir, Ken I Fukuda, BioPAX Manchester, biopax-...@cbio.mskcc.org

On Jun 1, 2006, at 12:17 PM, Emek Demir wrote:

> This can have two meanings:
>
> Any of many serines on the C-terminal. ( see generic proposal-
> semiquantitative modifications for a repeated phosphorylation
> element example) ( I don't care)

Is there a generic proposal to see? Have you posted it yet?

> One particular serine on the C-terminal, but I do not know which.
> ( I don't know)

It usually means this. Often things that sound like generics are
actually underspecified. Does the generic proposal address this
distinction?

> You are right, this should be addressed in the sequenceLocation but
> not on sequenceFeature. In the latter case, we must create a
> sequence location object with unknown start - end points, and with
> comment "c-terminal serine". As for the former, I do not have a
> good solution, but this particular one is quite rare.

Sorry, am not sure which "former" you are referring to. What is it
that is rare?

> I am not overly concerned about it though, as we can always extend
> sequence location in the future, if we have a lot of data in that
> form. We are not getting stuck. But it would be nice to be able to
> address those.

There is a fair amount of information that that talks about sequence
location in fuzzy terms. I think we should either find an existing
ontology of such terms (sequence ontology has some subclasses of
region, but not the term we are looking for) or define enough of them
so that, at a minimum, none of INOH needs to be put in comments.

Ken, is it easy to put together a list of which location terms you use?

-Alan

Emek Demir

unread,

Jun 1, 2006, 1:02:07 PM6/1/06

to Alan Ruttenberg, Ken I Fukuda, BioPAX Manchester, biopax-...@cbio.mskcc.org

> Is there a generic proposal to see? Have you posted it yet?
>

Sorry I was not clearly thinking. I meant
http://biopaxwiki.org/cgi-bin/moin.cgi/GenericParticipants. Yet Generic
proposal ( merged with states proposal) will be ready soon. Working on
it around the clock.

>> One particular serine on the C-terminal, but I do not know which. ( I
>> don't know)
>
> It usually means this. Often things that sound like generics are
> actually underspecified. Does the generic proposal address this
> distinction?

It is not explicitly stated but my discussion below I think is a
solution within the existing BioPAX.

>
>
>> You are right, this should be addressed in the sequenceLocation but
>> not on sequenceFeature. In the latter case, we must create a sequence
>> location object with unknown start - end points, and with comment
>> "c-terminal serine". As for the former, I do not have a good
>> solution, but this particular one is quite rare.
>
> Sorry, am not sure which "former" you are referring to. What is it
> that is rare?

former is I don't care, latter is I don't know. I don't care is rare, as
you also pointed out.

>
>> I am not overly concerned about it though, as we can always extend
>> sequence location in the future, if we have a lot of data in that
>> form. We are not getting stuck. But it would be nice to be able to
>> address those.
>
> There is a fair amount of information that that talks about sequence
> location in fuzzy terms.

Ok I was not clear again. I was not overly concerned about not
representing I don't care statements of this form. As for the I don't
know statements, I think we can :
just leave sequence location empty |
or introduce a fuzzy location class - this can be done in two ways :
specifying a CV term defining the region ( a domain name or region name,
e.g. C-terminal), specify a somewhere in between location ( like
somewhere between 30-60)).

> I think we should either find an existing ontology of such terms
> (sequence ontology has some subclasses of region, but not the term we
> are looking for) or define enough of them so that, at a minimum, none
> of INOH needs to be put in comments.
>

Such a CV would be good. I believe best thing would be adopting Uniprot
<FT> line keywords, they are fairly well thought. What do you think ?

Here is the link :
http://ca.expasy.org/sprot/userman.html#FT_line

Best,
ED

Ken I Fukuda

unread,

Jun 1, 2006, 1:54:32 PM6/1/06

to Alan Ruttenberg, BioPAX Manchester, Emek Demir

Hi there,

> There are many empty sequenceLocations, such as the following.
> Apparently the situation is that there is not enough expressivity in
> BioPAX to support the site description (C terminal Serine
> phosphorylation)
>
> I think 1 of 2 things should happen.
> a) Don't create a sequence location instance and leave feature
> location blank
>
> b) Create the sequence location, but move the Site:Ser and COOH-
> terminal comments to the sequenceLocation, since they really are
> commenting on the location of the feature.

Yes, I agree 100%. For the moment, we will take option a).

> Are these types of sequence location specifications handled in the
> new state/generic proposals?
>
> <sequenceFeature rdf:ID="id1716583517_R_smad_pS_sf1">
> <FEATURE-LOCATION rdf:resource="#id1716583517_R_smad_pS_sl1"/>
> <FEATURE-TYPE rdf:resource="#phosphorylated"/>
> <COMMENT>AttributeSetNumber:1</COMMENT>
> <COMMENT>Site:Ser</COMMENT>
> <COMMENT>COOH-terminal</COMMENT>
> </sequenceFeature>
> <sequenceLocation rdf:ID="id1716583517_R_smad_pS_sl1"/>

There are other types of sequence locations.
I will attach a list in a separate e-mail (by replying to Emek's post).

Ken

>
> Regards,
> Alan
>

---------------------------------------------
Ken Ichiro Fukuda, Ph.D.
Computational Biology Research Center (CBRC)
National Institute of
Advanced Industrial Science and Technology (AIST)
AIST Tokyo Waterfront Bio-IT Research Bldg. 10F
2-42 Aomi, Koutou-ku, Tokyo 135-0064 JAPAN
Phone: +81-3-3599-8049 FAX: +81-3-3599-8081
fukud...@aist.go.jp - http://www.cbrc.jp/~fukuda/index.html
- INOH Pathway Database Project -
- Integrating Network Objects with Hierarchies
- http://www.inoh.org

Ken I Fukuda

unread,

Jun 1, 2006, 2:26:26 PM6/1/06

to Emek Demir, Alan Ruttenberg, BioPAX Manchester, biopax-...@cbio.mskcc.org

Hi,

Firstly, for those who have not tried our INOH client
http://www.inoh.org/inohblog/main/2006/05/inoh_client_ver106.html
I will elaborate a bit on the INOH original data.

A protein node has a "SequenceFeature" attribute.
The value of this attribute has the following 6 columns.
"AttributeSetNumber":"Type":"Position":"Site":"Status":"Description"

"AttributeSetNumber" is a local ID for the "SequenceFeature" value.
It is required because you can have multiple "SequenceFeature" values.

Currently, "Type" can have the following types:
ADP-ribosylated
cholesterol modified
phosphorylated
DNA_bind
protein_bind
ubiquitinated
glycosylated
palmitoylated
cleavage site

"Position" is position.

For "Site" I attached a list. This information is in
BioPAX comment in the form of "Site:...".

"Status" comes from PSI-MI and we are not using this column.

A list of "Description" values is also attached to this mail.

> > I think we should either find an existing ontology of such terms
> > (sequence ontology has some subclasses of region, but not the term we
> > are looking for) or define enough of them so that, at a minimum, none
> > of INOH needs to be put in comments.
> >
> Such a CV would be good. I believe best thing would be adopting Uniprot
> <FT> line keywords, they are fairly well thought. What do you think ?
>
> Here is the link :
> http://ca.expasy.org/sprot/userman.html#FT_line

Acctually, INOH's "Type" value is based on Uniprot with some modifications.

We have a CV for binding-sites, such as transcription factor binding sites,
motif, etc. But I have to check if we are updating this list.

> Ken, is it easy to put together a list of which location terms you use?
>
> -Alan>

So here is the list for site and description.

Ken

all_site_description.txt

Ken I Fukuda

unread,

Jun 1, 2006, 2:36:22 PM6/1/06

to Alan Ruttenberg, Emek Demir, BioPAX Manchester, biopax-...@cbio.mskcc.org

Hi,

About SO (sequence ontology).

> There is a fair amount of information that that talks about sequence
> location in fuzzy terms. I think we should either find an existing
> ontology of such terms (sequence ontology has some subclasses of
> region, but not the term we are looking for) or define enough of them
> so that, at a minimum, none of INOH needs to be put in comments.

I requested in Nov 2004 to add famous binding sites such as
"AP-1 binding element", "BoxA element", "CarG box", "CRE binding element",
"SRE element" under SO's "TF_binding_site".

But this was declined for the following reason (quoting Karen's reply):
> 3.
> I personally think that enumerating the different instances of TF_binding_site
> is a bad idea as it will lead to an ever expanding ontology.
> If we look at 'gene' as an example, we do not enumerate all of the different
> genes. Genes have names and ids and are captured outside of the ontology.
> This is the example from GFF3:
>
> 2 ctg123 . gene 1000 9000 . + . ID=gene00001;Name=EDEN
> This example also shows how a TF_binding_site may be marked up:
> 3 ctg123 . TF_binding_site 1000 1012 . + . ID=tfbs00001;Parent=gene00001
>
> It would be possible to give this TF_binding_site a name.
> There exists a profile system called JASPAR which could be used to name the
> sites.
> http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D91
>

Ken Fukuda

unread,

Jun 1, 2006, 2:40:30 PM6/1/06

to BioPAX-M...@googlegroups.com

(There were some erros. Appologies for duplicated posts)

Hi,

Firstly, for those who have not tried our INOH client
http://www.inoh.org/inohblog/main/2006/05/inoh_client_ver106.html
I will elaborate a bit on the INOH original data.

A protein node has a "SequenceFeature" attribute.
The value of this attribute has the following 6 columns.
"AttributeSetNumber":"Type":"Position":"Site":"Status":"Description"

"AttributeSetNumber" is a local ID for the "SequenceFeature" value.
It is required because you can have multiple "SequenceFeature" values.

Currently, "Type" can have the following types:
ADP-ribosylated
cholesterol modified
phosphorylated
DNA_bind
protein_bind
ubiquitinated
glycosylated
palmitoylated
cleavage site

"Position" is position.

For "Site" I attached a list. This information is in
BioPAX comment in the form of "Site:...".

"Status" comes from PSI-MI and we are not using this column.

A list of "Description" values is also attached to this mail.

> > I think we should either find an existing ontology of such terms

> > (sequence ontology has some subclasses of region, but not the term we
> > are looking for) or define enough of them so that, at a minimum, none
> > of INOH needs to be put in comments.
> >
> Such a CV would be good. I believe best thing would be adopting Uniprot
> <FT> line keywords, they are fairly well thought. What do you think ?
>
> Here is the link :
> http://ca.expasy.org/sprot/userman.html#FT_line

Acctually, INOH's "Type" value is based on Uniprot with some modifications.

We have a CV for binding-sites, such as transcription factor binding sites,
motif, etc. But I have to check if we are updating this list.

> Ken, is it easy to put together a list of which location terms you use?
>
> -Alan>

So here is the list for site and description.

Ken

all_site_description.txt

Reply all

Reply to author

Forward