Complex protein bond 'locations' in FALDO

1 view

Skip to first unread message

Peter Cock

unread,

Dec 9, 2013, 6:33:49 AM12/9/13

to fa...@googlegroups.com

Hi all,

One minor quibble I have with the current FALDO text is the disulfide
bond example implicitly seems to assign a start and end to a bond,
yet to my mind that ordering is arbitrary. (Which end of a simple
bond is the start, and which is the end? Surely it is symmetric which
means there are two equally valid orientations, and thus two ways
to describe the bond in FALDO - which seems bad.)

Here's a more complex example of a protein modification where
heterogens are attached by one or more bonds to protein chains.

I was looking at this specific example because a Biopython parser
problem was recently reported for this sequence, where we have
an error with some of the bond locations:
http://lists.open-bio.org/pipermail/biopython-dev/2013-December/010995.html
...
http://lists.open-bio.org/pipermail/biopython-dev/2013-December/010996.html

See:

http://www.ncbi.nlm.nih.gov/protein/494379 (1MRR chain A)
http://www.ncbi.nlm.nih.gov/protein/494380 (1MRR chain B)

I was wondering what INSDC locations like this mean,
and how we would express them with FALDO, e.g.:

Het join(bond(284),bond(305),bond(309),bond(305))
/heterogen="( HG,1006 )"

or,

Het join(bond(268),bond(268),bond(272),bond(268))
/heterogen="( HG,1014 )"

I believe these are saying there is a heterogen mercury II ion (Hg)
attached to the protein with four bonds, some of which are to the
same residues.

(I also noted the NCBI webpage's highlighting of the
locations in the protein sequence seems to get confused
on many of these examples).

This seems to be the matching UniProt entry, but it doesn't
appear to try to capture this heterogen information:
http://www.uniprot.org/uniprot/P69924

Here's the PDB file, where the Hg shows up as HETATM lines,
along with LINK and/or CONECT lines for the bonds:

http://www.pdb.org/pdb/explore/explore.do?structureId=1MRR
http://www.pdb.org/pdb/files/1MRR.pdb

Jerven - are there any similar complex bond examples in UniProt
we could compare this to?

Note we can't give a FALDO position to the Hg ion here
(can we?).

My feeling is rather than describing chemical bonds as
faldo:InBetweenPositions with a start and end, we need
something symmetric, and more general. Perhaps that
would be outside FALDO's scope, and it would just need
to reuse the FALDO position definitions when bonding to
a protein or nucleotide sequence?

(Also, can anyone think of any 'shared' bonds in the protein
or nucleotide setting which might need more than two end
points to describe - like the benzene ring?)

Regards,

Peter

Michel Dumontier

unread,

Dec 9, 2013, 1:41:50 PM12/9/13

to Peter Cock, fa...@googlegroups.com

The disulfide bond isn't really a region as much as an interaction. an interaction can be an object (has-part), a processe (has-participant), or an association (refers-to). SIO lists disulfide bond as an object, so the relation is has-part.

--
You received this message because you are subscribed to the Google Groups "FALDO" group.
To unsubscribe from this group and stop receiving emails from it, send an email to faldo+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Michel Dumontier

Associate Professor of Medicine (Biomedical Informatics), Stanford University

Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group

http://dumontierlab.com

Jerven Bolleman

unread,

Dec 9, 2013, 3:36:58 PM12/9/13

to Michel Dumontier, Peter Cock, fa...@googlegroups.com

Hi Peter, Michel,

First, I just want to note my commit from last sunday

commit 3b1a5c12231ddb4fd424019364dd932a893262f5
Author: Jerven Bolleman <m...@jerven.eu>
Date: Sun Dec 1 20:06:40 2013 +0100

Removed section about disulfide brige as it confuses a region with a feature

I agree with Michel, that a bond is not a region. I.e. it is a 3d dimension with its own rules.
The information we have about most bonds is very limited but something like this could work in UniProt.

SHA:9ED..8EC rdf:type up:Disulfide_Bond_Annotation ;
rdfs:comment "Redox-active" ;
up:disulfideBond … .
… rdf:type up:Disulfide_Bond ;
up:bindsSequenceAt range:22859153199216429tt74tt74, range:22859153199216429tt77tt77 .
range:22859153199216429tt74tt74 a faldo:ExactPosition
etc...
More examples inline

What we see here is 4 positions and one feature. Not exactly sure about the
feature but it could be something like this.

SHA:9ED..8EC rdf:type insdc:Bond ;
insdc:between protein:494379p284, protein:494379p305,protein:494379p309 .
insdc:heterogen protein:494379heterogenHG,10006

What this representation loses is the fact that the heterogen binds twice to the protein:494379p309 position.
If you want to model this then each bond is separate i.e.

protein:494379heterogenHGp10006Bond rdf:type insdc:Bond ;
insdc:heterogen protein:494379heterogenHG,10006 ;
rdfs:comment "HG,1006 ";
protein:494379heterogenHGp10006 rdf:type insdc:Heterogen ;
insdc:binding protein:494379heterogenHGp10006n1, protein:494379heterogenHGp10006n2 , protein:494379heterogenHGp10006n3,protein:494379heterogenHGp10006n4 .
protein:494379heterogenHGp10006n1 rdf:type insdc:HeterogenBinding ;
insdc:to protein:494379p284 .
protein:494379heterogenHGp10006n2 rdf:type insdc:HeterogenBinding ;
insdc:to protein:494379p305 .
protein:494379heterogenHGp10006n3 rdf:type insdc:HeterogenBinding ;
insdc:to protein:494379p309 .
protein:494379heterogenHGp10006n4 rdf:type insdc:HeterogenBinding ;
insdc:to protein:494379p309 .

> or,
>
>
> Het join(bond(268),bond(268),bond(272),bond(268))
> /heterogen="( HG,1014 )"
>
> I believe these are saying there is a heterogen mercury II ion (Hg)
> attached to the protein with four bonds, some of which are to the
> same residues.
>
> (I also noted the NCBI webpage's highlighting of the
> locations in the protein sequence seems to get confused
> on many of these examples).
>
> This seems to be the matching UniProt entry, but it doesn't
> appear to try to capture this heterogen information:
> http://www.uniprot.org/uniprot/P69924
>
> Here's the PDB file, where the Hg shows up as HETATM lines,
> along with LINK and/or CONECT lines for the bonds:
>
> http://www.pdb.org/pdb/explore/explore.do?structureId=1MRR
> http://www.pdb.org/pdb/files/1MRR.pdb
>
> Jerven - are there any similar complex bond examples in UniProt
> we could compare this to?

Yes, but the flatfile annotation process has removed some of the
interesting information. I will ask our 3D expert tomorrow, she should be in.

>
> Note we can't give a FALDO position to the Hg ion here
> (can we?).
>
> My feeling is rather than describing chemical bonds as
> faldo:InBetweenPositions with a start and end, we need
> something symmetric, and more general. Perhaps that
> would be outside FALDO's scope, and it would just need
> to reuse the FALDO position definitions when bonding to
> a protein or nucleotide sequence?

I agree it gets into the what, which is important but deliberately
out of scope of FALDO. As we end up modelling, INSDC, UniProt & Ensembl
plus many others.

>
> (Also, can anyone think of any 'shared' bonds in the protein
> or nucleotide setting which might need more than two end
> points to describe - like the benzene ring?)
>
> Regards,
>
> Peter

Regards,
Jerven

PS. I have to note that I had never heard of the word hetrogen before.

Peter Cock

unread,

Dec 9, 2013, 4:02:23 PM12/9/13

to Jerven Bolleman, Michel Dumontier, fa...@googlegroups.com

On Mon, Dec 9, 2013 at 8:36 PM, Jerven Bolleman <m...@jerven.eu> wrote:
> Hi Peter, Michel,
>
> First, I just want to note my commit from last sunday
>
> commit 3b1a5c12231ddb4fd424019364dd932a893262f5
> Author: Jerven Bolleman <m...@jerven.eu>
> Date: Sun Dec 1 20:06:40 2013 +0100
>
> Removed section about disulfide brige as it confuses
> a region with a feature

Oh good - I missed that :)

> I agree with Michel, that a bond is not a region. I.e. it is
> a 3d dimension with its own rules.

Yes, that was my concern.

> The information we have about most bonds is very limited
> but something like this could work in UniProt.
>
> SHA:9ED..8EC rdf:type up:Disulfide_Bond_Annotation ;
> rdfs:comment "Redox-active" ;
> up:disulfideBond … .
> … rdf:type up:Disulfide_Bond ;
> up:bindsSequenceAt range:22859153199216429tt74tt74, range:22859153199216429tt77tt77 .
> range:22859153199216429tt74tt74 a faldo:ExactPosition
> etc...

Looks better :)

>> http://www.ncbi.nlm.nih.gov/protein/494379 (1MRR chain A)
>> http://www.ncbi.nlm.nih.gov/protein/494380 (1MRR chain B)
>>
>> I was wondering what INSDC locations like this mean,
>> and how we would express them with FALDO, e.g.:
>>
>> Het join(bond(284),bond(305),bond(309),bond(305))
>> /heterogen="( HG,1006 )"
>
> What we see here is 4 positions and one feature. Not exactly
> sure about the feature but it could be something like this.
>
> SHA:9ED..8EC rdf:type insdc:Bond ;
> insdc:between protein:494379p284, protein:494379p305,protein:494379p309 .
> insdc:heterogen protein:494379heterogenHG,10006
>
> What this representation loses is the fact that the heterogen
> binds twice to the protein:494379p309 position.
> If you want to model this then each bond is separate i.e.

Yes - lots of things to be careful of :(

>> Jerven - are there any similar complex bond examples in
>> UniProt we could compare this to?
>
> Yes, but the flatfile annotation process has removed some of the
> interesting information. I will ask our 3D expert tomorrow, she
> should be in.

Fingers crossed.

>> Note we can't give a FALDO position to the Hg ion here
>> (can we?).
>>
>> My feeling is rather than describing chemical bonds as
>> faldo:InBetweenPositions with a start and end, we need
>> something symmetric, and more general. Perhaps that
>> would be outside FALDO's scope, and it would just need
>> to reuse the FALDO position definitions when bonding to
>> a protein or nucleotide sequence?
>
> I agree it gets into the what, which is important but deliberately
> out of scope of FALDO. As we end up modelling, INSDC,
> UniProt & Ensembl plus many others.