Hi Peter, Michel,
First, I just want to note my commit from last sunday
commit 3b1a5c12231ddb4fd424019364dd932a893262f5
Author: Jerven Bolleman <
m...@jerven.eu>
Date: Sun Dec 1 20:06:40 2013 +0100
Removed section about disulfide brige as it confuses a region with a feature
I agree with Michel, that a bond is not a region. I.e. it is a 3d dimension with its own rules.
The information we have about most bonds is very limited but something like this could work in UniProt.
SHA:9ED..8EC rdf:type up:Disulfide_Bond_Annotation ;
rdfs:comment "Redox-active" ;
up:disulfideBond … .
… rdf:type up:Disulfide_Bond ;
up:bindsSequenceAt range:22859153199216429tt74tt74, range:22859153199216429tt77tt77 .
range:22859153199216429tt74tt74 a faldo:ExactPosition
etc...
More examples inline
What we see here is 4 positions and one feature. Not exactly sure about the
feature but it could be something like this.
SHA:9ED..8EC rdf:type insdc:Bond ;
insdc:between protein:494379p284, protein:494379p305,protein:494379p309 .
insdc:heterogen protein:494379heterogenHG,10006
What this representation loses is the fact that the heterogen binds twice to the protein:494379p309 position.
If you want to model this then each bond is separate i.e.
protein:494379heterogenHGp10006Bond rdf:type insdc:Bond ;
insdc:heterogen protein:494379heterogenHG,10006 ;
rdfs:comment "HG,1006 ";
protein:494379heterogenHGp10006 rdf:type insdc:Heterogen ;
insdc:binding protein:494379heterogenHGp10006n1, protein:494379heterogenHGp10006n2 , protein:494379heterogenHGp10006n3,protein:494379heterogenHGp10006n4 .
protein:494379heterogenHGp10006n1 rdf:type insdc:HeterogenBinding ;
insdc:to protein:494379p284 .
protein:494379heterogenHGp10006n2 rdf:type insdc:HeterogenBinding ;
insdc:to protein:494379p305 .
protein:494379heterogenHGp10006n3 rdf:type insdc:HeterogenBinding ;
insdc:to protein:494379p309 .
protein:494379heterogenHGp10006n4 rdf:type insdc:HeterogenBinding ;
insdc:to protein:494379p309 .
> or,
>
>
> Het join(bond(268),bond(268),bond(272),bond(268))
> /heterogen="( HG,1014 )"
>
> I believe these are saying there is a heterogen mercury II ion (Hg)
> attached to the protein with four bonds, some of which are to the
> same residues.
>
> (I also noted the NCBI webpage's highlighting of the
> locations in the protein sequence seems to get confused
> on many of these examples).
>
> This seems to be the matching UniProt entry, but it doesn't
> appear to try to capture this heterogen information:
>
http://www.uniprot.org/uniprot/P69924
>
> Here's the PDB file, where the Hg shows up as HETATM lines,
> along with LINK and/or CONECT lines for the bonds:
>
>
http://www.pdb.org/pdb/explore/explore.do?structureId=1MRR
>
http://www.pdb.org/pdb/files/1MRR.pdb
>
> Jerven - are there any similar complex bond examples in UniProt
> we could compare this to?
Yes, but the flatfile annotation process has removed some of the
interesting information. I will ask our 3D expert tomorrow, she should be in.
>
> Note we can't give a FALDO position to the Hg ion here
> (can we?).
>
> My feeling is rather than describing chemical bonds as
> faldo:InBetweenPositions with a start and end, we need
> something symmetric, and more general. Perhaps that
> would be outside FALDO's scope, and it would just need
> to reuse the FALDO position definitions when bonding to
> a protein or nucleotide sequence?
I agree it gets into the what, which is important but deliberately
out of scope of FALDO. As we end up modelling, INSDC, UniProt & Ensembl
plus many others.
>
> (Also, can anyone think of any 'shared' bonds in the protein
> or nucleotide setting which might need more than two end
> points to describe - like the benzene ring?)
>
> Regards,
>
> Peter
Regards,
Jerven
PS. I have to note that I had never heard of the word hetrogen before.