MCOFFEE results page

10 views
Skip to first unread message

HOUSTON Douglas

unread,
Oct 20, 2020, 2:47:30 AM10/20/20
to tco...@googlegroups.com
Dear T-COFFEE Google group,

Can someone please explain the colours in the highlighting you get on the M-Coffee results page? Large stretches of no alignment are coloured "good", for example (see attached). I've looked at the paper but it says "sequences in red correspond to alignment portions with a strong support in the primary library" ... what does this mean?

Also, what are these number?:
pdb|6nru|Shigel   :  85
sp|Q16875|F263_   :  58
cons              :  65

And can someone please tell me where I can find an explanation of the SCORE, and what is a high score and a low score? Again, the paper says "A value of a 100 means full agreement between the considered alignment and its associated primary library" - so what does it mean if the score is 652, for example?

Best regards,

_________________________________________________________________
Dr. Douglas R. Houston
Senior Lecturer in Computational Biochemistry
Institute of Quantitative Biology, Biochemistry and Biotechnology
Room 2.12, Waddington 1 Building
King's Buildings
University of Edinburgh
Edinburgh, EH9 3BF, UK
Tel. 07986875743

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
screenshot.jpg

Cedric Notredame

unread,
Oct 20, 2020, 5:15:03 AM10/20/20
to HOUSTON Douglas, tco...@googlegroups.com

Dear Doug,


Thanks for your query. I realize we never properly documented the graphic rendering of the score. It is explained and validated in the recent TCS paper (https://academic.oup.com/mbe/article/31/6/1625/2925802) as well as in the original mcoffee paper (https://academic.oup.com/nar/article/34/6/1692/2401531), but to be fair, I am not sure we have ever posted a proper tutorial. I am putting this on the todolist.


Let me start with the first discrepancy: the score are normalized 1-100, but owing to the compressed nature of this score on large alignments, we have extended the scale to 1-1000 for the score related to the entire alignment, hence the 652, that should be 65,2. We could have made it a float but we were somehow limited with several third party packages, including ours that assume an integer and would have required substantial re-engineering to deal with a float. Unfortunatelly this was a very early format decision taken more than 20 years ago.


Now with respect to your example, you have aligned two sequences and the consistency between the alternative MSA of these sequences (these alternative alignments ARE the library) is 65.2%. This consistency is a combination of the sequence identity and the actual agreement between these alternative alignments.


In T-Coffee, the library can be whatever one considers relevant (all pairwise alignments in T-Coffee, alternative MSAs in M-Coffee, etc)


This score reflects the overall stability of the alignments across the various alternative alignments contained in the library. Unaligned regions have a high score when they are consistently unaligned across the library.


The cons line provides you an estimate of MSA stability across columns, while the global numbers near the top provide an indication of the stability of a given sequence (including cons). These numbers can be normalized in various ways. In the current scheme the normalization takes place against the sequence length hence the asymmetry.


To be honest this scheme was never developed for pairwise sequence alignments but it may give you a - weak -clue of the most reliable parts of your alignment.


Hope this helps and thanks for using T-Coffee. Do not hesitate to ask any more question,


Cheers,


Cedric

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. --
You received this message because you are subscribed to the Google Groups "Tcoffee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tcoffee+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tcoffee/DBAPR05MB7047AA701AFD2099055C091BA81F0%40DBAPR05MB7047.eurprd05.prod.outlook.com.
-- 
##########################################
Dr Cedric Notredame, PhD
Group Leader
Notredame's lab - Comparative Bioinformatics Group
Bioinformatics and Genomics Programme
Room 440.03

Centre de Regulació Genòmica (CRG)
Dr. Aiguader, 88
08003 Barcelona
Spain

Ph#     + 34 93 316 02 71
Fax#    + 34 93 316 00 99
Mobile# + 34 66 250 47 82

email   cedric.n...@crg.eu
url     www.tcoffee.org
blog    cedricnotredame.blogspot.com
ORC-ID: 0000-0003-1461-0988
###########################################
Reply all
Reply to author
Forward
0 new messages