VectorSiteContainer::getSequencePosition()

33 views
Skip to first unread message

omotoso olatunde

unread,
May 20, 2021, 12:05:37 AM5/20/21
to Bio++ Usage Help Forum
Good morning,

Please I have challenge parsing "option_file" to CoMap for co-substitution analysis.
Here is the std-out
***********************************************************
* This is CoMap        version 1.5.2       date: 11/03/15 *
*     A C++ shell program to detect co-evolving sites.    *
***********************************************************

Parsing options:
Parsing file option_file for options.


-*- Retrieve data and model -*-

Tree file..............................: /media/science/enhancer/workfile/myedit/comaptree/ENSG00000284701.txt.tree
Number of leaves.......................: 25
Number of sons at root.................: 2
Alphabet type .........................: DNA
Sequence file .........................: /media/science/enhancer/workfile/myedit/comapseq/ENSG00000284701_NT_filter.fasta
Sequence format .......................: FASTA file
Sites to use...........................: all
Remove sites with gaps.................:
[======================================] 100%Done.
Remove unresolved sites................:
[======================================] 100%Done.
Heterogeneous model....................: no
Substitution model.....................: GTR
External frequencies initialization for: None
Parameter found........................: GTR.a=1
Parameter found........................: GTR.b=1
Parameter found........................: GTR.c=1
Parameter found........................: GTR.d=1
Parameter found........................: GTR.e=1
Constraint match at parameter GTR.theta, badValue = 1.000000 ]1e-06; 0.999999[
Parameter found........................: GTR.theta=0.999999
Constraint match at parameter GTR.theta1, badValue = 1.000000 ]1e-06; 0.999999[
Parameter found........................: GTR.theta1=0.999999
Constraint match at parameter GTR.theta2, badValue = 1.000000 ]1e-06; 0.999999[
Parameter found........................: GTR.theta2=0.999999
Distribution...........................: Gamma
Number of classes......................: 4
Parameter found........................: Gamma.alpha=1
- Category 0 (Pr = 0.25) rate..........: 0.136954
- Category 1 (Pr = 0.25) rate..........: 0.476752
- Category 2 (Pr = 0.25) rate..........: 1
- Category 3 (Pr = 0.25) rate..........: 2.38629
Rate distribution......................: Gamma
Number of classes......................: 4
SequenceNotFoundException: VectorSiteContainer::getSequencePosition().(Panthera_tigris)

first, I don't know what it means by "VectorSiteContainer" and don't know how to fix it.
Second, is my substitution model input correct? what I want is  GTR + gamma + I  to replicate the MBE (2005) paper on CoMap. All help is highly appreciated. 

Thank you very much

Julien Y. Dutheil

unread,
May 20, 2021, 1:33:13 AM5/20/21
to Bio++ Usage Help Forum
Hi,

This error message typically means that the sequence names in the imput tree do not match the ones in the sequence file. Here, it seems that the tree has a leaf named Panthera_tigris, which is not found in the alignment. Sometimes, this error is due to a wrong file format (like a windows file on a linux machine), which adds invisible cariage return characters at the the end of the sequence names.

I hope this helps,

Julien.

omotoso olatunde

unread,
May 20, 2021, 11:52:17 PM5/20/21
to Bio++ Usage Help Forum
Thank you @Professor Julien, you were right, some names were modified by the Time Tree that I forgot to change in the fasta file.
I just used a single file to test-run the  program, I plan to iterate it over some 10,000 files using "for loop". However, my current concern is the substitution model; how do I indicate as part of the model, the invariant regions (I)?  I plan to run the GTR + G + I, but can't figure out the I from the manual 

"""from the standout, this indicate the Gamma:

 Distribution...........................: Gamma
Number of classes......................: 4
Parameter found........................: Gamma.alpha=1
- Category 0 (Pr = 0.25) rate..........: 0.136954
- Category 1 (Pr = 0.25) rate..........: 0.476752
- Category 2 (Pr = 0.25) rate..........: 1
- Category 3 (Pr = 0.25) rate..........: 2.38629
Rate distribution......................: Gamma
Number of classes......................: 4""""

looking forward to your feedback, I am very grateful.
Reply all
Reply to author
Forward
0 new messages