string index out of range

87 views
Skip to first unread message

Alexandre

unread,
Oct 22, 2012, 1:40:41 PM10/22/12
to partiti...@googlegroups.com
Hi Rob,

I've been trying to use PF (using the right path this time) and after the program's initial procedures (loading configuration from .cfg file), it stucks with this message:

Removing Schemes in './analysis/schemes' (they will be recalculated from existing subset data)
Reading alignment file './Mit_Concat.phy'
Reading alignment file './analysis/start_tree/source.phy'
 Traceback (most recent call last):
   File "/Applications/PartitionFinderV1.0.1_Mac/PartitionFinder.py", line 23, in <module>
     sys.exit(main.main("PartitionFinder", "v1.0.1", "DNA"))
   File "/Applications/PartitionFinderV1.0.1_Mac/partfinder/main.py", line 156, in main
     options.processes)
   File "/Applications/PartitionFinderV1.0.1_Mac/partfinder/analysis.py", line 74, in __init__
     self.make_tree(cfg.user_tree_topology_path)
   File "/Applications/PartitionFinderV1.0.1_Mac/partfinder/analysis.py", line 120, in make_tree
     subset_with_everything)
   File "/Applications/PartitionFinderV1.0.1_Mac/partfinder/alignment.py", line 225, in __init__
     new_sequence = ''.join([old_sequence[i] for i in subset.columns])
IndexError: string index out of range

Could you tell me what is wrong (if any) with my .cfg file (I already sent you offlist)? May I reinforce that I've tried many different schemes, including more reduced ones, to see if I was pushing the limits too hard, but the message is quite the same (same lines called).

Thank you enormously.

Alex

Alexandre

unread,
Oct 22, 2012, 2:01:10 PM10/22/12
to partiti...@googlegroups.com
Just an update:

I've managed to run PF with a similar, but much poorer cfg file, with a nuclear dataset (considering only two schemes: all together and each gene separately). I'm unable to identify my mistake with the former mitochondrial set, and why can't I use more complex schemes such as 3rd position of each gene.

Thanks a lot :)

Rob Lanfear

unread,
Oct 22, 2012, 6:09:28 PM10/22/12
to partiti...@googlegroups.com
Hi Alex,

There's a problem in your .cfg file. You have 3747 sites in your alignment, but in your .cfg file you have some data_blocks that ask sites up to 3749.

Thanks for sending this through though - we thought that PF would give a sensible error message in this case, but it seems not to be. I'll fix that for the next release!

Cheers,

Rob
--
Rob Lanfear
Research Fellow,
Ecology, Evolution, and Genetics,
Research School of Biology,
Australian National University

www.robertlanfear.com

July-September
National Evolutionary Synthesis Center,
2024 W. Main Street,
Suite A200,
Durham, NC 27705-4667,
USA


alexandre pedro

unread,
Oct 22, 2012, 7:23:35 PM10/22/12
to partiti...@googlegroups.com
Oh, thank so much for the quick answer. I assume you don't need the log file anymore, but if you do, please let me know and i'll send you early tomorrow from my lab.

Thanks for letting me know what mistake I was making.

Bests,

Alex
--
Alexandre Pedro Selvatti Ferreira Nunes

Laboratório de Biologia Evolutiva Teórica e Aplicada
Departamento de Genética - Instituto de Biologia (UFRJ)
Prédio do CCS, Bloco A, Sala A2-095
Universidade Federal do Rio de Janeiro
Rua Prof. Rodolpho Paulo Rocco, S/N
Cidade Universitaria, Ilha do Fundão
Rio de Janeiro, RJ, Brasil
CEP: 21941-617


Rob Lanfear

unread,
Oct 23, 2012, 2:19:24 AM10/23/12
to partiti...@googlegroups.com
Hi Alex,

No - I don't need the log file.

Do keep the emails coming though. It's only from people telling us how they are using the program that we can see how we can make the error messages more helpful.

Cheers,

Rob

Alexandre

unread,
Oct 24, 2012, 12:13:14 PM10/24/12
to partiti...@googlegroups.com


Hi Rob, may I ask an additional question on this very topic, once is still about the same analysis.

Could you check if these blocks are determined in the right way? I mean, in an alignment of 3 genes, I want to test which partition scheme is better: 

-everything as a single block (one single or no partition)
-separate the third codon position from the rest (1+2)
-separate each gene (including the third position of each gene)
-separate each gene and each third codon position (max number of partitions)

I deliberately not assumed a partition between 1st and 2nd position both because my alignment is quite big and because there seems not to have variation even among the third codon positions, and this is the very reason for my question: please, could you check if my configuration of those schemes are right, because the results from PF has been always the first one (everything as a single partition). I'm using the BIC as a reference.

Here are the specific lines from the .cfg file:


charset gene1 = 1-2973\3;

charset gene2 = 2974-4349\3;

charset gene3 = 4350-5949\3;

charset gene1_3 = 3-2973\3;

charset gene2_3 = 2976-4349\3;

charset gene3_3 = 4352-5949\3;


## SCHEMES ##

allsame123 = (gene1, gene2, gene3, gene1_3, gene2_3, gene3_3);
allsame_3 = (gene1, gene2, gene3) (gene1_3, gene2_3, gene3_3);
each_gene123 = (gene1, gene1_3) (gene2, gene2_3) (gene3, gene3_3);
each_gene12_3 = (gene1) (gene2) (gene3) (gene1_3) (gene2_3) (gene3_3);


Thanks a million for this,

bests,

Alex

On Monday, October 22, 2012 3:40:41 PM UTC-2, Alexandre wrote:

Rob Lanfear

unread,
Oct 24, 2012, 7:36:45 PM10/24/12
to partiti...@googlegroups.com
Hi Alex,

Your partitions are not correctly defined. Right now you're ignoring second codon positions (assuming your data are all in frame, and that the 1st base of each is codon position 1). PF should be warning you about sites that are missing from your datablocks - you should take a look and make sure you understand the warnings you get!

The simplest (at least, most error-free) way to do it would be to define each codon position in each gene as a data_block, then describe the schemes you want to compare. 

E.g. 

gene1_pos1 = 1-2973\3;
gene1_pos2 = 2-2973\3;
gene1_pos3 = 3-2973\3;

etc.

THen you would define schemes that lump things together however you want.

By the way, if I were you I would also run the greedy algorithm on this. It almost always finds a scheme that's at least as good as anything you can think of, and it looks like it should be pretty quick on your dataset.

Cheers,

Rob

alexandre pedro

unread,
Oct 24, 2012, 8:21:58 PM10/24/12
to partiti...@googlegroups.com
Ok, Rob, thank you for the answers. So I must consider every codon position, even if I don't wnat to discern between 1+2 and 3. And thanks for the greedy tip, I was afraid it could get stuck in sub-optimal combinations due the computational effort, but after this I'll certainly give it a try and report any awkward result.

Thank you again,

bests,

Alex

Rob Lanfear

unread,
Oct 25, 2012, 6:06:00 PM10/25/12
to partiti...@googlegroups.com
Hi Alex,

No, you don't have to consider every codon position. You can also have composite data_blocks, like this:

gene1_pos1_and2 = 1-2973\3 2-2973\3;
gene1_pos3 = 3-2973\3;

However, in general I would advise against this for two reasons. (1) it's more likely to introduce human error; and (2) it precludes any computational methods (like considering all schemes, or using the greedy algorithm) from considering any schemes that split the first and second codon positions.

Cheers,

Rob

Alexandre

unread,
Oct 31, 2012, 2:09:21 PM10/31/12
to partiti...@googlegroups.com
Hello Rob,

After inserting the greedy search and the second codon position in this very example, I got a very conservative result: don't partition anything.

I just wanted to confirm with you if this is perfectly possible, assuming I have three nuclear genes for almost 1000 OTUs and considering all three codon positions, can it still show this result? I mean, is it possible my three nuclear genes have had a significantly evolutionary rate/substitution pattern for this particular group, so I don't need to worry partitioning this alignment?

And for last, I'm running it with a mitochondrial dataset and it's taking remarkably longer time. I'll let you know about its particular results so we can put these two on a discussion, if any.


Thanks for your attention,

Alex

On Monday, October 22, 2012 3:40:41 PM UTC-2, Alexandre wrote:

Rob Lanfear

unread,
Oct 31, 2012, 5:08:17 PM10/31/12
to partiti...@googlegroups.com
Hi Alexandre,

Any result is possible. It just depends on your dataset! It will also depend which information theoretic measure you use.

Cheers,

Rob

Karen Salazar

unread,
Oct 16, 2020, 7:44:48 PM10/16/20
to PartitionFinder
hello everybody, 
some one could explain me how I can do a strategy to mitochondrial PCGs: the 13 protein-coding genes excluding the third-codon positions combined with the 2 RNA genes (non-PCGs).  I have difficult to understand this although already I read many papers that do that to reduce the nucleotide heterogeneity. 
 But I already  before I do partitions for gen+codon position including PCGs and no-PCGs but including all codon position 

sheme to partition finder

gen1_pos1= 1-100/3;
gen1_pos2= 2-100/3;
gen1_pos3= 3-100/3;
gen2_pos1= 101-200/3;
gen2_pos1= 102-200/3;
gen2_pos1= 102-200/3;
gen3=201-400;

thank so much for the advises
Karen

Karen Salazar

unread,
Oct 16, 2020, 7:46:04 PM10/16/20
to PartitionFinder
I mean :
gen1_pos1= 1-100/3;
gen1_pos2= 2-100/3;
gen1_pos3= 3-100/3;
gen2_pos1= 101-200/3;
gen2_pos2= 102-200/3;
gen2_pos3= 102-200/3;
gen3=201-400;

Reply all
Reply to author
Forward
0 new messages