Query length error in EPA-NG

277 views
Skip to first unread message

Alicia S. Arroyo

unread,
Nov 4, 2017, 1:03:15 PM11/4/17
to raxml
Hello, 

I'm trying to run the new epa-ng with a small environmental 18S dataset and it pops up the same error. 

terminate called after throwing an instance of 'std::runtime_error'
  what():  Tried to insert sequence to MSA of unequal length: TCCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTAAAGTTGTTGCAGTTAAAACGCTCGTAGTTGGACGTTTGCGTTTCGCGATCTGGTTCGCCGGAGGGCGGGAGGCATGTGGGGCTGAGGGCAACCTGGGGCCCCTTGTTACTGTGAAGAAATTAGAGTGTTTAAAGCGGGCGTTTGCTTGAATACGAGAGCATGGAATAACGGCAGTGTTGTCGTTGGTCAGGACAAGATGATTGATAGGGACAGGCGGGGGCATTTGTATTTGGCAGTCAGAGGTGAAATTCTTGGATTTGGCAAAGACGAACTAGTGCAAAGGCATTTGCCAAGGATGTTTTCATTAATCAAGAACGAAAGTTAG
Aborted (core dumped)

It doesn't matter what query file I put, it always says the same error, which is always about the second sequence of the query file... 

I don't get the problem, any idea?

Thank you very much, 


A. 





Lucas Czech

unread,
Nov 4, 2017, 3:22:05 PM11/4/17
to ra...@googlegroups.com

Hi Alicia,

that looks like your alignment is not an alignment, at least according to that message. Did you check whether all sequences (references and queries) are aligned to each other, i.e., if each sequences involved has the same length?

From the exemplary sequence that you posted, it looks there are no gaps in this sequences. This hints that it is not aligned yet. If you haven't aligned your queries yet, you can use PaPaRa for this: https://sco.h-its.org/exelixis/web/software/papara/index.html

Cheers
Lucas

--
You received this message because you are subscribed to the Google Groups "raxml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alicia S. Arroyo

unread,
Nov 6, 2017, 5:08:13 AM11/6/17
to raxml

Hi Lucas,

Yes, that was the problem. Sorry xD 

However, I got a weird non-sense .jplace. All the placements go to the same node with the same LWR more or less... I run the same dataset with RAxML and the -f v option, just for checking, and the jplace is more logical. So I don't get why epa-ng doesn't compute the same output. 
I want to use epa-ng instead of RAxML -f v because I want to discard the low LWR placements, and I saw that epa-ng has this --filter-min-lwr option, which is perfect for what I need. 

This is the commands I used:
epa-ng -s ssu_globalEuk_ALtrimAl.fasta -t EukTree.nwk -q papara_alignment.Sanabria1000_90_reduced.fasta -O --filter-min-lwr 0.3
(it doesn't matter if I used the --filter-min-lwr and -O option or not, the jplace is exactly the same -epa_result0.3.jplace-. I attach all the files) 

For the regular RAxML-EPA invocation, I typed:
raxmlHPC-PTHREADS -f v -s papara_altr_Sanabria1000_90.fasta -t EukTree.nwk -m GTRGAMMA -n SanabriaPlacement90
(I also attach the final jplace coming from this command: RAxML_portableTree.SanabriaPlacement90.jplace)


The only difference is that in RAxML-EPA I used the trimmed alignment of the OTUs, while in epa-ng I used the alignment without trimming. In any case, if I try epa-ng with the trimmed alignment, it pops the following error:
terminate called after throwing an instance of 'std::runtime_error'
  what(): Set tip states during placement failed!
Aborted (core dumped)

Thank you so much for your help, 


A. 
RAxML_portableTree.SanabriaPlacement90.jplace
EukTree.nwk
epa_result0.3.jplace
ssu_globalEuk_ALtrimAl.fasta
papara_alignment.Sanabria1000_90_reduced.fasta

Lucas Czech

unread,
Nov 6, 2017, 6:35:33 AM11/6/17
to ra...@googlegroups.com

Hi Alicia,

that sounds alarming, the results should not differ that much! Unfortunately, Pierre, who is the developer of EPA-ng, is away for a few days. When he returns, we will investigate this.

So, please be patient, we will get back to you once we know what's going on with your data.

Best
Lucas

Alexey Kozlov

unread,
Nov 6, 2017, 7:28:53 AM11/6/17
to ra...@googlegroups.com
Hi Alicia,

just a side note: old RAxML-EPA also supports filtering by LWR, here is the quote from the online help:

--epa-keep-placements=number specify the number of potential placements you want to keep for each read in the EPA
algorithm.
Note that, the actual values printed will also depend on the settings for
--epa-prob-threshold=threshold !

DEFAULT: 7

--epa-prob-threshold=threshold specify a percent threshold for including potential placements of a read depending
on the
maximum placement weight for this read. If you set this value to 0.01 placements that have a
placement weight of 1 per cent of
the maximum placement will still be printed to file if the setting of --epa-keep-placements allows for it

DEFAULT: 0.01

--epa-accumulated-threshold=threshold specify an accumulated likelihood weight threshold for which different
placements of read are printed
to file. Placements for a read will be printed until the sum of their placement weights has reached
the threshold value.
Note that, this option can neither be used in combination with --epa-prob-threshold nor with
--epa-keep-placements!




Cheers,
Alexey
>> <javascript:>.
>> For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.

Pierre Barbera

unread,
Nov 8, 2017, 9:50:22 AM11/8/17
to ra...@googlegroups.com

Hi Alicia,

to reiterate what I sent you in an accidental personal mail instead of to the list:

- the default for epa-ng is to do the heuristic, whereas raxml doesn't (use --no-heur with epa-ng to do the same)

- still this is strange

I've had some time today to look at your data, and I can indeed replicate the strange results you show.

I noticed that the 119 reference sequences, which are also at the beginning of the papara_alignment.Sanabria1000_90_reduced.fasta file, differ from the ones you supply with
ssu_globalEuk_ALtrimAl.fasta. This should not be the case! The query sequences need to be aligned against a fixed reference, out of which the reference tree was inferred.

So one explanation would be that the tree you supplied may be based on the "wrong" alignment. I pryed apart papara_alignment.Sanabria1000_90_reduced.fasta into query and reference part, and fed them separately to epa-ng, and the results are more sane. However the tree log-likelihood is still weird (-70k compared to raxml's -35k), indicating that there is something wrong still.

As for the trimming: what matters most is that the queries are aligned to the fixed reference, and that the number of sites for the query and reference alignment are identical.

To reiterate the complete pipeline (just to make sure):
- obtain your reference MSA through any MSA tool of your choice
- use that MSA to infer a tree (maybe also try the new raxml-ng?)
- use EITHER the original MSA OR the reduced one usually printed by raxml, together with the tree that was inferred as your reference MSA/tree
- align your queries to this fixed reference using a tool such as papara
- feed the reference msa (-s), tree (-t) and aligned queries (-q) into epa-ng (here is a quirk: epa-ng will currently assume that the query file only contains the queries, without the reference as papara outputs them. Worst case it means that your reference sequences are also placed against the tree, which doesn't affect the placements of your OTUs)
- currently, use the -O option, as the model parameters need to be re-optimized (this can be streamlined better using raxml-ng to infer the tree and passing the model parameters directly to epa-ng)

Hope this helps,
Pierre


On 06.11.2017 11:08, Alicia S. Arroyo wrote:
-- 
MSc Pierre Barbera

Phone: +49 6221 533 258
Fax: +49 6221 533 298
E-Mail: pierre....@h-its.org

HITS gGmbH
Schloss-Wolfsbrunnenweg 35
D-69118 Heidelberg
Amtsgericht Mannheim / HRB 337446
Managing Director: Dr. Gesa Schönberger
Scientific Director: Prof. Dr. Michael Strube
Reply all
Reply to author
Forward
0 new messages