Hi Haris,
the tree file might contain some parts that do not work in RAxML:
If this still does not work, please send your cleaned tree again, so that we can look for any remaining issues.
Best
Lucas
--
You received this message because you are subscribed to the Google Groups "raxml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com.
Sorry, the situation seems a bit more complex. First, my sed command contained a little error and removed too many commas. But furthermore, there are also ":" in the taxon labels, which make it hard to just remove the quotation marks around them. And lastly, there are duplicate sequences in the tree, for example "CP003297, Escherichia coli O104:H4 str. 2009EL-2050".
How did you obtain this tree? It seems a bit messy. By using some
sed-regex-magic, it should be possible to get rid of the
conflicting characters, but the duplicates are still suspicious.
Lucas
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "raxml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "raxml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+unsubscribe@googlegroups.com.
Hi Haris,
hm, I don't think that just removing taxa from the tree is a good approach. If you want to run EPA using this tree, you will need the alignment of the sequences in the tree as well. Thus, the sequences and the taxa of the tree have to be the same. That is, for each sequences in the alignment, there needs to be exactly one taxon (terminal branch) in the tree - and all of them need to have unique names. The default pipeline is to have a set of "reference" sequences that you want to use, which you then align to each other, and then use those to infer a tree. This way, consistency is ensured.
Now, in your case, inferring a tree that large from the full alignment is a bit too much, with roughly 600k sequences... So that is not an option. As a side remark: I'm not sure why you actually want to have a tree this big as your reference tree. This will give you a lot of trouble with further analyses, memory requirements for EPA, visual inspection, etc. Is there a particular reason that you want a tree with ALL Silva sequences? Usually, people use a subset of those, namely the ones they are interested in and which they expect to be related to the sequences that they want to place on that tree. (Alternatively, you can wait a few days for our new paper on automatic reference trees, in which we use Silva to do an automatic selection of reference sequences for building reference trees - out soon!)
Long story short:
Hope that helps
Lucas
HarisThank you very much for your time.Do you know what is the best approach in order to remove them?I am quite new and I try to make my tree right but I have some duplicates.Hi Lucas!Sorry bothering you again!
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com.
Hi Haris,
you could infer a tree from the SILVA alignment, yes, but making a tree from that order of magnitude of sequences will not only take a substantial time and compute resource investment, but will almost certainly cause a lot of down stream tools to fail. Even if in the end you manage to place your eDNA sequences against something like a 600k taxa tree, you would have a hard time visualizing, or even post processing the results.
I would say in this case the technique that Lucas mentioned he will soon publish could be your best bet, if you insist on using the full SILVA tree. In essence, it does a multi-stage placement, where it first breaks down the tree into an inner backbone tree, against which you can feasibly perform placement. Then, if a sequence was placed on a branch where the full tree had been pruned, placement recurses into that particular sub-tree.
However be somewhat weary of the quality of results: this is essentially a heuristic; a shortcut. I would view this as a pre-study to find out what kind of genera to include in your final reference tree, against which you do your actual placement. This could be a more rigorous approach to what is typically (to my impression) done for the type of study you are attempting: selecting a set of reference taxa by hand, using knowledge from literature. Examples: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0037818 and more recently https://www.nature.com/articles/s41559-017-0091 (The reference sizes there are: 797 taxa / 2763 sites, and 512 taxa / 3374 sites).
Another thing to consider is: to what detail do you want your result to be? If you just want to classify your eDNA sequences to genus level, then you need a lot fewer of those SILVA sequences. Probably even if you just pick one representative sequence per species the database will shrink significantly. Additionally, we found that placement has a very hard time distinguishing between strains, especially for something like 16S.
Lastly: consider looking at https://github.com/lczech/genesis for
possible (and blazingly fast :) ) post analysis of placements. We
are also currently developing a program wrapping the most common
post analysis steps (like taxonomic assignment!) in a new tool
called gappa, but it is not well documented yet.
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
-- MSc Pierre Barbera Phone: +49 6221 533 258 Fax: +49 6221 533 298 E-Mail: pierre....@h-its.org HITS gGmbH Schloss-Wolfsbrunnenweg 35 D-69118 Heidelberg Amtsgericht Mannheim / HRB 337446 Managing Director: Dr. Gesa Schönberger Scientific Director: Prof. Dr. Michael Strube
-- MSc Pierre Barbera Phone: +49 6221 533 258
Fax: +49 6221 533 298 E-Mail: pierre....@h-its.org HITS gGmbH Schloss-Wolfsbrunnenweg 35 D-69118 Heidelberg Amtsgericht Mannheim / HRB 337446 Managing Director: Dr. Gesa Schönberger Scientific Director: Prof. Dr. Michael Strube
--