Looks like one of the email addresses I mailed bounced back; sorry if this email finds you twice. We have quite a bit of analysis wrapped-up in Lefse. Any help would be greatly appreciated….
Thanks,Todd
Todd N. WylieAssistant Professor
Department of Pediatrics | Division of Infectious Diseases
McDonnell Genome Institute
Washington University School of Medicine
660 S. Euclid Avenue
Campus Box 8208
St. Louis, MO 63110
314.747.4069 (Pediatrics office)
Begin forwarded message:
From: Todd Wylie <twy...@wustl.edu>
Subject: Lefse taxa names
Date: January 23, 2018 at 2:43:25 PM CST
Cc: Todd Wylie <twy...@wustl.edu>, "Wylie, Kristine" <kwy...@wustl.edu>
Greetings, Nicola:
I have a few questions regarding the naming convention for taxa fields in Lefse input files. I would be very grateful for any guidance you may provide. For the examples outlined below, I've included all of my command line instructions and associated files in the attached zip file.
My taxa names are formatted at a specific level (genus) without any pipes or hierarchical information. As such, I notice I get varying (but reproducible) results depending on the formatting of the taxa names, though class, subject ids, and abundance values all remain the same. For example, the following naming conventions alter LDA results (see attached PDF):
1) taxa names with underscores (e.g. Clostridium_sensu_stricto_g)2) taxa names with underscores removed (e.g. Clostridiumsensustrictog)3) taxa names with underscores changed to "q" charcters (e.g. Clostridiumqsensuqstrictoqg)4) taxa names with underscores changed to "x" charcters (e.g Clostridiumxsensuxstrictoxg)5) taxa names with underscores changed to double underscores (e.g. Clostridium__sensu__stricto__g)
Also, I made a version of taxa names with full lineage (e.g. Bacteria_k|Firmicutes_p|Clostridia_c|Clostridiales_o|Clostridiaceae_1_f|Clostridium_sensu_stricto_g) and results vary from the specific genus-level LDA results.
How does taxa naming convention alter LDA results? What is the best practice?
I'm afraid I'm missing something fundamental on my end... apologies in advance.
Very best,Todd
PS: I'm running from the command line using https://hub.docker.com/r/biobakery/lefse/ docker version.
The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail.
1. Bacteria|Acidobacteria|Acidobacteriia|Acidobacteriales|Acidobacteriaceae
2. Bacteria|Acidobacteria|Acidobacteriia|Acidobacteriales|Acidobacteriaceae|;__|;__