Using Ion Reporter's fasta file output to do alpha/beta diversity analysis with QIIME

304 views
Skip to first unread message

juliesm...@gmail.com

unread,
Mar 3, 2015, 2:25:44 PM3/3/15
to qiime...@googlegroups.com

Hello, 

I am very new to metagenomics and would very much appreciate any advice/help I can get. I have results (fasta file) from an Ion Torrent metagenomics analysis by Ion Reporter. I would like to ask how to do alpha and beta diversity analysis on these results. 

The details of the Ion Torrent and Ion reporter software can be read here: 

https://www.lifetechnologies.com/content/dam/LifeTech/Documents/PDFs/Ion-16S-Metagenomics-Kit-Software-Application-Note.pdf

To summarize, the Ion Reporter software generates “fasta” file output for a given sample that gives the species level identification. I want to take this one fasta file that has the species level identification of the reads and perform an alpha and beta diversity analysis using QIIME.  Any suggestions on how can one  do alpha and beta diversity analysis using this one fasta file? 

I am reading the QIIME help. I have installed QIIME and currently I am currently going through the QIIME tutorials.  I saw that the scripts required for doing alpha and beta diversity analysis need OTU tables and mapping files. Is it possible to generate OTUs and mapping table for this one fasta file I have? The content of the fasta file format resulted from Ion Reporter looks like the text I am pasting below.

example.fasta output from Ion Reporter. (This is the file I want to use as input to alpha and beta diversity analysis in QIIME): 

>MG|11|100.0|Bacteroidetes|Bacteroidia|Bacteroidales|Bacteroidaceae|Bacteroides|vulgatus|

CACGTATCCAACCTGCCGTCTACTCTTGGATAGCCTTCTGAAAGGAAGATTAATACAAGATGGCATCATGAGTCCGCATGTTCACATGATTAAAGGTATTCCGGTAGACGATGGGGATGCGTTCCATTAGATAGTAGGCGGGGTAACGGCCCACCTAGTCTTCGATGGATAGGGGTTCTGAGAGGAAGGTCCCCCACATTGGAACTGAGACACGGTCCAA

>MG|13|99.55|Bacteroidetes|Bacteroidia|Bacteroidales|Bacteroidaceae|Bacteroides|vulgatus|

TTGGACCGTGTCTCAGTTCCAATGTGGGGGACCTTCCTCTCAGAACCCCTATCCATCGAAGACTAGGTGGGCCGTTACCCCGCCTACTATCTAATGGAACGCATCCCCATCGTCTACCGGAATACCTTTAATCATGTGAACATGCGGACTCATGATGCCATCTTGTACTTAATCTTCCTTTCAGAAGGCTGTCCAAGAGTAGACGGCAGGTTGGATACGTG

I would very much appreciate any advice and suggestions you may give.  

Thank you,

Julie 

Adam robbins-pianka

unread,
Mar 3, 2015, 3:53:01 PM3/3/15
to qiime...@googlegroups.com
Hi Julie,

To my knowledge, there is no very good way to do this. The taxonomy assignments that are in the FASTA headers in your file cannot be parsed by any current QIIME script.

Are these shotgun reads? Are the reads already quality filtered? How many samples did you take, and is the sample that each read originated from indicated in the FASTA headers somehow?

Best,
Adam

--

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

julie smith

unread,
Mar 3, 2015, 5:52:28 PM3/3/15
to qiime...@googlegroups.com
Hi Adam,

Thank you very much for your quick reply.  These reads are generated as below with the 16s metagenomics workflow from Ion Torrent. This is the info listed on Ion Torrent website:

The Ion 16S™ Metagenomics Kit is designed for rapid, comprehensive, and broad-range research analyses of mixed microbial populations using the Ion PGM™ System. The kit uses two primer pools to amplify seven hypervariable regions (V2, V3, V4, V6, V7, V8, and V9) of bacterial 16S rRNA. The combination of the two primer pools enables broad-range, sequence-based identification of bacteria from complex mixed populations.

Above text is from this link:
https://www.lifetechnologies.com/us/en/home/life-science/sequencing/dna-sequencing/microbial-sequencing/microbial-identification-ion-torrent-next-generation-sequencing/ion-16s-metagenomics-solution.html?icid=npiGP-ion16s-100214

The information about the samples is given here. I ran Ion Torrent demo metagenomics workflow with Ion Torrent demo data:


Text from the link above says the following about the samples: 
Two mock bacterial community samples were amplified using both primer pools: a balanced community [21] with equal representation of ribosomal DNA (rDNA) copies for 20 bacterial species, and a staggered community [22] with variable amounts of rDNA copies for 20 bacterial species (103 –106 copies per organism per µL) (Table 1, supplementary information). 


There is no way to tell in the header which read belongs to which sample from the resulted fasta file. 

What suggestions would you give me to go about this? Basically, what would be the best way to utilize this fasta file for alpha beta diversity analysis?

For example, would you suggest that I  write a script to generate a new file (from this resulted fasta file with taxonomy headers) which can be read by QIIME alpha diversity analysis? If so, would you please give me suggestion on what kind of files QIIME needs? Is the information given in the fasta file header sufficient for me to generate a QIIIME format file? How would this file format be to give as input to QIIME's alpha diversity analysis python script?

Thank you for your help,
Julie



--

---
You received this message because you are subscribed to a topic in the Google Groups "Qiime Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/qiime-forum/f_xBCbRnjdc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to qiime-forum...@googlegroups.com.

julie smith

unread,
Mar 3, 2015, 6:00:06 PM3/3/15
to qiime...@googlegroups.com
Also, to add to my email above, the reads are quality filtered. Basically, all I want to do is to use this resulted fasta file which already has taxonomy assignment (and reformat the file if needed by writing a conversion script) to do alpha and beta diversity analysis with QIIME. I would very much appreciate any advice I can get since I am very new to the field. 

Thank you,
Julie

Adam robbins-pianka

unread,
Mar 4, 2015, 11:59:46 AM3/4/15
to qiime...@googlegroups.com
Hi Julie,

1. To reiterate, there is currently no QIIME script that can parse the taxonomy assignments from the FASTA headers in your file, so if you want to use those taxonomy assignments, you will have to parse them out yourself and generate a counts table -- I don't recommend this, but it should be possible.

2. In order to do alpha (within-sample) or beta (between-sample) diversity analysis, you will need to be able to determine which reads came from which samples. Unless you have additional files you haven't mentioned, I don't see how you are going to be able to do this. How many samples did you take/how many samples' sequences are in this file?

Best,
Adam

julie smith

unread,
Mar 4, 2015, 1:56:46 PM3/4/15
to qiime...@googlegroups.com
Hi Adam, 

Thank you again for your quick reply.  I really appreciate your help and advice. 

There is only 1 sample in this particular fasta file that I had sent you. For multiple number of samples, I would have separate fasta files per sample similar to the example fasta file I gave above. To summarize, basically, all reads in1 given fasta file belong to 1 sample. 

So, the question is how to do alpha diversity analysis with this 1 fasta file? 

By counts table, do you mean the OTU biom file? (I am sorry, I am very new to this topic, so please excuse my ignorance) 

I have been reading the QIIME website and info on alpha diversity below:

Since the alpha diversity script needs a BIOM file, I will need to generate it, right?   Could you please advice me on how I could generate the OTU biom file from the fasta file that I have? What would be the best way to do this? 

I thought of the following way from what I have been reading so far..  What do you think of the following way to do this? 

1. Provide the  example.fasta file as input to the pick otus script (below) to generate OTU table txt file 


2. Provide the OTU table output from step1 above as input to the make_otu_table script (below) to generate OTU BIOM file:

3. Provide the OTU BIOM file from step 2 above as input to the alpha diversity analysis script below. 

Do you think the above steps would work? 

Thank you,
Julie



Adam robbins-pianka

unread,
Mar 5, 2015, 12:35:34 PM3/5/15
to qiime...@googlegroups.com
Thanks for clarifying. If your sequences are one file per sample, then they are already demultiplexed, and you say they are already quality filtered. So, I would recommend you use the add_qiime_labels.py script to get them into the format that QIIME requires for downstream processing.

From there, you will be able to perform the steps you mention, although I would recommend one of the OTU picking workflows, probably pick_open_reference_otus.py.

That will generate an OTU table in BIOM format, which you can then pass to alpha_diversity.py.

Note that if you do this with just ONE sample (one FASTA file), then you will have a very small OTU table. If you pass all of your samples to add_qiime_labels.py, then you will be able to process all of your samples at once, which will be more efficient. Also, you will not be able to do beta diversity on a single sample, but you would be able to if you use all your samples.

Best,
Adam

julie smith

unread,
Mar 10, 2015, 6:17:49 PM3/10/15
to qiime...@googlegroups.com

Hi Adam, 

Thank you very much again for your quick reply and all your help to me. I followed the steps that you suggested in your last email. Here is a summary of what I did so far: 


Step 1. First, I generated a map file as below where I named the sample name as S1: 

vi map.txt:

SampleID       BarcodeSequence LinkerPrimerSequence    Treatment       DOB     Description

S1                      S1-Demo NA      NA



Step 2) Then, I named my fasta file above as S1 and I put it under a fasta directory named "fasta_dir. I ran to command below to generate fasta file with QIIME labels:

add_qiime_labels.py -i fasta_dir/ -m map.txt -c SampleID



Step 3) The command  "add_qiime_labels.py" in Step2 above, generated a file named "combined_seqs.fna" which is the QIIME version of fasta file with labels added to it in the fasta header. 



Step 4) Then, I took the fna file from step3 above named "combined_seqs.fna" and then I ran the command "pick_open_reference_otus.py" using the with the new_refseqs.fna file that I found under the illumina tutorial files. (I couldn't find the another file named "refseqs.fna".) The command below worked with "new_refseqs.fna" file: 

 pick_open_reference_otus.py -i /home/QIIMETEST/combined_seqs.fna -r /home/QIIME_tutorial/moving_pictures_tutorial-1.9.0/illumina/precomputed-output/otus/new_refseqs.fna -o /home/output --suppress_taxonomy_assignment --suppress_align_and_tree


Step 5) The command in step4 generated a BIOM file named otu_table_mc2.biom. I took this file as my OTU biom table.  Then, I ran the alpha diversity script as below.

alpha_diversity.py -i /home/otu_table_mc2.biom -m chao1 -o outputAlphaDiversity.txt 

The alpha diversity script above generates a file with a value of 62. 


vi outputAlphaDiversity.txt:

chao1

S1 62


How do I interpret this number of 62? Does this mean that there are 62 different species in the sample fasta and my alpha diversity prediction is 62 by QIIME?  However, my input sample contained 20 species.  So, why does QIIME generate a value of 62? If you could please give me some insight on this when you have a chance, I would really appreciate it.  

Thank you. 

Best,

Julie

ps: I also noticed that my OTU biom file contains 62 rows. Does the alpha diversity number always equal to the number of elements in the OTU table? Is this expected? Thank you. 



julie smith

unread,
Mar 11, 2015, 1:35:26 PM3/11/15
to qiime...@googlegroups.com
Hi Adam, 

Adding to my last email, I am attaching the biom file that I had generated mentioned in my email named "otu_table_mc2.biom". I did alpha diversity analysis on this biom file with different alpha metrics. When I run alpha_diversity script with different alpha diversity metrics chao1, simson and shannon, and with each metric I get different results. Since i am very new to the metagenomics field, I don't understand why results are so different and what the significance of different numbers mean with these different metrics? 

Could you please suggest on which one of the 3 metrics from below would be best to use for the biom file that i have attached? Or is there another metric that you would recommend based on the biom file that I have? The alpha diversity results that I get with each metric is listed below: 

1. I ran script:  alpha_diversity.py -i /home/otu_table_mc2.biom -m chao1 -o /home/outputAlphaDiversitywithChao1metric.txt 
I got Alpha diversty result with chao1 metric = 62

2. I ran script:  alpha_diversity.py -i /home/otu_table_mc2.biom -m simpson -o /home/outputAlphaDiversitywithSimpsonmetric.txt  
Alpha diversity result with simpson metric = 0.977

3. I ran script:  alpha_diversity.py -i /home/otu_table_mc2.biom -m shannon -o /home/outputAlphaDiversitywithShannonmetric.txt  
Alpha diversity result with shannon metric = 5.69 


If you could please advise me on what these metrics mean or point me to some references where I can understand what different metrics represent for alpha diversity, i would really appreciate it.  Such as which is the most common metric used from the 3 metrics I gave above in the metagenomics field? Which metric shall I use?

Thank you,
Julie
otu_table_mc2.biom

Anna I

unread,
Jun 1, 2015, 1:48:21 PM6/1/15
to qiime...@googlegroups.com
Hi Julie,

I'm also currently trying to QIIME core diversity analyses on the data from the IonTorrent Metagenomics workflow (also with the Ion 16S Metagenomics Kit), hoping you can help me get started!
Have you made any progress with this? What qiime script commands did you end up using for converting the fasta files generated by the IonTorrent workflow into a qiime-workable format? And how did you set up your mapping file, if the primers used in the kit are not disclosed?

(Also, did you use the S001_Demo_Metagenomics_Mock_Community_reads_genus_species.fasta file, or one of the other fasta files?)

Best regards,
Anna
Reply all
Reply to author
Forward
0 new messages