How to use BLAST outputs in QIIME

531 views
Skip to first unread message

Christian Gray

unread,
Apr 5, 2017, 2:07:50 PM4/5/17
to Qiime 1 Forum
Hi everyone,

I have nifH sequences that I BLASTed using commandline BLAST+ because for whatever reason, QIIME's assign_taxonomy.py always returned 0 blast matches, even though BLAST+ returns most of the sequences with matches > 97%.
Anyway, I used BLAST+ blastn megablast, and used outfmt 7. Here is a link to the NCBI outfmt descriptions https://www.ncbi.nlm.nih.gov/books/NBK279675/
I generated an accessionID to taxonomy file using Dr. Chris Baker's entrez_qiime.py script. I was hoping to use this BLAST assigned taxonomy in QIIME to process my data. Are there any ways of doing manipulating the file to get an otu or biom table? I can rerun blast if there is a better output format for me to use.

Thanks in advance!

Jai Ram Rideout

unread,
Apr 5, 2017, 6:50:34 PM4/5/17
to Qiime 1 Forum
Hi Christian,

I'm not sure if this is your goal, but it sounds like you have a tab-separated file (generated by BLAST) containing taxonomy assignments for your representative sequences, and you want to add those taxonomy assignments to your .biom file. To do that, check out the `biom add-metadata` command and its corresponding tutorial. You'll want to add your taxonomy assignments as "observation metadata". You may need to make some minor formatting changes to your BLAST output file to match the format expected by the BIOM software. The tutorial I linked to describes the expected format in more detail.

Please let me know if you have any issues or if I misunderstood what you are trying to accomplish.

Best,
Jai

Christian Gray

unread,
Apr 7, 2017, 10:06:55 AM4/7/17
to Qiime 1 Forum
Hi Jai,

Thank you for your advice. That was exactly what I wanted to accomplish. However, I thought that the formatting would have been too difficult to do in the short amount of time that I have until this deadline (a little over a week), but I discovered that I could use pick_open_reference_otus.py and input a -r file to check them against. Even though they aren't blasted against it, I am hoping that it should be alright. I am pretty new to QIIME, so I didn't realize this workaround beforehand. However, I have opened up a new thread because once again I ran into issues with biom add-metadata, and I have linked my new thread here.

Thanks Jai for your advice with this!
Best,

Christian

Jai Ram Rideout

unread,
Apr 7, 2017, 6:51:02 PM4/7/17
to Qiime 1 Forum
Hi Christian,

Let's keep discussion on this thread since these issues are all related. There's a lot going on here, so let's back up a bit. Ultimately, it sounds like you are wanting to perform open-reference OTU picking (and taxonomy assignment) against a nifH reference database. Please correct me if I am wrong.

Assuming this is your goal, let's get pick_open_reference_otus.py working. Can you please post the following information?

1. The output from running print_qiime_config.py -t.

2. The first few lines of your nifH reference sequences.

3. The first few lines of your nifH reference taxonomy annotations.

This will give me the information necessary to determine whether your reference database is formatted correctly, and if QIIME is discovering it.

By default, pick_open_reference_otus.py will attempt to align your sequences with PyNAST against a 16S template alignment -- this is why you're getting so many alignment failures with your nifH sequences. You'll need to skip alignment and tree building by passing --suppress_align_and_tree to pick_open_reference_otus.py.

If you need to build a phylogenetic tree (e.g. for use with phylogenetically-aware diversity metrics), you can use align_seqs.py, filter_alignment.py, and make_phylogeny.py to build your alignment and tree after running pick_open_reference_otus.py. You'll need to use a different alignment method than PyNAST, and you also may need to play around with other parameters depending on your sequence data (e.g. minimum length cutoffs, etc.).

Best,
Jai

Christian Gray

unread,
Apr 7, 2017, 8:42:32 PM4/7/17
to Qiime 1 Forum
Hi Jai,

Here are the result from print_qiime_config.py

System information

==================

         Platform: darwin

   Python version: 2.7.13 |Anaconda 2.2.0 (x86_64)| (default, Dec 20 2016, 23:05:08)  [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]

Python executable: /macqiime/anaconda/bin/python


QIIME default reference information

===================================

For details on what files are used as QIIME's default references, see here:

 https://github.com/biocore/qiime-default-reference/releases/tag/0.1.2


Dependency versions

===================

          QIIME library version: 1.9.1

           QIIME script version: 1.9.1

qiime-default-reference version: 0.1.2

                  NumPy version: 1.9.2

                  SciPy version: 0.15.1

                 pandas version: 0.16.1

             matplotlib version: 1.4.3

            biom-format version: 2.1.4

                   h5py version: 2.4.0 (HDF5 version: 1.8.14)

                   qcli version: 0.1.1

                   pyqi version: 0.3.2

             scikit-bio version: 0.2.3

                 PyNAST version: 1.2.2

                Emperor version: 0.9.51

                burrito version: 0.9.1

       burrito-fillings version: 0.1.1

              sortmerna version: SortMeRNA version 2.0, 29/11/2014

              sumaclust version: SUMACLUST Version 1.0.00

                  swarm version: Swarm 1.2.19 [Jun  2 2015 14:40:16]

                          gdata: Installed.


QIIME config values

===================

For definitions of these settings and to learn how to configure QIIME, see here:

 http://qiime.org/install/qiime_config.html

 http://qiime.org/tutorials/parallel_qiime.html


                     blastmat_dir: None

      pick_otus_reference_seqs_fp: /macqiime/anaconda/lib/python2.7/site-packages/qiime_default_reference/gg_13_8_otus/rep_set/97_otus.fasta

                         sc_queue: all.q

      topiaryexplorer_project_dir: None

     pynast_template_alignment_fp: /macqiime/anaconda/lib/python2.7/site-packages/qiime_default_reference/gg_13_8_otus/rep_set_aligned/85_otus.pynast.fasta

                  cluster_jobs_fp: start_parallel_jobs.py

pynast_template_alignment_blastdb: None

assign_taxonomy_reference_seqs_fp: /macqiime/anaconda/lib/python2.7/site-packages/qiime_default_reference/gg_13_8_otus/rep_set/97_otus.fasta

                     torque_queue: friendlyq

                    jobs_to_start: 1

                       slurm_time: None

            denoiser_min_per_core: 50

assign_taxonomy_id_to_taxonomy_fp: /macqiime/anaconda/lib/python2.7/site-packages/qiime_default_reference/gg_13_8_otus/taxonomy/97_otu_taxonomy.txt

                         temp_dir: /tmp/

                     slurm_memory: None

                      slurm_queue: None

                      blastall_fp: blastall

                 seconds_to_sleep: 60


QIIME base install test results

===============================

.........

----------------------------------------------------------------------

Ran 9 tests in 0.225s


OK


Here are the first couple lines from my reference sequences (before pick_otus.py)


>NC_020272.1 Bacillus amyloliquefaciens IT-45, complete genome

AAATTATCCACATGTCGTTTTTTCGGGGGAGCGCGGCTTTTTTGTGCGTTAAAAAAAGGAATATGAAAAA

TAAGGTTTCTAATCTGGTTAAGGTATGTTATCCTATTATGGTTGTAAGAAATAAAAGCACTGCTGAAGTT

GACAATGAATAGGCAGCACAAATATAATAAGTAAGACTGTCTTTAACAGCTATTCCTCGAGGGAGGTGTC

ATAAATGAAAAGAACATTCCAACCGAATAACCGTAAACGCAGTAAAGTTCATGGCTTCAGAAGCCGTATG

AGTTCAAAAAACGGTCGTCTAGTATTAGCACGCCGTCGCCGCAAAGGCAGAAAAGTATTATCAGCTTAGG

CCACTGAATAATGTCAGTGGTCTTTTTTCACATTAAGAGAAAAGAGATGTCATGCGTCGCCCGGTACTGG

(there are many more nucleotides after this because my file just happens to start with a complete genome)


... and after pick_otus.py


>GU193617.1

TGTGATCCGAAGGCTGACTCCACCCGGCTTATACTCCACGCCAAGGCACAGAATACAGTCATGGACCTGGTGCGGGAATTGGGAACTGTCGAGGATCTGGAACTTGAAGATGTATTGAAAGTCGGCTACGGCGATACCAAGTGTGTTGAGTCCGGCGGCCCGGAGCCAGGAGTCGGTTGTGCCGGCCGTGGTGTCATCACTGCCATCAACTTCCTTGAAGAGAACGGTGCATATACCGATGATCTAGATTTTGTTTTTTACGATGTTCTCGGCGACGTTGTCTGCGGCGGGTTTGCCATGCCGATTCGTGAAGGTAAGGCTGAAGAGATTTACATCGTCTGCTCCGGCGAGATGATGGC

>KX525048.1

GGTGGAATCGGAAAGTCGACCACCACACAGAATCTAACAGCAGCTCTGTCCACGAGGGGAAAGAAAATCATGCAGATAGGCTGCGATCCCAAGGCAGACTCGGTAAAGTTTCTGATGAACGGAAAGAAGCAGCCCTCGGTCCTGGACACACTCAGGAAAGAGGGCGAGGTCAAGCTTGAGGACGTGATGAAGACCGGCTTTGGCGGAATTCATTGCGTCGAGTCCGGCGGCCCTGAGCCAGGAGTAGGCTGCGCTGGAAGAGGCATCATCACATCCATCGGTTTGCTGGAGAACCTGGGAGCCTACACCGACGACCTCGACTACGTCTTCTACGATGTGCTCGGCGACGTGGTCTGCGGCGGATTTGCTATG

>KX458414.1

GGAGGAATTGGAAAGTCCACCACGACCCAAAATACCGTCGCGGGTTTGGCGGAGATGGGAAAAAAGGTAATGGTGGTGGGCTGCGACCCCAAAGCGGATTCAACCAGGTTGTTGCTGGGCGGGCTGGCCCAAAAATCAGTTCTGGATACGCTACGCGAGGAAGGCGAAGACATCGAACTGGATTACGTCATGAAAGAGGGCTTTTGTAAGACCTTATGCGTGGAATCCGGCGGCCCGGAGCCCGGAGTCGGCTGCGCCGGGCGGGGTATCATTACCTCGGTCAACTTGCTGGAACAGTTGGGGGCGTATGAAGAAGACAAGAATCTGGATTATGTGTTCTACGATGTTTTGGGAGACGTGGTGTGCGGCGGATTCGCAATG


And here are the first couple lines from the taxonomy assignment (entrez_qiime.py)


HM750539.1 NA;NA;NA;NA;NA;uncultured bacterium

HQ436385.1 NA;NA;NA;NA;NA;uncultured bacterium

KF846970.1 NA;NA;NA;NA;NA;uncultured bacterium

JN578829.1 Proteobacteria;Alphaproteobacteria;Rhizobiales;Bradyrhizobiaceae;Bradyrhizobium;Bradyrhizobium sp. SUTN2_1

AY221776.1 NA;NA;NA;NA;NA;uncultured bacterium

JN578858.1 Proteobacteria;Alphaproteobacteria;Rhizobiales;Bradyrhizobiaceae;Bradyrhizobium;Bradyrhizobium sp. DOA9


Thanks Jai! That explains why the PyNast didn't work at all with these samples!


Thank you for your help!


Christian

Jai Ram Rideout

unread,
Apr 10, 2017, 7:03:33 PM4/10/17
to Qiime 1 Forum
Thanks for all the details Christian! The formatting of your reference database looks good, so here's what I would do:

First, create a parameters file to configure assign_taxonomy.py with your nifH reference sequences and taxonomic annotations. This parameters file will instruct pick_open_reference_otus.py to use your reference database when it performs taxonomy assignment. See my forum post here for details and an example (the post is for an ITS reference database, but the same strategy applies to your reference database).

Next, pass your nifH reference sequences via -r to pick_open_reference_otus.py, and supply your parameters file via -p. Supply --suppress_align_and_tree to avoid the PyNAST steps I noted earlier.

Note: the steps I outlined doesn't change your default reference database in QIIME (yours is Greengenes, which ships with QIIME 1). If you want all further analyses to automatically use your nifH database, you can create a QIIME config file and set these values to your reference database files: pick_otus_reference_seqs_fp, assign_taxonomy_id_to_taxonomy_fp, assign_taxonomy_reference_seqs_fp

Let us know how it goes!

Jai
Reply all
Reply to author
Forward
0 new messages