regarding qiime

143 views
Skip to first unread message

diypri...@gmail.com

unread,
Aug 24, 2017, 2:07:12 AM8/24/17
to Qiime 1 Forum
Hi
please some body help me out
i am running the split_libraries.py command for multiple fasta file with each fasta file comma separated after running the command i am getting the empty seq.fna . Please help me out

diypri...@gmail.com

unread,
Aug 24, 2017, 2:12:02 AM8/24/17
to Qiime 1 Forum
Hi Colin
Please suggest here

TonyWalters

unread,
Aug 24, 2017, 6:12:48 AM8/24/17
to Qiime 1 Forum
Hello,

In this case, you would want to look at the .log file that is being created-it should show counts of input sequences and counts of why the sequences are failing (primer mismatches, failing to match barcodes, etc.).

-Tony

diypri...@gmail.com

unread,
Aug 24, 2017, 6:53:29 AM8/24/17
to Qiime 1 Forum
Hi Tony

The log file shows this
Number raw input seqs   28410

Length outside bounds of 200 and 1000  1741
Num ambiguous bases exceeds limit of 6  0
Max homopolymer run exceeds limit of 6  428
Num mismatches in primer exceeds limit of 0: 26241

Sequence length details for all sequnces passing quality filters"
No sequences passed quality filters for writting.

Barcodes corrected/not  0/0
Uncorrected barcodes will not be written to the output fasta file.
Corrected barcodes will be written with the appropriate barcode category.
Corrected but unassigned sequences will not be written unless ---retain_unassigned_reads id enabled.
Total valid barcodes that are not in mappimg file   0
Sequences associated with valid barcodes that are not in the mapping file will not be written.
Barcodes in mapping file
Sample     Sequence Count    Barcode 
ASHSOIL2            0              ATGCATAA
ASHSOIL1            0              ATCAGTCA
ASHSOIL3            0              AGCCTAAT
Total number seqs   written                      0

TonyWalters

unread,
Aug 24, 2017, 7:09:30 AM8/24/17
to Qiime 1 Forum
Most of your reads are being lost here:
Num mismatches in primer exceeds limit of 0: 26241

To troubleshoot this, I would compare the LinkerPrimerSequence values in your mapping file to those in some of your actual reads in your input fasta file(s). Your reads should first start with your barcode (looks like 8 base pairs above) and then be followed by the primer sequence. Do these match up?

diypri...@gmail.com

unread,
Aug 24, 2017, 8:14:12 AM8/24/17
to Qiime 1 Forum
Hi Tony

Actually my boss have given these files to me for analysis, but i am dead sure that the barcode sequences and Linker primer sequences in the mapping file doesnot match with those in the file. But while validating the mapping file it does not show any error. 

TonyWalters

unread,
Aug 24, 2017, 8:25:41 AM8/24/17
to Qiime 1 Forum
Hello,

The mapping file validation only makes sure the mapping data are internally correct (e.g. no duplicate barcodes, no non-DNA characters in primers, etc). You'll have to manually examine the reads to see how or if the data in the mapping file can be matched to the reads. The source publication may also describe the particular primers/barcodes used.

Now, another possibility is that whoever got these files downloaded them from SRA and they are *already* split according to sample, e.g. one fasta file per sample, with the barcodes/primers already removed. Can you check if this is the case? There would be a different approach to handling already-demultiplexed/quality filtered data like these.

-Tony

diypri...@gmail.com

unread,
Aug 24, 2017, 8:30:06 AM8/24/17
to Qiime 1 Forum
Hi 
How can i check that

TonyWalters

unread,
Aug 24, 2017, 8:41:59 AM8/24/17
to Qiime 1 Forum
As for finding out if the fasta files came from SRA, you'd probably have to check with whoever gave them to you about their source (unless there's some obvious tipoff, like a .log file or text file with them that discusses SRA). For examining the files directly to compare the beginning reads to your mapping barcodes/primers, you can open the file in a plain text editor and look at the beginning of the reads that way. Alternatively, you can use a "less" command on the terminal to browse the file (see https://en.wikipedia.org/wiki/Less_(Unix) for information about the less command).

diypri...@gmail.com

unread,
Aug 24, 2017, 8:54:35 AM8/24/17
to Qiime 1 Forum
Thankx Tony
 It means that if i am not having the proper barcode sequence and primer sequences i will face this error.
Once the barcode sequence and primer sequence are correct i will not face any error problem.

diypri...@gmail.com

unread,
Aug 24, 2017, 9:00:43 AM8/24/17
to Qiime 1 Forum
Hi Tony
No match  with the beginning of the reads, I think it means the error is because of the wrong barcode sequence and primer sequence.

TonyWalters

unread,
Aug 24, 2017, 9:06:50 AM8/24/17
to Qiime 1 Forum
Do you see the right primer sequence if you start past the barcode (8 base pairs in)?

If not, I would ask your boss 1. where he got the reads and 2. if there is a publication (may be easier to get the barcodes/primer sequences here) associated with them.

diypri...@gmail.com

unread,
Aug 25, 2017, 6:55:35 AM8/25/17
to Qiime 1 Forum
Hi Tony 

I have talked to my boss and he gave me the primer sequence both forward as well as reverse. Now how can i make a linker primer sequence out of that.

diypri...@gmail.com

unread,
Aug 25, 2017, 6:58:19 AM8/25/17
to Qiime 1 Forum
Hi 
i  also attaching the run summary file 
please help me out here in preparing the mapping file
Run summary 2013_07_16.xlsx

diypri...@gmail.com

unread,
Aug 25, 2017, 7:04:06 AM8/25/17
to Qiime 1 Forum
Please help me out here

TonyWalters

unread,
Aug 25, 2017, 7:56:56 AM8/25/17
to Qiime 1 Forum
Hello,

The run summary doesn't really have any details about the identity of the primers/barcodes.

Are you able to take the barcodes/primers from your boss and match them up to some of the sequences in the reads? The first 8 bases should be the barcode, the next bases should match up with the primer sequence (you might try starting with matching the primer sequence, since that should be found in every read after the first 8 bases).

-Tony

diypri...@gmail.com

unread,
Aug 25, 2017, 8:08:30 AM8/25/17
to Qiime 1 Forum
Yes Tony
i am sending you the primer sequences the barcodes  sequences that my boss has given to me plus the sequence files.

diypri...@gmail.com

unread,
Aug 25, 2017, 12:53:48 PM8/25/17
to Qiime 1 Forum
Hi Tony 

This is the primer pair E517F (5’-CAGCAGCCGCGGTAA-3’) and E969-984 (5’-GTAAGGTTCYTCGCGT-3’). i will attach the three sample files please help me in making the mapping file out of it.        

diypri...@gmail.com

unread,
Aug 25, 2017, 12:59:07 PM8/25/17
to Qiime 1 Forum
Please receive the sample file.
The barcode sequence are not available and i think Roche 454 usually use that 14 MID sequences may be it will be from. 

ID MID Sequence

MID1- ACGAGTGCGT

MID2 -ACGCTCGACA

MID3 -AGACGCACTC

MID4 -AGCACTGTAG

MID5 -ATCAGACACG

MID6 -ATATCGCGAG

MID7 -CGTGTCTCTA

MID8 -CTCGCGTGTC

MID9 -TAGTATCAGC

MID10- TCTCTATGCG

MID11-TGATACGTCT

MID12 -TACTGAGCTA

MID13-CATAGTAGTG

MID14-CGAGAGATAC

BongaT11.zip
BongaT10.zip
BongaT9.zip

diypri...@gmail.com

unread,
Aug 25, 2017, 1:02:51 PM8/25/17
to Qiime 1 Forum
plz help me now in making the mapping file

diypri...@gmail.com

unread,
Aug 25, 2017, 1:02:52 PM8/25/17
to Qiime 1 Forum

TonyWalters

unread,
Aug 25, 2017, 1:35:50 PM8/25/17
to Qiime 1 Forum
Hello,

I really don't have any ability to create an accurate mapping file for you. The barcodes don't seem to be in the reads, neither do the primers.

Here is what I was asking you to do earlier, take a read, and examine it to see where the primer fits. E.g. (with just the first part of the read from the BongT9.zip file):

>ICYH0ZI01C2CP5 length=436 xy=1139_3419 region=1 run=R_2013_07_18_16_13_37_
TGCATCGAATTAAACCACATGCTCCACCGCTTGTGCGGGCCCCCGTCAATTCCTTTGAGT
                                      

For the primers, you have these options
CAGCAGCCGCGGTAA
and 
GTAAGGTTCYTCGCGT
Also, just to be sure, the reverse complement of these two primers:
TTACCGCGGCTGCTG
and
ACGCGARGAACCTTAC

These do not match bases in the read. This indicates that either the primers are wrong, or you're dealing with reads that already have the barcodes and primers removed from them. It sure looks like you got a fasta file per sample from your boss, such as BongaT9, BongaT10, and BongaT11. Will your boss not tell you if he/she downloaded them from SRA, got them from an author a paper, or what the source was? I strongly suspect that the barcodes were already removed and the data were already split into a fasta file per sample, because blasting some of the reads hits NCBI results (mostly chloroplasts) and it hits all the way to ends of the reads. If barcodes were still present at the end, they wouldn't match the reference genes on genbank at the beginning and/or ending of the reads.

There is no way that I with certainty say that these reads are already split per sample, but they may be. I would *strongly* suggest that the source of these reads be tracked down, so that the nature of the processing is known. You wouldn't be able to publish on these data without knowing this in any case.

If it turns out to be the case that the data are already split up according to sample, the approach to getting the separate fasta files into a single fasta file that can be used by QIIME for OTU picking is to use the add_qiime_labels.py script ( see http://qiime.org/scripts/add_qiime_labels.html). You'll have to take your existing mapping file, and put in the file names of the fasta files, e.g.
#SampleID BarcodeSequence LinkerPrimerSequence InputFileName Description
BongaT9      CCCCCCCC          CAGCAGCCGCGGTAA  BongaT9.fasta  BongaT9
BongaT11     AAAAAAAAA         CAGCAGCCGCGGTAA  BongaT11.fasta BongaT11

and so on for all of the separate fasta files, then call add_qiime_labels.py as shown in the example on the scripts page.

-Tony

diypri...@gmail.com

unread,
Aug 25, 2017, 1:49:36 PM8/25/17
to Qiime 1 Forum
Thanks Tony

 Yes infact i got a fasta file per sample. i also doubt that the reads are already split. Noe going by your way can i prepare my own mapping file by taking any barcode and linker primer sequence and combine by doing add_qiime _labels. Further more it is a 454 data will add_qiime_labels command work for it. or there is any other command. Please suggest.

TonyWalters

unread,
Aug 25, 2017, 1:56:37 PM8/25/17
to Qiime 1 Forum
Hello,

Yes, the add_qiime_labels.py script is fine for 454 data (it doesn't do any filtering of sequences, it just combines the reads and makes the labels of the reads QIIME-compatible).

-Tony

diypri...@gmail.com

unread,
Aug 25, 2017, 2:21:50 PM8/25/17
to Qiime 1 Forum
Thanks Tony 

Really i don't know how to thank yourself. But worked but i am a bit confused the total reads in combined sequence file are 56820 while the reads in the individuals files are 136817, 72723,30346 respectively. please clear it here.

TonyWalters

unread,
Aug 25, 2017, 2:27:15 PM8/25/17
to Qiime 1 Forum
How are you counting them? With count_seqs.py? http://qiime.org/scripts/count_seqs.html

diypri...@gmail.com

unread,
Aug 25, 2017, 2:38:29 PM8/25/17
to Qiime 1 Forum
Thanks for help
 it is alright. will proceed now for further analysis and if i will face any error during analysis will text you. Thanks again

diypri...@gmail.com

unread,
Aug 26, 2017, 2:26:07 AM8/26/17
to Qiime 1 Forum
Hi Tony 

i am running pick_de_novo_otus.py it runs successfully but after the command is over there is only one line in the otu biom table. id table biom biological matrix.....................generated by ...... after that the page is completely blank .
Please suggest here where is the error.

TonyWalters

unread,
Aug 26, 2017, 2:56:18 AM8/26/17
to Qiime 1 Forum
Post the log file please.

diypri...@gmail.com

unread,
Aug 26, 2017, 3:04:49 AM8/26/17
to Qiime 1 Forum
Please find the attachment
log_20170826024104.txt

TonyWalters

unread,
Aug 26, 2017, 3:11:15 AM8/26/17
to Qiime 1 Forum
This is the command in the log file for creating the OTU table:
/home/qiime/qiime_software/python-2.7.3-release/bin/python /home/qiime/qiime_software/qiime-1.8.0-release/bin/make_otu_table.py -i Soil_otus/uclust_picked_otus/combined_seqs_otus.txt -t Soil_otus/uclust_assigned_taxonomy/combined_seqs_rep_set_tax_assignments.txt -o Soil_otus/otu_table.biom 

How big is are the files referenced in this command?

diypri...@gmail.com

unread,
Aug 26, 2017, 3:25:06 AM8/26/17
to Qiime 1 Forum
Hi Tony 
i am not getting you.
 I simply entered the command pick_de_novo_otus.py and i got pynast aligned sequences,rep-set, uclust_assigned_taxonomy, Uclust -picked-otus, log _txt, otu table biom, rep-set -tre. but i am getting empty otu_
table-biom.
i am attaching the biom table file
otu_table.biom

TonyWalters

unread,
Aug 26, 2017, 3:29:44 AM8/26/17
to Qiime 1 Forum
Okay, that file has data in it. It's in json format, so not as easy to read by hand as a tab delimited file, but still it has data. I'm confused about the prior statement that the file only had one line?

diypri...@gmail.com

unread,
Aug 26, 2017, 3:46:58 AM8/26/17
to Qiime 1 Forum
Please suggest can i proceed with that or do i change the format if yes how to change please mention the command

TonyWalters

unread,
Aug 26, 2017, 3:52:13 AM8/26/17
to Qiime 1 Forum
You don't need to change the format. You can use that OTU table for downstream analyses.

See this tutorial for analyses you can do: http://qiime.org/tutorials/tutorial.html

You should be through these steps already: http://qiime.org/tutorials/tutorial.html#de-novo-otu-picking
but your log file didn't indicate that it had completed the alignment/tree building, so that may still be running, unless you already have a rep_set.tre file with data in it.

You can summarize the table as described here: http://qiime.org/tutorials/tutorial.html#summarize-the-otu-table
I'd skip the next step about OTU networking, unless you are specifically asking questions about network analyses and proceed to the taxonomy summary and other steps described in the tutorial.


diypri...@gmail.com

unread,
Aug 26, 2017, 5:40:13 AM8/26/17
to Qiime 1 Forum
Hi Tony
I am doing the analysis and every thing is going smoothly. However while making the heat map i am getting filter by counts per otu? What does this filter by counts per otu mean?

TonyWalters

unread,
Aug 27, 2017, 2:28:56 AM8/27/17
to Qiime 1 Forum
Can you post the exact message you are getting? Usually filtering by counts per OTU is done to remove specific OTUs, e.g., OTUs that are very small. It may be that there are too many OTUs to create the .pdf file with the heatmap, and it's asking you to filter out low abundance OTUs first. Do filter OTUs out, see: filter_otus_from_otu_table.py (http://qiime.org/scripts/filter_otus_from_otu_table.html). You could filter out OTUs that have less than 100 sequences associated with them using --min_count 100 and filter_otus_from_otu_table.py.

James

unread,
Aug 27, 2017, 4:04:28 AM8/27/17
to Qiime 1 Forum
Hi 
i am a new QIIME user and bioinformatics as well. I have the same problem. when i run split_libraries.py command  i get an empty seq.fna. Actually i have only Illumina R1.fastq  from the sequencing company. Please help me out too

The log file shows this.

Number raw input seqs 14301

Length outside bounds of 200 and 1000 0
Num ambiguous bases exceeds limit of 6 0
Missing Qual Score 0
Mean qual score below minimum of 25 414
Max homopolymer run exceeds limit of 6 13699
Num mismatches in primer exceeds limit of 0: 188

Sequence length details for all sequences passing quality filters:
No sequences passed quality filters for writing.

Barcodes corrected/not 0/0
Uncorrected barcodes will not be written to the output fasta file.
Corrected barcodes will be written with the appropriate barcode category.
Corrected but unassigned sequences will not be written unless --retain_unassigned_reads is enabled.

Total valid barcodes that are not in mapping file 0
Sequences associated with valid barcodes that are not in the mapping file will not be written.

Barcodes in mapping file
Sample Sequence Count Barcode
Y01  0 ACACGCTCTTTT
Y04  0 ACACCGTTTTTT
Y02  0 ACACCGTCTATG
Y11  0 ACACCGGGTGCT
Y10  0 ACACCGGGCTGG
Y18  0 ACACCGGCTGGT
Y07  0 ACACCGCTTTGT
Y08  0 ACACCGCTTACC
Y05  0 ACACCGCAACTT
Y15  0 ACACCGATTCTT
Y06  0 ACACCGACTTGA
Y03  0 ACACCGACTCAA
Y09  0 ACACCGACATGT

Total number seqs written 0

James.

leah reshef

unread,
Aug 28, 2017, 3:21:37 AM8/28/17
to Qiime 1 Forum
Are you running split_libraries.py on fastq files? split_libraires.py was intended for 454 data, excepts to get the seq info in a fasta file, and to get a quality of file of the kind outputted by 454- in your case, you can see its throwing all your sequences out because they dont pass the  quality filter.

For raw illumina fastq reads, you want to use the split_libraries_fastq.py command. If you have more than one fastq file, as is the usual case for illumina, you'll want multiple_split_libraries_fastq.py

James

unread,
Aug 30, 2017, 8:56:07 AM8/30/17
to Qiime 1 Forum
Hi Leah,

split_libraries_fastq.py command for multiple fasta files worked well

 Thanks.

James

diypri...@gmail.com

unread,
Aug 31, 2017, 4:25:24 AM8/31/17
to Qiime 1 Forum
Hi 
i am developing a phylogenetic tree by using fig tree command and it works well enough how ever i get a tree named as otu denoveo like that . i want to replace this with the sample name . please suggest here how can i do this. 

Jose

unread,
Aug 31, 2017, 8:31:57 AM8/31/17
to Qiime 1 Forum
Hi,

this question does not seem to be related to this thread. Can you please post a new one with your question and some more details (specify the exact command, errors you are getting, etc)? That way it will be easier to manage each topic independently.

Thanks,
Jose

diypri...@gmail.com

unread,
Aug 31, 2017, 1:31:57 PM8/31/17
to Qiime 1 Forum
Hi Jose 
 
My simple problem is that the taxa named are in case of denove 123 like that. i want to name them in terms of samples and microrganisms how can i

Jose

unread,
Sep 1, 2017, 5:13:49 PM9/1/17
to Qiime 1 Forum
Hi,

please start a new post (not a reply to this, but a separate post) with a more detailed explanation of what is your problem: what commands exactly did you run? what is your output? what is it that you would like to do? It will be easier for us to help you that way.

Thank you,
Jose

diypri...@gmail.com

unread,
Sep 2, 2017, 10:16:43 AM9/2/17
to Qiime 1 Forum
Hi Jose
 i have run the command and now i want to draw diagram of the tree using fig tree it works but the branches are named labelled as denovo i want to label them as sample ids 

diypri...@gmail.com

unread,
Sep 2, 2017, 10:29:05 AM9/2/17
to Qiime 1 Forum
This is the tree now i want to view it using fig tree. But while viewing i face a problem because the branches are labelled as denovo as i want to label them with sample ids

rep_set.tre

TonyWalters

unread,
Sep 2, 2017, 10:30:08 AM9/2/17
to Qiime 1 Forum
Hello,

Will you please go to https://groups.google.com/forum/#!forum/qiime-forum and post a new question (hit the red "new question" button in the top left of the screen) about the tree there?

-Tony
Reply all
Reply to author
Forward
This conversation is locked
You cannot reply and perform actions on locked conversations.
0 new messages