Creating mapping file from fastq files

308 views
Skip to first unread message

zillur rahman

unread,
Nov 5, 2016, 4:22:24 PM11/5/16
to Qiime 1 Forum
Hi there,

I have fastq files from 61 sample. Can I create a mapping file from these fastq files? Thanks.

Best Regards
Zillur

Colin Brislawn

unread,
Nov 5, 2016, 8:04:21 PM11/5/16
to Qiime 1 Forum
Hello Zillur,

Are you trying to demultiplex these 61 samples into a qiime compatible format? I think this qiime script might be a good way to demultiplex: 

After using this script, you will have valid qiime sample names based on the file (or folder) names of those 61 samples. You can still put these sample names into a metadata file. See this guide and example: http://qiime.org/documentation/file_formats.html#mapping-file-overview 

Of course, your mapping file will not have info in the barcode and primers columns, but that's OK! All qiime scripts would still work, and you can even validate it by passing the -b and -p to your validate_mapping_file.py script. 

Let me know if that helps, and tell me if you have other questions,
Colin

zillur rahman

unread,
Nov 7, 2016, 5:41:44 PM11/7/16
to Qiime 1 Forum
Thank you very much for your reply. I used the multiple_split_libraries_fastq.py script. It gave output, a fasta file, split_library_log.txt, log_29358729384.txt and histogram.txt. But I don't have any information about barcodes and primers. So, How can I generate a mapping file. I have a file (Attached). How can I modify this file to create a mapping file. I tried to run validate_mapping_file.py, but it always gives me error.

Best Regards
Zillur
Lopez16SRun.xlsx

Colin Brislawn

unread,
Nov 7, 2016, 6:08:27 PM11/7/16
to Qiime 1 Forum
Hello Zillur,

You will have to building a mapping file by hand. This can be a tricky process, but you can do it! (We are always here to help.)

This page show what all the columns mean, in detail.: http://qiime.org/documentation/file_formats.html#mapping-file-overview 

Basically, you will process your xlsx file into that format, and your #SampleID will the the names listed in the split_library_log.txt file. 

For the columns called BarcodeSequence and LinkerPrimerSequence, you can just leave them blank, then pass the -p and -b flags when running validate_mapping_file.py. 

Let me know if you have questions along the way,
Colin

zillur rahman

unread,
Nov 7, 2016, 7:05:33 PM11/7/16
to Qiime 1 Forum
Thank you very much for your kind reply. I ran validate_mapping_file.py, got the attached file. But if I want to run split_libraries.py with this mapping file it shows me that my mapping file is not correct. What should I do now?

Best Regards
Zillur
mapping_file_corrected.txt

Colin Brislawn

unread,
Nov 7, 2016, 8:28:05 PM11/7/16
to Qiime 1 Forum
Hello Zillur,

Because you have already used multiple_split_libraries_fastq.py on your fastq files, I don't think you need to run split_libraries.py also. (split_libraries.py is used for fasta file, maybe from the old 454 sequencing platform.) 

Looks like your mapping file is off to a good start. You can also add metadata like treatment and patient number to that file so you can compare treatments or patients (as examples).

So validate_mapping_file.py is the next step. When you ran this script, did you pass the -p and -b flags as I requested? What warning and errors did it show you?

Colin

zillur rahman

unread,
Nov 7, 2016, 8:36:14 PM11/7/16
to Qiime 1 Forum
Thank you very much. I ran validate_mapping_file.py passing -p and -b flags and it gave me a corrected mapping file (previously attached) and a log file (herewith)

I want to detect alpha and beta diversity. Suggestions ??? Thanks again.

Best Regards
Zillur
mapping_file.log

Colin Brislawn

unread,
Nov 7, 2016, 8:57:18 PM11/7/16
to Qiime 1 Forum
OK great. That log file looks ok. It still showing an error, but that just has to do with lack of barcodes. 

I want to detect alpha and beta diversity. Suggestions ??? Thanks again.

This script will calculate alpha diversity metrics for each of your samples:

This is one of the beta diversity workflow scripts:

Zillur, can you tell me more about your biological question? What are you hoping to discover with this experiment?

Colin

zillur rahman

unread,
Nov 9, 2016, 11:45:29 AM11/9/16
to Qiime 1 Forum
Thank you very much. I was trying run these scripts. I got this error report (attached):
What should I do now?

Best regards
Zillur 
log_20161108155824.txt

Colin Brislawn

unread,
Nov 9, 2016, 12:42:02 PM11/9/16
to Qiime 1 Forum
Good morning Zillur,

Did you look through your log file to find the error? I found the error near the bottom of the file:
ValueError: max() arg is an empty sequence

I then searched the qiime fourm for that error, and found this threads: 

Does that seem like it could be your problem? I remembered that you just made your mapping file, so maybe sample names don't match up.

Colin

zillur rahman

unread,
Nov 9, 2016, 6:06:14 PM11/9/16
to Qiime 1 Forum
Thank you very much. Maybe I have the same problem. But how can I resolve this? My out_table.biom is like 93MB and I can't read it (google drive link). What should I do now? 

Best Regards
Zillur

Colin Brislawn

unread,
Nov 9, 2016, 6:09:16 PM11/9/16
to Qiime 1 Forum
Hello Zillur,

You can still get the sample names from your biom file using this command.
biom summarize-table -i rich_sparse_otu_table.biom -o rich_sparse_otu_table_summary.txt

That output file will list all your sample names, so that you can check if they are the same as the ones in your mapping file.

Colin

zillur rahman

unread,
Nov 9, 2016, 7:10:24 PM11/9/16
to Qiime 1 Forum
Thank you very much. I have changed the samples name. Now I have another problem. The script (alpha_rarefaction.py) ask me for a reference tree (value error: no phylogenetic tree supplied ). How can I get a tree for my samples? I have tried make_phylogeny.py, it requires all the sequence same length. So, I tried to use align_seqs.py using muscle. But it gives me,"Segmentation fault (core dumped)". What should I do now?

Best Regards
Zillur

Colin Brislawn

unread,
Nov 10, 2016, 11:22:23 AM11/10/16
to Qiime 1 Forum
Hello Zillur,

Thanks for sticking with me on this. I'm glad you got the sample names figured out, and think we are making real progress!

For specific alpha beta diversity metrics, you will need that phylogenetic tree. These are faith_PD for alpha div, and UniFrac for beta div. If you are comfortable using other metrics (not faith_pd and UniFrac), you don't need to make this tree at all. 

Did you perform closed-ref or open-ref OTU picking? If you do want to use these metrics, and you used closed-ref OTU picking, you can pass the greengenes tree. I can show you where to find that in your qiime distribution, or just email you a copy. 

If you used open-ref OTU picking, we will have to make the tree, using these three scripts:
(You are on the right track by trying make_phylogeny.py! The other two scripts before hand will make your reads the same length.)


Thanks for staying with me as we solve these problems. Sorry that qiime has had so many problems. I really appreciate your dedication as we fix them.
Keep in touch,
Colin

zillur rahman

unread,
Nov 10, 2016, 12:40:15 PM11/10/16
to Qiime 1 Forum
Thank you very much for helping me all the way. Sorry for sending to many message. I just got the phylogenetic tree (attached). Now, actually I realize, I don't need to create phylogeny, the pick_otus.py script do this for me (the attached log)! Am I on the right track? Now, I need to put organism names instead of denovo .... How can I do this? An how can I add colors to the tree. Thank you again. I appreciate it.

Best Regards
Zillur
log_20161107192136.txt
phylo.tre.jpg

Colin Brislawn

unread,
Nov 10, 2016, 1:27:52 PM11/10/16
to Qiime 1 Forum
Hello Zillur,

We are definitely on the right track! Glad you found the tree created by pick_otus.py. Sometimes that script does not make the tree, so I'm glad it works for you.

Now that you have the tree, you can pass it to the diversity scripts to get UniFrac distances. I'm a big fan of weighted UniFrac + PCoA visualized in Emperor. You get cool looking 3D ordinations of samples.

Qiime does not have many commands for graphing trees. I'm not sure it's able to change colors based on taxonomy, or reliable the tips using the taxonomy given in the .biomm table. I make trees in R using the Phyloseq package, but that's fully outside of qiime... 


Keep up the good work!
Colin

zillur rahman

unread,
Nov 10, 2016, 1:53:39 PM11/10/16
to Qiime 1 Forum
Thank you very much. I tried to run diversity scripts. For alpha I got some plots (too many, which one I need? attached) and Warning: "element wise comparison failed; returning scalar instead, but in the future will perform element wise comparison". What should I do? For beta I got few files but no plot "The result contains negative eigenvalues.... (added at the bottom of the log file.)". What should I do now? How can I put organism name on the tip of the tree? Thanks a lot again.

Best Regards
Zillur 
output_div.zip

Colin Brislawn

unread,
Nov 10, 2016, 2:36:05 PM11/10/16
to Qiime 1 Forum
Hello Zillur,

Lot's of good stuff here.

For alpha I got some plots (too many, which one I need? attached)
Alpha diversity metrics are different ways of estimating 'how many different microbes are here?' Take a look at this page for an overview:
The different plots estimate it in different ways.
 
 Warning: "element wise comparison failed; returning scalar instead, but in the future will perform element wise comparison". What should I do?
That's a warning (not an error), so you can ignore it.

For beta I got few files but no plot "The result contains negative eigenvalues.... (added at the bottom of the log file.)".
That's another warning. As long as the negative eigenvalues are much smaller than the largest eigenvalues, we are OK.

How can I put organism name on the tip of the tree?
I don't know how to do that in qiime :-(


The next step depends on the biological question. What was the goal of this study?
Colin
 

zillur rahman

unread,
Nov 10, 2016, 2:58:15 PM11/10/16
to Qiime 1 Forum
Thank you very much for your suggestions. I didn't get any plot for beta_diversity_through_plot.py script. What can I do?

Best Regards
Zillur

Colin Brislawn

unread,
Nov 10, 2016, 4:57:34 PM11/10/16
to Qiime 1 Forum
The output of that script is an Emperor graph, an interactive 3D graph in the file index.html.

Did you get an index.html file when you ran that script? 



Colin

zillur rahman

unread,
Nov 10, 2016, 6:30:08 PM11/10/16
to Qiime 1 Forum
Thank you very much. I got everything!! I am fine for now. I am grateful to your cooperation. I will write if anything comes up. Thanks a lot again.

Best Regards
Zillur 

Colin Brislawn

unread,
Nov 10, 2016, 7:56:35 PM11/10/16
to Qiime 1 Forum
Great! Thanks Zillur. We are always here to help.
Colin

chuanwu wang

unread,
Nov 15, 2016, 10:05:54 PM11/15/16
to Qiime 1 Forum
Hi Colin,
Do you know whether I can combine the forward barcode and reverse barcode together into The second column header must be “BarcodeSequence”? as listed in this link,  http://qiime.org/documentation/file_formats.html

Colin Brislawn

unread,
Nov 16, 2016, 1:37:48 PM11/16/16
to Qiime 1 Forum
Hello Chuanwu,

I'm not sure how to work the dual indexed reads. I think Zech can help you over in the thread you posted.

Colin

Reply all
Reply to author
Forward
0 new messages