IndexError: list index out of range

Syed najeeb ashraf

unread,

May 2, 2017, 1:18:10 PM5/2/17

to Qiime 1 Forum

I was running core diversity analysis using below command

core_diversity_analyses.py -o cdout/ -i pick_open_reference_otus_Step2/otu_table_mc2_w_tax_no_pynast_failures.biom -m temp.tsv -t pick_open_reference_otus_Step2/rep_set.tre -e 1114

*** ERROR RAISED DURING STEP: Make emperor plots, weighted_unifrac)

Command run was:

make_emperor.py -i cdout//bdiv_even1114//weighted_unifrac_pc.txt -o cdout//bdiv_even1114//weighted_unifrac_emperor_pcoa_plot/ -m temp.tsv

Command returned exit status: 1

Stdout:

Stderr

Traceback (most recent call last):

File "/gpfs/home/nsyed/anaconda3/envs/qiime1/bin/make_emperor.py", line 643, in <module>

main()

File "/gpfs/home/nsyed/anaconda3/envs/qiime1/bin/make_emperor.py", line 468, in main

sids_intersection, include_repeat_cols=True)

File "/gpfs/home/nsyed/anaconda3/envs/qiime1/lib/python2.7/site-packages/emperor/qiime_backports/filter.py", line 113, in filter_mapping_file

headers.append(map_header[i+1])

IndexError: list index out of range

i have checked check mapping file script and it just showed error with my TSV file and which is was pretty much expected. So what else might be issue ? any idea ?

And Another problem: I am runnig closed Refrence OTU picking with using 4 parallel jobs and its running for 4 days and seems its not doing any thing. So what might be the problme ? Shall i killed the jobs and run again ? Something to do with parallel option ? seqs.fna file is around 30 Gbs.

Jai Ram Rideout

unread,

May 2, 2017, 4:48:30 PM5/2/17

to Qiime 1 Forum

Hello,

You noted there are errors in your mapping file identified by validate_mapping_file.py. What types of errors are you receiving? If you see any errors related to the #SampleID column or Description column you'll need to fix those (possibly others).

Additionally, make sure that the sample IDs in your mapping file match up with the sample IDs in the OTU table. It's possible all data is being filtered out if none of the IDs match.

You're using -e 1114, which is the even sampling depth used in the Moving Pictures tutorial. You probably don't want to use this number, as an appropriate even sampling depth will vary for every dataset/study. Use biom summarize-table to see how many sequences you have in each sample in order to choose a reasonable sampling depth. It is possible all of your samples are being dropped due to the sampling depth you're using.

And Another problem: I am runnig closed Refrence OTU picking with using 4 parallel jobs and its running for 4 days and seems its not doing any thing. So what might be the problme ? Shall i killed the jobs and run again ? Something to do with parallel option ? seqs.fna file is around 30 Gbs.

It may still be running-- you can use top or htop to see whether the process is using CPU or memory. Even with 4 parallel jobs it can take quite some time to do OTU picking. You can also check the output directory to see if it contains any files (the log file may be helpful to see what step the workflow is at). In the log file you'll want to verify that there are 4 parallel jobs being supplied to the workflow (this is noted at the top of the log file).

If the process seems to have stalled (i.e. not consuming CPU or memory), please send me the exact command you're running, the output from print_qiime_config.py -t, and the contents of the output directory.

Best,

Jai

Syed najeeb ashraf

unread,

May 4, 2017, 2:37:33 PM5/4/17

to Qiime 1 Forum

I got below error after this

validate_mapping_file.py -m temp.tsv

Errors and/or warnings detected in the mapping file. Please check the log and html file for details.

Infact I tried to use corrected MAP file by above script as well but still, I got the same problem.

Syed najeeb ashraf

unread,

May 4, 2017, 3:11:55 PM5/4/17

to Qiime 1 Forum

THis is output of summarize command, Can you suggest which is optimum sample depth in this case?

biom summarize-table -i closed_open_refrences_Step2/otu_table.biom

Num samples: 7

Num observations: 2211

Total count: 5763607

Table density (fraction of non-zero values): 0.253

Counts/sample summary:

Min: 2939.0

Max: 3342885.0

Median: 8123.000

Mean: 823372.429

Std. dev.: 1317124.248

Sample Metadata Categories: None provided

Observation Metadata Categories: taxonomy

Counts/sample detail:

200076691V4: 2939.0

200076751V4: 3116.0

200076701V4: 5195.0

8100080231V4: 8123.0

8100080241V4: 10910.0

8100080241V1-V3: 2390439.0

200076701V1-V3: 3342885.0

Syed najeeb ashraf

unread,

May 4, 2017, 3:51:55 PM5/4/17

to Qiime 1 Forum

I am attacking my mapping file as well.

I run core diversity analysis again and then after giving below warning

core_diversity_analyses.py -o closed_open_refrences_Step3/ -i closed_open_refrences_Step2/otu_table.biom -m temp.tsv_corrected.txt -t closed_open_refrences_Step2/97_otus.tree -e 2300

/gpfs/home/nsyed/anaconda3/envs/qiime1/lib/python2.7/site-packages/skbio/stats/ordination/_principal_coordinate_analysis.py:107: RuntimeWarning: The result contains negative eigenvalues. Please compare their magnitude with the magnitude of some of the largest positive eigenvalues. If the negative ones are smaller, it's probably safe to ignore them, but if they are large in magnitude, the results won't be useful. See the Notes section for more details. The smallest eigenvalue is -0.000681506979748 and the largest is 0.576738476103.

I am again getting same error.

*** ERROR RAISED DURING STEP: Make emperor plots, weighted_unifrac)

Command run was:

make_emperor.py -i closed_open_refrences_Step3//bdiv_even2300//weighted_unifrac_pc.txt -o closed_open_refrences_Step3//bdiv_even2300//weighted_unifrac_emperor_pcoa_plot/ -m temp.tsv_corrected.txt

Command returned exit status: 1

Stdout:

Stderr

Traceback (most recent call last):

File "/gpfs/home/nsyed/anaconda3/envs/qiime1/bin/make_emperor.py", line 643, in <module>

main()

File "/gpfs/home/nsyed/anaconda3/envs/qiime1/bin/make_emperor.py", line 468, in main

sids_intersection, include_repeat_cols=True)

File "/gpfs/home/nsyed/anaconda3/envs/qiime1/lib/python2.7/site-packages/emperor/qiime_backports/filter.py", line 115, in filter_mapping_file

temp.tsv_corrected.txt

Greg Caporaso

unread,

May 5, 2017, 3:24:57 PM5/5/17

to Qiime 1 Forum

Hello,

The validate_mapping_file.py script will correct what it can in your mapping file, but sometimes there are errors that it can't correct. In this case it looks like you're missing data in the BarcodeSequence and LinkerPrimerSequence columns. validate_mapping_file.py isn't able to provide that information for you, which is why when you re-run validate_mapping_file.py on the "corrected" file, it is still generating errors.

Since it seems like you're not using QIIME for demultiplexing, you can delete those two columns from the mapping file and try re-running core_diversity_analyses.py. validate_mapping_file.py will still give you an error about those columns being missing, but in this case it can be safely ignored.

In response to your other questions:

RuntimeWarning: The result contains negative eigenvalues. Please compare their magnitude with the magnitude of some of the largest positive eigenvalues. If the negative ones are smaller, it's probably safe to ignore them, but if they are large in magnitude, the results won't be useful. See the Notes section for more details. The smallest eigenvalue is -0.000681506979748 and the largest is 0.576738476103.

This warning can be safely ignored (see the text in the warning message).

Can you suggest which is optimum sample depth in this case?

There isn't a great choice here, since you have such a large range of values, but since there are only seven samples I would probably go with 2939 to retain all of the samples.

I notice another potential issue with your analysis. It looks like some of your sequences are V4 and some are V1-V3, but you're using pick_open_reference_otus.py. If your reads were not all generated from the same primer pair, you need to use pick_closed_reference_otus.py. This is discussed in the documentation here.

Best,

Greg

Syed najeeb ashraf

unread,

May 5, 2017, 5:49:12 PM5/5/17

to Qiime 1 Forum

Thanks alot. Greg,

Your suggestion for removing BarcodeSequence and LinkerPrimerSequence worked .

Actually I followed this link http://qiime.org/documentation/file_formats.html here and it says that "In some circumstances, users may need to generate a mapping file that does not contain barcodes and/or primers. To generate such a mapping file, fields for “BarcodeSequence” and “LinkerPrimerSequence” can be left empty. An example of such a file is below (note that the tabs are still present for the empty “BarcodeSequence” and “LinkerPrimerSequence” fields):" so I just put empty tabs for these values but it was not working for me. I don't know but I think that I tried this empty tab option earlier and worked but not working now anymore.

Yes, I am using closed references OTU only,

Thanks

Syed najeeb ashraf

unread,

May 6, 2017, 7:29:48 AM5/6/17

to Qiime 1 Forum

I have few more silly questions

1. Since I have used pick_closed_reference_otus.py already so I don't need to run assigned taxonomy step again ?

2. My fastq data showed the high level of duplication around 90-95 %, so is it ok if I remove duplicate read? But I read somewhere that removing duplicate reads will affect the downstream analysis? What's your suggestion for that?

Thanks for the wonderful help !!

Thanks

Najeeb

Greg Caporaso

unread,

May 8, 2017, 6:20:43 PM5/8/17

to Qiime 1 Forum

Hello,

1. Since I have used pick_closed_reference_otus.py already so I don't need to run assigned taxonomy step again ?

That is correct.

2. My fastq data showed the high level of duplication around 90-95 %, so is it ok if I remove duplicate read? But I read somewhere that removing duplicate reads will affect the downstream analysis? What's your suggestion for that?

I'm not certain that I understand the question. It sounds like you're saying that 90-95% of the reads in your fastq files are duplicates of other observed sequences, and you're asking if you should therefore remove them. That information will be important for downstream analyses that pay attention to OTU abundance, so you shouldn't try to remove those reads from your fastq file. Those will get collapsed into OTUs (each of which will have a single representative sequence) in QIIME, so downstream steps won't need to, for example, perform taxonomy assignment on a lot of duplicated sequence reads.

Reply all

Reply to author

Forward