filter negative controls after denoise_wrapper.py

231 views
Skip to first unread message

jespenshade

unread,
Sep 21, 2012, 5:52:19 PM9/21/12
to qiime...@googlegroups.com
Hello,

I'd like to know if there is a way to filter all the OTUs represented in my negative controls out of the other samples. As if I was subtracting background noise. I've used denoise_wrapper.py on several files and inflated them all into one file.

Thanks!

Jordan

Jose Carlos Clemente

unread,
Sep 21, 2012, 6:09:17 PM9/21/12
to qiime...@googlegroups.com
Jordan,

do you mean removing OTUs found only in your controls from other
samples? For instance, if control_1 has OTU X, you'd want to remove X
from all samples. If this is what you are trying to do, you could:

1) get the OTU ids found in your controls, and generate a list with
those: I don't think we have something to do this easily, but here is
a way to do it:

+ filter all non-control samples from the OTU table:
filter_samples_from_otu_table.py -i otu_table.biom -o
otu_table.control_only.biom -m MappingFile.txt -s Treatment:NoControl
(make sure your mapping file contains one column 'Treatment' to
distinguish non-controls)
+ convert the resulting biom file to 'classic' format: convert_biom.py
-i otu_table.control_only.biom -o otu_table.control_only.txt -b
+ get the OTU ids found in controls: awk '{print $1}'
otu_table.control_only.txt | tail +3 > control_otu_ids.txt

2) remove OTUs found in controls
filter_otus_from_otu_tables.py -i otu_table.biom -e
control_otu_ids.txt -o otu_table_filtered.biom

This is kinda complicated, but I think it should work. If others have
better ideas, please share.

Jose
> --
>
>
>

jespenshade

unread,
Sep 26, 2012, 3:55:53 PM9/26/12
to qiime...@googlegroups.com
Jose,

I finally got back to this question and have been following your instructions, however I'm stuck at getting the OTU ids found in controls. Here is the script and result:

qiime@qiime-VirtualBox:~/Desktop/16_Stability/Inflated/Oral_Combined$ awk '{print $1}' otu_table.control_only.txt | tail +3 > control_otu_ids.txt
tail: cannot open `+3' for reading: No such file or directory

Should I have a file/directory called +3 or did I enter the script incorrectly?

Jordan

Jose Carlos Clemente

unread,
Sep 26, 2012, 4:35:21 PM9/26/12
to qiime...@googlegroups.com
Jordan,

sorry there was a typo in my reply. It should be

tail -n +3

Jose
> --
>
>
>

jespenshade

unread,
Sep 26, 2012, 8:03:16 PM9/26/12
to qiime...@googlegroups.com
This is how it ended up:

qiime@qiime-VirtualBox:~/Desktop/16_Stability/Inflated/Oral_Combined$ filter_otus_from_otu_table.py -i otu_table_no_pynast_failures.biom -e control_otu_ids.txt -o oral_otu_table_filtered.biom
Traceback (most recent call last):
  File "/home/qiime/qiime_software/qiime-1.5.0-release/bin/filter_otus_from_otu_table.py", line 120, in <module>
    main()
  File "/home/qiime/qiime_software/qiime-1.5.0-release/bin/filter_otus_from_otu_table.py", line 115, in main
    negate_ids_to_exclude)
  File "/home/qiime/qiime_software/qiime-1.5.0-release/lib/qiime/filter.py", line 229, in filter_otus_from_otu_table
    return otu_table.filterObservations(filter_f)
  File "/home/qiime/qiime_software/biom-format-0.9.3-release/python-code/biom/table.py", line 790, in filterObservations
    raise TableException, "All obs filtered out!"
biom.exception.TableException: All obs filtered out!

The oral_otu_table_filtered.biom file is empty. Is it more likely that I've done something wrong or that this approach won't work?

Thanks!

Jordan

Jose Carlos Clemente

unread,
Sep 27, 2012, 10:48:03 AM9/27/12
to qiime...@googlegroups.com
Jordan,

let's try checking intermediate results: can you please post the
results of these commands?

wc otu_table.control_only.txt
wc control_otu_ids.txt

Thanks,
Jose
> --
>
>
>

jespenshade

unread,
Sep 27, 2012, 4:59:59 PM9/27/12
to qiime...@googlegroups.com
Jose,
 
296 13326 54102 otu_table.control_only.txt
295  295 1070 control_otu_ids.txt
 
- Jordan

Jose Carlos Clemente

unread,
Sep 27, 2012, 5:09:29 PM9/27/12
to qiime...@googlegroups.com
Ok, if you do per_library_stats.py -i otu_table.biom, how many OTUs do
you have in total? Maybe what is happening is that OTUs in your
controls are all OTUs there are...
> --
>
>
>

jespenshade

unread,
Sep 28, 2012, 4:13:26 PM9/28/12
to qiime...@googlegroups.com
qiime@qiime-VirtualBox:~/Desktop/16_Stability/Inflated/Oral_Combined$ per_library_stats.py -i otu_table_no_pynast_failures.biom
Num samples: 48
Num otus: 295
Num observations (sequences): 255493.0

Seqs/sample summary:
 Min: 21.0
 Max: 25250.0
 Median: 3480.0
 Mean: 5322.77083333
 Std. dev.: 6161.00814004
 Median Absolute Deviation: 2786.0
 Default even sampling depth in
  core_qiime_analyses.py (just a suggestion): 21.0

Jose Carlos Clemente

unread,
Sep 28, 2012, 4:18:57 PM9/28/12
to qiime...@googlegroups.com
Hm, maybe the filtering was done the other way around: what if you do
the same procedure but specifying -s Treatment:Control' in
filter_samples_from_otu_table.py? Sorry this is taking this long, if
other users feel this functionality is important, please add a feature
request in our site.

Jose
> --
>
>
>

Jorrit-Jan Hofstra

unread,
Oct 14, 2012, 1:58:55 PM10/14/12
to qiime...@googlegroups.com
Hi Jespe and Jose,

I have been looking for the same functionality in QIIME.. Thank you both for posting this discussion. I have tried this method and got the same results as Jespe. I also tried it the other way around but all OTU's are removed and I end up with an empty biom-table. Maybe you have some more pointers on how to work around this problem?

You said in your last post, Jose, that if other users feel this functionality is import they fill out a 'feature request'. So I did.. I expect many people would like to be able to subtract control sample OTUs and also maybe subtract or filters out OTUs between different time points of treatments. My research field is medical microbiology and this kind of feature would be extremely useful for this field.

It would be great if it would eventually be possible to subtract/filter/compare different samples both qualitatively (filtering out OTU's based on presence) AND quantitatively (actually subtracting the number of OTU's from one sample to another.. I hope this makes sense somehow.

I would love to help develop such a functionality. However I'm a relative newbie in the whole bio-informatics field and am currently stil trying to learn to write code in Python.. and unfortunately not very good at it yet. But if you have any ideas on how I can contribute I'd be happy to..

Best regards,
Jorrit (in Amsterdam)

Op vrijdag 28 september 2012 22:19:19 UTC+2 schreef Jose het volgende:
Reply all
Reply to author
Forward
0 new messages