How to highlight which OTUs are shared between particular samples?

1,905 views
Skip to first unread message

joanito

unread,
Jan 27, 2013, 11:29:50 AM1/27/13
to qiime...@googlegroups.com
Hi everybody,

I'm analyzing the microbal communities of some ants, and would like to highlight which OTUs are shared between particular samples... How would you guys do this? I know in Mothur one can produce Venn diagrams showing how many OTUs are shared between the different samples. Something like that would be a good start, but I'd also like to see exactly what are those OTUs (not just the number of shared OTUs) and possibly in which abundance they are shared... Also if I could apply some statistical tests to see whether the association of those bacteria is related to a particular ecological niche of the ants it would be great (basically I have my samples divided in two types of lifestyles and would like to see if any OTU is correlated with the two lifestyles...)... Help would be greatly appreciated!!

Thanks!
Joanito

Tony Walters

unread,
Jan 27, 2013, 3:02:44 PM1/27/13
to qiime...@googlegroups.com
Hello Jaonito,

I'd recommend looking at the cytoscape networks displayed with the data from make_otu_network.py (see http://qiime.org/tutorials/making_cytoscape_networks.html).  You could also filter the input OTU table to remove low abundance OTUs if you want to look for only abundant and shared OTUs (use filter_otus_from_otu_table.py, http://qiime.org/scripts/filter_otus_from_otu_table.html).  To look for OTUs associated with given categories, use the otu_category_significance.py script (http://qiime.org/scripts/otu_category_significance.html?highlight=otu_category_significance), but you want to do even sampling on the OTU table to remove sequencing depth effects (single_rarefaction.py).

-Tony

On Sun, Jan 27, 2013 at 9:47 AM, Tony Walters <william....@gmail.com> wrote:
Heya Luke,

Can you answer this one?  Part of the answer is the make_otu_network, possibly combined with filtering of low abundance samples to get only shared and abundant OTUs.


Joanito

--
 
 
 


joanito

unread,
Jan 30, 2013, 7:51:09 AM1/30/13
to qiime...@googlegroups.com
Hi Tony,
thanks a lot for your reply....
I've been trying to make the network as you suggested, but I'm having troubles understanding how to color the samples according to different categories... in the help it says: -b COLORBY, --colorby=COLORBY
                        This is the categories to color by in the plots from
                        the user-generated mapping file. The categories must
                        match the name of a column header in the mapping file
                        exactly and multiple categories can be list by comma
                        separating them without spaces. The user can also
                        combine columns in the mapping file by separating the
                        categories by "&&" without spaces [default=none]
I have some problems understanding what this means... So basically what I tried to do is to add a column in my mapping file where I have category as header and then I have three different categories in my samples whether they are fungivorous hosts, fungivorous parasites or carnivorous predators... is it possible to color them by these three categories somehow? I had put them in the column like this example: fungivorous,parasite ....then I run the command with -b category
I also tried to just give them a number 1 2 or 3, but it simply doesn't get this info when I generate the network and open the files in cytoscape... basically when I get to add the attributes of the network and want to color the samples differently, the categories are not there... Am I doing this wrong somehow?

Thanks!
Joanito

Jai Ram Rideout

unread,
Jan 30, 2013, 10:52:14 AM1/30/13
to qiime...@googlegroups.com
Hi Joanito,

Your approach seems correct (i.e. creating a column in your mapping file so that you can label each of your samples one of the three values). What you'll want to do is pass the the new column name using the --colorby option. For example, if your new column is named SampleType:

make_otu_network.py -i ... -o ... -m ... -b SampleType

You'll then need to open the resulting OTU network in Cytoscape and color by that category. Please refer to the following references for how to do this (the tutorials use the Treatment column, which has two values 'Control' and 'Fast', but the same principles should apply to your new column as well).

http://qiime.org/tutorials/making_cytoscape_networks.html

Hope this helps,
Jai


--
 
---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

joanito

unread,
Jan 30, 2013, 11:07:25 AM1/30/13
to qiime...@googlegroups.com
Ok, I've found a way around this... I can actually decide which colours to assign to the samples in cytoscape even without the category column... so no worries about this. But I have another question, or actually a few. Right now the network is quite messy and hard to read, so I'd like to simplify it by as you said removing all the low abndance otus from the otu table. Then I'd like to map the OTU names to the different OTU nodes, and I can easily do this by giving them the taxa assignment column from the otu table. But in this way I get something like 
k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Pseudonocardiaceae as name for each otu... As you might imagine, in this way the network becomes completely unreadable... is there a way to retain only the last name of the assignment (in this case only Pseudonocardiaceae) instead of the complete assignment without having to rename every single otu in the network manually? I've been thinking one way around this would be to open the otu table in Megan, so that it will cluster different otus belonging to the same taxa together and then export the assignments, the export file will then have only the last level of the taxonomy, but then is it possible to convert this back in the biom format to make it readable for qiime again? Also I don't know if this would be the best way because for example if there are originally two Pseudonocardia otus with one shared between two samples and one another not, I would loose this kind of detail...

Another question, but this is more general... when one removes singletons with this command filter_otus_from_otu_table.py -i otu_table.biom -o otu_table_no_singletons.biom -n 2
this removes all otus that are represented by only a single sequence across all the dataset. Now my doubt is... if I have for example one OTU that is abundant in one sample but represented with only 1 sequence in another sample, should I remove that 1 sequence and consider that sample to be instead negative regarding that particular OTU? In other words how likely it is that that 1 sequence is just an artifact in that particular sample when in fact that otu is present in other samples? What could be the reason for this? Is it possible that this is due to carryover during the sequencing run? Or because of a barcode mismatch that single sequence should instead belong to the other sample where that otu is abundant? I have this issue in mind particularly because I have done some PCR work on some of these bacteria and I fail to amplify them from some specific samples from which I have pyrosequencing data that tells me they are there but maybe just with 1-2 or maximum 4 sequences...

Sorry for all these questions, maybe I should have opened a new discussion... hope this is fine and what I wrote is understandable...
Thanks a lot!!
Joanito

Luke Ursell

unread,
Jan 30, 2013, 11:16:46 AM1/30/13
to qiime...@googlegroups.com
Hi Juanito,

Just to add, you'll need to run the make_otu_network.py script with you new mapping file in order to include the metadata column. Make sure you are importing Node Attributes as per the instructions on the Make Cytoscape Networks tutorial page that Jai linked to. Also, that tutorial was made with Cytoscape 2.8.3 (I believe), rather than the new Cytoscape 3.0 beta. I find important the node and edge attributes to be more intuitive in 2.8.3, but then I usually switch to 3.0 for its improved visual resolution and rendering. 

The make_otu_networks.py script will also produced your node table. Note that this table can be modified in any way you see fit, including adding in new metadata to describe the OTUs and the samples. So if you are having difficulty in finding your category of interest, make sure it is present in the node attribute table that you import into Cytoscape.

Hope this helps,
Luke

joanito

unread,
Jan 30, 2013, 11:17:59 AM1/30/13
to qiime...@googlegroups.com
Hi Jai,

thanks! But this is exactly what I had done (which is also what you do when you want to sort the otu table according to a new column in the mapping file for example, right? which i did with no problems...)... but for some reason that column is not reported in the network output files and so cytoscape doesn't see that column, so I can't color the samples according to that... but as I wrote before I have gone around this by simply manually coloring the different samples with three different colours... so no worries, even though either I'm still doing something wrong or there is a bug somewhere in the process...

but I have posted some other questions before, now I don't know if they are visible or gone behind in the discussion, so I'll copy and paste them again here: 

Ok, I've found a way around this... I can actually decide which colours to assign to the samples in cytoscape even without the category column... so no worries about this. But I have another question, or actually a few. Right now the network is quite messy and hard to read, so I'd like to simplify it by as you said removing all the low abndance otus from the otu table. Then I'd like to map the OTU names to the different OTU nodes, and I can easily do this by giving them the taxa assignment column from the otu table. But in this way I get something like 
k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Pseudonocardiaceae as name for each otu... As you might imagine, in this way the network becomes completely unreadable... is there a way to retain only the last name of the assignment (in this case only Pseudonocardiaceae) instead of the complete assignment without having to rename every single otu in the network manually? I've been thinking one way around this would be to open the otu table in Megan, so that it will cluster different otus belonging to the same taxa together and then export the assignments, the export file will then have only the last level of the taxonomy, but then is it possible to convert this back in the biom format to make it readable for qiime again? Also I don't know if this would be the best way because for example if there are originally two Pseudonocardia otus with one shared between two samples and one another not, I would loose this kind of detail...

Another question, but this is more general... when one removes singletons with this command filter_otus_from_otu_table.py -i otu_table.biom -o otu_table_no_singletons.biom -n 2
this removes all otus that are represented by only a single sequence across all the dataset. Now my doubt is... if I have for example one OTU that is abundant in one sample but represented with only 1 sequence in another sample, should I remove that 1 sequence and consider that sample to be instead negative regarding that particular OTU? In other words how likely it is that that 1 sequence is just an artifact in that particular sample when in fact that otu is present in other samples? What could be the reason for this? Is it possible that this is due to carryover during the sequencing run? Or because of a barcode mismatch that single sequence should instead belong to the other sample where that otu is abundant? I have this issue in mind particularly because I have done some PCR work on some of these bacteria and I fail to amplify them from some specific samples from which I have pyrosequencing data that tells me they are there but maybe just with 1-2 or maximum 4 sequences...

Sorry for all these questions, maybe I should have opened a new discussion... hope this is fine and what I wrote is understandable...
Thanks a lot!!
Joanito

Luke Ursell

unread,
Jan 30, 2013, 11:28:28 AM1/30/13
to qiime...@googlegroups.com
Hi Juanito,

One thing that might work for you is to:
1) Convert your OTU biom into a classic format preserving the taxonomy information by passing -b --header_key='taxonomy'
2) Open the classic table in Excel, and then split the taxonomy string into individual levels, so that k__ has its own column, p__ has its own column, etc.
3) You can do that by selecting the Data -> Text to Columns, and then splitting based on ;
4) Now you've got taxonomic info for each of your OTU ids
5) You'll now want to merge this taxonomic info into your node table - simply open the node table in Excel, and paste in the new columns, making sure you are obviously pasting them to the correct OTU ids. I recommend the =VLOOKUP function in Excel for this task.
6) You should now be able to import the new node table into Cytoscape
7) Cytoscape will let you color and label, or even group nodes, based on your node metadata. I suggest reading some more in depth manuals on Cytoscape function, as it is a very powerful tool that I'm only familiar with a handful of features. Perhaps post on the Cytoscape google groups help forum.

Best,
Luke

joanito

unread,
Jan 30, 2013, 12:19:40 PM1/30/13
to qiime...@googlegroups.com
Hi Luke,

thanks for you suggestion! There is still a problem with doing this... that not all OTUs have been assigned at the same taxonomic level, many reached the genus, but a lot are at the order or even phylum or kingdom, so even by splitting the taxonomy as you suggested I can't simply copy and paste one column, I would have to create another column with the highest assignment for each OTU... I can do this but it will take quite some time manually as I have 700 OTUs... is there a way around it that you could think of?  
Also, do you have an answer for the other question I asked before, about the singletons in individual samples? That is really something that I would like to understand, why I can't get those samples to amplify with specific PCR, when pyrosequencing tells me there are 1 or few reads of those otus... And I'd like to decide how to treat those singletons or just a few reads in my otu table, if keeping them and consider those otus to be present, or discard them, before creating the network and other analyses...

Thanks a lot!
Joanito

Luke Ursell

unread,
Jan 30, 2013, 12:27:38 PM1/30/13
to qiime...@googlegroups.com
Hi Joanito,

I think some manual curation will be required in order to get the taxonomy working for Cytoscape, although there may be an easier way by directly using Cytoscape (though you'd have to contact them). 

Another question, but this is more general... when one removes singletons with this command filter_otus_from_otu_table.py -i otu_table.biom -o otu_table_no_singletons.biom -n 2
this removes all otus that are represented by only a single sequence across all the dataset. Now my doubt is... if I have for example one OTU that is abundant in one sample but represented with only 1 sequence in another sample, should I remove that 1 sequence and consider that sample to be instead negative regarding that particular OTU? In other words how likely it is that that 1 sequence is just an artifact in that particular sample when in fact that otu is present in other samples? What could be the reason for this? Is it possible that this is due to carryover during the sequencing run? Or because of a barcode mismatch that single sequence should instead belong to the other sample where that otu is abundant? I have this issue in mind particularly because I have done some PCR work on some of these bacteria and I fail to amplify them from some specific samples from which I have pyrosequencing data that tells me they are there but maybe just with 1-2 or maximum 4 sequences…

When you pass the -n 2 option, you are saying that across all samples there must be at least 2 sequences that map back to that OTU. Thus, if the OTU was represented by 100 sequences in one sample, and only 1 sequence in another sample, that OTU would be retained because it was represented by 101 sequences overall. However, if you passed the -s 2 in addition to the -n 2 parameter, now you are mandating that each OTU be present in at least 2 samples, and within each sample it has to be represented by at least 2 sequences. In this case, the OTU that was present by 100 seqs in sample 1 and only one sequence in sample 2 would be discarded.

Does this fully answer your question?
Luke

joanito

unread,
Jan 30, 2013, 12:50:44 PM1/30/13
to qiime...@googlegroups.com
Thanks, but my question was more about the biological meaning behind... I mean if I have an OTU that has 100 sequences in one sample and only 1 sequence in another, how likely is it that the bacteria is truly present in both samples? Could the fact that I find only one sequence of that bacteria in the other sample be due to other causes (like I don't know, carryover between samples during the sequencing runs or an error in the barcode sequence of that particular sequence that ended up being assigned to the wrong sample or other reasons?...). I ask this because I have screened my samples with PCR, so let's say I had 8 samples of these ants and I looked for presence absence of this particular bacteria in all of these samples... I had pyrosequencing data for the same DNA extractions telling me that these bacteria are present in 6 of those samples, but are abundant in 4 of them, while in 2 there are only 1 and 2 sequences respectively... but when I do the PCR I get a positive PCR band for the 4 samples that resulted abundantly infected, while I get completely negative for the rest of the samples (even though I tried all sorts of things from dilution to magnesium gradients, to trying with other polymerases ecc... but nothing, keeps on being completely negative...)... so what should I conclude from this? The PCR is less sensitive compared to the pyrosequencing so the bacteria are actually there but they are not amplifiable in PCR reaction because there are too few of them? Or there is some error in some of the steps of the analyses of 454 data (or in the sequencing) so actually I should consider those samples negative and remove those singletons (or 2 sequences) and consider them 0? More generally, what should one do with the singletons (meaning, I have already removed the OTUs that are represented by only one sequence in total across all samples, but should I also remove the singletons in every sample, also if that OTU is represented in other samples with more sequences?)

Hope it's more understandable!!
Thank you very much for your help!
Joanito

Luke Ursell

unread,
Jan 30, 2013, 12:55:51 PM1/30/13
to qiime...@googlegroups.com
Hi Joanito,

The question you are asking is very similar to trying to distinguish faint bands on a western blot, or other signal-poor situations. In the end it is going to be up to you to assess whether or not you are comfortable the data. You might re-sequence. But, to my mind, if I was only interested in the presence / absence of a specific OTU, and that OTU was only represented by one sequence in a sample, and that sample failed to yield the OTU through PCR….I would probably conclude that it was not present in the sample. I routinely remove all singletons from my data before downstream analysis.

Luke

joanito

unread,
Jan 30, 2013, 1:19:38 PM1/30/13
to qiime...@googlegroups.com
Ok, thanks Luke! So you not only remove OTUs that are represented by one sequence in total across samples, but you also remove singletons within samples? Like if you have an OTU that results like 65, 43, 12, 1, 0, 500 you would make it be 65, 43, 12, 0, 0, 500? I'm sorry if I bother you with this, but this is very important for me to decide for all downstream analyses... I'm just puzzled of what might be the reasons for singletons to appear and what is the consensus among scientist on how to treat these...

Thank you!!

Luke Ursell

unread,
Jan 30, 2013, 1:24:31 PM1/30/13
to qiime...@googlegroups.com
I don't remove singletons within samples, only singletons across samples. In the example you provided, I would keep the OTU that was present at the level of 1 sequence, because I would be confident that the OTU was actually present and could be detected by my sequencing given the high prevalence in the other samples. The removal of singletons is mainly used to remove artifacts from sequencing, and not just OTUs that a present at low abundances in some samples.

Luke
Reply all
Reply to author
Forward
0 new messages