MED to phyloseq

sfuent...@gmail.com

unread,

Jun 27, 2016, 7:51:37 AM6/27/16

to Oligotyping and MED

Hi,

I'm making a phyloseq package from MED output, but I have a couple of questions regarding taxonomy.

To build the phyloseq I have:

- otu table: matrix of sequence nodes (e.g. 5006, 30004... etc) by samples

- sample data: a data frame with my metadata, samples by variables

Now for the taxonomy I have separated the levels from the assignment in RDP for each node resulting in:

- taxonomy table: matrix of sequence nodes by taxonomic ranks (from order to genus), where I added an extra level for nodes.

Building phyloseq works perfect but I'm unsure of whether this is the correct way, as I might be having some issues with taxonomy assignment in downstream analyses.

So my questions would be:

- How to make a taxonomy matrix from MED output that is compatible for phyloseq

- Should unclassified sequences be excluded? (though I think might be informative)

- Any other tip from going from MED to phyloseq is highly appreciated

Thanks

Susana

A. Murat Eren

unread,

Jun 27, 2016, 9:26:08 AM6/27/16

to Oligotyping and MED

Dear Susana,

Unfortunately I have zero experience with phyloseq (despite the fact that I know how awesome it is), therefore, I am not sure if I should even attempt to answer these questions.

When you find out the best practice to prepare your package please consider letting me know! :)

Best wishes,

--

A. Murat Eren (meren)
http://merenlab.org :: gpg

--
You received this message because you are subscribed to the Google Groups "Oligotyping and MED" group.
To unsubscribe from this group and stop receiving emails from it, send an email to oligotyping...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/oligotyping/604a80cb-c31d-4b4d-bcf5-2ec07ed64393%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

sfuent...@gmail.com

unread,

Jun 28, 2016, 9:27:37 AM6/28/16

to Oligotyping and MED

Dear Murat (Meren?),

I've been discussing it with a previous colleague of mine (developer of microbiome package, also quite cool https://github.com/microbiome/microbiome/blob/master/vignettes/vignette.md), as he also uses phyloseq format for input data.

He suggested, as I already thought, to create some sort of string for the taxonomy table, let's say from Order --> Species (where possible) and use this as the rownames. Then include one extra level into the phyloseq tax_table for the nodes.

We would then have a matrix of taxonomic string by levels (8).

Then I see that some nodes have actually the same taxonomy output, but they differ in the number of hits. Should I group these, or is the number of hits informative for other analyses?

I'm sorry I'm a bit new to this sort of data, and the company that provides the output is very secretive with its pipeline therefore I can't really say which parameters were used to get there.

All I know is that the nodes were classified using RDP, and I have a column with Lineage (that can be subdivided into the different levels) and another with Species (I think this might not be a very realistic level, but maybe it is possible to go so low?).