Merging samples in OTU table (Biom file/phloseq object)

Peter Atanackov

unread,

Nov 11, 2015, 4:06:37 PM11/11/15

to Qiime 1 Forum

I have a set of data from 4 separate 454 runs, I merged all the *.fna files with the CAT command right after "split_libraries.py" and before "pick_otus.py" I did this since the mapping files contain barcode duplicates.

In the end of the process I'm left with a nice biom file which has 70 samples. The samples were taken at three different times in the year and I would like to analize them separately grouped by the date at which the samples were taken. I'm left with the problem of grouping the samples by date, the "summarize_otu_by_cat.py" command doesn't have much use here, because it requires a mapping file, also I cannot merge the OTU table in phyloseq, because the phyloseq object does not contain sample variables.

Is there a way to just merge all the samples in the OTU table? Then a simple script in Python could use the "filter_samples_from_otu_table.py", make three separate biom files with samples taken on the same date in each one of them (the sample names have the date in them), merge all the samples in the three biom files and merge the three biom files together in to a single one. Or alternatively is there another way?

Also is it possible to add a sample variable to the phyloseq object?

Any input is welcomed.

Thank you, Peter

Colin Brislawn

unread,

Nov 11, 2015, 4:21:55 PM11/11/15

to Qiime 1 Forum

Hello Peter,

You need a metadata mapping file for your combined runs. You should be able to make ones for your combined run by merging the mapping files you used for the 4 separate 454 runs.

Once you have this mapping file which lists information for each sample, including date, you can use it with qiime scripts or with phyloseq. There are ways or merging samples in both qiime and phyloseq, but first you have to have that mapping file so you can tell the script what files to merge.

Let me if you need help making this file!

Colin

Peter Atanackov

unread,

Nov 11, 2015, 4:42:55 PM11/11/15

to qiime...@googlegroups.com

Hello Colin,

That would be so much easier since the mapping files contain actual sample variables, but won't the fact that some samples have the same barcode sequence pose a problem?

Colin Brislawn

unread,

Nov 11, 2015, 6:17:49 PM11/11/15

to Qiime 1 Forum

Having the same barcodes when demultiplexing would be a big problem! For downstream analysis, you can leave the barcode column blank because barcodes are never used (you still need the BarcodeSequence header at the top of the empty column).

When you run validate_mapping_file.py, you can pass --not_barcoded so the barcode check is avoided.

Colin

Peter Atanackov

unread,

Nov 12, 2015, 9:49:34 AM11/12/15

to Qiime 1 Forum

Thank you Colin, I tried it out and it worked like a charm.

Cheers!

Reply all

Reply to author

Forward