CATEGORY_METADATA

111 views
Skip to first unread message

Leo FR

unread,
Aug 24, 2020, 4:25:25 PM8/24/20
to Microbiome Helper
Hello,

I am doing step 7, and have a question about CATEGORY_METADATA. How should I format this file?

My main metadata file is formatted as follows, and worked in previous step (checked using google sheets tool):

Sample-id    treatment    location ...  

categorical    categorical    categorical ...

N101PF        control campus ...

N101Pi        treatment field ...
...
How should my CATEGORY_METADATA be formatted?

Thank you!
Leo

Gavin Douglas

unread,
Aug 24, 2020, 9:24:46 PM8/24/20
to microbio...@googlegroups.com
Hey there,

Are you running into an error? The metadata file needs to be formatted as described on the QIIME 2 website (see: https://docs.qiime2.org/2020.8/tutorials/metadata/). The “CATEGORY_METADATA” variable just refers to the name of a column, like “treatment” in your table.


Best,

Gavin 

--
You received this message because you are subscribed to the Google Groups "Microbiome Helper" group.
To unsubscribe from this group and stop receiving emails from it, send an email to microbiome-hel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/microbiome-helper/ed4542d7-d1fb-455f-b757-b44450146092n%40googlegroups.com.

Leo FR

unread,
Aug 25, 2020, 2:10:39 PM8/25/20
to Microbiome Helper
Hi Gavin

Thank you for replying. I am referring to these commands:

qiime feature-table group \
--i-table deblur_output/deblur_table_final.qza \
--p-axis sample \
--p-mode sum \
--m-metadata-file $METADATA \
--m-metadata-column CATEGORY \
--o-grouped-table deblur_output/deblur_table_final_CATEGORY.qza

qiime taxa barplot \
--i-table deblur_output/deblur_table_final_CATEGORY.qza \
--i-taxonomy taxa/classification.qza \
--m-metadata-file CATEGORY_METADATA \
--o-visualization taxa/taxa_barplot_CATEGORY.qzv

The first command runs just fine, and I use my variable "group" as CATEGORY. However, in the second part, I run into an error. The protocol says "Also you will need to create a new metadata file (called CATEGORY_METADATA below) to make a barplot of this grouped data." Is this new CATEGORY_METADATA file just a copy of my metadata file?

My codes:
qiime feature-table group \
   --i-table deblur_output/deblur_table_final.qza \
   --p-axis sample \
   --p-mode sum \
   --m-metadata-file $METADATA \
   --m-metadata-column group \
   --o-grouped-table deblur_output/deblur_table_final_group.qza

qiime taxa barplot \
   --i-table deblur_output/deblur_table_final_group.qza \
   --i-taxonomy taxa/classification.qza \
   --m-metadata-file group \
   --o-visualization taxa/taxa_barplot_group.qzv

Location of my metadata file: /home/qiime2/metadata.TXT
output from command 1: /home/qiime2/deblur_output/deblur_table_final_group.qza

metadata:
sample-id    barcode-sequence    treatment    location    Sampling    _population    group
categorical    categorical    categorical    categorical    categorical    categorical    Categorical
N101PF        control    campus    post_planting    low    C/C/POST
N101Pi        treatment    field    pre_planting    low    T/F/PRE

I think it is a problem with my files, as the report indicates "There was a problem with the command: (1/1) Missing option '--m-metadata-file".

I am still learning qiime,  it may sound like a simple problem.

Leo

André Comeau

unread,
Aug 25, 2020, 4:42:48 PM8/25/20
to Microbiome Helper
Leo,
The sequence of commands + metadata file (I think from your description) should be:

qiime taxa barplot
--i-table deblur_output/deblur_table_final.qza
--i-taxonomy taxa/classification.qza
--m-metadata-file metadata.txt
--o-visualization taxa/taxa_barplot.qzv

...for if you want to see a barplot with all the samples represented, one bar per sample, within your categories (ie: not the sum of similar samples into one bar per category) - if you are OK with this, then you do not have to do the sample regrouping below.

In order to instead see a barplot with your, for example, GROUP replicates all summed together:

qiime feature-table group
--i-table deblur_output/deblur_table_final.qza
--p-axis sample
--p-mode sum
--m-metadata-file metadata.txt
--m-metadata-column group
--o-grouped-table deblur_output/deblur_table_final_GROUP.qza

...which will then regroup your table to have all GROUP samples summed - and the important point here, that affects your metadata, is that now your original IDs are going to be replaced by the labels you have used in your GROUP category, so your new metadata file needs to be redone to match...so something like this:

sample-id    new_or_old_category_1  new_or_old_category_2  etc.
categorical  categorical            categorical            categorical
C/C/POST     label_1                label_3                label_5
T/F/PRE      label_2                label_4                label_6

...so you see what is happening here? It forces you to remake a new metadata file since all your previous categories might get messed up/invalidated if, for this example, all your "C/C/POST" group samples don't all share the same values for the other categories - for example, the C/C/POST may not be all the same Treatment, Location, etc., hence why you have to remodel the whole metadata table.

Once done, you can run the new barplot command as (for example):

qiime taxa barplot
--i-table deblur_output/deblur_table_final_GROUP.qza
--i-taxonomy taxa/classification.qza
--m-metadata-file metadata_GROUP.txt
--o-visualization taxa/taxa_barplot_GROUP.qzv

...this should generate the correct new barplot, with samples summed into the GROUP category (ie: will be new x-axis labels and not an option now in the pull-down menus), with the correct new categories in the pull-down menus.

Leo FR

unread,
Aug 26, 2020, 11:04:09 AM8/26/20
to microbio...@googlegroups.com
Hi Andre,

Thank you so much for the detailed answer. I updated my
CATEGORY_METADATA file and it worked like a charm.

I also had to change some things in the metadata file. For example it
does not accept "/", so it would not work for C/C/POST, for example.
The google sheet validation tool is really helpful to find these
issues.

Thank you all for the great work preparing these scripts and helping
here in the group!

Leo
> You received this message because you are subscribed to a topic in the Google Groups "Microbiome Helper" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/microbiome-helper/2zIvxvDsYoI/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to microbiome-hel...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/microbiome-helper/a1efa28b-557d-477f-bd59-38a2ce30b76dn%40googlegroups.com.

Brian Simison

unread,
Dec 16, 2021, 2:58:41 PM12/16/21
to Microbiome Helper
I ran into the same issue as Leo, but I have not been able to remodel my new metadata file properly. I must be misunderstanding. I do not understand what the " new_or_old_category_1 " represents.  I have thousands of samples, so it is difficult to keep editing manually.

one of my metadata files has the following columns:
#SampleID        BarcodeSequence        LinkerPrimerSequence        FileInput        Location        Twindex        Description
100N6                        100N6_S63_L001.assembled_filtered.nonchimera.fasta        Nose        B        100
100T6                        100T6_S36_L001.assembled_filtered.nonchimera.fasta        Throat        B        100
101N6                        101N6_S64_L001.assembled_filtered.nonchimera.fasta        Nose        A        101
101T6                        101T6_S37_L001.assembled_filtered.nonchimera.fasta        Throat        A        101

If I wanted to plot the 'Location' column, does 'Nose' and 'Throat' become  'SampleID' values? 
Is there another method to generate this new CATEGORY_METADATA file?

thank you

Andre Comeau

unread,
Dec 16, 2021, 4:08:52 PM12/16/21
to Microbiome Helper
Brian,
The main way to figure this out is to see what the layout of your new ASV feature table looks like after running the grouping command (qiime feature-table group) using the category you have chosen (such as Location in your case)...can you make a summary file of that new table (using qiime feature-table summarize --i-table grouped_table.qza --o-visualization grouped_table_summary.qzv) and then share the file? The names of the resulting "samples" along the left/first column of this new table (see in the Q2viewer) will then dictate how you have to reformat your new metadata file (since the new IDs have to match the first column of the metadata file).



ANDRÉ M. COMEAU, PhD
Manager  Integrated Microbiome Resource (IMR)
T: 902.494.2684 | E: andre....@dal.ca 

Address for deliveries:
Dept. of Pharmacology
Tupper Med. Bldg., room 5D
Dalhousie University
5850 College St.
Halifax NS B3H 4R2 

Research Associate (Lab Manager)

Morgan Langille Lab  Dept. of Pharmacology
ResearchGate Profile GoogleScholar Publications


"Without fantasy, there is no science. Without fact, there is no art." - Nabokov
"The good thing about science is that it's true whether or not you believe in it." - Neil deGrasse Tyson 



From: microbio...@googlegroups.com <microbio...@googlegroups.com> on behalf of Brian Simison <wbs...@gmail.com>
Sent: Thursday, December 16, 2021 3:58 PM
To: Microbiome Helper <microbio...@googlegroups.com>
Subject: Re: [microbiome-helper] CATEGORY_METADATA
 
CAUTION: The Sender of this email is not from within Dalhousie.
Message has been deleted
Message has been deleted
Message has been deleted

Brian Simison

unread,
Dec 17, 2021, 1:54:26 PM12/17/21
to Microbiome Helper
Thank you André!

I think I understand your instructions. I ran the following on a subset of our data and have attached the qza output. I tried to upload  the .qzv output file, but Google group said I cannot upload this file type, so I gzipped it, and that file type is also banned. So I attach a screengrab from qiime2view.

qiime feature-table group \
   --i-table deblur_output/deblur_table_final.qza \
   --p-axis sample \
   --p-mode sum \
   --m-metadata-file metadata.txt \
   --m-metadata-column Location \
   --o-grouped-table deblur_output/deblur_table_Location.qza

qiime feature-table summarize \
   --i-table deblur_output/deblur_table_Location.qza \
   --o-visualization grouped_table_Location_summary.qzv

Q2_feat_tble_Locationpng.png

Andre Comeau

unread,
Dec 20, 2021, 5:45:54 PM12/20/21
to Microbiome Helper
OK, so now you see that your samples have been regrouped into just 3 samples (your locations of Nose/Throat/X, with the last one being something I couldn't see in your original table snippet - a screenshot of the Sample Interactive Detail tab was what I was referring to so we could see the collapsed "sample names"), so now your new metadata file you are going to make will only have three lines/samples in it to match the Nose/Throat/X values.

Therefore, you just have to adjust any other columns you had in the metadata file to be the "all Nose" value in those columns for the Nose sample line, the "all Throat" values in the second line, etc. for the third line. That new 3-line metadata file should then work with this new regrouped feature table you've made in any downstream QIIME2 commands.



ANDRÉ M. COMEAU, PhD
Manager  Integrated Microbiome Resource (IMR)
T: 902.494.2684 | E: andre....@dal.ca 

Address for deliveries:
Dept. of Pharmacology
Tupper Med. Bldg., room 5D
Dalhousie University
5850 College St.
Halifax NS B3H 4R2 

Research Associate (Lab Manager)

Morgan Langille Lab  Dept. of Pharmacology
ResearchGate Profile GoogleScholar Publications


"Without fantasy, there is no science. Without fact, there is no art." - Nabokov
"The good thing about science is that it's true whether or not you believe in it." - Neil deGrasse Tyson 

Sent: Friday, December 17, 2021 2:54 PM

To: Microbiome Helper <microbio...@googlegroups.com>
Subject: Re: [microbiome-helper] CATEGORY_METADATA
CAUTION: The Sender of this email is not from within Dalhousie.
--
You received this message because you are subscribed to the Google Groups "Microbiome Helper" group.
To unsubscribe from this group and stop receiving emails from it, send an email to microbiome-hel...@googlegroups.com.

Brian Simison

unread,
Dec 31, 2021, 3:11:20 PM12/31/21
to Microbiome Helper
Thank you again André, 
Here is the grouped_table_Location_summary.qzv "Interactive Sample Detail":

Q2_feat_tble_inter_samp_Detail_Location.png


And here are the first few lines of the "Feature Detail" tab:

Q2_feat_tble_Feat_Detail_Location.png
 
The new matrix will have three rows, one each for my three LOCATIONs. I assume I will remove most of the original columns; Barcode, Linker, FileInput, and Location. But not sure what to do with 'Twindex', or 'Description'. Twindex is just twin 'A' or twin 'B'.

#Sample   IDBarcodeSequence   LinkerPrimerSequence   FileInput   Location    Twindex    Description
Nose
Throat
Hand


Again, the original has hundreds of rows: The SampleID code here is each individual is assigned a number like "100", 100N6 is a Nose sample from indiv 100, 100T6 is the same individual with a throat sample taken, and for hand samples we use "H". Individual 101 is the identical twin of 100, 102 and 103 are twins, etc.

#SampleID        BarcodeSequence        LinkerPrimerSequence        FileInput        Location        Twindex        Description
100N6    100N6_S63_L001.assembled_filtered.nonchimera.fasta        Nose        B        100
100T6    100T6_S36_L001.assembled_filtered.nonchimera.fasta        Throat        B        100
101N6    101N6_S64_L001.assembled_filtered.nonchimera.fasta        Nose        A        101
101T6    101T6_S37_L001.assembled_filtered.nonchimera.fasta        Throat        A        101
102N6    102N6_S65_L001.assembled_filtered.nonchimera.fasta        Nose        B        102
102T6    102T6_S38_L001.assembled_filtered.nonchimera.fasta        Throat        B        102
103N6    103N6_S66_L001.assembled_filtered.nonchimera.fasta        Nose        A        103
103T6    103T6_S39_L001.assembled_filtered.nonchimera.fasta        Throat        A        103
104H6    104H6_S243_L001.assembled_filtered.nonchimera.fasta        Hand        B        104
104N6    104N6_S67_L001.assembled_filtered.nonchimera.fasta        Nose        B        104



I think I am misunderstanding something here. Am I analyzing my data incorrectly given this is a twin study (Amplicon SOP v2)? Ultimately, what we are looking for are correlations between twins. Do twins have similar ASV composition under various conditions? Our full matrix has hundreds of columns, like antibacterial resistance, twins raised separately vs raised together, city vs rural, etc. 

I have successfully gone through the entire Amplicon SOP v2 pipeline, except this step. 

Happy New Year!

On Monday, December 20, 2021 at 2:45:54 PM UTC-8 André Comeau wrote:
OK, so now you see that your samples have been regrouped into just 3 samples (your locations of Nose/Throat/X, with the last one being something I couldn't see in your original table snippet - a screenshot of the Sample Interactive Detail tab was what I was referring to so we could see the collapsed "sample names"), so now your new metadata file you are going to make will only have three lines/samples in it to match the Nose/Throat/X values.

Therefore, you just have to adjust any other columns you had in the metadata file to be the "all Nose" value in those columns for the Nose sample line, the "all Throat" values in the second line, etc. for the third line. That new 3-line metadata file should then work with this new regrouped feature table you've made in any downstream QIIME2 commands.



ANDRÉ M. COMEAU, PhD
Manager  Integrated Microbiome Resource (IMR)
T: 902.494.2684 | E: andre....@dal.ca 

Address for deliveries:
Dept. of Pharmacology
Tupper Med. Bldg., room 5D
Dalhousie University
5850 College St.
Halifax NS B3H 4R2 

Research Associate (Lab Manager)

Morgan Langille Lab  Dept. of Pharmacology
ResearchGate Profile GoogleScholar Publications


"Without fantasy, there is no science. Without fact, there is no art." - Nabokov
"The good thing about science is that it's true whether or not you believe in it." - Neil deGrasse Tyson 

Andre Comeau

unread,
Jan 6, 2022, 11:45:09 AM1/6/22
to Microbiome Helper
Brian,
So yes, you have now collapsed your ASV table into only three samples now and so the rest of your metadata columns/categories are going to probably have to disappear, since they now represent a mishmash of many things that are in Throat, etc. (ie: probably now there isn't one unique category label to add to all Throat samples for any of them).

However, remember that this whole thing of collapsing was a very small side-command simply used in the one Step 7 in case you wanted to see the bar charts with all the samples cumulated for specific categories (ie: instead of a chart from the first part of Step 7 which would have one bar per sample, you'd have only three bars in this example). This collapsed ASV table is then not used again in any of our other steps/analyses from the SOP.

You continue to use instead the full ASV table (with full metadata file) for the subsequent diversity calculations (such as Shannon analysis between all your groups/metadata columns), for drawing the PCoA plot and for doing the optional ANCOM analysis (or whichever other QIIME2 module you want) to test for significant features between groups (of the category(ies) of your choosing). You do not need to collapse the ASV table above if you want to test for the differences between body locations, since the commands can do this without the need for collapsing (ie: they can read the metadata file and know to regroup those samples in that way for the test).

So, for example, you could run the ANCOM analysis using either of the below options (after doing the mandatory pseudocount version of the whole ASV table):
qiime composition ancom \
   --i-table deblur_output/deblur_table_final_pseudocount.qza \
   --m-metadata-file $METADATA \
   --m-metadata-column Location \
   --output-dir ancom_output
qiime composition ancom \
   --i-table deblur_output/deblur_table_final_pseudocount.qza \
   --m-metadata-file $METADATA \
   --m-metadata-column Twindex \
   --output-dir ancom_output
...to assess differences between body sites or twins. However, note that these analyses, by the nature of how the data is organized in your experiment and the way "population" analysis in microbiome are done, will be the synthesis of all the samples tagged with Throat vs. other locations, regardless of how they are further broken down within that category (ie: all twins/individuals with a Throat tag are mashed together). Similarly, testing for the Twindex category will have all individuals marked as TwinA compared to all TwinBs, regardless of which individual they are nor where the sample locations are from. In other words, this in not comparing each linked TwinA to TwinB and so can't tell you specifically if each individual pair of twins are different from each other.

You might be able to get a sense of that in the PCoA plot if you color the dots by "TwinPair" (which is a new column you would need in your metadata file regrouping, for example from below, all 100+101 samples into TwinPair1) and see if those 4 fours dots (ex: 100N+100T+101N+101T, sometimes 5-6 dots if you have Hand too) shared by those two twins/one pair are generally together in the plot, but might be difficult to see since you have hundreds (so colors will be repeating).

The type of analysis you are describing for the twins sounds more like "paired analysis" and so there is a plug-in and the associated paper (plus QIIME2 forum posts for help) you might want to look at:


You're probably going to have to modify your metadata file with something like the TwinPair above in order for the longitudinal analysis to be able to identify which sets of individuals are paired together into "twin sets", but they'll have instructions in the plug-in pages about how to do this.



ANDRÉ M. COMEAU, PhD
Manager  Integrated Microbiome Resource (IMR)
T: 902.494.2684 | E: andre....@dal.ca 

Address for deliveries:
Dept. of Pharmacology
Tupper Med. Bldg., room 5D
Dalhousie University
5850 College St.
Halifax NS B3H 4R2 

Research Associate (Lab Manager)

Morgan Langille Lab  Dept. of Pharmacology
ResearchGate Profile GoogleScholar Publications


"Without fantasy, there is no science. Without fact, there is no art." - Nabokov
"The good thing about science is that it's true whether or not you believe in it." - Neil deGrasse Tyson 

Sent: Friday, December 31, 2021 4:11 PM

Brian Simison

unread,
Jan 7, 2022, 12:17:38 PM1/7/22
to Microbiome Helper
Thank you so much  André !
Very helpful explanations. 

Reply all
Reply to author
Forward
0 new messages