ForcedPredictors question

290 views
Skip to first unread message

juliana soto

unread,
May 17, 2018, 10:53:56 AM5/17/18
to MaAsLin-users

Hello,

I have a question regarding the option "ForcedPredictors " in the config file.


My metadata is location, Age, Race, and Clinical status. I am interested in control the effect of Age and Race in the abundance and evaluate only the one from the variables location and clinical Status. To do that, should I use:

  1. fAllvAll = TRUE
  2. strForcedPredictors = c("Age", "Race")
Thanks,


Juliana

Himel Mallick

unread,
May 22, 2018, 6:51:11 AM5/22/18
to juliana soto, MaAsLin-users
Thanks, Juliana. Posting this email thread to the MaAsLin users group for future references.

Himel

HIMEL MALLICK | Postdoctoral Associate
Medical and Population Genetics | Broad Institute of MIT and Harvard
415 Main Street | Cambridge, MA 02142
Email: hmal...@broadinstitute.org
Website: https://www.hsph.harvard.edu/himel-mallick/

On Fri, May 18, 2018 at 11:45 AM, juliana soto <dendr...@gmail.com> wrote:
Hi Himel,

Thanks a lot for this explanation. It is extremely useful to understand how MaAsLin works. Now, I feel more confident about the results I got from the analysis.

Best regards,

Juliana



On Thu, May 17, 2018 at 6:48 PM, Himel Mallick <hmal...@broadinstitute.org> wrote:
Hi Juliana,

From the analysis standpoint, the variables we want to adjust for are those that we are not primarily interested in but we want to include in the model in order to ‘cancel out’ their effect on the primary variables of interest. In that sense, you are absolutely right that one would not expect to see these variables in the output file. But if we carefully look at the statistical model under the hood, those variables are not estimated separately or independently. In fact, they are treated exactly like the primary variables in the model and all the variables are jointly estimated to produce individual coefficient estimates and p-values. That being said, in most practical scenarios, we are not interested in anything beyond our primary variables of interest. This viewpoint is incorporated in many statistical softwares  that allow you to specify your main and adjusting variables separately so that only main variables are included in the output table. MaAsLin does not separate these two  for a couple of reasons: (i) it accommodates the general case of multivariable analysis when users are interested in all the variables, and (ii) users can always subset their output file to the specific variables they are interested in. Note that, these two philosophical differences of reporting significant associations do not undermine the individual p-values and coefficients as they are essentially coming from the exact same multivariable model.

Sorry for the long explanation but a short version for your purpose is: you can ignore the variables that you are not interested in but want to adjust for and that does not affect your conclusions regarding the main variables of interest. Hope this helps. 

Many thanks,
Himel

On May 17, 2018, at 2:53 PM, juliana soto <dendr...@gmail.com> wrote:

it does, thanks!!!

Another question: I ran the same variables but in this case I used another abundance table from different samples and I got a list of taxa wit significant associations in the ouput file $PROJECTNAME.txt.
If this output is the result of the interaction between the effects of the variables (abundance~Age+Location+Race) but fixing the effect of Race, why I got results with this variable (3 significant associations)? My understanding is that if I fix a variable, its effects should be "canceled" and not add them to the total significant associations

Also, I ran it again without controlling "race", to know all vs all affects and I got only one significant association for race. I am a little bit confused with the interpretation of the results.

I have attached the two output files, one controlling race (Results.race) and the other one without restrictions (results.All)


This is the config file I am using:
Matrix: Metadata
Delimiter: TAB
Name_Row_Number: 1
Name_Column_Number: 1
Read_PCL_Rows: 2-4
Read_PCL_Columns: Q101-


Matrix: Abundance
Delimiter: TAB
Name_Row_Number: 1
Name_Column_Number: 1
Read_PCL_Rows: 5-
Read_PCL_Columns: Q101-


fAllvAll = TRUE
strForcedPredictors = "race"



Thanks a lot Himel and sorry to bother you with many questions!!!!

Juliana



On Thu, May 17, 2018 at 1:52 PM, Himel Mallick <hmal...@broadinstitute.org> wrote:
Hi Juliana,

Your intuition about "No significant data found" is correct. To answer your other question, MaAsLin by default fits a multivariable linear model and returns output for each significant feature-metadata pairs from the linear model. For your specific example, MaAsLin is fitting the following per-feature model: Feature ~ Age + Location + Race. Therefore, it will return significant results per metadata, including Race. Let me know if that answers your question.

Many thanks,
Himel


HIMEL MALLICK | Postdoctoral Associate
Medical and Population Genetics | Broad Institute of MIT and Harvard
415 Main Street | Cambridge, MA 02142
Email: hmallick@broadinstitute.org
Website: https://www.hsph.harvard.edu/himel-mallick/

On Thu, May 17, 2018 at 12:54 PM, juliana soto <dendr...@gmail.com> wrote:
Hi Himel,
Thanks for your response.

I ran it controlling only "Race" and I got output files for each of the variables: Age, Location but also for Race. My question is why still it generates an output for the variable that I am controlling?

Also, the ouput file $PROJECTNAME.txt says: "No significant data found."  Does it mean that I don't have significant data when the model takes into account all the variables? for instance: abundance~Age+Location+Race

Thanks for your help

Juliana




On Thu, May 17, 2018 at 11:19 AM, Himel Mallick <hmal...@broadinstitute.org> wrote:
Hi Juliana,

This seems correct based on what you want to do.

Many thanks,
Himel

HIMEL MALLICK | Postdoctoral Associate
Medical and Population Genetics | Broad Institute of MIT and Harvard
415 Main Street | Cambridge, MA 02142
Email: hmallick@broadinstitute.org
Website: https://www.hsph.harvard.edu/himel-mallick/

--
You received this message because you are subscribed to the Google Groups "MaAsLin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to maaslin-users+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.




<results.All.txt>
<Results.race.txt>


Himel Mallick

unread,
May 22, 2018, 7:26:59 AM5/22/18
to 王志峰, MaAsLin-users
Hello there,

This problem arises if you don't have the correct versions of the required packages. In particular, MaAsLin requires the following R packages: agricolae, gam (version 1.14), gamlss, gbm, glmnet, inlinedocs, logging, MASS, nlme (version 3.1-127), optparse, outliers, penalized, pscl, robustbase. Can you make sure you have the correct versions for nlme and gam? If not, can you try installing the correct versions and re-install MaAsLin and see if the problem still remains?

Many thanks,
Himel


HIMEL MALLICK | Postdoctoral Associate
Medical and Population Genetics | Broad Institute of MIT and Harvard
415 Main Street | Cambridge, MA 02142
On Tue, May 22, 2018 at 7:14 AM, 王志峰 <wzfa...@gmail.com> wrote:
Hello,
I have a question regarding the option "strRandomCovariates" in R(v3.5.0).  MaAsLin(v0.0.4)


Run_Parameters.txt 
Parameters used in the MaAsLin run
Optional input read.config file= maaslin_wgs_all/data.conf
Optional R file= 
FDR threshold for pdf generation= 0.25
Minimum relative abundance= 1e-04
Minimum percentage of samples with measurements= 0.1
The fence used to define outliers with a quantile based analysis. If set to 0, the Grubbs test was used= 0
Ignore if the Grubbs test was not used. The significance level used as a cut-off to define outliers= 0.05
These covariates are treated as random covariates and not fixed covariates= subjectID These covariates are treated as random covariates and not fixed covariates= mgx_pool
The type of multiple testing correction used= BH
Zero inflated inference models were turned on= FALSE
Feature selection step= boost
Statistical inference step= lm
Numeric transform used= asinsqrt
Quality control was run= TRUE
These covariates were forced into each model= 
These features' data were not changed by QC processes= 
Output verbosity= DEBUG
Log file was generated= TRUE
Data plots were inverted= FALSE
Ignore unless boosting was used. The threshold for the rel.inf used to select features= NA
All verses all inference method was used= FALSE
Ignore unless penalized feature selection was used. Alpha to determine the type of penalty= 0.95

When I set  “strRandomCovariates ” ,the p values in result are all negative(<0),
cmd "Maaslin.wrapper(metaphlan2_species, metadata, strOutputDIR = "maaslin_wgs_all2", strRandomCovariates = c("subjectID","mgx_pool"))"  (use library("microbiomics"))
eg. Variable Feature Value Coefficient N N.not.0 P.value Q.value
1 after_abx p__Firmicutes_c__Bacilli_o__Lactobacillales_f__Lactobacillaceae_g__Lactobacillus_s__Lactobacillus_rhamnosus after_abxFALSE -0.0201593470101126 651 135 -3.05452881162841 -238.253247307016
2 after_abx p__Firmicutes_c__Clostridia_o__Clostridiales_f__Clostridiaceae_g__Clostridium_s__Clostridium_symbiosum after_abxFALSE -0.00555081353636654 651 233 -2.61083393129052 -130.199292114521
3 after_abx p__Proteobacteria_c__Gammaproteobacteria_o__Enterobacteriales_f__Enterobacteriaceae_g__Klebsiella_s__Klebsiella_unclassified after_abxFALSE -0.00390176332187698 651 195 -2.56863524152464 -124.028387376476
4 after_abx p__Firmicutes_c__Clostridia_o__Clostridiales_f__Clostridiaceae_g__Clostridium_s__Clostridium_nexile after_abxFALSE -0.0111570950512306 651 218 -2.5535799653611 -119.507542378899
5 after_abx p__Actinobacteria_c__Actinobacteria_o__Bifidobacteriales_f__Bifidobacteriaceae_g__Bifidobacterium_s__Bifidobacterium_breve after_abxFALSE -0.0058402594963034 651 437 -2.16024373367262 -78.2316837837155
6 after_abx p__Bacteroidetes_c__Bacteroidia_o__Bacteroidales_f__Bacteroidaceae_g__Bacteroides_s__Bacteroides_xylanisolvens after_abxFALSE -0.0175852736236893 651 189 -1.84884094292473 -56.2417414837704


but remove " strRandomCovariates  "   it run correct (0<p<1).
eg. Variable Feature Value Coefficient N N.not.0 P.value Q.value
1 after_abx p__Firmicutes_c__Bacilli_o__Lactobacillales_f__Lactobacillaceae_g__Lactobacillus_s__Lactobacillus_rhamnosus after_abxFALSE -0.0215524758018919 651 135 0.00171090103340321 0.0505297178991512
2 after_abx p__Firmicutes_c__Clostridia_o__Clostridiales_f__Lachnospiraceae_g__Dorea_s__Dorea_formicigenerans after_abxFALSE 0.00822216938684956 651 200 0.00253431269849346 0.0696246433408385
3 after_abx p__Firmicutes_c__Clostridia_o__Clostridiales_f__Ruminococcaceae_g__Ruminococcus_s__Ruminococcus_lactaris after_abxFALSE 0.0164624961838384 651 185 0.00599507756818736 0.138159287594136
4 after_abx p__Firmicutes_c__Clostridia_o__Clostridiales_f__Clostridiaceae_g__Clostridium_s__Clostridium_symbiosum after_abxFALSE -0.00555619011918235 651 233 0.00937439777581176 0.185174792428697
5 after_abx p__Firmicutes_c__Clostridia_o__Clostridiales_f__Ruminococcaceae_g__Subdoligranulum_s__Subdoligranulum_unclassified after_abxFALSE 0.0306915498141874 651 556 0.0101818540733212 0.196935285023005
6 after_abx p__Firmicutes_c__Clostridia_o__Clostridiales_f__Lachnospiraceae_g__Roseburia_s__Roseburia_hominis after_abxFALSE 0.00925085951911881 651 243 0.0103582003956873 0.196935285023005

This script come from  Vatanen T, Kostic A D, d’Hennezel E, et al. Variation in microbiome LPS immunogenicity contributes to autoimmunity in humans[J]. Cell, 2016, 165(4): 842-853.

I don't know where is wrong and sorry to bother you.


2018-05-22 18:51 GMT+08:00 Himel Mallick <hmal...@broadinstitute.org>:
Thanks, Juliana. Posting this email thread to the MaAsLin users group for future references.

Himel
HIMEL MALLICK | Postdoctoral Associate
Medical and Population Genetics | Broad Institute of MIT and Harvard
415 Main Street | Cambridge, MA 02142

Luisa Hugerth

unread,
Aug 12, 2019, 2:19:19 AM8/12/19
to MaAsLin-users
Could you please make sure that the Docker image has all the correct versions installed? It's currently on NLME 3.1.131 and behaving as described above (p/q values outside the range [0, 1]).
Thank you!
On Thu, May 17, 2018 at 12:54 PM, juliana soto <dendr...@gmail.com> wrote:
Hi Himel,
Thanks for your response.

I ran it controlling only "Race" and I got output files for each of the variables: Age, Location but also for Race. My question is why still it generates an output for the variable that I am controlling?

Also, the ouput file $PROJECTNAME.txt says: "No significant data found."  Does it mean that I don't have significant data when the model takes into account all the variables? for instance: abundance~Age+Location+Race

Thanks for your help

Juliana



On Thu, May 17, 2018 at 11:19 AM, Himel Mallick <hmal...@broadinstitute.org> wrote:
Hi Juliana,

This seems correct based on what you want to do.

Many thanks,
Himel
HIMEL MALLICK | Postdoctoral Associate
Medical and Population Genetics | Broad Institute of MIT and Harvard
415 Main Street | Cambridge, MA 02142
On Thu, May 17, 2018 at 10:53 AM, juliana soto <dendr...@gmail.com> wrote:

Hello,

I have a question regarding the option "ForcedPredictors " in the config file.


My metadata is location, Age, Race, and Clinical status. I am interested in control the effect of Age and Race in the abundance and evaluate only the one from the variables location and clinical Status. To do that, should I use:

  1. fAllvAll = TRUE
  2. strForcedPredictors = c("Age", "Race")
Thanks,


Juliana

--
You received this message because you are subscribed to the Google Groups "MaAsLin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to maasli...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.




<results.All.txt>
<Results.race.txt>


--
You received this message because you are subscribed to the Google Groups "MaAsLin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to maasli...@googlegroups.com.

Long Nguyen

unread,
Aug 13, 2019, 9:15:16 AM8/13/19
to MaAsLin-users
Hi Luisa,
Thank you for writing. Please note, MaAsLin is no longer actively supported. We encourage you to use the updated version, MaAsLin2.

Long 
Reply all
Reply to author
Forward
0 new messages