Hello,
I have a question regarding the option "ForcedPredictors " in the config file.
My metadata is location, Age,
Race, and Clinical status. I am interested in control the effect of Age
and Race in the abundance and evaluate only the one from the variables
location and clinical Status. To do that, should I use:
JulianaBest regards,Hi Himel,Thanks a lot for this explanation. It is extremely useful to understand how MaAsLin works. Now, I feel more confident about the results I got from the analysis.On Thu, May 17, 2018 at 6:48 PM, Himel Mallick <hmal...@broadinstitute.org> wrote:Hi Juliana,From the analysis standpoint, the variables we want to adjust for are those that we are not primarily interested in but we want to include in the model in order to ‘cancel out’ their effect on the primary variables of interest. In that sense, you are absolutely right that one would not expect to see these variables in the output file. But if we carefully look at the statistical model under the hood, those variables are not estimated separately or independently. In fact, they are treated exactly like the primary variables in the model and all the variables are jointly estimated to produce individual coefficient estimates and p-values. That being said, in most practical scenarios, we are not interested in anything beyond our primary variables of interest. This viewpoint is incorporated in many statistical softwares that allow you to specify your main and adjusting variables separately so that only main variables are included in the output table. MaAsLin does not separate these two for a couple of reasons: (i) it accommodates the general case of multivariable analysis when users are interested in all the variables, and (ii) users can always subset their output file to the specific variables they are interested in. Note that, these two philosophical differences of reporting significant associations do not undermine the individual p-values and coefficients as they are essentially coming from the exact same multivariable model.Sorry for the long explanation but a short version for your purpose is: you can ignore the variables that you are not interested in but want to adjust for and that does not affect your conclusions regarding the main variables of interest. Hope this helps.Many thanks,HimelJulianaThanks a lot Himel and sorry to bother you with many questions!!!!it does, thanks!!!Another question: I ran the same variables but in this case I used another abundance table from different samples and I got a list of taxa wit significant associations in the ouput file $PROJECTNAME.txt.If this output is the result of the interaction between the effects of the variables (abundance~Age+Location+Race) but fixing the effect of Race, why I got results with this variable (3 significant associations)? My understanding is that if I fix a variable, its effects should be "canceled" and not add them to the total significant associationsAlso, I ran it again without controlling "race", to know all vs all affects and I got only one significant association for race. I am a little bit confused with the interpretation of the results.I have attached the two output files, one controlling race (Results.race) and the other one without restrictions (results.All)This is the config file I am using:
Matrix: Metadata
Delimiter: TAB
Name_Row_Number: 1
Name_Column_Number: 1
Read_PCL_Rows: 2-4
Read_PCL_Columns: Q101-
Matrix: Abundance
Delimiter: TAB
Name_Row_Number: 1
Name_Column_Number: 1
Read_PCL_Rows: 5-
Read_PCL_Columns: Q101-
fAllvAll = TRUE
strForcedPredictors = "race"On Thu, May 17, 2018 at 1:52 PM, Himel Mallick <hmal...@broadinstitute.org> wrote:Hi Juliana,Your intuition about "No significant data found" is correct. To answer your other question, MaAsLin by default fits a multivariable linear model and returns output for each significant feature-metadata pairs from the linear model. For your specific example, MaAsLin is fitting the following per-feature model: Feature ~ Age + Location + Race. Therefore, it will return significant results per metadata, including Race. Let me know if that answers your question.Many thanks,HimelHIMEL MALLICK | Postdoctoral Associate
Medical and Population Genetics | Broad Institute of MIT and Harvard
415 Main Street | Cambridge, MA 02142
Email: hmallick@broadinstitute.org
Website: https://www.hsph.harvard.edu/himel-mallick/On Thu, May 17, 2018 at 12:54 PM, juliana soto <dendr...@gmail.com> wrote:I ran it controlling only "Race" and I got output files for each of the variables: Age, Location but also for Race. My question is why still it generates an output for the variable that I am controlling?Hi Himel,Thanks for your response.
Also, the ouput file $PROJECTNAME.txt says: "No significant data found." Does it mean that I don't have significant data when the model takes into account all the variables? for instance: abundance~Age+Location+Race
Thanks for your helpJulianaOn Thu, May 17, 2018 at 11:19 AM, Himel Mallick <hmal...@broadinstitute.org> wrote:Hi Juliana,This seems correct based on what you want to do.Many thanks,HimelHIMEL MALLICK | Postdoctoral Associate
Medical and Population Genetics | Broad Institute of MIT and Harvard
415 Main Street | Cambridge, MA 02142
Email: hmallick@broadinstitute.org
Website: https://www.hsph.harvard.edu/himel-mallick/
--
You received this message because you are subscribed to the Google Groups "MaAsLin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to maaslin-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<results.All.txt><Results.race.txt>
Hello,I have a question regarding the option "strRandomCovariates" in R(v3.5.0). MaAsLin(v0.0.4)Run_Parameters.txtParameters used in the MaAsLin runOptional input read.config file= maaslin_wgs_all/data.confOptional R file=FDR threshold for pdf generation= 0.25Minimum relative abundance= 1e-04Minimum percentage of samples with measurements= 0.1The fence used to define outliers with a quantile based analysis. If set to 0, the Grubbs test was used= 0Ignore if the Grubbs test was not used. The significance level used as a cut-off to define outliers= 0.05These covariates are treated as random covariates and not fixed covariates= subjectID These covariates are treated as random covariates and not fixed covariates= mgx_poolThe type of multiple testing correction used= BHZero inflated inference models were turned on= FALSEFeature selection step= boostStatistical inference step= lmNumeric transform used= asinsqrtQuality control was run= TRUEThese covariates were forced into each model=These features' data were not changed by QC processes=Output verbosity= DEBUGLog file was generated= TRUEData plots were inverted= FALSEIgnore unless boosting was used. The threshold for the rel.inf used to select features= NAAll verses all inference method was used= FALSEIgnore unless penalized feature selection was used. Alpha to determine the type of penalty= 0.95When I set “strRandomCovariates ” ,the p values in result are all negative(<0),cmd "Maaslin.wrapper(metaphlan2_species, metadata, strOutputDIR = "maaslin_wgs_all2", strRandomCovariates = c("subjectID","mgx_pool"))" (use library("microbiomics"))eg. Variable Feature Value Coefficient N N.not.0 P.value Q.value1 after_abx p__Firmicutes_c__Bacilli_o__Lactobacillales_f__Lactobacillaceae_g__Lactobacillus_s__Lactobacillus_rhamnosus after_abxFALSE -0.0201593470101126 651 135 -3.05452881162841 -238.2532473070162 after_abx p__Firmicutes_c__Clostridia_o__Clostridiales_f__Clostridiaceae_g__Clostridium_s__Clostridium_symbiosum after_abxFALSE -0.00555081353636654 651 233 -2.61083393129052 -130.1992921145213 after_abx p__Proteobacteria_c__Gammaproteobacteria_o__Enterobacteriales_f__Enterobacteriaceae_g__Klebsiella_s__Klebsiella_unclassified after_abxFALSE -0.00390176332187698 651 195 -2.56863524152464 -124.0283873764764 after_abx p__Firmicutes_c__Clostridia_o__Clostridiales_f__Clostridiaceae_g__Clostridium_s__Clostridium_nexile after_abxFALSE -0.0111570950512306 651 218 -2.5535799653611 -119.5075423788995 after_abx p__Actinobacteria_c__Actinobacteria_o__Bifidobacteriales_f__Bifidobacteriaceae_g__Bifidobacterium_s__Bifidobacterium_breve after_abxFALSE -0.0058402594963034 651 437 -2.16024373367262 -78.23168378371556 after_abx p__Bacteroidetes_c__Bacteroidia_o__Bacteroidales_f__Bacteroidaceae_g__Bacteroides_s__Bacteroides_xylanisolvens after_abxFALSE -0.0175852736236893 651 189 -1.84884094292473 -56.2417414837704but remove " strRandomCovariates " it run correct (0<p<1).eg. Variable Feature Value Coefficient N N.not.0 P.value Q.value1 after_abx p__Firmicutes_c__Bacilli_o__Lactobacillales_f__Lactobacillaceae_g__Lactobacillus_s__Lactobacillus_rhamnosus after_abxFALSE -0.0215524758018919 651 135 0.00171090103340321 0.05052971789915122 after_abx p__Firmicutes_c__Clostridia_o__Clostridiales_f__Lachnospiraceae_g__Dorea_s__Dorea_formicigenerans after_abxFALSE 0.00822216938684956 651 200 0.00253431269849346 0.06962464334083853 after_abx p__Firmicutes_c__Clostridia_o__Clostridiales_f__Ruminococcaceae_g__Ruminococcus_s__Ruminococcus_lactaris after_abxFALSE 0.0164624961838384 651 185 0.00599507756818736 0.1381592875941364 after_abx p__Firmicutes_c__Clostridia_o__Clostridiales_f__Clostridiaceae_g__Clostridium_s__Clostridium_symbiosum after_abxFALSE -0.00555619011918235 651 233 0.00937439777581176 0.1851747924286975 after_abx p__Firmicutes_c__Clostridia_o__Clostridiales_f__Ruminococcaceae_g__Subdoligranulum_s__Subdoligranulum_unclassified after_abxFALSE 0.0306915498141874 651 556 0.0101818540733212 0.1969352850230056 after_abx p__Firmicutes_c__Clostridia_o__Clostridiales_f__Lachnospiraceae_g__Roseburia_s__Roseburia_hominis after_abxFALSE 0.00925085951911881 651 243 0.0103582003956873 0.196935285023005This script come from Vatanen T, Kostic A D, d’Hennezel E, et al. Variation in microbiome LPS immunogenicity contributes to autoimmunity in humans[J]. Cell, 2016, 165(4): 842-853.
I don't know where is wrong and sorry to bother you.
2018-05-22 18:51 GMT+08:00 Himel Mallick <hmal...@broadinstitute.org>:
Thanks, Juliana. Posting this email thread to the MaAsLin users group for future references.Himel
HIMEL MALLICK | Postdoctoral Associate
Medical and Population Genetics | Broad Institute of MIT and Harvard
415 Main Street | Cambridge, MA 02142
On Thu, May 17, 2018 at 12:54 PM, juliana soto <dendr...@gmail.com> wrote:
I ran it controlling only "Race" and I got output files for each of the variables: Age, Location but also for Race. My question is why still it generates an output for the variable that I am controlling?Hi Himel,Thanks for your response.
Also, the ouput file $PROJECTNAME.txt says: "No significant data found." Does it mean that I don't have significant data when the model takes into account all the variables? for instance: abundance~Age+Location+Race
Thanks for your helpJuliana
On Thu, May 17, 2018 at 11:19 AM, Himel Mallick <hmal...@broadinstitute.org> wrote:
Hi Juliana,This seems correct based on what you want to do.Many thanks,Himel
HIMEL MALLICK | Postdoctoral Associate
Medical and Population Genetics | Broad Institute of MIT and Harvard
415 Main Street | Cambridge, MA 02142
On Thu, May 17, 2018 at 10:53 AM, juliana soto <dendr...@gmail.com> wrote:
Hello,
I have a question regarding the option "ForcedPredictors " in the config file.
My metadata is location, Age, Race, and Clinical status. I am interested in control the effect of Age and Race in the abundance and evaluate only the one from the variables location and clinical Status. To do that, should I use:
Thanks,
- fAllvAll = TRUE
- strForcedPredictors = c("Age", "Race")
Juliana
--
You received this message because you are subscribed to the Google Groups "MaAsLin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to maasli...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<results.All.txt><Results.race.txt>
--
You received this message because you are subscribed to the Google Groups "MaAsLin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to maasli...@googlegroups.com.