glm sample size vs OBS_CT

97 views
Skip to first unread message

Mariona Bustamante

unread,
May 29, 2023, 5:24:05 PM5/29/23
to plink2...@googlegroups.com
Dear all,

I am running glm with plink2. See code below:

PLINK v2.00a3.6 AVX2 (14 Aug 2022)             www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /PROJECTES/HELIX_OMICS/analyses/GWAS_chemical_exposure_ZB/results/phthalates_mb/mecpp/mecpp.log.
Options in effect:
  --bfile /PROJECTES/HELIX_OMICS/data_final/gwas/child/8y/GSA_QC3.HRCimp_20210615/HELIX.impQC.rs.05.EUR
  --covar /PROJECTES/HELIX_OMICS/analyses/GWAS_chemical_exposure_ZB/db/HELIX_phtalates_cov_plink.txt
  --covar-name PC1, PC2, PC3, PC4, PC5, PC6, PC7, PC8, PC9, PC10, e3_sex, hs_child_age_days_None
  --covar-variance-standardize PC1, PC2, PC3, PC4, PC5, PC6, PC7, PC8, PC9, PC10, hs_child_age_days_None
  --glm
  --no-sex
  --out /PROJECTES/HELIX_OMICS/analyses/GWAS_chemical_exposure_ZB/results/phthalates_mb/mecpp/mecpp
  --pheno /PROJECTES/HELIX_OMICS/analyses/GWAS_chemical_exposure_ZB/db/HELIX_phtalates_cov_plink.txt
  --pheno-name hs_mecpp_cadj_Log2

Start time: Mon May 29 22:57:15 2023
515269 MiB RAM detected; reserving 257634 MiB for main workspace.
Using up to 72 threads (change this with --threads).
1155 samples (0 females, 0 males, 1155 ambiguous; 1155 founders) loaded from
/PROJECTES/HELIX_OMICS/data_final/gwas/child/8y/GSA_QC3.HRCimp_20210615/HELIX.impQC.rs.05.EUR.fam.
4614947 variants loaded from
/PROJECTES/HELIX_OMICS/data_final/gwas/child/8y/GSA_QC3.HRCimp_20210615/HELIX.impQC.rs.05.EUR.bim.
1 quantitative phenotype loaded (1044 values).
12 covariates loaded from /PROJECTES/HELIX_OMICS/analyses/GWAS_chemical_exposure_ZB/db/HELIX_phtalates_cov_plink.txt.
--covar-variance-standardize: 1 covariate transformed.
Calculating allele frequencies... done.
--glm linear regression on phenotype 'hs_mecpp_cadj_Log2': 8%...

When I read the results in R, I get the following table:
#CHROM    POS           ID REF ALT A1          TEST OBS_CT       BETA
1:      1 752721 1:752721:A:G   G   A  A           ADD    797  0.0259112
2:      1 752721 1:752721:A:G   G   A  A e3_sex=female    797 -0.0352766
3:      1 752721 1:752721:A:G   G   A  A           PC1    797  4.1562400
4:      1 752721 1:752721:A:G   G   A  A           PC2    797 11.8837000
5:      1 752721 1:752721:A:G   G   A  A           PC3    797  0.3141430
6:      1 752721 1:752721:A:G   G   A  A           PC4    797  2.8399700

I would like to know why OBS_CT (797) is different from the initial sample size (1044). I have checked and I do not have any missing values in the covariates.

I have read this explanation from PLINK but I do not understand it: "Finally, if PLINK 2 determines that any samples and covariates are irrelevant to all regressions (e.g. a covariate could be zero-valued for all but one sample), they are removed before any variants are processed. You can use the 'pheno-ids' modifier to make PLINK 2 report the remaining samples to (per-phenotype) .id files. (When the sample set changes on chrX or chrY, .x.id and/or .y.id files are also written.)"

Thank you in advance,

Mariona

This message is intended exclusively for its addressee and may contain information that is CONFIDENTIAL and protected by professional privilege. If you are not the intended recipient you are hereby notified that any dissemination, copy or disclosure of this communication is strictly prohibited by law. If this message has been received in error, please immediately notify us via e-mail and delete it.

DATA PROTECTION. We inform you that your personal data, including your e-mail address and data included in your email correspondence, are included in the ISGlobal Foundation files. Your personal data will be used for the purpose of contacting you and sending information on the activities of the above foundations. You can exercise your rights of access, rectification, cancellation and opposition by contacting the following address: lo...@isglobal.org. ISGlobal Privacy Policy at www.isglobal.org.

-----------------------------------------------------------------------------------------------------------------------------

CONFIDENCIALIDAD. Este mensaje y sus anexos se dirigen exclusivamente a su destinatario y puede contener información confidencial, por lo que la utilización, divulgación y/o copia sin autorización está prohibida por la legislación vigente. Si ha recibido este mensaje por error, le rogamos lo comunique inmediatamente por esta misma vía y proceda a su destrucción.

PROTECCIÓN DE DATOS. Sus datos de carácter personal utilizados en este envío, incluida su dirección de e-mail, forman parte de ficheros de titularidad de la Fundación ISGlobal  para cualquier finalidades de contacto, relación institucional y/o envío de información sobre sus actividades. Los datos que usted nos pueda facilitar contestando este correo quedarán incorporados en los correspondientes ficheros, autorizando el uso de su dirección de e-mail para las finalidades citadas. Puede ejercer los derechos de acceso, rectificación, cancelación y oposición dirigiéndose a lo...@isglobal.org . Política de privacidad en www.isglobal.org.

Mariona Bustamante

unread,
May 29, 2023, 7:07:18 PM5/29/23
to plink2...@googlegroups.com
Hi
I think I have found the problem. The database contained many variables and one of them had a level with a "-". When I have eliminated this variable from the database (I did not use it for the analyses) and also changed --linear for --glm, it has worked. Not I got the 1044 in the OBS_CT.
Regards,
Mariona
Reply all
Reply to author
Forward
0 new messages