Increase Genomic Inflation Factor with subset of GWAS

eke...@gmail.com

unread,

Apr 4, 2016, 1:08:27 AM4/4/16

to plink2-users

Hi, I have a question about the GC factor.

When I ran the GWAS analysis with 1,3M SNPs, the inflation factor was 1.03. But, when I extract 143 SNPs, the GC was 9.3.

I used plink --logistic and included principal component 1, 2, 3 as covatiates.

Why is it different even though it is from same dataset?

My sample population is from very diverse populations (Miami, FL) and I am wondering whether I need to include more PCs to reduce the GC.

Or what is the best way to control Genomic inflation from admixture populaitons?

When i ran the cluster analysis, there is 36 clusters. See the attached MDS plot for our populaitons.

Here is the log file for each analysis.

########## log file 1#############################

1372519 variants loaded from .bim file.

416 people (0 males, 416 females) loaded from .fam.

359 phenotype values present after --pheno.

Using 1 thread (no multithreaded calculations invoked).

--covar: 8 out of 15 covariates loaded.

Before main variant filters, 416 founders and 0 nonfounders present.

Calculating allele frequencies... done.

Total genotyping rate is 0.99782.

1372519 variants and 416 people pass filters and QC.

Among remaining phenotypes, 81 are cases and 278 are controls. (57 phenotypes

are missing.)

Writing logistic model association results to

f:\gwas_analysis_lee\1372519.assoc.logistic

... done.

--adjust: Genomic inflation est. lambda (based on median chisq) = 1.03824.

--adjust values (1372514 variants) written to

######### log file 2 #############################

1372519 variants loaded from .bim file.

416 people (0 males, 416 females) loaded from .fam.

359 phenotype values present after --pheno.

--extract: 143 variants remaining.

Using 1 thread (no multithreaded calculations invoked).

--covar: 8 out of 15 covariates loaded.

Before main variant filters, 416 founders and 0 nonfounders present.

Calculating allele frequencies... done.

Total genotyping rate is 0.997176.

143 variants and 416 people pass filters and QC.

Among remaining phenotypes, 81 are cases and 278 are controls. (57 phenotypes

are missing.)

Writing logistic model association results to

f:\gwas_analysis_lee\143.assoc.logistic

... done.

--adjust: Genomic inflation est. lambda (based on median chisq) = 9.33367.

--adjust values (143 variants) written to

Message has been deleted

Christopher Chang

unread,

Apr 4, 2016, 1:51:45 PM4/4/16

to plink2-users

The genomic inflation factor computation assumes that you're scanning the entire genome. It should be ignored when you're only looking at the most significant SNPs.

Till Andlauer

unread,

Apr 6, 2016, 3:24:53 AM4/6/16

to eke...@gmail.com, plink2-users

Hi,

Your inflation factor is in fact good (as said, only the genome-wide
inflation counts) and I don't think you really have separate clusters
here. It seems like you have a major population with positive values for
C1 and individuals with a partially different ethnic background
spreading along the negative values of C1. You might consider scaling
the axes of your MDS plot so that they show standard deviations and then
remove individuals that exceed a certain threshold, e.g. 4 or 5. For
example, EIGENSTRAT might help you with that. If you want to know how
many components to include as covariates, you need to inspect additional
plots (e.g. C3 vs C4, C5 vs C6 etc). You can also inspect whether
components are associated with your phenotype or with other covariates
and how much variation they explain. If you have a very diverse
population (which, by the way, is not really the case for your plot), it
often works best to use 8 of 10 generated MDS components as covariates
to have 80 % of variation covered.

Best,

Till

On 04.04.16 07:12, eke...@gmail.com wrote:
> I forgot to attach the mds plot.

> --
> You received this message because you are subscribed to the Google
> Groups "plink2-users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to plink2-users...@googlegroups.com
> <mailto:plink2-users...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

eke...@gmail.com

unread,

Apr 8, 2016, 7:59:56 PM4/8/16

to plink2-users, eke...@gmail.com, till_a...@psych.mpg.de

Hi,

Thank you so much for your advice.

I have another question.

I have two GWAS datasets: one for about 400 patients and the other about 750 patients and we know these two population composition is quite different.

First we ran the GWAS analysis for each dataset separately and we got the PC1, 2, 3 for each dataset using EIGENSOFT.

For some reasons, we were not able to find any SNPs with genome-wide significance.

Because we used the same genotyping platform and same questionnaire to collect the outcome and confounding factors,

I am wondering whether if I pool two populations together and run the GWAS again, I can get any significant findings.

In this case when I pool the two gwas dataset, how can I compile two plink bed (fam, ped) files together? can I use plink --bmerge ?

Also do I need to run the EIGENSOFT again to get genomic substructure (PC1, 2, 3...) with a new compiled dataset?

Thank you in advance.

Best,

Eunkyung

> <mailto:plink2-users+unsub...@googlegroups.com>.

Till Andlauer

unread,

Apr 9, 2016, 6:08:33 AM4/9/16

to eke...@gmail.com, plink2-users

Hi Eunkyung,

It is difficult to judge whether it makes sense to merge two data sets
without seeing the data. If they form separate clusters after merging,
I'd rather analyze them separately and conduct a meta-analysis afterwards.
Yes, you can use --bmerge to merge the data sets after some basic QC and
you might also have to flip the orientation of some of the variants.
Then you need to redo QC steps and you definitely need to recalculate
the GRM and to rerun EIGENSOFT.

Best,
Till

> > an email to plink2-users...@googlegroups.com <javascript:>
> > <mailto:plink2-users...@googlegroups.com <javascript:>>.