Which related matrix, per gene or whole genome, to use, when fit BSLMM to quantitative traits

81 views
Skip to first unread message

Yue Liu

unread,
Jun 14, 2019, 5:12:24 PM6/14/19
to gemma-discussion
Hi,
Thanks very much for the great package. I am planning to predict quantitive traits (such as gene expression) from genetic data, using BSLMM or other models. It requires the input of the related matrix. My question is, what kind of matrix should I use if I fit the model to a particular gene. Should I use the cis-gene SNPs or the whole genome SNPs to create the related matrix. The whole genome SNPs would definitely be more accurate in determining the relatedness of individuals. But since my goal is to fit the model for each gene, isn't the related matrix generated per gene more appropriate? Thanks a lot for your advice on this.

Yue

Pjotr Prins

unread,
Jun 17, 2019, 3:18:16 PM6/17/19
to Yue Liu, gemma-discussion
This is a rather large question. I would suggest to read relevant
papers. Specialized kinship matrices, in general, appear to perform
less well. Think about it, what are you trying to capture?

Pj.

Yue Liu

unread,
Jun 18, 2019, 12:10:03 PM6/18/19
to Pjotr Prins, gemma-discussion
Thanks very much your reply Pj. This is human data. I forget to reply to the google group.

On Tue, Jun 18, 2019 at 10:12 AM Pjotr Prins <pjot...@thebird.nl> wrote:
Try both. Also try running without Kinship correction.

Is it human data?

On Tue, Jun 18, 2019 at 09:56:45AM -0400, Yue Liu wrote:
>    Thanks very much for the reply.
>    I am new to the association studies and GEMMA.
>    We are trying to use GEMMA BSLMM to correlate the SNPs to a
>    quantitative trait, such as gene expression, at per gene base, and then
>    associate the genes to the phenotype.
>    For example, for a gene with 1000 SNPs nearby, we are trying to
>    correlate the SNPs to its expression levels. So we will run GEMMA for
>    each gene that have expression data available. Now when following the
>    GEMMA, there is a step to create the Related Matrix which is used for
>    the following steps. I can use all the Genotype data from all the genes
>    to calculate this Related Matrix, and use the same matrix for every
>    model fitting of each gene. Or I can calculate the Related Matrix per
>    gene base, only use the SNPs near the gene, which are also the SNPs
>    used for the following model fitting of this gene.
>    I am wondering which of the above two options I should take. Thanks a
>    lot for your help.
>    Yue
>
>    On Mon, Jun 17, 2019 at 3:18 PM Pjotr Prins <[1]pjotr...@gmail.com>
> References
>
>    1. mailto:pjotr...@gmail.com

Yue Liu

unread,
Jun 18, 2019, 2:56:16 PM6/18/19
to Pjotr Prins, gemma-discussion
Hi Pjotr,
We found the BMC Genomics paper that used GEMMA BSLMM to predict gene expression using snp data. We would like to perform similar experiments with <200 individuals, on possible 8000 genes. But for each gene, probably only hundreds or thousands SNPs that are nearby the gene, would be used to predict its expression.
Thanks,
Yue
On Tue, Jun 18, 2019 at 12:55 PM Prins <pjot...@thebird.nl> wrote:
How many individuals and what is the 'experiment'?


On Tue, Jun 18, 2019 at 12:09:52PM -0400, Yue Liu wrote:
>    Thanks very much your reply Pj. This is human data. I forget to reply
>    to the google group.
>    On Tue, Jun 18, 2019 at 10:12 AM Pjotr Prins <[1]pjot...@thebird.nl>
>      <[1][2]pjotr...@gmail.com>
>      >    1. mailto:[3]pjotr...@gmail.com
>
> References
>
>    1. mailto:pjot...@thebird.nl
>    2. mailto:pjotr...@gmail.com
>    3. mailto:pjotr...@gmail.com

Yue Liu

unread,
Jun 18, 2019, 3:34:26 PM6/18/19
to Pjotr Prins, gemma-discussion
Thanks a lot for your help on this.
I am still quite new and the study is just in the early stage. We are interested in predicting quantitive traits of a gene, such as gene expression or any quantitive -omics data, using cis-SNPs near a gene. So the goal now is to fit a model, per gene base, and hopefully hundreds of samples are enough to fit a model with only hundreds to thousands of SNPs that are near the gene. Once this be done, then more samples  could have their quant traits by prediction methods, and there might be enough power for association of quant traits to disease traits.
We saw the BMC Genomics paper that used BSLMM for such a task. It also compared it to LMM and LASSO(ENET). When we run the GEMMA example data on BSLMM, it required a related matrix, which made us think about which matrix we should use for the real data.
I guess some possibly similar scenario is this: one is only interested in finding significant snps on a particular region of chr6. He/she planned to only test these snps, using LMM or BSLMM model to a trait. However, He/she actually had the SNPs data for the whole genome, which should provide a more accurate related matrix among samples. Now which set of SNPs should one use, chr6 regional SNPs or whole genome SNPs?

Yue   

On Tue, Jun 18, 2019 at 2:59 PM Pjotr Prins <pjot...@thebird.nl> wrote:
With 200 individuals you won't have enough power though you may find
some 'suggestive' associations. It may be interesting to mine the
mouse/rat data in genenetwork depending on your phenotype and tissue.


On Tue, Jun 18, 2019 at 02:56:04PM -0400, Yue Liu wrote:
>    Hi Pjotr,
>    We found the BMC Genomics paper that used GEMMA BSLMM to predict gene
>    expression using snp data. We would like to perform similar experiments
>    with <200 individuals, on possible 8000 genes. But for each gene,
>    probably only hundreds or thousands SNPs that are nearby the gene,
>    would be used to predict its expression.
>    Thanks,
>    Yue
>    On Tue, Jun 18, 2019 at 12:55 PM Prins <[1]pjot...@thebird.nl> wrote:
>
>      How many individuals and what is the 'experiment'?
>      On Tue, Jun 18, 2019 at 12:09:52PM -0400, Yue Liu wrote:
>      >    Thanks very much your reply Pj. This is human data. I forget to
>      reply
>      >    to the google group.
>      >    On Tue, Jun 18, 2019 at 10:12 AM Pjotr Prins
>      <[1][2]pjot...@thebird.nl>
>      >      <[1][2][3]pjotr...@gmail.com>
>      >      >    1. mailto:[3][4]pjotr...@gmail.com
>      >
>      > References
>      >
>      >    1. mailto:[5]pjot...@thebird.nl
>      >    2. mailto:[6]pjotr...@gmail.com
>      >    3. mailto:[7]pjotr...@gmail.com
>
> References
>
>    1. mailto:pjot...@thebird.nl
>    2. mailto:pjot...@thebird.nl
>    3. mailto:pjotr...@gmail.com
>    4. mailto:pjotr...@gmail.com
>    5. mailto:pjot...@thebird.nl
>    6. mailto:pjotr...@gmail.com
>    7. mailto:pjotr...@gmail.com
Reply all
Reply to author
Forward
0 new messages