K Matrix Adjustments

497 views
Skip to first unread message

Aaron Jackson

unread,
Sep 14, 2011, 1:49:13 PM9/14/11
to tas...@googlegroups.com

Hello,

                I have a question  on how to correctly adjust kinship values for a tassel matrix.

On this shortened example when we run the Loiselle kinship matrix on Spagedi our output is :

 

ALL LOCI

2169

2490

7155

7404

9032

9049

9979

2169

 

-0.0252

-0.3345

-0.2807

0.0802

0.211

-0.0273

2490

-0.0252

 

-0.2681

-0.2214

0.1174

-0.0659

0.0148

7155

-0.3345

-0.2681

 

0.5775

-0.3243

-0.26

-0.2637

7404

-0.2807

-0.2214

0.5775

 

-0.3315

-0.3107

-0.2938

9032

0.0802

0.1174

-0.3243

-0.3315

 

-0.0181

0.1187

9049

0.211

-0.0659

-0.26

-0.3107

-0.0181

 

-0.0088

9979

-0.0273

0.0148

-0.2637

-0.2938

0.1187

-0.0088

 

 

 

According  to the manual the blanks are changed into 2 and the negative values are assigned a value of Zero.

If this is done we get Matrix 2:

 

Matrix 2








 

 







2169

2

0

0

0

0.0802

0.211

0

2490

0

2

0

0

0.1174

0

0.0148

7155

0

0

2

0.5775

0

0

0

7404

0

0

0.5775

2

0

0

0

9032

0.0802

0.1174

0

0

2

0

0.1187

9049

0.211

0

0

0

0

2

0

9979

0

0.0148

0

0

0.1187

0

2

 

In the older manual you warn about problems that can occur when the data is adjusted and the off diagonals are not changed.  Would it be better to assign the diagonals a value of 1 and then multiply the entire matrix by 2 so that the off diagonals are adjusted?

 

ALL LOCI

2169

2490

7155

7404

9032

9049

9979










2169

1

-0.0252

-0.3345

-0.2807

0.0802

0.211

-0.0273


2169

2

-0.0504

-0.669

-0.5614

0.1604

0.422

-0.0546

2490

-0.0252

1

-0.2681

-0.2214

0.1174

-0.0659

0.0148


2490

-0.0504

2

-0.5362

-0.4428

0.2348

-0.1318

0.0296

7155

-0.3345

-0.2681

1

0.5775

-0.3243

-0.26

-0.2637

  X  2   =

7155

-0.669

-0.5362

2

1.155

-0.6486

-0.52

-0.5274

7404

-0.2807

-0.2214

0.5775

1

-0.3315

-0.3107

-0.2938


7404

-0.5614

-0.4428

1.155

2

-0.663

-0.6214

-0.5876

9032

0.0802

0.1174

-0.3243

-0.3315

1

-0.0181

0.1187


9032

0.1604

0.2348

-0.6486

-0.663

2

-0.0362

0.2374

9049

0.211

-0.0659

-0.26

-0.3107

-0.0181

1

-0.0088


9049

0.422

-0.1318

-0.52

-0.6214

-0.0362

2

-0.0176

9979

-0.0273

0.0148

-0.2637

-0.2938

0.1187

-0.0088

1


9979

-0.0546

0.0296

-0.5274

-0.5876

0.2374

-0.0176

2

 

Followed by changing the negative values into 0 as is done in matrix 3:

 

Matrix 3








2169

2

0

0

0

0.1604

0.422

0

2490

0

2

0

0

0.2348

0

0.0296

7155

0

0

2

1.155

0

0

0

7404

0

0

1.155

2

0

0

0

9032

0.1604

0.2348

0

0

2

0

0.2374

9049

0.422

0

0

0

0

2

0

9979

0

0.0296

0

0

0.2374

0

2

 

 

We have tried running our data both ways (matrix 2 versus matrix 3 method) and get slightly different results.  Which way of generating the matrix makes the most sense, the method used with Matrix 2 or by multiplying everything by 2 as in matrix 3?

 

Thanks,

Aaron

Zhiwu Zhang

unread,
Sep 14, 2011, 2:28:00 PM9/14/11
to tas...@googlegroups.com

Dear Aaron,

 

Thank you very much for making such nice presentation on your questions. Here are some thoughts that may help to construct the matrix.

 

1. The coefficient matrix used for mixed model is twice coancestry (kinship). The diagonals of the coefficient matrix equivalent to 1 + inbreeding coefficient. For inbred, the diagonals are 2. In condition that the estimate from Spagedi is kinship, your matrix 2 does the right thing.

2. The key for the condition is to set the base. The ideal base is a population where no body correlated. Obvious this would not be true biologically. It only means some time ago the population size is sufficient that there is no difference on kinship among individuals.

3. It is debatable to consider all the negatives the same and set them to zero. In this case,  the base is the center of the negatives and the differences are completely ignored.

4. An good alternative is to set the least kinship as the base. This means to move every elements above 0.

 

Hope this help,

 

Zhiwu Zhang

 

From: tas...@googlegroups.com [tas...@googlegroups.com] on behalf of Aaron Jackson [osati...@gmail.com]
Sent: Wednesday, September 14, 2011 1:49 PM
To: tas...@googlegroups.com
Subject: [TASSEL-Group] K Matrix Adjustments

--
You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To post to this group, send email to tas...@googlegroups.com.
To unsubscribe from this group, send email to tassel+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/tassel?hl=en.

Peter Bradbury

unread,
Sep 15, 2011, 9:42:07 AM9/15/11
to TASSEL - Trait Analysis by Association, Evolution and Linkage
There has been a lot of research directed at methods for estimating an
IBD (identity by descent) relationship matrix from markers. Using
marker information directly gives you an IBS (identity by state).
Using IBS information to infer IBD has been the subject of much
research. The different methods can have a big impact on the estimate
of genetic variance yet have only a small impact on model fit and
marker testing for association analysis. The reason is that the model
is fitting A, the additive genetic relationship matrix (or K or 2*K,
depending on how you define the terms), times the additive genetic
variance. If you multiply all the values in A by a constant C then you
just end up dividing the estimate of genetic variance by C. Some of
the different methods of estimating A differ mainly in how it is
scaled. For example, if you use K = matrix of coefficients of
relationship vs. A = 2*K, the only difference will be that the
estimate of the additive genetic variance will change.

Habier et al, 2007, Genetics 177: 2389–2397, show that the if X is the
marker incidence matrix that XX' times a scaling factor approximates
A. This is a somewhat different approach to justify using markers
directly as opposed to trying to directly estimate IBD.

Stich et al. 2008, Genetics 178: 1745–1754, investigate the effect of
re-scaling A and setting negative numbers to zero. Rescaling A alone
had little impact on model fit. Setting some terms to zero improved
model fit. The observation that using zero for estimates of
relationship for the more distant relatives can improve fit is only
empirical. I know of no theoretical work explaining why that is so.

Other useful references include:
GCTA (Yang, J et al. (2011) Am. J. Hum. Gen. 88:76-82). GCTA is
software from the Peter Visscher lab, which calculates a kinship
matrix among other things.

COANCESTRY: Wang, J. (2011) COANCESTRY: a program for simulating,
estimating and analysing relatedness and inbreeding coefficients.
Molecular Ecology Resources 11: 141–145

CoCoa: Steven Maenhout et al. (2009). CoCoa: a software tool for
estimating the coefficient of coancestry from multilocus genotype
data. Bioinformatics 25: 2753–2754.

I am providing no guidance on which is the best approach. My
experience though is that the impact of your choice on the marker
tests is likely to be small.

Another thing to pay attention to is whether the kinship matrix that
you calculate is positive semi-definite. For some software (e.g. EMMA/
R and GAPIT), it must be. TASSEL gets around the problem by only
considering values for the variances (genetic and error) for which the
likelihood function is defined.

Peter Bradbury
Reply all
Reply to author
Forward
0 new messages