Highly variable results from BreedR and disagreement with other mixed model solvers!?

96 views
Skip to first unread message

Marnin Wolfe

unread,
Dec 19, 2017, 10:44:42 AM12/19/17
to breedR
Hi again,

I'm having some highly variable results from BreedR / AIREMLF90. I fit a model with two generic components, each with the same design matrix but with different covariance matrices. The two covariances correspond with the kinship matrix for different regions of the genome. I've been careful to ensure these two kinships have a low correlation so they aren't just redundant / collinear. There are also two random effects that are IID (a LOC-YEAR and REP effect). 

Previously, I fit this model with two different R packages (EMMREML and SOMMER). Since I have ~26K observations on ~1500 individuals, those models took days to fit. Both results were identical, giving me a total heritability of ~0.4, split about 0.1 vs. 0.3 among the two genome regions. 

I was really excited that BreedR can fit the model in one hour using a tiny fraction of the RAM used by the other packages I traditionally use. 

Unfortunately, BreedR gave me a total heritability of 1.0 with all of the variance on one of the two components. I proceeded to try a range of initial values, plus different AIREMLF90 convergence options (number of EM-REML steps, conv_crit, tolerance parameter). Results are crazy. Sometimes, the model supposedly converges and I get NA for -2logLik and AIC. But mostly I get h2=1 with all weight on one of the kernels. For another trait, I'm getting the same issue with highly variable logLik at convergence AND highly variable variance components / heritabilities at convergence in addition. All depending on the starting values, convergence criteria and the IID random effects included also. Some of the parameter values I've tried just seem to run forever (are super slow to converge). 

Why should I get such different results from these other packages. EMMREML uses an EMM-REML algorithm. SOMMER uses a Newton-Raphson algorithm. They both agree with each other. No messing around with starting parameters or convergence criteria required. But they take days and lots of RAM.

Any advice would be welcome.

Best regards,

Marnin

Facundo Muñoz

unread,
Dec 19, 2017, 3:24:36 PM12/19/17
to bre...@googlegroups.com

Wow, that looks really bad...

Have you tried method = "em"? AIREML is faster, but it can be very inestable when one or more of the variance components are very small (sensitivity to initial values, NAs, etc.). EM will be slower, but it will eventually converge (giving perhaps a somewhat inflated estimation of the negligible variance).

Also, try to fit each generic component individually, to see whether one of them is causing troubles.

I'm guessing there is numerical inestability somewhere. Look for very small variances and consider removing those effects. Try to re-scale values to work with reasonable values of variances.

If nothing works, I can give a look if you want. We can email privately.

ƒacu.-

--
Report issues at https://github.com/famuvie/breedR/issues
---
You received this message because you are subscribed to the Google Groups "breedR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to breedr+un...@googlegroups.com.
To post to this group, send email to bre...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/breedr/c1473dfa-46e1-4581-8e91-14d88423a9ea%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Marnin Wolfe

unread,
Dec 26, 2017, 6:59:05 PM12/26/17
to breedR
Dear Facundo,

Since your suggestions, I spent a bit of time making more comparisons. In addition, a colleague of mine tried it too. What both of us have found (with totally different data) is that BreedR/BLUPF90 always seem to produce identical results to other R packages when fitting only a single genetic variance component (with a genomic relationship matrix). However, when fitting two kernels, the results become much more unreliable. 

My two genetic variance components have a correlation of only ~0.3. Below is a summary of variance components I got using BreedR vs. SOMMER. Note that the SOMMER model took ~155GB RAM and several days to fit, while BreedR took a few hours and a few hundred MB of RAM. 

BreedR with EM was actually faster than AI. You can see that AI gives crazy high variances (h2~1.0) and EM gets close to the SOMMER answer but just puts the same weight on both components.

I've done additional tests with other traits and varied the starting values with similar disagreements. 

Any ideas would be most welcome? I can provide more information if desired.

Best regards,

Marnin

BreedR

Single Kernel, genome not partitioned (AI algorithm)

TRIAL                                      0.168

REP                                         0.001

GENOTYPE                          1.1897

Residual                                 0.4566


Single Kernel, genome not partitioned (EM algorithm)

TRIAL                0.168400

REP                   0.001159

GENOTYPE       1.19

Residual             0.457


BreedR

Two Kernel, genome partitioned (AI algorithm)

TRIAL              1.7165e-01

REP                 1.1420e-03

GENOTYPE_region1     2.0449e+04

GENOTYPE_region2    1.5510e+09

Residual    4.5507e-01


Two Kernel, genome partitioned (EM algorithm)

TRIAL                0.167600

REP                  0.001152

GENOTYPE_region1                           1.199

GENOTYPE_region2                        1.199

Residual                      0.458600


sommer::mmer()

Two Kernel, genome partitioned (Newton-Raphson algorithm)

TRIAL          0.168424

REP             0.001112

GENOTYPE_region1                0.181813

GENOTYPE_region2             1.014361

Residual                             0.456645


Facundo Muñoz

unread,
Dec 28, 2017, 5:05:36 PM12/28/17
to bre...@googlegroups.com

Dear Martin,

Would you please repeat the tests with each of the following modifications?

1. Remove the REP effect (which is negligible, anyway)

2. Multiply the response by a factor of 10

3. Both 1 and 2


As you can see, I'm trying to confirm some of my earlier hypotheses of numerical inestabilities.

Thanks

        ƒacu.-

Reply all
Reply to author
Forward
0 new messages