variance estimates by levels of a fixed effect

lefevref...@gmail.com

unread,

Jun 10, 2022, 10:17:27 AM6/10/22

to breedR

Dear Facundo,

sorry for a basic question (just coming back to breedR after a long period not using it...):

what would be the syntax to obtain variance estimates by levels of a fixed effect?

for example, with the globulus data set:

remlf90(fixed = phe_X ~ 1+gg, random = ~ mum, data = globulus)

estimates one general variance term

how would you write to obtain variance estimates for each provenance gg?

(I tried something around random =~0+gg|mum but it does not work)

Thank you,

François

Facundo Muñoz

unread,

Jun 10, 2022, 10:45:11 AM6/10/22

to bre...@googlegroups.com

Cher François,

What you tried would certainly be the most sensible interface. But unfortunately is not implemented that way.

You can do it manually, though, by creating ancillary variables derived from mum, for each level of the fixed effect. You can check a worked out example in these slides [1] (starting from slide 9).

I hope it helps.

ƒacu.-

[1] https://famuvie.github.io/breedR/workshop_biofora/slides/5_GxE.html#9

--
Report issues at https://github.com/famuvie/breedR/issues
---
You received this message because you are subscribed to the Google Groups "breedR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to breedr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/breedr/9a258962-ec32-4e2b-91a1-62b365ecdde3n%40googlegroups.com.

lefevref...@gmail.com

unread,

Jun 10, 2022, 11:05:00 AM6/10/22

to breedR

Many thanks Facundo for such a quick response !!

(and sorry for technical error with the reply)

Cheers,

François

lefevref...@gmail.com

unread,

Jun 16, 2022, 11:33:05 AM6/16/22

to breedR

Hi again Facundo,

actually it seems more complex than we thought and we still need some help.

Here is what we tried to do, inspired from the example your gave for heterogeneous spatial effects (but wrongly inspired apparently) .

Objective: We have 4 populations and want to estimate 4 additive variances, one in each population.

For each population, we created a matrix (M1 to M4) with the same number of lines as the full_dataset, with 3 columns ("self", "sire", "dam") with the actual values in the appropriate lines and zeros in all lines corresponding to the other populations. We did so because we understood that all "substructure matrices" should have the same length as the full dataset.

For each matrix, we built an additive model (am1 to am4) :

am1 <- list(model = 'add_animal',
pedigree = M1,
id = 'self')

Then, we tried the following models :

1) with generic random effect :

fm <- remlf90( fixed = ht.2018 ~ pop4,
random = ~ bloc,
generic = list(pop1=am1, pop2=am2, pop=am3, pop4=am4),
data = full_dataset)

We get the following error message :

Erreur dans validate_generic_element(model = "add_animal", pedigree = list( :
arguments inutilisés (model = "add_animal", pedigree = list(c(1001, 0, 1003, 0, 0, 1006, 0, 1008, 0, 0, 1011, 0, 0, 1014, 0, 0, 0, 0, 0, 1020, 0, 1022, 0, 0, 0, 0, 0, 0, 1029, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1041, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1056, 1057, 1058, 0, 0, 0, 0, 0, 1064, 0, 0, 1067, 1068, 0, 0, 0, 1072, 0, 0, 0, 1076, 0, 0, 1079, 0, 0, 0, 0, 0, 0, 0, 0, 1088, 0, 0, 0, 0, 1093, 0, 0, 0, 0, 0, 0, 0, 1101, 1102, 0, 0, 0, 0, 0, 1108, 0, 0, 1111, 0, 0, 0, 0, 0, 1117, 0, 0, 0, 1121, 0, 0, 0, ...

2) we also tried with genetic random effect :

fm <- remlf90(fixed = ht.2018 ~ pop4,
random = ~ bloc,
genetic = list( pop1=am1, pop2=am2, pop=am3, pop4=am4 ),
data = full_dataset)

We get the following error message :

Erreur dans check_genetic(G0 = list(model = "add_animal", pedigree = list( :
Argument model required in the genetic component.

=> Question 1 :
Is one of these approaches correct? Which type of random effect should we use : generic or genetic?

=> Question 2 :
With this syntax, in each additive model, MX is not a pedigree but a data frame and we assumed that pedigree = MX calls the "build_pedigree" function.
Could there be a problem at this stage because of the zero values in the MX data frame?

We get the following error message when we try to build the pedigree with the build_pedigree function:

Erreur dans (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : les arguments impliquent des nombres de lignes différents : 251, 252
De plus : Message d'avis :
Dans map[pedx[order(ord), 1]] <- 1:nrow(pedx) le nombre d'objets à remplacer n'est pas multiple de la taille du remplacement

If this is the pb we did not find how to solve it...

Many thanks in advance for your help,

François Lefèvre & Alice Copie

lefevref...@gmail.com

unread,

Jun 17, 2022, 4:52:58 AM6/17/22

to breedR

sorry, in my post I simplified the names of the objects compared to the actual R code (i.e. the names in the post are not the actual names used in the R code), but doing so I introduced some confusion in the model (pop4 appears twice in the post, this is not the case in the actual Rcode) => do not care of these kinds of confusion in my post, this is not the source of the errors.

Facundo Muñoz

unread,

Jul 5, 2022, 5:08:56 AM7/5/22

to bre...@googlegroups.com

Dear François,

I can see the difficulty. Let's see if we can work it out.

First, help me understand your intended model. Then we can see whether it can be fit with breedR.

You have 4 populations with 4 different additive-genetic variances. Are they completely independent or rather do they have some ancestors in common ?

If they are completely independent, you would in principle fit separate models. But you may still want to analyse them jointly in order to estimate common effects (e.g. spatial) more accurately. This can be done with breedR using the generic model as in the spatial effect example.

On the other hand, if they have some non-observed ancestors in common, what do the variances represent? Why not estimate the genetic variance of the meta-population? or even estimate the population as a fixed effect?

Some fake data with the actual structure would help providing working code that you can adapt later.

ƒacu.-

To view this discussion on the web visit https://groups.google.com/d/msgid/breedr/1b2d7464-b835-41ce-9efd-95ea06e3cdd6n%40googlegroups.com.

lefevref...@gmail.com

unread,

Jul 6, 2022, 4:15:16 AM7/6/22

to breedR

Dear Facundo,

Thanks for helping !

We have progenies from 4 "populations". Three of them are successive cohorts in a forest resulting from an initial plantation 160years ago: we have the first generation (i.e. the founders G0), the second generation G1 and the third G2. These three populations not only have common (recent) ancestors but also the pollen pool is supposed common to all progenies. The 4th population (TG) is sampled in the initial area of distribution where the founders come from. Therefore G0, G1 and G2 might not have recent ancestors in common with TG. We want to estimate genetic variances within each of these 4 populations (we are not seeking for estimates of between population variances).

All progenies from the 4 populations are compared in a common garden experiment, in a block design (progenies of the 4 populations in each block).

We also set a fixed effect "population" in the model.

Kind regards,

François

PS: the dataset is structured as follows:

self sire dam pop variable block

Facundo Muñoz

unread,

Jul 6, 2022, 6:30:28 AM7/6/22

to bre...@googlegroups.com

Ok, so to summarise, these populations are indeed related, but you don't know how exactly (e.g. there are no sire or dam codes that are ancestors of evaluated individuals from different populations) and you don't want to assume equal genetic variance (perhaps G1 and G2 have undergone selection, or whatever). So you want to consider them independently. Still they are evaluated jointly in the same trial, so that the block effects are common.

This situation is analogous to the example of multi-site trial. So, you can use the same tricks. But there were a few mistakes in your earlier code.

1. You are going to use the `generic` model.

2. You need to provide a list of incidence and covariance matrices (instead of 'add_animal' model specifications as you did before).

3. You can use the internal function `breedR:::additive_genetic_animal()` to build the incidence and covariance matrices for you.

Find below a little example using fake data with the structure you have. I also attach the script for convenience.

Hope it helps

ƒacu.-

library(tidyverse)
library(breedR)
#> Loading required package: sp

set.seed(20220706)
N <- 100L
dat <- tibble(
  self = seq.int(N),
  sire = sample(N + 1:10L, size = N, replace = TRUE),
  dam  = sample(2 * N + 1:10L, size = N, replace = TRUE),
  population = factor(sample(1:4L, size = N, replace = TRUE)),
  variable = rnorm(N),
  block = factor(sample(1:20L, size = N, replace = TRUE))
)

## We want to do the following, for each population.
## I do it here for the first population

## Data for the first population
dat1 <- filter(dat, population == 1L)

## Build the pedigree
ped1 <- build_pedigree(1:3, data = filter(dat, population == 1))
#> Warning in build_pedigree(1:3, data = filter(dat, population == 1)): The
#> pedigree has been recoded. Check attr(ped, 'map').

## Build the additive-genetic effect
add1 <- breedR:::additive_genetic_animal(ped1, dat1$self)

## From which we can extract the incidence and covariance matrices easily
incmat1 <- add1$incidence.matrix
covmat1 <- add1$structure.matrix

## Finally, we need to embed these partial incidence matrices in a larger
## matrix with 0 on the rows corresponding to observations from other
## populations. We can make a custom function for that.
## Let x be a partial incidence matrix, and idx a binary (0/1) or logical vector
## indicating the relevant rows in the full data set.
augment_incmat <- function(x, idx) {
  ans <- Matrix::Matrix(0, nrow = length(idx), ncol = ncol(x))
  ans[as.logical(idx), ] <- x
  return(ans)
}
# augment_incmat(incmat1, dat$population == 1)

## Now do the same for all 4 populations at oncek
pop_dat <- dat |> 
  group_by(population) |> 
  nest() |> 
  arrange(population) |> 
  mutate(
    ped = map(
      data,
      ~suppressWarnings(  ## The pedigree will be recoded, ok.
        build_pedigree(x = 1:3, data = .)
      )
    ),
    add_gen = map2(
      data, ped,
      ~breedR:::additive_genetic_animal(.y, idx = .x$self)
    ),
    partial_incmat = map(add_gen, ~.$incidence.matrix),
    incmat = map2(
      data, partial_incmat,
      ~augment_incmat(.y, dat$population == population)
    ),
    covmat = map(add_gen, ~.$structure.matrix)
  )

# ## Check
# identical(dat1 |> select(-population), pop_dat$data[[1]])
# identical(ped1, pop_dat$ped[[1]])
# identical(add1, pop_dat$add_gen[[1]])
# identical(incmat1, pop_dat$partial_incmat[[1]])
# identical(augment_incmat(incmat1, dat$population == 1), pop_dat$incmat[[1]])
# identical(covmat1, pop_dat$covmat[[1]])


## We can now use these incidence and covariance matrices to feed a
## generic model. Remember to use "EM".
res <- remlf90(
  fixed  = variable ~ population,
  random = ~ block,
  generic = list(gp1 = list(pop_dat$incmat[[1]], pop_dat$covmat[[1]]),
                 gp2 = list(pop_dat$incmat[[2]], pop_dat$covmat[[2]]),
                 gp3 = list(pop_dat$incmat[[3]], pop_dat$covmat[[3]]),
                 gp4 = list(pop_dat$incmat[[4]], pop_dat$covmat[[4]])),
  data = dat,
  method = 'em'
)
#> Using default initial variances given by default_initial_variance()
#> See ?breedR.getOption.

summary(res)
#> Formula: variable ~ 0 + population + block 
#>    Data: dat 
#>  AIC     BIC logLik
#>  306 unknown   -147
#> 
#> Parameters of special components:
#> 
#> 
#> Variance components:
#>          Estimated variances
#> block               0.000604
#> gp1                 0.732900
#> gp2                 0.005950
#> gp3                 0.636000
#> gp4                 0.580000
#> Residual            0.730600
#> 
#> Fixed effects:
#>                  value   s.e.
#> population.1 -0.195512 0.3021
#> population.2 -0.076930 0.1552
#> population.3 -0.166961 0.2827
#> population.4  0.019729 0.2925

^{Created on 2022-07-06 by the reprex package
(v2.0.1)}

To view this discussion on the web visit https://groups.google.com/d/msgid/breedr/02fed69e-1b07-480f-881d-de38e67168afn%40googlegroups.com.

example.R

lefevref...@gmail.com

unread,

Sep 12, 2022, 7:53:15 AM9/12/22

to breedR

Dear Facundo,

We are sorry for the delayed answer, we got a little caught up in our analysis. The script you sent works perfectly and we were able to achieve what we wanted.

Thank you for your useful help!

Kind regards,

François Lefèvre & Alice Copie

Facundo Muñoz

unread,

Sep 12, 2022, 7:59:41 AM9/12/22

to bre...@googlegroups.com

Great! thanks for the feedback.

ƒacu.-

To view this discussion on the web visit https://groups.google.com/d/msgid/breedr/f0b11a51-6b9e-4d56-85b4-b5dc19d652cen%40googlegroups.com.

Reply all

Reply to author

Forward