Causes of genetic map expansion?

676 views
Skip to first unread message

Ryan McCormick

unread,
Feb 19, 2014, 7:10:49 PM2/19/14
to rqtl...@googlegroups.com
Hi all,

We have a ~8k markers across 400 individuals (plants), ordered by physical position, that we're trying to estimate the genetic map for prior to QTL mapping. The cross is an F4 resulting from repeated selfing, and we're using the BCsFt tools, treating the cross as an BC0F4. Typically, the maps we get at lower marker densities and fewer individuals (e.g. ~1k and ~90 individuals) for F2s and F4s (and others) are generally around the expected 1250 cM for the whole genome. 

For the 8k marker map, the resulting map is greatly expanded when we treat it as an F4, around 4800 cM. The map size is reduced back to around 1250 cM when we treat it as a RIL. The large F4 map is otherwise well behaved in terms of linkage, there aren't any single markers or groups of markers that seem responsible for the expansion. The larger map also seems to map known QTL just fine.

We thought it could be related to genotyping error rates, although we still obtained the large maps using a 3k subset of the highest confidence markers with the highest confidence genotypes. Is it possible that with many individuals and many markers that errors can accumulate and expand the map (maybe that's why setting the heterozygotes to missing when treating as a RIL caused the map size to return to normal)?

Does anyone have any suggestions as to what might cause map expansion? Is it theoretically known that genetic map size is determined by the number of markers and/or the number of individuals? Is this genetic map expansion expected?

Thank you,
Ryan McCormick

Kevin Emerson

unread,
Feb 20, 2014, 11:24:58 AM2/20/14
to rqtl-disc
Ryan,

I have had a similar issue arise when I was using RADseq genotype data (I assume you are using RADseq data as well since I see you on the STACKS google group too! :))

We noticed when we were mapping in Wyeomyia that we had many genotype calls that were suspect.  These were calls where you would have a string of P1 alleles, then a single P2 allele and then another long string of P1 alleles within a single individual along a chromosome (we saw them pretty easily using plot.geno).  If you make the assumption that a double recombination event surrounding only a single marker should be a very rare event, and remove these from the data (the genotype calls, not the loci), things are a bit more as they are expected to be.  We ended up assuming that these single-marker double recombination events represented genotype errors from STACKS rather than an absurdly large number of very-close double recombination events and removed them, leading to a map that made quite a bit more sense.

The code that I used to remove such genotype calls is below if you want to give it a shot (run removeDoubleXO).  You can at least try it and see how many suspect calls there are:

Kevin Emerson


#####

# markersInInterval(cross, chr, min, max)

# returns a list of marker names that fall within a range of locations

# (in cM) along a linkage group (chr). Requires a single linkage group

# output is a character vector

#####


markersInInterval <- function(cross, chr, min, max) {

 names(which(pull.map(cross=cross, chr=chr)[[chr]] < max &

             pull.map(cross=cross, chr=chr)[[chr]] > min))

}


#####

# singleMarkerInInterval(cross, chr, min, max)

# returns the marker name if only a single marker falls in a given

# interval (including the possible case of NAs surrounding marker

# on a linkage group, else returns value FALSE

#####


singleMarkerInInterval <- function(cross, chr, min, max) {

  tmp <- markersInInterval(cross,chr,min,max)  

  val <- ifelse(sum(!is.na(tmp) == 1), tmp[!is.na(tmp)], FALSE)

}

#####

# removeDoubleXO(cross,chr,verbose)

# Takes a cross object, returns a cross object with all double-recombinant

# genotypes removed.  This is useful to determine if the genotyping errors

# from stacks influence the genetic maps.  if verbose = 1 (TRUE), minimal

# reporting happens, if verbose > 1, all changed genotypes are written to

# STDOUT

#####


removeDoubleXO <- function(cross, chr, verbose=TRUE) { 

  if (!missing(chr)) 

      chr <- matchchr(chr, names(cross$geno))

  else chr <- names(cross$geno)

  for (ch in chr) {  # loop through all linkage groups

    if (verbose)

      cat("Starting linkage group",ch,"\n")


    # find all recombination events in cross. xo is a list spanning

    # individuals.  each element of xo is either NULL or a numeric

    # vector with estimated crossover locations

    

    xo <- locateXO(cross, ch)


    # initialize some variables to keep track of the number

    # of changes

    

    total_removed <- 0

    tot_genotypes <- length(cross$geno[[ch]]$data[,]) -

                          sum(is.na(cross$geno[[ch]]$data[,]))


    for (ind in 1:length(xo)) {            # loop through individuals

      if (length(xo[[ind]]) <= 1

        next  # skip individuals with one or fewer recombination events

      # walk along the linkage groups recombination events

      for (location in 2:length(xo[[ind]])) {


        # Determine if there is a single marker between each recombination

        # event and the next (location -1 thru location)

        

        sMar <- singleMarkerInInterval(cross,ch,xo[[ind]][location-1],xo[[ind]][location])

        if (sMar!=FALSE) { # if there are double recombination events

          oldValue <- cross$geno[[ch]]$data[ind,sMar] # original genotype call

          cross$geno[[ch]]$data[ind,sMar] <- NA # assign genotype to NA

          total_removed <- total_removed + 1 # count removed genotypes

          if (verbose>1)

            cat("individual", ind, "marker", sMar, "oldvalue=", oldValue,

                "newvalue = ", cross$geno[[ch]]$data[ind,sMar], "\n")

        }

      }

    }

    if (verbose) {

      cat("Removed",total_removed,"of",tot_genotypes,

          "genotyped markers on linkage group",ch,"\n")

    }

  }

  cross

}



--
You received this message because you are subscribed to the Google Groups "R/qtl discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rqtl-disc+...@googlegroups.com.
To post to this group, send email to rqtl...@googlegroups.com.
Visit this group at http://groups.google.com/group/rqtl-disc.
For more options, visit https://groups.google.com/groups/opt_out.

Ryan McCormick

unread,
Feb 20, 2014, 12:12:19 PM2/20/14
to rqtl...@googlegroups.com
Hi Kevin,

Thanks for your reply and for sharing your code! It is indeed RAD-seq data (although we recently transitioned over to using the GATK to call genotypes).

So does this suggest that the map expansion is due mostly to the accumulation of these errors (which look like close double crossovers), and that, in datasets with larger numbers of markers and individuals, there exist a relatively larger number of these errors which causes more noticeable map expansion?

Does that also suggest that there shouldn't be a strong relationship between the number of markers and individuals and the genetic map size (e.g. an F2 map of 1,000 markers and 100 individuals vs an F2 map of 10,000 markers and 500 individuals)?

Thanks again,
Ryan McCormick

Karl Broman

unread,
Feb 21, 2014, 7:22:30 AM2/21/14
to rqtl...@googlegroups.com
R/qtl also has a function cleanGeno() that can be used for to remove genotypes leading to tight double-crossovers.

I suspect, though, that there may be a problem with the BCsFt code; it may give biased estimates of the genetic map even in the absence of genotyping errors. I'll study the issue.

Map length should be independent of number of markers/individuals.

karl

Kevin Emerson

unread,
Feb 21, 2014, 8:10:14 AM2/21/14
to rqtl-disc
Ryan

Whether there are some issues with the  BCsFt code, it is true that genotyping errors will cause greater expansion in maps with larger numbers of markers.  Each genotyping error will count as two 'false' recombination events, increasing the number of recombinants linearly with the number of genotype errors.

As far as I see it, the genetic map size should only be dependent on the rate of recombination, the actual genome size, and the number of generations allowed for recombination, and not the number of markers considered.

Best,
Kevin

Ryan McCormick

unread,
Feb 21, 2014, 8:45:25 AM2/21/14
to rqtl...@googlegroups.com
Karl and Kevin,

Thank you for the responses! I think this has given me some traction on the problem.

Sincerely,
Ryan McCormick

Karl Broman

unread,
Feb 21, 2014, 9:37:36 AM2/21/14
to rqtl...@googlegroups.com
I agree, though I prefer to think of a genetic map as concerning properties of meiotic recombination and so the map shouldn't be dependent on the number of generations in the cross; that should be something we take into account in the analysis.

karl

Jim Specht

unread,
Feb 21, 2014, 9:58:57 AM2/21/14
to rqtl...@googlegroups.com
Colleagues: I have seen this problem pop up repeatedly when genotyping-by-sequencing is performed to generate "parental alleles for markers in RIL or F2 populations". The GBS output seems to always have a small percentage of these single-base call errors in long strings of parental bases. I have found it necessary to identify and remove these base call errors because their presence greatly expand the map distance in chromosomes wherein they occur, sometimes to the extent that portions of the chromosome cannot even be joined. Jim

K.JK

unread,
Apr 8, 2014, 5:49:33 PM4/8/14
to rqtl...@googlegroups.com
All:

Are there any insights on how this type of data (RAD-seq, GBS, etc.) affects QTL detection? I seem to have ~3x more double crossovers than I would expect, but removing some of these bad markers seems to have little effect on the the location or LOD score.

Also, the cleanGeno() function only work for cross type 'bc'. Is it possible to run this function for 'riself'?

Karl Broman

unread,
Apr 9, 2014, 10:40:48 AM4/9/14
to rqtl...@googlegroups.com
> Are there any insights on how this type of data (RAD-seq, GBS, etc.) affects QTL detection? I seem to have ~3x more double crossovers than I would expect, but removing some of these bad markers seems to have little effect on the the location or LOD score.

If you use some error.prob>0 in calc.genoprob, R/qtl should smooth over errors.

> Also, the cleanGeno() function only work for cross type 'bc'. Is it possible to run this function for 'riself'?

I've revised it to work with any cross with two possible genotypes (i.e., not just 'bc' but also riself, risib, dh, and haploid).

If you have the tools to compile the package, you may be able to install the development version with:

install.packages("devtools")
library(devtools)
install_github("qtl", "kbroman", "devel")

I also placed just the cleanGeno function at the following:

https://gist.github.com/kbroman/10277099

karl

Karl Broman

unread,
Apr 9, 2014, 11:14:07 AM4/9/14
to rqtl...@googlegroups.com
I've also placed Windows and Mac binaries of the development version of the package, with the change to cleanGeno, at http://www.rqtl.org/download/devel

karl

K.JK

unread,
Apr 10, 2014, 9:22:51 PM4/10/14
to rqtl...@googlegroups.com
Thanks Karl. I appreciate that.
Reply all
Reply to author
Forward
0 new messages