Breaking up large data sets in linkage map creation?

583 views
Skip to first unread message

Elisa Zhang

unread,
Jun 6, 2012, 2:43:37 PM6/6/12
to R/qtl discussion
Hi all,

I have a very large F2 cross dataset I am using to build a linkage
map. Following "Genetic map construction with R/QTL", I've gone
through all the recommended cleaning steps and have a final cross
object of ~9000 markers and 51 individuals. formLinkageGroups() put
the markers into about 20 different linkage groups (and a number of
singletons), which is what we expect from this species.

However, I've found that ordering the markers on each linkage group is
very time-consuming. I am fortunate to have access to a large computer
cluster to run these commands, but ordering the markers on a linkage
group which had ~300 markers took 2.5 hours.

Considering I have 17 more linkage groups which I need to order the
markers of, including 1 linkage group with ~1000 markers on it, I'm
wondering if it's possible to split up my current cross object into 17
different cross objects, each containing 1 linkage group. Then, I
could submit parallel jobs to our cluster to order markers on
different linkage groups at the same time, and once each linkage group
is done ordering, I would hope to be able to put the linkage groups
back together again into one cross object.

From what I've read so far about the internal data structure of R/QTL
in Karl's book, I want to venture a guess that this may be feasible,
although I could be entirely wrong. I mainly don't know what commands
I would use to put the linkage groups back together again...I may have
to try and code something in R myself for that I'm guessing.

Any thoughts or insights on this would be appreciated (especially from
Karl :)

Thanks!

Best,

Elisa

Karl Broman

unread,
Jun 7, 2012, 4:10:29 AM6/7/12
to rqtl...@googlegroups.com
You can work with each chromosome separately and then use switch.order() to re-order the markers in the original cross object to match those you've discovered.

Here's a bit of code, as an example. First, I load the hyper data and re-order markers on chromosome 4. We'll pretend that hyper.new has the marker order we want

data(hyper)
hyper.new <- switch.order(hyper, chr=4, c(3,1,2,5,4,6:nmar(hyper)[4]))

Now I want to get the markers in hyper to match the new order in hyper.new. I use match and then switch.order

m <- match(markernames(hyper.new, chr=4), markernames(hyper, chr=4))
hyper.reordered <- switch.order(hyper, 4, m)

I would also suggest using findDupMarkers() to identify markers with matching genotypes, and then drop.markers to omit them before you do the map construction. Markers with the same data will end up in the same spot anyway, and their presence will just slow everything down.

dupmar <- findDupMarkers(hyper)
hyper.nodup <- drop.markers(hyper, unlist(dupmar))

karl
> --
> You received this message because you are subscribed to the Google Groups "R/qtl discussion" group.
> To post to this group, send email to rqtl...@googlegroups.com.
> To unsubscribe from this group, send email to rqtl-disc+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/rqtl-disc?hl=en.
>

Elisa Zhang

unread,
Jun 18, 2012, 9:06:34 PM6/18/12
to rqtl...@googlegroups.com
Hi Karl,

Running this strategy on my large data set has been working pretty well for me--thanks so much.

I'm running into another problem however. I'm not sure if it has to do with the large numbers of markers in my linkage groups,  but I have been getting maximum intermarker distances > 0.5 even though I set max.rf=0.125. 

Below are some of the commands I ran which illustrates this:

 
aa_clean_lg25<-formLinkageGroups(aa_clean_est, max.rf=0.125, min.lod=7, reorgMarkers=TRUE)

#ordering each chromosome separately as you suggested above:
aa_clean_lg25_14<-subset(aa_clean_lg25, chr=14) 
aa_clean_lg25_14<-orderMarkers(aa_clean_lg25_14, chr=14, error.prob=0.005)

 #chromosome 14:
> summary.map(aa_clean_lg25_14)
        n.mar length ave.spacing max.spacing
14        374  465.3         1.2        99.2
overall   374  465.3         1.2        99.2

#chromosome 15:
            > summary.map(aa_clean_lg25_15)
        n.mar length ave.spacing max.spacing
15        355  302.9         0.9        81.7
overall   355  302.9         0.9        81.7
 

It doesn't happen on all my linkage groups. 

Do you have any thoughts on why this may be happening? max.rf=0.125 should correspond to a maximum intermarker distance of 25cM, if I'm using the Haldane function (the default) in orderMarkers(), right?

I can send you my dataset through a private email if you want. I have attached some plots associated with the cross object which has had its linkage groups formed (aa_clean_lg25).

Thank you so much for your help!

Elisa
aa_clean_lg25lod_rf.jpg
aa_clean_lg25rf_heatmap.jpg
aa_clean_lg25summary.txt

Karl Broman

unread,
Jun 19, 2012, 11:08:59 AM6/19/12
to rqtl...@googlegroups.com
I would interpret these results as indicating some additional problems in marker order that need to be investigated. Genetic map construction cannot be fully automated. Look at the estimated map of the chromosome as well as the inter-marker recombination fractions to decide whether markers need to be moved around or just removed from the linkage group. I also will look at a lot of two-locus genotype tables, as provided by geno.crosstab().

karl
> --
> You received this message because you are subscribed to the Google Groups "R/qtl discussion" group.
> To view this discussion on the web visit https://groups.google.com/d/msg/rqtl-disc/-/0ZMqiRS50goJ.
> To post to this group, send email to rqtl...@googlegroups.com.
> To unsubscribe from this group, send email to rqtl-disc+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/rqtl-disc?hl=en.
> <aa_clean_lg25summary.txt><aa_clean_lg25lod_rf.jpg><aa_clean_lg25rf_heatmap.jpg>

Elisa Zhang

unread,
Jun 19, 2012, 5:07:08 PM6/19/12
to rqtl...@googlegroups.com
Hi Karl,

Thanks so much for the advice. 

Once I find any problematic markers, do you recommend A) removing them from the current cross object which has undergone est.rf(), formLinkageGroups(), and orderMarkers(), or B) removing them from the original spreadsheet and starting the map-creation process over from scratch, (i.e. beginning with read.cross())?

I just ask this because I have tried option A before with a smaller data set but the large intermarker distances still remained, although in that case my max.rf may have been 0.35.

Thank you!

Elisa
> To unsubscribe from this group, send email to rqtl-disc+unsubscribe@googlegroups.com.

Karl Broman

unread,
Jun 20, 2012, 9:34:56 AM6/20/12
to rqtl...@googlegroups.com

I wouldn't start over from scratch, but would remove them from the current cross object or move them to a "linkage group" of problematic markers, like "un":

mycross <- movemarker(mycross, badmarker, "un")

karl
> > To unsubscribe from this group, send email to rqtl-disc+...@googlegroups.com.
> > For more options, visit this group at http://groups.google.com/group/rqtl-disc?hl=en.
> > <aa_clean_lg25summary.txt><aa_clean_lg25lod_rf.jpg><aa_clean_lg25rf_heatmap.jpg>
>
>
> --
> You received this message because you are subscribed to the Google Groups "R/qtl discussion" group.
> To view this discussion on the web visit https://groups.google.com/d/msg/rqtl-disc/-/pxPJWYnaE64J.
> To post to this group, send email to rqtl...@googlegroups.com.
> To unsubscribe from this group, send email to rqtl-disc+...@googlegroups.com.

Elisa Zhang

unread,
Jun 21, 2012, 7:39:08 PM6/21/12
to rqtl...@googlegroups.com
Thanks Karl, and after removing the bad markers, would you re-run est.rf() and orderMarkers()?
> > To unsubscribe from this group, send email to rqtl-disc+unsubscribe@googlegroups.com.
> > For more options, visit this group at http://groups.google.com/group/rqtl-disc?hl=en.
> > <aa_clean_lg25summary.txt><aa_clean_lg25lod_rf.jpg><aa_clean_lg25rf_heatmap.jpg>
>
>
> --
> You received this message because you are subscribed to the Google Groups "R/qtl discussion" group.
> To view this discussion on the web visit https://groups.google.com/d/msg/rqtl-disc/-/pxPJWYnaE64J.
> To post to this group, send email to rqtl...@googlegroups.com.
> To unsubscribe from this group, send email to rqtl-disc+unsubscribe@googlegroups.com.

Karl Broman

unread,
Jun 21, 2012, 10:24:59 PM6/21/12
to rqtl...@googlegroups.com
I wouldn't start over with orderMarkers, but I'd definitely look at the results of est.rf and would also do a bunch with ripple. 

karl
To view this discussion on the web visit https://groups.google.com/d/msg/rqtl-disc/-/4y1ZoIh_CpgJ.

To post to this group, send email to rqtl...@googlegroups.com.
To unsubscribe from this group, send email to rqtl-disc+...@googlegroups.com.

Elisa Zhang

unread,
Jun 27, 2012, 1:18:32 PM6/27/12
to rqtl...@googlegroups.com, dario riccardo valenzano
Hi Karl,

Thanks for your advice.

On a different note, I am wondering if there is an upper limit to the number of markers on a linkage group that orderMarkers() can order. 

So far, I've been successfully able to order linkage groups with up to 637 markers, using the method you described above. However, I just tried submitting a job on our computer cluster to order the biggest linkage group, one with 1168 markers, but after 13 hours of run-time, I got this error message:

Error in rbind(orders, cbind(left, temp + i - 1, right)) : 
  allocMatrix: too many elements specified
Calls: orderMarkers -> orderMarkers.sub -> summary -> ripple -> rbind
Execution halted

Is there anything you would recommend doing for this linkage group so that I can successfully run orderMarkers()?

Thanks so much--your help is really appreciated.

Elisa




On Wednesday, June 6, 2012 11:43:37 AM UTC-7, Elisa Zhang wrote:

Karl Broman

unread,
Jun 27, 2012, 4:02:34 PM6/27/12
to rqtl...@googlegroups.com
orderMarkers() can take an arbitrarily large number of markers, but you would want to reduce the window argument from 7 to something much smaller (like 2), or set use.ripple=FALSE.

karl
> --
> You received this message because you are subscribed to the Google Groups "R/qtl discussion" group.
> To view this discussion on the web visit https://groups.google.com/d/msg/rqtl-disc/-/t143foXobg8J.

Elisa Zhang

unread,
Nov 12, 2012, 6:35:43 PM11/12/12
to rqtl...@googlegroups.com
Hi Karl,

Is there any way to keep the window argument at 7 and run this command without getting this error message?

Thanks!

Elisa

Karl Broman

unread,
Nov 12, 2012, 11:06:04 PM11/12/12
to rqtl-disc@googlegroups.com discussion
I think the only solution is to work on a computer with more RAM.

karl
> To view this discussion on the web visit https://groups.google.com/d/msg/rqtl-disc/-/WaURJT0CxFgJ.

Elisa Zhang

unread,
Nov 14, 2012, 8:52:57 PM11/14/12
to rqtl...@googlegroups.com, rqtl-disc@googlegroups.com discussion
Hi Karl,

I ran this job on a computer cluster and tried allocating more RAM to it, but it always crashes out after using the same amount of RAM (48Gb), even though the cluster has successfully used greater amounts of RAM in running orderMarkers() .

Interestingly, I've ran orderMarkers() on a different linkage group/data set that had also had >700 markers and it also crashed out at 48Gb with the error.

Elisa

Karl Broman

unread,
Nov 14, 2012, 9:03:05 PM11/14/12
to rqtl...@googlegroups.com
With that many markers in a linkage group, using window=7 just won't work, with the way the software is written. 

karl
To view this discussion on the web visit https://groups.google.com/d/msg/rqtl-disc/-/0M9T5p3pzOoJ.
Reply all
Reply to author
Forward
0 new messages