The BLINK user manual v1.01 states (P14, 7.4, opimization) that bin_size and bin_ selection can be optimized. Can you please explain how these values are derived and used?
My understanding is the bin_size array is the number of bp in the whole genome and this is based on cumulative positional data for each chromosome. The example 3 50 5 0.5 means 3e+6 bp in the whole genome and bin lengths of 50e+6, 5e+6 and 0.5e+6 are used. Is this correct? If so, how can bin length be greater than the genome?
The bin_selection examples of 3 10 20 30 means a bin_selection array of 3e+6 for the whole genome (as above) with 10, 20 and 30 QTNs (SNPs) from each bin. Is this correct?
If LD (Pearson correlation?) is used to remove QTNs from bins, the bin sizes should be a portion of each chromosome, perhaps up to the LD decay length? As an example, 300-1000 kb. Is this the idea behind bin size optimization?
I have also read the paper in GigaScience which states BLINK only optimizes the number of QTNs.
I also understand LD can also be adjusted with the default value of 0.7 used (LD=0.7). Is this correct? If relaxed to say LD=0.8, more QTNs would be retained and tested in each bin. This makes a large difference to output if the bins are not optimized also.
It seems to me optimization of bin size is critical but my understanding is either completely wrong or not clear enough to make use of this properly. I appreciate any advice or explanation. Many thanks.
Dr Garth M. Sanewski
Principal Horticulturist,
Horticulture & Forestry Science
Department of Agriculture and
Fisheries
----------------------------------------------------------------------------------------------------------
T 07 53811333 Fax 54535901 E
garth.s...@daf.qld.gov.au W www.daf.qld.gov.au
47 Mayers Rd,
Nambour. QLD 4560
SCMC 5083, Nambour QLD 4560