Hello,
I've encountered 2 bugs while using --R on the current development build (31 Mar.).
First, in `--R debug` there's an error with recording the values for "cluster", such that the text for recoding missing values gets inserted before the last element of the vector of values. E.g.:
COVAR<-NA
CLUSTER <- c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, CLUSTER[CLUSTER==-1] <- NA
0 )
l <- 100
Best I can find, this occurs lines 226-231 of plink_rserve.c where "
CLUSTER[CLUSTER==-1] <- NA" gets written directly to file by fputs() before the last buffer is actually written for the fwrite_checked(). I think replacing the fputs() on line 228 with:
bufptr = memcpya(bufptr, "CLUSTER[CLUSTER==-1] <- NA\n", 27);
would fix this, but I haven't recompiled to test.
The second bug involving tracking blocks of SNPs fed to Rserve in the presence of multiallelics. Specifically, if the number of SNPs input for --R is a multiple of 100 (i.e. RPLUGIN_BLOCK_SIZE) plus 1, and the last two SNPs are a split multiallelic with the same chr/bp position, then results will not be returned for the last SNP. The last SNP's info will be printed to the .auto.R output file, but not the output from the R script.
Sample output from end of .auto.R with 101 SNPs:
3 rs12054164 109224522 G 1005 11
3 rs143394206 109224797 AAC 998 11
3 rs34855976 109224950 CA 1005 11
3 rs34855976:109224950:C:CG 109224950 CG
Compare output with 100 SNPs (same is observed with 102 SNPs):
3 rs12054164 109224522 G 1005 11
3 rs143394206 109224797 AAC 998 11
3 rs34855976 109224950 CA 1005 11
3 rs34855976:109224950:C:CG 109224950 CG 1005 11
I haven't been able to fully parse what's causing this issue, but my hunch is it's related to tracking marker_idx vs. marker_uidx?
Thanks for the help (and thanks for porting the --R interface to plink2 despite all of its rough edges!)
Cheers,