R plugin problems

56 views
Skip to first unread message

Raymond Walters

unread,
Apr 5, 2016, 4:37:50 PM4/5/16
to plink2-dev
Hello,
I've encountered 2 bugs while using --R on the current development build (31 Mar.). 

First, in `--R debug` there's an error with recording the values for "cluster", such that the text for recoding missing values gets inserted before the last element of the vector of values. E.g.:

COVAR<-NA
CLUSTER <- c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, CLUSTER[CLUSTER==-1] <- NA
0 )
l <- 100

Best I can find, this occurs lines 226-231 of plink_rserve.c where "CLUSTER[CLUSTER==-1] <- NA" gets written directly to file by fputs() before the last buffer is actually written for the fwrite_checked(). I think replacing the fputs() on line 228 with:
bufptr = memcpya(bufptr, "CLUSTER[CLUSTER==-1] <- NA\n", 27);
would fix this, but I haven't recompiled to test.


The second bug involving tracking blocks of SNPs fed to Rserve in the presence of multiallelics. Specifically, if the number of SNPs input for --R is a multiple of 100 (i.e. RPLUGIN_BLOCK_SIZE) plus 1, and the last two SNPs are a split multiallelic with the same chr/bp position, then results will not be returned for the last SNP. The last SNP's info will be printed to the .auto.R output file, but not the output from the R script. 

Sample output from end of .auto.R with 101 SNPs:
   3                  rs12054164  109224522    G 1005 11
   3                 rs143394206  109224797  AAC 998 11
   3                  rs34855976  109224950   CA 1005 11
   3   rs34855976:109224950:C:CG  109224950   CG

Compare output with 100 SNPs (same is observed with 102 SNPs):
   3                  rs12054164  109224522    G 1005 11
   3                 rs143394206  109224797  AAC 998 11
   3                  rs34855976  109224950   CA 1005 11
   3   rs34855976:109224950:C:CG  109224950   CG 1005 11

I haven't been able to fully parse what's causing this issue, but my hunch is it's related to tracking marker_idx vs. marker_uidx?

Thanks for the help (and thanks for porting the --R interface to plink2 despite all of its rough edges!)
Cheers,
Raymond

Christopher Chang

unread,
Apr 6, 2016, 11:19:43 AM4/6/16
to plink2-dev
Thanks for the note; I'll try to fix this within the next few days.

Christopher Chang

unread,
Apr 16, 2016, 2:03:22 PM4/16/16
to plink2-dev
The first bug should be fixed in the 16 April development build.

I haven't been able to reproduce the second issue.  Could you send me the .log file from your buggy run, and a script and dataset I can use to replicate?


On Tuesday, April 5, 2016 at 1:37:50 PM UTC-7, Raymond Walters wrote:

Raymond Walters

unread,
Apr 19, 2016, 5:59:40 PM4/19/16
to Christopher Chang, plink2-dev
Hi Chris,
Unfortunately those logs got overwritten by re-runs to work around this issue in the past couple, so will need some time to pull together a toy example with data I can share. I don’t recall the plink log’s showing any errors for those runs though; all reported writing output to .auto.R and completing successfully.

One potentially useful datapoint: while putting together a workaround for this I did see some rare instances of an entire block of 100 SNPs returning in .auto.R with no results (just the 4 fields of SNP information), so it may not be restricted to the last block. Will try to keep an eye on that while assembling the shareable toy example, hopefully later this week.

Cheers,
Raymond



--
You received this message because you are subscribed to a topic in the Google Groups "plink2-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/plink2-dev/QxhKdKmgnZ0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to plink2-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages