Hi Khoa,
Thank you for the information.
It turns out that this is a bug in the code base.
I can confirm that there is an eQTL mapping to this gene
that SHOULD not be because its RDB score is 5 which is greater than your specified threshold of 3a.
where eqtl$uniqID[annot$RDB<=eqtlMapRDBth] works properly ONLY when the number of rows are the same between the 2 dataframes. When they are not the same, they result in erroneous results.
As a test, I reproduced this:
eqtl <- eqtl[eqtl$uniqID %in% eqtl$uniqID[annot$RDB<=eqtlMapRDBth],]
write.table(eqtl, "original_code.txt", quote=FALSE, row.names=FALSE, sep = "\t")
grep 32975869 original_code.txt | grep LEMD2
6:32975869:A:G eQTLGen eQTLGen_cis_eQTLs ENSG00000161904 A 1.3509e-07 -5.2718 0.00045107129432402 G + 6 32975869 LEMD2 1
Fixed code:
merged_df <- merge(eqtl, annot, by = "uniqID")
filtered_ids <- merged_df$uniqID[merged_df$RDB <= eqtlMapRDBth]
eqtl <- eqtl[eqtl$uniqID %in% filtered_ids, ]
write.table(eqtl, "corrected_code.txt", quote=FALSE, row.names=FALSE, sep = "\t")
grep 32975869 corrected_code.txt | grep LEMD2 #return nothing
So to answer your question:
the issue you observed is due to bugs in the code.
To properly fix this, I will need to identify all the places where this is coded incorrectly and fix them, and push them to production. Be aware that it is the summer holiday at the moment so there is going to be a delay. I will let you know once it is fixed.
Thank you for reporting the issue.
Best,
Tanya