pgenlibr: what is the fastest way to get the index of a variant by variant-ID?
164 views
Skip to first unread message
N.
unread,
Jan 24, 2021, 3:37:55 PM1/24/21
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Sign in to report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to plink2-dev
Hi,
I have been using pgenlibr in R to load specific variants from pgen files, however it requires me to load(e.g. fread) the entire pvar to retrieve the index of the variant since it's a txt file, which is the bottleneck for speed, since the pvar files are quite large.
Is there a more efficient way to know the index of a variant by name, without loading the entire pvar file?
Thanks,
Niek
Christopher Chang
unread,
Jan 24, 2021, 4:14:14 PM1/24/21
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Sign in to report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to plink2-dev
How are you reading the .pvar? pgenlibr includes its own .pvar loader which should be pretty fast, especially if the file is BGZF- or Zstd-compressed, and extraneous columns (e.g. INFO) have been removed from the .pvar.
If that isn't enough, you'd need to write your own software for this. It can be done by constructing a name-based index.
N.
unread,
Jan 24, 2021, 5:42:05 PM1/24/21
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Sign in to report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to plink2-dev
that is great, could you post an example?
I am now using something like this:
snps <- c("rs1234","rs54123")
i <- which( fread(f.pvar,select = 3)$ID %in% snps)
Is there a way to use pgenlibr to query the index using the ID?
Thanks so much
Christopher Chang
unread,
Jan 24, 2021, 9:08:27 PM1/24/21
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Sign in to report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to plink2-dev
I've added a pgenlibr::GetVariantsById() function for this purpose (note that it returns a list, since there can be more than one index corresponding to a single ID). It'll be slow the first time you call it (since that's when it constructs the string -> ID lookup table), but subsequent queries are fast.
N.
unread,
Jan 25, 2021, 4:22:22 AM1/25/21
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Sign in to report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message