Hi,
I’ve noticed that if you run the liftOver executable from a nodejs process, it takes ~800ms longer than running the same process from the command line.
An example:
in.bed contains one line: chr1 1000000 2000000
out.bed and unmapped.bed are empty files before liftOver is run.
> time ./liftOver in.bed hg19ToHg38.over.chain.gz out.bed unmapped.bed
real 0m0.022s
user 0m0.004s
sys 0m0.012s
> node -e "const { execSync } = require('child_process'); console.time('liftover'); execSync('./liftOver in.bed hg19ToHg38.over.chain.gz out.bed unmapped.bed'); console.timeEnd('liftover')"
liftover: 853.078ms
This happens regardless of how many variants are in in.bed (i.e. if there are loads of variants and it takes ~1 second via the command line, it will take ~1.8 seconds via node). I have tried this using node’s execSync, exec, and spawn commands, and all of them result in the same issue. I have tested this with node v10 and v12.
Using spawn it is possible to see that the slowness occurs between the output of “Reading liftover chains” and the output of “Mapping coordinates”:
> node -e "const { spawn } = require('child_process'); let i = 1; console.time('liftover'); console.time('liftover1'); console.time('liftover2'); const proc = spawn('./liftOver', [ 'in.bed', 'hg19ToHg38.over.chain.gz', 'out.bed', 'unmapped.bed' ]); proc.stderr.on('data', data => { console.timeEnd('liftover' + i); i++; console.log(data.toString()); }); proc.on('close', () => console.timeEnd('liftover'));"
liftover1: 5.524ms
Reading liftover chains
liftover2: 845.090ms
Mapping coordinates
liftover: 853.873ms
I was hoping you could figure out what is causing this and release a new version of the liftOver executable which rectifies the issue.
Thanks,
Simon Brent
Dear Simon,
Thank you for using the UCSC Genome Browser and bringing to our attention that the liftOver executable is slow when run via nodejs.
Could you please provide more information about why you are asking for improvements to the liftOver feature? For example, what is the purpose of running liftOver in the nodejs environment instead of a native OS environment? Also, what is the advantage of running the program in the nodejs environment? If we can better understand your objective we may be able to provide better feedback.
To the best of our knowledge, there is nothing in the liftOver program itself that should cause such a delay, and we suspect it has more to do with the details of the node environment. One of our engineers tried running liftOver on nodejs and with a larger input (226K rows) and noted that the overhead of node.js did NOT dominate the time at all:
One thing to try would be to decompress the file hg19ToHg38.over.chain.gz (so the liftOver would be invoked on hg19ToHg38.over.chain; this prevents the liftOver process from spawning its own gunzip child) and/or closing stdin to the spawned process. Another option might be to make a shell script wrapper for liftOver, and invoke that from node.
If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
--
---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/2ed835994c8742b9a7f66d9e071f5065%40sanger.ac.uk.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.
Hi Conner,
Your suggestion of unzipping the chain files has solved the problem, thank you very much! (For reference, I had already tried closing stdin, and it had no effect).
For context, I have a nodejs web server with an endpoint that does liftover for a single position, and relies on the UCSC liftover executable to do this.
Thanks again,
Simon
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/ [groups.google.com].
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/2ed835994c8742b9a7f66d9e071f5065%40sanger.ac.uk [groups.google.com].
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout [groups.google.com].
--
Conner Powell
UCSC Genomics Institute, Quality Assurance and Support Analyst