liftOver executable is slow when run via nodejs

14 views
Skip to first unread message

Simon Brent

unread,
May 22, 2019, 11:28:52 AM5/22/19
to gen...@soe.ucsc.edu

Hi,

 

I’ve noticed that if you run the liftOver executable from a nodejs process, it takes ~800ms longer than running the same process from the command line.

 

An example:

 

in.bed contains one line: chr1 1000000 2000000

out.bed and unmapped.bed are empty files before liftOver is run.

 

> time ./liftOver in.bed hg19ToHg38.over.chain.gz out.bed unmapped.bed

real    0m0.022s

user    0m0.004s

sys     0m0.012s

 

> node -e "const { execSync } = require('child_process'); console.time('liftover'); execSync('./liftOver in.bed hg19ToHg38.over.chain.gz out.bed unmapped.bed'); console.timeEnd('liftover')"

liftover: 853.078ms

 

This happens regardless of how many variants are in in.bed (i.e. if there are loads of variants and it takes ~1 second via the command line, it will take ~1.8 seconds via node). I have tried this using node’s execSync, exec, and spawn commands, and all of them result in the same issue. I have tested this with node v10 and v12.

 

Using spawn it is possible to see that the slowness occurs between the output of “Reading liftover chains” and the output of “Mapping coordinates”:

 

> node -e "const { spawn } = require('child_process'); let i = 1; console.time('liftover'); console.time('liftover1'); console.time('liftover2'); const proc = spawn('./liftOver', [ 'in.bed', 'hg19ToHg38.over.chain.gz', 'out.bed', 'unmapped.bed' ]); proc.stderr.on('data', data => { console.timeEnd('liftover' + i); i++; console.log(data.toString()); }); proc.on('close', () => console.timeEnd('liftover'));"

liftover1: 5.524ms

Reading liftover chains

 

liftover2: 845.090ms

Mapping coordinates

 

liftover: 853.873ms

 

I was hoping you could figure out what is causing this and release a new version of the liftOver executable which rectifies the issue.

 

Thanks,

 

Simon Brent


-- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

Conner Powell

unread,
May 23, 2019, 2:48:04 PM5/23/19
to Simon Brent, gen...@soe.ucsc.edu

Dear Simon,

Thank you for using the UCSC Genome Browser and bringing to our attention that the liftOver executable is slow when run via nodejs.

Could you please provide more information about why you are asking for improvements to the liftOver feature? For example, what is the purpose of running liftOver in the nodejs environment instead of a native OS environment? Also, what is the advantage of running the program in the nodejs environment? If we can better understand your objective we may be able to provide better feedback.

To the best of our knowledge, there is nothing in the liftOver program itself that should cause such a delay, and we suspect it has more to do with the details of the node environment. One of our engineers tried running liftOver on nodejs and with a larger input (226K rows) and noted that the overhead of node.js did NOT dominate the time at all:

hgsql hg38 -BN -e 'select chrom, txStart, txEnd from knownGene' > ~/in.bed

[hgwdev:~> wc -l in.bed
226811 in.bed

[hgwdev:~> time /cluster/home/user/bin/x86_64/liftOver in.bed /gbdb/hg19/liftOver/hg19ToHg38.over.chain.gz out.bed unmapped.bed                                                                                      
Reading liftover chains
Mapping coordinates
2.960u 0.025s 0:03.06 97.3%     0+0k 0+11768io 0pf+0w

This is about 3.06 seconds.

[hgwdev:~> node -e "const { execSync } = require('child_process'); console.time('liftover'); execSync('/cluster/home/user/bin/x86_64/liftOver in.bed /gbdb/hg19/liftOver/hg19ToHg38.over.chain.gz out.bed unmapped.bed'); console.timeEnd('liftover')" 
Reading liftover chains
Mapping coordinates
liftover: 3206.437ms

This is about 3.21 seconds.

The time that liftOver spends on processing large input files will likely be more significant than the unexplained delay from running within node.

One thing to try would be to decompress the file hg19ToHg38.over.chain.gz (so the liftOver would be invoked on hg19ToHg38.over.chain; this prevents the liftOver process from spawning its own gunzip child) and/or closing stdin to the spawned process. Another option might be to make a shell script wrapper for liftOver, and invoke that from node.

If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/2ed835994c8742b9a7f66d9e071f5065%40sanger.ac.uk.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.


--
Conner Powell
UCSC Genomics Institute, Quality Assurance and Support Analyst 

Simon Brent

unread,
May 24, 2019, 12:26:09 PM5/24/19
to gen...@soe.ucsc.edu, Conner Powell

Hi Conner,

 

Your suggestion of unzipping the chain files has solved the problem, thank you very much! (For reference, I had already tried closing stdin, and it had no effect).

 

For context, I have a nodejs web server with an endpoint that does liftover for a single position, and relies on the UCSC liftover executable to do this.

 

Thanks again,

Simon


 

--

Conner Powell

UCSC Genomics Institute, Quality Assurance and Support Analyst 

 

-- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
Reply all
Reply to author
Forward
0 new messages