script with go routine is fastest the second time

110 views
Skip to first unread message

jpofmars

unread,
Jan 21, 2020, 8:24:27 AM1/21/20
to golang-nuts
Hi everybody,

I develop a script which parse a file and foreach row I perform several treatment who need to use an index to access at a file.

Foreach row I create a goroutine. the number of goroutine is limited to 8 by using the package from "github.com/korovkin/limiter" and GOMAXPROC=8
For each routine I load the index and open the file to avoid conflict ( 8 index was created at the beginning and shared with channel ). 

So when I launch the script, it take ~2 seconds for 1000 rows. (cpu < 100% during all the process) 
When it done, If I relaunch it , it take ~300ms for 1000 rows (cpu > 700% during all te process).

If I launch the script with only one routine, the time also 1-2 second for 1000 rows.

It's difficult to share the script. 
Did you had the same behaviour in your scripts ?
Do you have an idea why the performance is 10x better if I launch 2 times my script ?

For information this behaviour is the same on MACOS or CENTOS.

thank you for your help

Brian Candler

unread,
Jan 21, 2020, 8:40:33 AM1/21/20
to golang-nuts
Could it be that the files from disk are "hot" in cache on the second run?

Try emptying the cache before the second run.  Linux:

sync; echo 3 > /proc/sys/vm/drop_caches

jpofmars

unread,
Jan 22, 2020, 4:14:27 AM1/22/20
to golang-nuts
I try on Linux and It seems that you are right.
For information I use the package biogo/hts and the "problem" happens when I use the function NewChunkReader who takes file position and retrieve data from the big file (bgzipped).

Thank you for your answer.
Reply all
Reply to author
Forward
0 new messages