Pkg.update() timings

468 views
Skip to first unread message

Kevin Squire

unread,
Jun 4, 2014, 2:25:07 PM6/4/14
to juli...@googlegroups.com
I'm finding `Pkg.update()` excruciatingly slow, and I'm wondering if there might have been a recent regression?

With 73 packages installed, `Pkg.update()` takes almost 6 minutes on my Linux box and leaves Julia using 1.4 Gb of RAM (resident).

Now, `Pkg` has been slow since inception, and there are a lot of things that could affect speed (OS, processor(s), network speed).  Even so, this seems a little excessive to me.

I'm wondering if anyone could corroborate this, or tell me it was faster (or the same) on a less recent Julia?  I'm running on 0.3.0-prerelease+3423 (2014-06-03 04:27 UTC).

Cheers,
   Kevin

Gustavo Lacerda

unread,
Jun 4, 2014, 2:45:28 PM6/4/14
to juli...@googlegroups.com
hi Kevin,

For me it takes less than 1 minute, but I only have 18 packages installed.

Gustavo


julia> Pkg.update()
INFO: Updating METADATA...
INFO: Updating BSplines...
INFO: Updating Plotly...
INFO: Updating SubsetSelection...
INFO: Updating METADATA.jl...
INFO: Computing changes...
INFO: No packages to install, update or remove



julia> Pkg.status()
3 required packages:
- BSplines 0.0.3 master
- Images 0.2.36
- JSON 0.3.5
15 additional packages:
- BinDeps 0.2.12
- Cartesian 0.1.5
- Color 0.2.9
- HTTPClient 0.1.4
- Homebrew 0.0.6
- LibCURL 0.1.3
- METADATA.jl 0.0.0- metadata-v2 (unregistered)
- Plotly 0.0.0- master (unregistered)
- SIUnits 0.0.1
- SubsetSelection 0.0.0- master (unregistered)
- TexExtensions 0.0.1
- URIParser 0.0.2
- Zlib 0.1.7
- src 0.0.0- non-repo (unregistered)
- test 0.0.0- non-repo (unregistered)


Version 0.3.0-prerelease+3131 (2014-05-20 21:00 UTC)

Milan Bouchet-Valat

unread,
Jun 4, 2014, 4:44:20 PM6/4/14
to juli...@googlegroups.com
I'm seeing this too, it looks like it's hanging when updating METADATA
and package caches. For example it took a few minutes just to update the
cache of Distributions. Then packages with no cache to update were
processed quickly. (I'm not sure what's the difference between "Updating
cache of X" and "Updating Y".)

FWIW it's with an older version:
Version 0.3.0-prerelease+3298 (2014-05-29 20:54 UTC)



Regards

Jacob Quinn

unread,
Jun 4, 2014, 4:47:12 PM6/4/14
to juli...@googlegroups.com

Rick Graham

unread,
Jun 5, 2014, 12:22:52 AM6/5/14
to juli...@googlegroups.com
Don't know if the added time info is useful, but with 63 packages installed, `./julia -E 'Pkg.update()'` takes 6:12.09 on my 32-bit dual core Fedora 19 system.

$ /usr/bin/time -v ./julia -E 'Pkg.update()'
INFO
: Updating METADATA...
INFO
: Updating JuMP...
INFO
: Updating Ipopt...
INFO
: Updating Color...

INFO
: Computing changes...
INFO
: No packages to install, update or
remove
nothing
 
Command being timed: "./julia -E Pkg.update()"
 
User time (seconds): 41.56
 
System time (seconds): 332.16
 
Percent of CPU this job got: 100%
 
Elapsed (wall clock) time (h:mm:ss or m:ss): 6:12.09
 
Average shared text size (kbytes): 0
 
Average unshared data size (kbytes): 0
 
Average stack size (kbytes): 0
 
Average total size (kbytes): 0
 
Maximum resident set size (kbytes): 483536
 
Average resident set size (kbytes): 0
 
Major (requiring I/O) page faults: 31
 
Minor (reclaiming a frame) page faults: 4796485
 
Voluntary context switches: 36585
 
Involuntary context switches: 64905
 
Swaps: 0
 
File system inputs: 31648
 
File system outputs: 2072
 
Socket messages sent: 0
 
Socket messages received: 0
 
Signals delivered: 0
 
Page size (bytes): 4096
 
Exit status: 0


In addition `strace` reports that 98.4% of the system time is spent in syscall `futex` and that there were 10428 `execve("/usr/bin/git", ...` syscalls.

Kevin Squire

unread,
Jun 5, 2014, 2:50:03 AM6/5/14
to juli...@googlegroups.com
Well, it seems consistent, without a likely recent regression, and probably not much different from when I complained about this the first time (I have a bad memory :-) (https://github.com/JuliaLang/julia/issues/4158).  

I'm really looking forward to the results of Alessandro Andrioni's GSOC project on this...

Cheers,
   Kevin

Simon Kornblith

unread,
Jun 5, 2014, 9:44:40 AM6/5/14
to juli...@googlegroups.com
The slowness of Pkg.update() on Linux is largely due to fork(), and fork() speed is inversely correlated to memory consumption, so anything that makes Julia consume more memory makes Pkg.update() slower. (Memory consumption has a much smaller impact on fork() speed on OS X, and as a result, our spawn performance there is generally much better.)

https://github.com/JuliaLang/libuv/pull/24 will make Pkg.update() substantially faster (>10x on my system) assuming it doesn't break anything. If you want to try it now, check out the julia-uv0.11.26 branch of libuv, do something to stop make from trying to check out the submodule commit, and build with that. There may be a preferred way to pin the commit that I don't know about, but either commenting out the line of deps/Makefile that checks out the submodule or running git add deps/libuv [and being careful not to push that change to master] should work.

Pkg.update() still takes ~15 seconds for me with no packages to update, which is still too long, but at least it's tolerable. Hopefully libgit2 will take this down another order of magnitude.

Simon

Kevin Squire

unread,
Jun 5, 2014, 10:40:06 AM6/5/14
to juli...@googlegroups.com
Thanks Simon--I'll try it!

Kevin 

niclas....@gmail.com

unread,
May 3, 2015, 5:11:53 PM5/3/15
to juli...@googlegroups.com

Revisiting an old thread here.

I did some digging into Pkg internals today and found that at least for my Windows version (reasonably fresh master, 32-bit build), the bulk of the time from running Pkg.update() seems to be related to prefetching to the cache. In particular, it seems like the Git.iscommit() call is run multiple times for each version of each installed package. Each of these calls take around 0.2s. With 54 packages (7 required packages), and a total number of about 1000 revisions, this adds significantly to the runtime.

I'm wondering about the role of the cache here. At least for registered packages I guess we already have a record of versions and dependencies in metadata, which should allow us to resolve the needed versions without probing all historical versions of every package. Am I missing something?

Niclas

Tony Kelman

unread,
May 3, 2015, 8:30:23 PM5/3/15
to juli...@googlegroups.com
I don't think you're missing anything. Pkg could really use a redesign to use libgit2 instead of shelling out, and avoid hitting the filesystem so hard for all versioning information. It needs someone who's willing to take on the project and work on it.

niclas....@gmail.com

unread,
May 5, 2015, 4:55:43 AM5/5/15
to juli...@googlegroups.com
FWIW, I'm trying out some local changes to Pkg that look promising. I made a vectorized version of iscommit() which dumps all hashes it can find using "git log --all" and matches those against the input vector of hashes. Using this I could significantly reduce the number of calls to git. In my testing environment (synchronous, additional debug output) the runtime of a no-change Pkg.update() went from coffee break (> 10 min) to bathroom break (couple of minutes).

I'll put up a RFC pull request once I've cleaned it up a bit.

Kevin Squire

unread,
May 5, 2015, 9:41:15 AM5/5/15
to juli...@googlegroups.com
Hi Niclas,

That sounds promising!  Is that change against 0.3 or 0.4?

Kevin

niclas....@gmail.com

unread,
May 5, 2015, 3:42:41 PM5/5/15
to juli...@googlegroups.com
Hi Kevin,

It's against 0.4, but it should be more or less trivial to backport since I don't think that the surrounding code has changed since 0.3.

Here's the pull request: https://github.com/JuliaLang/julia/pull/11137

Niclas

wil...@gmail.com

unread,
May 5, 2015, 5:13:02 PM5/5/15
to juli...@googlegroups.com
I made quick port of 'Read' functions to 'libgit2' calls: https://github.com/JuliaLang/julia/pull/11143. and got 10x speed improvement on `installed` call.

-- Art
Reply all
Reply to author
Forward
0 new messages