Julia with less dependencies; is that possible?

588 views
Skip to first unread message

Ryan Northrup

unread,
Aug 11, 2014, 11:08:01 PM8/11/14
to juli...@googlegroups.com
Julia has a lot of external dependencies for the base installation.  OpenBLAS/LAPACK, SuiteSparse, ARPACK, dSFMT, FFTW, OpenSpecFun... the works.  This is great, but it introduces some issues:

  • Compilation takes longer; when compiling from source is the recommended way to get/install Julia, that becomes a usability issue.
  • Porting to new platforms (operating systems, architectures) is more difficult, since the dependencies have to be ported as well (if they even can be ported...).
  • The distribution is larger.

From a quick glance at the current source, it looks like these math dependencies are mostly separate from the core language, and are more or less extra libraries implemented in Julia accompanying the standard library.  Am I correct in reckoning that it would be possible to split the advanced math functionality using these dependencies into standalone Julia packages?  Is that something already being worked on?  Has anyone managed to trim them out?

Basically, what I'm going for is a minimal Julia: just enough for basic functionality, while allowing for the extra math functionality when wanted/needed.  Is that doable?

Viral Shah

unread,
Aug 11, 2014, 11:32:28 PM8/11/14
to juli...@googlegroups.com
This is very much planned for. We will start separating out these things and go for a slimmer base, once we have some static compilation support. Once we separate this functionality into packages, we will also simultaneously want to get to a point where we can continue shipping binaries with all these libraries.

-viral

John Myles White

unread,
Aug 11, 2014, 11:33:05 PM8/11/14
to juli...@googlegroups.com
There’s a long history of discussion about this in the GitHub issues: https://github.com/JuliaLang/julia/issues/5155

 — John

Jameson Nash

unread,
Aug 11, 2014, 11:36:55 PM8/11/14
to juli...@googlegroups.com
sure it's possible. in theory, the only dependencies you need are llvm, libuv, and utf8proc. it's just unclear what benefit would result. it would make it harder for the end user to use the libraries (since they would not have them by default). the distribution files is only marginally larger, and we don't count harddrive space in megabytes anymore. in short, nobody has shown a truly compelling reason to put effort into creating and maintaining such an option.

John Myles White

unread,
Aug 11, 2014, 11:38:33 PM8/11/14
to juli...@googlegroups.com
It might be worth nothing that make is parellizable over all of the dependencies you’re mentioning. On a machine with 32 cores, compiling Julia from scratch takes less than 5 minutes.

 — John

Tobi

unread,
Aug 12, 2014, 6:13:18 AM8/12/14
to juli...@googlegroups.com
Jameson,

First, I don't think that shrinking base should be done in 0.4.

However, as I outlined in https://github.com/JuliaLang/julia/issues/5155 one main point is that it is difficult to define what should be in base and my concern with the current model is that it does not really scale. We regularly have pull requests where it is discussed whether a specific functionality should be in base or not. And people cannot really test these things because base code is bound to releases (or one compiles Julia itself)

All this may not be super critical in the moment. But look at the thread where the naming of some sparse matrix functions was discussed. In the end it was over 100 messages and I think it would have been much better to discuss this in a "sparse" package.
Further when interface in that area will be changed there is a lot higher barrier because one has to add deprecations. If it would be a package one could simply rely on an older version of a package if the interface changes.

So, my feeling is that it would make everything cleaner if Julia base would only provide the core infrastructure. Further please note that the proposal is to ship with default packages so that the end user would have an even better experience than now. And the default packages should definately be included in the downloads to that for the regular end user the binary dependencies would be resolved.

Cheers

Tobi

P.S. I started to think about shrinking base when I worked on compiling libjulia with MSVC. So there was a much simpler motivation for it :-)

Patrick O'Leary

unread,
Aug 12, 2014, 8:54:59 AM8/12/14
to juli...@googlegroups.com
On Monday, August 11, 2014 10:36:55 PM UTC-5, Jameson wrote:
sure it's possible. in theory, the only dependencies you need are llvm, libuv, and utf8proc. it's just unclear what benefit would result. it would make it harder for the end user to use the libraries (since they would not have them by default). the distribution files is only marginally larger, and we don't count harddrive space in megabytes anymore. in short, nobody has shown a truly compelling reason to put effort into creating and maintaining such an option.

I really wasted my time cutting down the OpenCV build options for this little ARM board I'm using at work, then.

There's a world between the deep embedded world of microcontrollers and modern consumer machines, and we're often targeting ten or fifteen year old architectures (hello, RAD750!). Of course we'll need static compilation to a POWER target first...

(I'm not going to dictate a solution here, so if the preference is statically linking the runtime and giving up having a REPL, so be it. Just don't dismiss the use case.)

Ryan Northrup

unread,
Aug 12, 2014, 5:03:45 PM8/12/14
to juli...@googlegroups.com
Thanks for the info, everyone.  Sounds like this is something that's already actively being worked on (I'm excited, to say the least).


On Monday, August 11, 2014 8:38:33 PM UTC-7, John Myles White wrote:
It might be worth nothing that make is parellizable over all of the dependencies you’re mentioning. On a machine with 32 cores, compiling Julia from scratch takes less than 5 minutes.

 — John

 I guess it's just one more reason to get myself a 32-core machine, then ;)


On Monday, August 11, 2014 8:36:55 PM UTC-7, Jameson wrote:
sure it's possible. in theory, the only dependencies you need are llvm, libuv, and utf8proc. it's just unclear what benefit would result. it would make it harder for the end user to use the libraries (since they would not have them by default). the distribution files is only marginally larger, and we don't count harddrive space in megabytes anymore. in short, nobody has shown a truly compelling reason to put effort into creating and maintaining such an option.

My Julia installation (in /opt/programs/julia) takes up 1.7GB of disk space according to du1.3GB of that is consumed by the deps/ folder alone.  The bulk of that seems to come from OpenBLAS/LAPACK (683M) and LLVM (252M), but the others add up, too (SuiteSparse, the single and double versions of FFTW, and GMP are each at around 40MB.  That might not seem like much if you're working with massive petabyte-scale SANs and such, but in the embedded realm (for example), that's likely several times larger than the operating system itself (and on par with desktop and server versions of GNU/Linux distros).

I count hard drive space in megabytes because I'm used to the feeling of having limited resources at my disposal.  I'd reckon that most embedded and mobile developers do the same.  Hence my question; being able to trim those things out makes Julia much more accessible to all platforms, not just the powerful ones :)

Beyond that, having a small "minimum" base (accompanied with a peripheral set of default packages for things like BLAS and advanced matrix operations and Fourier transforms and such) would (I think) make porting to new platforms much easier, since there's less to initially port in order to have a working Julia on, say, a Blue Gene/L installation or a datacenter full of pre-x86 Crays or the HTC One M8 sleeping in my pocket.

Jameson Nash

unread,
Aug 12, 2014, 5:12:00 PM8/12/14
to juli...@googlegroups.com
Just semantics, but typically you don't put a compiler on an embedded system, so all you really would care about is the size of make dist. 

John Myles White

unread,
Aug 12, 2014, 6:20:39 PM8/12/14
to juli...@googlegroups.com
To echo Jameson's point, it's worth noting that a binary install of Julia uses about 50 MB of space.

 -- John

Jameson Nash

unread,
Aug 12, 2014, 6:28:23 PM8/12/14
to juli...@googlegroups.com
Right, also note that the size breakdown from that, IIRC, is about:
12MB julia sysimg and dylib
25MB openblas (drops to 10MB without dynamic arch)
12MB llvm
The rest is just noise

Ryan Northrup

unread,
Aug 12, 2014, 9:17:08 PM8/12/14
to juli...@googlegroups.com
On Tuesday, August 12, 2014 3:28:23 PM UTC-7, Jameson wrote:
Right, also note that the size breakdown from that, IIRC, is about:
12MB julia sysimg and dylib
25MB openblas (drops to 10MB without dynamic arch)
12MB llvm
The rest is just noise

Alright, fair enough.

Still, OpenBLAS takes up half the storage footprint by that metric, too (a bit less than a third without dynamic arch).  That's pretty significant nonetheless.

Jameson Nash

unread,
Aug 12, 2014, 10:44:17 PM8/12/14
to juli...@googlegroups.com
Quite true. Once we add the ability to use static-compilation, so that llvm isn't needed at runtime, it may make a lot more sense to investigate reducing the other dependencies too. (note that for Apple/iPhone, the Accelerate BLAS is already bundled with the phone OS, so openblas isn't actually needed at all). However, without a compelling use case immediately being targeted, it's hard to see the benefit in providing multiple versions of julia to download and potentially confusing users trying to pick which options they need (this one has some SweetSparse stuff, but this other one comes in blue – how to decide?!). With more distros picking up Julia also, it is even less necessary to worry about the time necessary to build a from-scratch copy.

Similarly, you can cut down on your build time significantly by turning off dynamic arch in a Make.user file, but then your build can't be reused on another machine with a potentially different processor, which is why this isn't the default option.

Tobi

unread,
Aug 13, 2014, 4:05:12 AM8/13/14
to juli...@googlegroups.com
Jameson, 

I agree with you that its not worth to provide different versions to download. And since most developers don't care how many binary dependencies Julia has it would indeed require someone with a need to move this forward.

Still, I think it is a nice long term goal to make Julia base more modular. Deployment of standalone Julia applications (Think of a Gtk.jl GUI) is also something where a more modular Julia would be nice. When embedding Julia into a larger C/C++ application its also great if one could select only those components that are needed. In my opinion Julia is an absolutely awesome language to provide scripting capabilities to an existing application.

But for me the modularization is more important from the development point of view. Julia base is a mix of several things and very biased from the numerical programming point of view. This is not wrong but often it is quite unclear why things are in base or why not. Think of fft vs. imfilter (Images.jl). Also I think that it would be a lot easier to experiment with the implementation of different sparse matrix formats when this would not live in base. People could simply test things out without compiling branches from source.

Cheers 

Tobi

gael....@gmail.com

unread,
Aug 13, 2014, 6:49:04 AM8/13/14
to juli...@googlegroups.com
+1

Providing different versions is not a good idea but fortunately, that wouldn't be required: people needing a trimed-down, non-linalg-oriented version of Julia for instance could just download the sources and change a few switches. But by default, everything is included.

In addition to making Julia more portable and to clean up the namespace a little bit, this would allow easier module swapping (as Tobi wrote) which in turn could help to avoid the dead batteries problem:

http://radar.oreilly.com/2013/10/dead-batteries-included.html

which I find important in the long term. That also means that if one wants to improve one of the newly created LinAlg/Optimize/... module, one can clone it and work on that and switch between implementations using imports/usings (instead of having two different versions of Julia itself).


As I've read elsewhere, this is also a way to have people consider Julia more as a general-purpose programming language (with first class numerical processing constructs and abilities!): no more math stuffs in the namespace (even if those are just **one** 'using' away ;).

Seeing Julia as a general-purpose language may not seem very interesting at first sight (and some people here may not like the idea) but this could (yes could, not should) trigger many new contributions in fields far from science and engineering.

Having people in their respective fields working on (possibly native) crypto, ORMs, network IO/web frameworks, GUIs, etc. implementations would be fantastic.

You've built far more than a DSL, now accept the consequences. :)

Stefan Karpinski

unread,
Aug 13, 2014, 11:23:57 AM8/13/14
to juli...@googlegroups.com
I'm all for Julia as a general purpose language. Not sure how I feel about an ORM though. This post was quite insightful:


I suspect that what you really want is a library that let's you write portable, composable database queries. Portable here meaning that you can swap out databases; composable meaning that the result of one query expression can be made the input of another (similar to Arel – https://github.com/rails/arel).

Jason Riedy

unread,
Aug 13, 2014, 2:00:34 PM8/13/14
to juli...@googlegroups.com
And Stefan Karpinski writes:
> I suspect that what you really want is a library that let's you write
> portable, composable database queries.

See also Hadley Wickham's dplyr for R for an interesting take:
http://cran.r-project.org/web/packages/dplyr/index.html

Stefan Karpinski

unread,
Aug 13, 2014, 2:37:16 PM8/13/14
to Julia Dev
Yes, very much so.

Tony Kelman

unread,
Aug 13, 2014, 6:11:36 PM8/13/14
to juli...@googlegroups.com
50 MB on a Mac. Less in Linux distro packages I imagine, when openblas is separate.

On Windows we have 250 MB from bundled Git, 50 MB each from libjulia and libjulia-debug, 35 MB from openblas, 30 MB of other dll's, 17 MB sysimg, and 8 MB everything else. At least we'll soon be getting rid of the biggest contributor.

gael....@gmail.com

unread,
Aug 13, 2014, 6:49:39 PM8/13/14
to juli...@googlegroups.com
As far as my future web-oriented pet project is concerned, I think I'll stay simple for DB access. ORMs are not perfect but raw SQL queries aren't either (I'd be curious to read Mr Wozniak's next post on ORMs in a few months from now... It's like programming paradigms, they come in and out of fashion pretty easily/quickly).

But those were really just a few examples of the !(number crunching) set. The whole point is that if Julia can be *seen* as a general purpose language instead of a matlab replacement, people will naturally come and write things like database handlers or whatever for the others. If, as a user, I think Julia is a good fit for a web server backend, that means that YOU wouldn't have to write a DB handler if you don't need it because I would do it. That also means that then YOU can spend more time on things you actually enjoy/need/want.


Back to the thread that also means that BLAS for instance has to go away should the developer request it at compile time. Same things for any non-required lib.

But as I understand it, that would currently require to keep a different version of Base for each combination of missing libs (or use conditional branches everywhere). Whereas if everything related to BLAS/libX is in its own Julia module it's much easier: if you don't need the features provided by a lib, you don't need the corresponding Julia module (for BLAS, it could be named LinAlg.jl). Should you need a few pieces not directly depending on the given lib, you could just reimplement a minimal LinAlg.jl module. Should you find a better implementation of libX with a different API, you could write and release a brand new module as a replacement for the classical one instead of trying to have your pervasive changes to Base.jl included in the next major release in X months from now.

Reply all
Reply to author
Forward
0 new messages