binary dependencies

79 views
Skip to first unread message

Daniel Vigovszky

unread,
Aug 31, 2015, 7:20:39 AM8/31/15
to haskell-stack
Hi all,

I've been trying out stack and would love to use it in our projects at Prezi, so I was thinking how we could integrate it with our workflow. To understand my proposal, let me explain first what we are doing today. We have several Haskell projects, depending on each other, often also using and producing artifacts for other languages (such as C++ and Haxe). To fit our projects into the company's build infrastructure, we developed a Gradle plugin for Haskell development and use it for all our Haskell projects, running them both on our development machines and on Jenkins nodes. All these builds are producing binary artifacts for Linux and OSX, stored on an Artifactory server. The plugin uses Gradle's dependency management to get these binary artifacts, and wraps cabal to set up the appropriate package database chain. 
The idea is that each binary dependency is a GHC package database, and to build a project we pass all it's dependencies' package DB in appropriate order, on top of the global package DB (we skip the user package DB). Our "bottom" project (which is the one goes immediately on top of the global package DB in every build) is a special project which only installs commonly used libraries but has no custom code, a bit similar to the snapshots of stack.

Stack would fit into this system quite well, if we'd have a way to put additional package DB layers between the snapshot DB and the local package DB. If a few constraints are enforced, this would not break the reproducibility of stack build:
- all the binary deps must be built on the same snapshot
- they have to be layered on each other in the correct order
- they are built on the same platform (OS version and GHC version)

If we assume that these constraints are enforced by the user (which in our case would be done by our Gradle plugin), it would be easy to add a new block to stack.yaml such as 'extra-bin-deps' or 'extra-package-dbs', and just use the listed package DBs in addition to the snapshot and the local DB everywhere. 
A more full solution would be if binary dependency support would get first-class support in stack, in this case it could package it's local package DB together with metadata and use these packages to find out the correct layering order and do more checks. 

So, what do you think? If this is something that could be added to stack, I'd be happy to work on it of course. I'm also interested in other ideas how we could use stack in our build environment, without giving up on our binary artifacts.

Michael Snoyman

unread,
Aug 31, 2015, 8:20:16 AM8/31/15
to Daniel Vigovszky, haskell-stack
There are a number of ways a package database added to the database set could in theory break invariants: it's built from different source code, has different Cabal flags or GHC options set, etc. However, I'd agree with leaving that to the user's hands to not mess up. stack already handles this situation very well for the global package database, as I see it, we're essentially talking about extending the concept of a global package database to be a global + 0 or more global-ish databases that are included by all snapshots. Is that accurate?

All that said, I'm not sure what advantage this has over just caching the ~/.stack directory itself. That's what the Travis instructions[1] do, and it results in very fast builds in practice. Would that work for you?

In theory, if we have such an "extra global" concept, we could easily extend it with the ability to download these bundles remotely, and stack would get downloadable binary caches, which has been an open feature request for a while[2]

--
You received this message because you are subscribed to the Google Groups "haskell-stack" group.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-stac...@googlegroups.com.
To post to this group, send email to haskel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/haskell-stack/c54274fc-4df2-4bd6-ac93-93245383a58e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daniel Vigovszky

unread,
Sep 1, 2015, 3:42:49 AM9/1/15
to haskell-stack, mic...@snoyman.com
I think we are not on the same page yet, let me try to show a small example (using cabal).

Let's say we have three projects, lib1, lib2 and app, with the following (stripped) contents:

lib1.cabal:

name:    lib1
version: 0.1
library
  exposed-modules:     Lib1
  build-depends:       base >=4.7 && <4.8

lib1.hs:

module Lib1 where

hello :: String -> String
hello name = "hello " ++ name

lib2.cabal:

name:                lib2
version:             0.3
library
  exposed-modules:     Lib2
  build-depends:       base >=4.7 && <4.8
                     , ansi-wl-pprint ==0.6.*
                     , lib1 ==0.1.*

lib2.hs:

module Lib2 where

import Text.PrettyPrint.ANSI.Leijen

printGreeting :: String -> IO ()
printGreeting greeting = putDoc $ text greeting <> linebreak

app.cabal:

name: app
executable app
  main-is:             Main.hs
  build-depends:       base >=4.7 && <4.8
                     , lib1 ==0.1.*
                     , lib2 ==0.3

Main.hs:

import Lib1
import Lib2

main :: IO ()
main = do
  printGreeting $ hello "world"

Now this could be built by stack if all the three projects are put into the same stack project, assuming that ansi-wl-pprint is available in the snapshot used.
But with my binary dependency approach, you can compile the three projects separately, with the following package database chaining:

lib1: --package-db=clear --package-db=global --package-db=lib1-local-db
lib2: --package-db=clear --package-db=global --package-db=lib1-local-db --package-db=lib2-local-db
app: --package-db=clear --package-db=global --package-db=lib1-local-db --package-db=lib2-local-db --package-db=app-local-db

In this case every '*-local-db' package database would only contain the packages necessary to build that particular project assuming that all the dependent projects' package databases are passed too, like above:

lib1/build/sandbox/packages
   lib1-0.1.0.0

lib2/build/sandbox/packages
   ansi-terminal-0.6.2.1
   ansi-wl-pprint-0.6.7.2
   lib2-0.3

and app is empty (because it only built an executable)

If I understand stack well, to be able to do this with it the only difference would be that the '--package-db=clear --package-db=global' part is managed by stack and is extended with the snapshot package database, and the local package database is at the end of each chain is also managed by stack. So we'd only need to be able to pass additional package databases in the middle of this chain.

Michael Snoyman

unread,
Sep 1, 2015, 6:57:36 AM9/1/15
to Daniel Vigovszky, haskell-stack
That is different than what I'd originally understood. And yes, adding in the ability to stack to take arbitrary additional databases on top of the global database should solve it, and I see no problem with that change.

However, just to throw out one more idea: if the situation is that you have a few proprietary packages in addition to the Stackage snapshots, you could create a custom snapshot that includes those extra packages, compile them into your .stack directory, and then cache that entire directory. Or if you want to go more extreme: create a Docker image with all of the snapshot libraries precompiled, and then both CI and devs get to used the precompiled packages. (In theory, the extra-database approach you're talking could also be extended to allow usage of precompiled binary databases.)

Daniel Vigovszky

unread,
Sep 1, 2015, 8:43:40 AM9/1/15
to haskell-stack, mic...@snoyman.com
These are valid ideas but not really fitting our use case, because we have to fit the whole haskell part into our gradle build system. So in our case, support for the extra databases on the stack side would already mean "precompiled binary databases", because gradle would pack, publish, and download these on top of it. We are also using docker but in a different way and mixing the two approach would be difficult.
So if you agree, I'll try to create a pull request adding the support for extra package databases, and then we can move forward testing the integration. Is it ok?

Michael Snoyman

unread,
Sep 1, 2015, 8:46:59 AM9/1/15
to Daniel Vigovszky, haskell-stack
Yup, absolutely, sounds great. Let me know if you need any guidance. It should mostly be a matter of:

* In Stack.Build.Installed, check these extra databases after the global (see: loadDatabase')
* In Stack.Types.Build.configureOptsDir, include the additional --package-db calls

There may be other things I'm forgetting right now...
Reply all
Reply to author
Forward
0 new messages