|Overveiw of current scoutess architecture and philosophy||Jeremy Shaw||11/3/12 1:47 PM|
There are a lot of questions about why scoutess is the way it is, and if that is even the way we want it. In the post I will try to outline some of the ideas behind scoutess.
The primary goal of scoutess is to allow project maintainers to automate many of tedious parts of maintaining a project. The name 'scoutess' was chosen to reflect that idea that a lot of the work done by scoutess is collecting information and creating reports about the findings. Scoutess includes buildbot type functionality, but is not limited to just buildbot stuff. Additionally, unlike general purpose systems like Jenkins, scoutess is specialized for Cabal and Haskell development. Here are some of the many things we would like scoutess to be able to do:
- watch source repositories and rebuild packages on new commits
- watch for build-dependency on changes (on hackage or commits to source repos) and test that your packages still build
- automatically notify your upper bounds are out of date (similar to what packdeps).
- automatically build cross-linked haddock documentation and upload it to your project site
- integration with an ircbot that reports important information
- ability to test builds against multiple versions of the haskell platform and multiple OSes
- automatically check that the *lower* bounds in your .cabal file are still valid
- scan communities like reddit, irc, stackoverflow for relevant discussions that you may want to take part in
- check for reverse dependencies that are broken as a result of changes to your package
- check that if you upload a package to hackage that it is actually going to build (not missing other dependencies)
- notify you if you bumped the .cabal version in source, but forgot to upload a new version
- notify you if you uploaded a new version to hackage but forgot to push your patch to the repo
- commit notification on irc
When looking at the tasks, it is clear that there are a lot of reusable parts. If you want to build the haddock documentation for the latest dev sources, you have to start by checking out the sources. If you want to check that the latest dev sources build, you have to start by doing a checkout.
Now, lets say you want to create a new module that runs hlint on the source and reports coding 'violations'. Once again, you need to start by getting the source.
We would like to design scoutess so that is very modular and not at all monolithic. We would like to provide a bunch of 'block boxes' that you can wire together in different ways to get customized functionality. If you want to add new functionality, it should be easy to create a new self contained module that does that.
Implicit external state sucks, make it explicit
One beautiful aspect of purely functional programming is referential transparency. It is much easier to reason about functions when all the information it needs is explicitly passed in via the arguments. Not having to worry about global variables, and getting a different answer for the same inputs makes things much easier to understand.
This is especially useful when using libraries that you don't really understand fully.
The problem with something like scoutess is that it involves loads of external, implicit state. For example, whether a darcs repository is up to date or not.
The solution in scoutess is that we always make any implicit dependencies explicit in the type signature. For example, if a function 'foo' requires that a darcs repository be up to date, it should take as an argument a value that can only be produced by first running the function that updates the darcs repository.
The idea is simply, when someone wants to call 'foo', they probably have no idea what needs to be done before 'foo' can be run. And they aren't really going to read the docs to find out either. However, since they can't call 'foo' with out all its required arguments, they will have to look for the function that generates the arguments they need to call 'foo'. And, in that way, they will be assured that they always call the prerequisite functions in the correct order.
In the DataFlow module, we use the Arrow syntax. Yet if you look at the type, you will see that it just uses Kleisli to turn a monad into an Arrow. So what is the point?
The point is really just to make the idea of 'black boxes' being wired together even more explicit. you basically have something like:
outputs <- blackbox arg1 arg2 <- inputs
This clearly shows the output from the blackbox, shows what inputs come from other black boxes, and shows what arguments are just normal arguments. We only use this at the top-level for building configurations.
Haskell as config language
Like, xmonad and other projects, we do not have a separate configuration language. Instead your configuration is written in a plain old Haskell file. This is because scoutess configs will often want to apply Haskell functions to generate values.
If you look at Scoutess.DataFlow
You will see that it contains a function 'standard'. This is supposed to represent a fairly standard 'build' process. But it is not intended to be the only build process. It is expected that advanced users will create their own functions similar to this which create the type of workflow they are looking to have happen for different runs.
We are currently thinking a lot about the design of scoutess to ensure that it can actually do all the things we want it to do. We don't want to spend a lot of time written code, and then find out it can't actually do what we need.
One thing we are working on is creating a list of all the weird and/or useful things that someone might want to do. Then for each thing, we try to identify all the steps that would be needed to do that.
You can see that list here:
Contributing new tricky cases would be incredibly useful.
prototype with types
As said earlier, we want to make all the state explicit in the type signature. Once we have our list of build cases, and the steps we think are required, we can start turning those into code. But, our goal is to largely avoid writing function bodies at first. We want to start by writing down the names of the functions we will need and their types. Basically, create the haddock docs first, and then implement the functions second. The reason behind this is that in trying to write down the types for the functions, you realize what information you are going to need that you didn't think about it. If we can get the types correct first, then writing the function bodies will be a lot easier, and we won't have to rewrite as much code.
Now we will walk through the 'standard' function and try to figure out what it does, why it does that, and if it is actually going to work.
Here is 'standard' for reference:
First we will look at the type signature:
SourceSpec is a type that tells use where we can find Cabal packages. It is a Set of SourceLocation where a SourceLocation can be a RCS like darcs, or a cabal repository such as hackage.
standard takes two filters, one that filters sources and one that filters versions. We will see what those do in a second. The result of standard is a arrow.
TargetSpec specifies what we actually want to build. For example, SourceSpec might just contain hackage. And TargetSpec would be the specific packages that we want to build. PriorRun provides the information about what happened in previous runs. This is how we can determine if a package needs to be rebuilt or not. Note that when we specify what targets we want to build, there are varying levels of precision available. We might just provide a package name, or a package name and version, or a package name and source location. That is why we need a resolution step -- if we just specify a package name, then we need to look at all the locations and find out what the most recent version available is.
The result is a BuildReport that summarizes the findings.
next we have: