Compilation through Scion

24 views

Skip to first unread message

Thomas ten Cate

unread,

Jun 27, 2009, 8:18:46 AM6/27/09

to Development discussion, scion-l...@googlegroups.com

Hi Leif and nominolo and whoever else is lurking,

I've been thinking about compilation and building of Haskell projects.
Currently, EclipseFP invokes "ghc --make" (or another compiler) for
the project's main executable through a command line, and parses its
output. There are several problems with this approach:
1. Output parsing is a hack.
2. The output parsing in the current version was done through Cohatoe,
but is now commented out because I wanted to remove that dependency.
This would need to be rewritten in Java. But it's only ~100 lines of
Haskell code, so it wouldn't be too bad.
3. There is no progress report until the entire thing is done.
Especially in large projects with many source files, this would be a
problem.
4. The files that are currently open in editors need to be loaded
(parsed/desugared/typechecked) into Scion anyway, so it's duplicate
effort.

I've also been thinking about alternatives, involving Scion.

It might be possible to extend Scion with a couple of "compile"
commands, that produce .hi and .o files on disk. But this also has its
problems:
- Scion commands are executed sequentially, due to the
single-threadedness of GHC. This means that all UI features that
depend on Scion would stop working during compilation. However, the
commands that do not interface with GHC (e.g. lookups in precomputed
Maps) could theoretically be done in parallel with compiling. Scion
would need to be modified to be multithreaded, and it sounds a little
scary.
- The GHC API function compileCoreToObj that we would probably need
"has only so far been tested with a single self-contained module",
which doesn't give me much hope for the stability and usability of
this part of the API.
- EclipseFP would become nearly useless without Scion.
All in all, I'm not sure if this is the way to go.

Also, I'm not sure yet if, and how, Cabal fits into this picture.

This is a fairly big issue, and I'm not sure I grasp all the
implications. What are your thoughts on this?

Thomas

Thomas Schilling

unread,

Jun 28, 2009, 3:14:22 PM6/28/09

to Thomas ten Cate, Development discussion, Leif Frenzel, Duncan Coutts, scion-l...@googlegroups.com

On 27 Jun 2009, at 17:12, Thomas ten Cate wrote:

> All very good points Leif, thanks! Right after I sent my previous
> e-mail I rewrote the output parser in Java. I will modify it to work
> on streams, and to clear and create markers whenever the corresponding
> output scrolls by. This sounds like the best option.
>
> What is currently in the way of invoking "cabal build" instead of
> "ghc --make"?
>
> Thomas
>
> On Sat, Jun 27, 2009 at 17:21, Leif Frenzel<him...@leiffrenzel.de>
> wrote:
>> Hi Thomas,

>>
>>> I've been thinking about compilation and building of Haskell
>>> projects.
>>> Currently, EclipseFP invokes "ghc --make" (or another compiler) for
>>> the project's main executable through a command line, and parses its
>>> output. There are several problems with this approach:
>>> 1. Output parsing is a hack.
>>> 2. The output parsing in the current version was done through
>>> Cohatoe,
>>> but is now commented out because I wanted to remove that dependency.
>>> This would need to be rewritten in Java. But it's only ~100 lines of
>>> Haskell code, so it wouldn't be too bad.

>> I had the long-term hope to replace running the compiler directly
>> with
>> running Cabal on the .cabal file. The output parsing would have been
>> pretty similar, though. Normally it's a requirement for an IDE to
>> ensure
>> independence of the build process, i.e. you should be able to run a
>> build even without the IDE, but of course, when you are in the IDE,
>> it
>> can have all sorts of extra convenience. So in the long run, being
>> able
>> to run Cabal from within EclipseFP easily and seeing the results
>> would
>> be a useful thing. (And for that, you'd need at least console
>> linking,
>> if not direct transformation of error/warning messages into markers).

Scion can do everything that "ghc --make" can. Generating binaries
just requires setting a different hscTarget and ghcLink options (see
Scion.Session.initialScionDynFlags and GHC.DynFlags). The problem is
that if we need to change static flags (e.g., enable profiling or
threaded builds) then we need to start a new process. Feel free to
talk to Simon Marlow on the #ghc IRC channel (his nick is JaffaCake)
or the ghc-cvs mailing list for hints on how much work that would be.
GHCi uses quite a lot of global variables as well, but that's just
another GHC API client, so we don't need to worry about that.

In GHC 6.11/12 I added an optional callback to the --make facility.
The callback is called after compiling each module with the messages
generated from it. Ideally, the GHC API shouldn't contain it's own
make-facility, but provide just the functionality to compile single
module. The 'ghc' command line utility and other API clients can then
use another library for providing --make functionality (and other
preprocessing tasks). This library is being worked on, but it's too
far from production-ready to use it at this point.

>>
>> So maybe you could consider this as an option. It might also help
>> with
>> the progress reporting and threading issues you mention below.

>>
>>> 3. There is no progress report until the entire thing is done.
>>> Especially in large projects with many source files, this would be a
>>> problem.

>> Indeed. Although I think that reading the output stream and at least
>> report back which files have been processed and with which messages
>> should be possible, right? It would require an extra thread that
>> consumes the messages and notifies the progress monitor and/or
>> generates
>> markers.

A simple solution would be to just print the GHC output to a console.
(his can by calling the 'ghc' command line utility and just showing
its output in some window. Alternatively, you can set the
'log_action' callback from within the GHC API.

>>
>>> 4. The files that are currently open in editors need to be loaded
>>> (parsed/desugared/typechecked) into Scion anyway, so it's duplicate
>>> effort.
>>>
>>> I've also been thinking about alternatives, involving Scion.
>>>
>>> It might be possible to extend Scion with a couple of "compile"
>>> commands, that produce .hi and .o files on disk. But this also has
>>> its
>>> problems:
>>> - Scion commands are executed sequentially, due to the
>>> single-threadedness of GHC. This means that all UI features that
>>> depend on Scion would stop working during compilation. However, the
>>> commands that do not interface with GHC (e.g. lookups in precomputed
>>> Maps) could theoretically be done in parallel with compiling. Scion
>>> would need to be modified to be multithreaded, and it sounds a
>>> little
>>> scary.

>> Yes, correct.

Using two concurrent Scion-servers shouldn't be such a big deal. The
question is whether they should be managed transparently by Scion or
by the front-end. I'm leaning towards the former.

>>
>>> - The GHC API function compileCoreToObj that we would probably need
>>> "has only so far been tested with a single self-contained module",
>>> which doesn't give me much hope for the stability and usability of
>>> this part of the API.

I don't think that function is meant for that purpose. The compile/
typecheck/loadModule do generate output if the DynFlags are set
correctly. I think that's what you should use.

>>> - EclipseFP would become nearly useless without Scion.

>> Yes, though that might be a price worth paying, especially if many
>> other
>> UI functions rely on Scion anyway.

I agree, if you don't want to rely on Scion, then you're doomed to
reimplement it's functionality in the frontend--which is exactly the
problem that Scion tries to solve.

If Scion is too unreliable then it should be fixed. Thomas, do you
have any concrete concerns?

>>> Also, I'm not sure yet if, and how, Cabal fits into this picture.

Cabal provides facilities to:

- parse .cabal files
- find build-dependencies such as command line "ghc", and pre-
processors
- resolve package dependencies

The above features are needed to "configure" a .cabal project.

The somewhat unsatisfactory feature is it's build support. This part
is pretty much single-threaded, because it relies on 'ghc --make' and
has very simple preprocessing facilities.

Another problem is that Cabal doesn't play well as a library, yet,
because any error tries to exit the program.

I CC'd Duncan, Cabal's main developer, because he probably has some
thoughts on that matter.

/ Thomas
--
Push the envelope. Watch it bend.

PGP.sig

Thomas ten Cate

unread,

Jun 29, 2009, 6:08:28 AM6/29/09

to Thomas Schilling, Development discussion, Leif Frenzel, Duncan Coutts, scion-l...@googlegroups.com

On Sun, Jun 28, 2009 at 21:14, Thomas Schilling<nomi...@googlemail.com> wrote:
> Scion can do everything that "ghc --make" can. Generating binaries just
> requires setting a different hscTarget and ghcLink options (see
> Scion.Session.initialScionDynFlags and GHC.DynFlags). The problem is that
> if we need to change static flags (e.g., enable profiling or threaded
> builds) then we need to start a new process.

But these kinds of flags require a full recompile anyway, right? In
that case, the overhead of restarting the Scion server is negligible.

> In GHC 6.11/12 I added an optional callback to the --make facility. The
> callback is called after compiling each module with the messages generated
> from it.

That would be good. We could keep the client up to date about
compilation status. It would sort of break the request/response model
that we currently have, but that is not a problem.

>>> So maybe you could consider this as an option. It might also help with
>>> the progress reporting and threading issues you mention below.

I reimplemented the output parser in Java for the time being. It works.

> A simple solution would be to just print the GHC output to a console. (his
> can by calling the 'ghc' command line utility and just showing its output in
> some window. Alternatively, you can set the 'log_action' callback from
> within the GHC API.

Yes, GHC output is *also* printed to a console in the current
EclipseFP version. But then we don't have error highlighting and such,
which is why it needs to be parsed. And consider automatic background
recompiling; it's no good to distract the user with a scrolling
console every time some file is saved.

> Using two concurrent Scion-servers shouldn't be such a big deal. The
> question is whether they should be managed transparently by Scion or by the
> front-end. I'm leaning towards the former.

If we were to have two separate instances, would they be able to use
each other's compiled results? In other words, will the "load" command
load the compiled result from disk, if it is present and up-to-date?

If this is not the case, then two instances again seems like we're
compiling everything twice, which is exactly what I would like to
prevent.

> I don't think that function is meant for that purpose. The

> compile/typecheck/loadModule do generate output if the DynFlags are set

> correctly. I think that's what you should use.

That's great! Scion would just need one extra command to change the
hscTarget and maybe ghcLink. Or, better IMHO, add a parameter to the
"load" command to indicate whether code generation should be done.
Makes the whole thing a bit less stateful and easier to manage for the
client.

> If Scion is too unreliable then it should be fixed. Thomas, do you have any
> concrete concerns?

Not regarding the reliability of Scion, no :) But I'm slightly worried
that pinning ourselves down on Scion too much will also create a
dependency on GHC. I think we can assume that the user has GHC
installed, and thus can install Scion too. However, I think we should
*not* assume that everyone always wants to compile their code through
GHC. E.g. even when developing on GHC, people may want to test their
code using a different compiler, to check that it works for everyone.

> The somewhat unsatisfactory feature is it's build support. This part is
> pretty much single-threaded, because it relies on 'ghc --make' and has very
> simple preprocessing facilities.
>
> Another problem is that Cabal doesn't play well as a library, yet, because
> any error tries to exit the program.
>
> I CC'd Duncan, Cabal's main developer, because he probably has some thoughts
> on that matter.

Well, as Duncan replied, this will be fixed with Cabal 1.8 / GHC 6.12,
which is great! Combined with nominolo's new "ghc --make" hook, we
could call Distribution.Simple.Build.build and keep the client up to
date as well.

Distribution.Simple.Configure.configure would also need to be run at
some point, but there we can probably get away with piping the output
to a console. By parsing and checking the Cabal file first
(Distribution.PackageDescription.Parse and
Distribution.PackageDescription.Check) we could catch many errors in
advance and highlight them in the IDE.

So we now have several alternatives to building stuff from the IDE:
1. invoke ghc --make through a command line, parse output
2. invoke ghc --make through Scion
3. invoke cabal configure && cabal build through a command line, parse
output (does not necessarily use GHC)
4. invoke cabal configure && cabal build through Scion (does this
necessarily use GHC...?)
5. invoke some other compiler through the command line, parse output

1 or 2 are good options in case no Cabal file is available. This would
be the case for simple programs, for example in education. In these
cases, the IDE needs to figure out (or be told) what the main file(s)
is/are, and the main function therein (of course, this should default
to Main.main). Currently, EclipseFP does not do this, but simply
invokes "ghc --make" on every file. GHC flags should be configurable
from the IDE (they already are).

3 or 4 would be good for package developers. The options to be passed
to "cabal configure" need to be configurable from the IDE, and there
should be a way to specify different sets of options (build
configurations).

5 is just something to be designed for, because I will probably not
implement support for any other compilers.

So there can be different "builders" for a project (GHC Builder, Cabal
Builder), and every builder can have multiple "configurations" (flag
settings, etc.). This is something that Eclipse supports nicely.

Thomas

Thomas Schilling

unread,

Jun 29, 2009, 6:50:17 AM6/29/09

to Thomas ten Cate, Development discussion, Leif Frenzel, scion-l...@googlegroups.com, Duncan Coutts

On 29 Jun 2009, at 11:08, Thomas ten Cate wrote:

> On Sun, Jun 28, 2009 at 21:14, Thomas Schilling<nomi...@googlemail.com
> > wrote:
>> Scion can do everything that "ghc --make" can. Generating binaries
>> just
>> requires setting a different hscTarget and ghcLink options (see
>> Scion.Session.initialScionDynFlags and GHC.DynFlags). The problem
>> is that
>> if we need to change static flags (e.g., enable profiling or threaded
>> builds) then we need to start a new process.
>
> But these kinds of flags require a full recompile anyway, right? In
> that case, the overhead of restarting the Scion server is negligible.

Right, the issue is the overhead of needing to use inter-process
communication which again requires some wire protocol. Then again,
since we now use JSON everywhere this has become less of an issue.

>
>> In GHC 6.11/12 I added an optional callback to the --make
>> facility. The
>> callback is called after compiling each module with the messages
>> generated
>> from it.
>
> That would be good. We could keep the client up to date about
> compilation status. It would sort of break the request/response model
> that we currently have, but that is not a problem.

Yes, I'm not sure whether that's a good idea. At the very least there
should be an option to turn it on/off. Some clients might not like
these asynchronous events and prefer a poll-based model (e.g., a web-
browser-based client).

>>>> So maybe you could consider this as an option. It might also help
>>>> with
>>>> the progress reporting and threading issues you mention below.
>
> I reimplemented the output parser in Java for the time being. It
> works.

Right, it's probably not too difficult.

>
>> A simple solution would be to just print the GHC output to a
>> console. (his
>> can by calling the 'ghc' command line utility and just showing its
>> output in
>> some window. Alternatively, you can set the 'log_action' callback
>> from
>> within the GHC API.
>
> Yes, GHC output is *also* printed to a console in the current
> EclipseFP version. But then we don't have error highlighting and such,
> which is why it needs to be parsed. And consider automatic background
> recompiling; it's no good to distract the user with a scrolling
> console every time some file is saved.
>
>> Using two concurrent Scion-servers shouldn't be such a big deal. The
>> question is whether they should be managed transparently by Scion
>> or by the
>> front-end. I'm leaning towards the former.
>
> If we were to have two separate instances, would they be able to use
> each other's compiled results? In other words, will the "load" command
> load the compiled result from disk, if it is present and up-to-date?
>
> If this is not the case, then two instances again seems like we're
> compiling everything twice, which is exactly what I would like to
> prevent.

Ah! I think that should be possible. If Scion is in HscNothing mode
and you give it the paths to your .hi and .o files it will *use* them
but not *overwrite* them. So it should be possible to have two GHC
instances--one that is permanent and is used for background
compilation and one that gets started whenever a re-build is required.

>> I don't think that function is meant for that purpose. The
>> compile/typecheck/loadModule do generate output if the DynFlags are
>> set
>> correctly. I think that's what you should use.
>
> That's great! Scion would just need one extra command to change the
> hscTarget and maybe ghcLink. Or, better IMHO, add a parameter to the
> "load" command to indicate whether code generation should be done.
> Makes the whole thing a bit less stateful and easier to manage for the
> client.

Yeah, well. I did fix some bugs related to automatically recompiling
previously loaded files if they were built using the wrong mode, but
I'm not sure I caught all of the issues. ATM, I would reset the
session when switching targets.

>
>> If Scion is too unreliable then it should be fixed. Thomas, do you
>> have any
>> concrete concerns?
>
> Not regarding the reliability of Scion, no :) But I'm slightly worried
> that pinning ourselves down on Scion too much will also create a
> dependency on GHC. I think we can assume that the user has GHC
> installed, and thus can install Scion too. However, I think we should
> *not* assume that everyone always wants to compile their code through
> GHC. E.g. even when developing on GHC, people may want to test their
> code using a different compiler, to check that it works for everyone.

Fair enough. I had some requests for a back-end independent Scion.
Maybe it would be worth considering implementing this abstraction in
Scion itself. The argument is a again duplication of efforts in the
front-ends. For things like output parsing it should be fairly
straightforward to interact with command line programs from Scion and
parse the text output there, then send it over to the client. In any
case, this requires a fair amount of design work first.

Right 2 or 4 is what Scion currently does, depending on whether
there's a Cabal file present. 4 does not necessarily require GHC, in
fact calling cabal build through Scion wouldn't even work at the
moment, since Cabal provide no way to call back into a running
instance of GHC. Scion currently emulates Cabal's build commands by
using its preprocessing features and collecting the command line flags
from Cabal (that's pretty much a hack, though).

>
> 1 or 2 are good options in case no Cabal file is available. This would
> be the case for simple programs, for example in education. In these
> cases, the IDE needs to figure out (or be told) what the main file(s)
> is/are, and the main function therein (of course, this should default
> to Main.main). Currently, EclipseFP does not do this, but simply
> invokes "ghc --make" on every file. GHC flags should be configurable
> from the IDE (they already are).
>
> 3 or 4 would be good for package developers. The options to be passed
> to "cabal configure" need to be configurable from the IDE, and there
> should be a way to specify different sets of options (build
> configurations).
>
> 5 is just something to be designed for, because I will probably not
> implement support for any other compilers.
>
> So there can be different "builders" for a project (GHC Builder, Cabal
> Builder), and every builder can have multiple "configurations" (flag
> settings, etc.). This is something that Eclipse supports nicely.

I agree that this is a reasonable abstraction. I just think that
Eclipse isn't the only client that would like to have this
flexibility, so I was hoping that such things could go into Scion and
thus be shared by other clients. The Eclipse classes would then just
call out to the different server commands. I don't know any details,
so I'm just guessing here.