the removed file problem

16 views
Skip to first unread message

Evan Laforge

unread,
Apr 4, 2018, 2:16:20 PM4/4/18
to Shake build system
I have a strong feeling like this has already been discussed, and there was
a solution, though the solution had other side-effects that made me not
want to do it. But I just looked through the archives and I couldn't find
the discussion, so here it is again.

The problem is that when a file is moved or removed, shake doesn't know
to delete its output files. I think the solution was something like
tracking all the inputs explicitly... actually I don't really remember it.

The problem with leaving stale .o files is that you don't catch un-updated
import lines, since ghc will happily link in the stale .o file. Eventually
you get some mysterious crash or if you're lucky some terrible link error
due to duplicated symbols, and if you're still lucky you think to try
rebuilding from scratch before anything else. To avoid all this, I
instinctively rebuild from scratch whenever I delete or move a module.
This is sort of ok locally, but it means CI can't cache shake output and be
reliable.

The other problem I think is GHC-related, which is that even if I do update
all imports, I can still get link errors. I also get that "unknown modules
below me" thing from ghci.

I haven't really figured out exactly what's going on with this second
problem, but my guess is that bits of the old module are still inlined into
other modules. Though it's surely recompiling the file since I updated the
import line... maybe I'm not sure then. Maybe I'm not tracking
dependencies quite right, but it seems like there's not that much to get
wrong. I say that to produce either X.hs.o or X.hi, run ghc X.hs. Before
I run ghc, I find the imports, map them to ["Y.hi", "Z.hi"] and need those.
Simplified:

(\fn -> any (`isSuffixOf` fn) [".hs.o", ".hi"]) ?>> \fns -> do
let Just obj = List.find (".hs.o" `List.isSuffixOf`) fns
let hs = objToSrc obj
imports <- chaseImports hs
let his = map (objToHi . srcToObj) imports
need $ hs : his
ghc hs

There is some extra hackery where I don't track the *.hi produced by Main
modules. Their module lines doesn't necessarily match up with the
filename, so while the .hs.o is specified explicitly and comes out right,
the .hi is implicit and so comes out unpredictably. But I think it should
be unrelated, since no one imports Main modules. I always found ghc's
handling of main modules awkward.

Anyway, has anyone else noticed this second problem? I haven't been able
to reproduce it reliably, but I haven't really tried very hard either.
I just delete the build output and get on with life. But again, if
I want to be able to cache build output in CI I feel like I should have a
more reliable build.

Neil Mitchell

unread,
Apr 4, 2018, 3:42:59 PM4/4/18
to Evan Laforge, Shake build system
Hi Evan,

I suspect you mean:
http://neilmitchell.blogspot.co.uk/2015/04/cleaning-stale-files-with-shake.html

As you accurately remember, the solution is a bit vomit inducing...

My personal feeling here is that GHC has bugs with moving modules, and
the real solution is to fix GHC, not try and have the build system
clean up files that GHC shouldn't be looking at. With ghcid I bump
into the problem quite regularly, but I see it with normal ghc too. As
you say, tracking down the bugs into reliable test cases is a pain,
and each instance can be trivially worked around, so I can see why the
incentives haven't aligned to cause anyone to track it down yet. I can
slightly tip the incentives by offering a beer to anyone who reports
such an issue!

Thanks, Neil

Evan Laforge

unread,
Apr 4, 2018, 4:42:10 PM4/4/18
to Neil Mitchell, Shake build system
On Wed, Apr 4, 2018 at 12:42 PM, Neil Mitchell <ndmit...@gmail.com> wrote:
> Hi Evan,
>
> I suspect you mean:
> http://neilmitchell.blogspot.co.uk/2015/04/cleaning-stale-files-with-shake.html

Ah yes, that's it. If I recall correctly, the reason I didn't want to
do that is that building all files is a bit of a bigger deal for a
large complicated build system with many targets than for a small one
with only one target.

> My personal feeling here is that GHC has bugs with moving modules, and
> the real solution is to fix GHC, not try and have the build system
> clean up files that GHC shouldn't be looking at. With ghcid I bump
> into the problem quite regularly, but I see it with normal ghc too. As
> you say, tracking down the bugs into reliable test cases is a pain,
> and each instance can be trivially worked around, so I can see why the
> incentives haven't aligned to cause anyone to track it down yet. I can
> slightly tip the incentives by offering a beer to anyone who reports
> such an issue!

Maybe the problem is that all the people who spend their time tracking
down recompilation problems have been driven to harder stuff :) I
reported the "somewhere below me" bug on trac many many years ago but
as you say no one has really looked into it.

But then, I'll be soon going to a new job where I'll be doing build
infrastructure for a haskell-oriented company, so maybe I'll suddenly
get some motivation. The thing is, while each case can be worked
around manually, I'd think this would scuttle any attempt to have
reliable CI, unless you want to include the hack of "if it doesn't
link, rm -rf * and try again." The tweag.io guys are doing stuff with
Bazel, and I gather you've had at least a bit of communication with
them, do you know if they run into the same ghc issues? If you want
to do mono-repo style builds you definitely need a solid incremental
build.

I don't even know how they work around ghc not being totally
deterministic quite yet. Hm, maybe that's related to why it seems
hard to reproduce.

Neil Mitchell

unread,
Apr 4, 2018, 4:47:35 PM4/4/18
to Evan Laforge, Shake build system
> Ah yes, that's it. If I recall correctly, the reason I didn't want to
> do that is that building all files is a bit of a bigger deal for a
> large complicated build system with many targets than for a small one
> with only one target.

It's also grim because it deals with files, e.g. writing a live file,
parsing it, and going again.

> But then, I'll be soon going to a new job where I'll be doing build
> infrastructure for a haskell-oriented company, so maybe I'll suddenly
> get some motivation.

Awesome!

> The thing is, while each case can be worked
> around manually, I'd think this would scuttle any attempt to have
> reliable CI, unless you want to include the hack of "if it doesn't
> link, rm -rf * and try again." The tweag.io guys are doing stuff with
> Bazel, and I gather you've had at least a bit of communication with
> them, do you know if they run into the same ghc issues? If you want
> to do mono-repo style builds you definitely need a solid incremental
> build.

I do talk to them now and again, but they haven't mentioned that
(although they wouldn't necessarily do so). I have had experience of
incorrect GHC dependencies with incorrect module dependencies getting
into a Buck cache. It wasn't pleasant!

> I don't even know how they work around ghc not being totally
> deterministic quite yet. Hm, maybe that's related to why it seems
> hard to reproduce.

I was under the impression GHC was totally deterministic. Certainly
Buck has issues if it isn't, which is why it has been such a priority.

Thanks, Neil

Evan Laforge

unread,
Apr 4, 2018, 5:20:23 PM4/4/18
to Neil Mitchell, Shake build system
>> I don't even know how they work around ghc not being totally
>> deterministic quite yet. Hm, maybe that's related to why it seems
>> hard to reproduce.
>
> I was under the impression GHC was totally deterministic. Certainly
> Buck has issues if it isn't, which is why it has been such a priority.

Going from this it looks like not yet:
https://ghc.haskell.org/trac/ghc/wiki/DeterministicBuilds

A fair amount of work was done working towards that goal, which I
think is what the wiki page is talking about, but it was always just
dealing with the particularly egregious cases, and was never planned
to get to totally deterministic. The guy who was working on it made a
lot of progress a year or so back, but I don't know if he got to full
.hi determinism. I haven't seen any commits about it for quite a
while, so presumably it's back on the back-burner.

Neil Mitchell

unread,
Apr 4, 2018, 5:34:51 PM4/4/18
to Evan Laforge, Shake build system
>> I was under the impression GHC was totally deterministic. Certainly
>> Buck has issues if it isn't, which is why it has been such a priority.
>
> Going from this it looks like not yet:
> https://ghc.haskell.org/trac/ghc/wiki/DeterministicBuilds
>
> A fair amount of work was done working towards that goal, which I
> think is what the wiki page is talking about, but it was always just
> dealing with the particularly egregious cases, and was never planned
> to get to totally deterministic. The guy who was working on it made a
> lot of progress a year or so back, but I don't know if he got to full
> .hi determinism. I haven't seen any commits about it for quite a
> while, so presumably it's back on the back-burner.

Seems that https://ghc.haskell.org/trac/ghc/ticket/12935 identifies an
issue, and that the last comment was only 4 weeks ago, so not that on
the back burner.

Thanks, Neil

Evan Laforge

unread,
Apr 4, 2018, 5:42:42 PM4/4/18
to Neil Mitchell, Shake build system
On Wed, Apr 4, 2018 at 2:34 PM, Neil Mitchell <ndmit...@gmail.com> wrote:
> Seems that https://ghc.haskell.org/trac/ghc/ticket/12935 identifies an
> issue, and that the last comment was only 4 weeks ago, so not that on
> the back burner.

Oh, good news. But still not deterministic yet. That comment by
simonmar is right on, if I recall correctly shake intentionally
randomizes the build order.

Neil Mitchell

unread,
Apr 4, 2018, 5:47:33 PM4/4/18
to Evan Laforge, Shake build system
> Oh, good news. But still not deterministic yet. That comment by
> simonmar is right on, if I recall correctly shake intentionally
> randomizes the build order.

It does indeed - unless you use -j1 in which case it does its best to
be deterministic.
Reply all
Reply to author
Forward
0 new messages