[Question] Why does elixir not check source content to detect modifications?

140 views
Skip to first unread message

Marc-André Lafortune

unread,
Jun 21, 2021, 2:20:50 AM6/21/21
to elixir-lang-core
I'm a newbie, sorry if this has been asked before but I couldn't find anything.

Currently, elixir uses last modification dates to determine files that need to be recompiled.

```
$ mix compile
# ...
$ touch lib/some/file.ex
$ mix compile
Compiling <n> files (.ex)
```

Could Elixir not verify that in addition to a modification time change, that the actual text content has changed by comparing a hash of it? In the example above, the file `lib/some/file.ex` would be hashed a second time, but no recompilation would occur.

Currently, many operations may change the last modification date even if no content at all has changed:
* switch to a different branch, switch back
* do an interactive rebase:
  - to rewrite a commit message,
  - to reorder some commits
  - to squash some commits together
  - etc.

While the content of the files does not change (at all) in these examples, some very long recompilations may occur because elixir only relies only on modification times and not on file hashes.
Hashing a file is orders of magnitude faster than compiling it, and realizing that the potentially large number of compile-time dependencies of a given file do not need to be recompiled is infinitely faster than recompiling them all for nothing.

I am presuming that hash collisions are deemed so incredible unlikely to be acceptable, but if not, then I'd ask the same question but with using the complete content of the source file instead of a hash of it. Copying a source file is again an order of magnitude faster than the time it takes to recompile it.

José Valim

unread,
Jun 21, 2021, 2:54:43 AM6/21/21
to elixir-l...@googlegroups.com
Hi Marc-André!

There is no particular reason, this is a functionality that could be added. In particular we can continue checking the mtime and file size but compare the contents if the mtime changed but the file size is the same.

I also think we should update the mtime anyway, even if the hash is the same. WDYT?

Feel free to open up an issue or, if you want to tackle it, even send a PR!

Thanks!

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/e7a68d8d-0102-483d-a4ec-58849eb88ef6n%40googlegroups.com.

Marc-André Lafortune

unread,
Jun 21, 2021, 3:29:25 AM6/21/21
to elixir-l...@googlegroups.com
On Mon, Jun 21, 2021 at 2:54 AM José Valim <jose....@dashbit.co> wrote:
Hi Marc-André!

There is no particular reason, this is a functionality that could be added. In particular we can continue checking the mtime and file size but compare the contents if the mtime changed but the file size is the same.

Yep 👍
 
I also think we should update the mtime anyway, even if the hash is the same. WDYT?

Yes, otherwise we might keep re-hashing the file over and over again.
 
Feel free to open up an issue or, if you want to tackle it, even send a PR!

Awesome. I'll check if I can come up with a PR in the next few days, and if not I'll create an issue.

Thanks. It's so awesome to get such quick feedback 💛
 
You received this message because you are subscribed to a topic in the Google Groups "elixir-lang-core" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elixir-lang-core/8-30JVn_8M0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elixir-lang-co...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4L-_BRhadhXVckZ5TX4wLqgmxm-iuGHQr4M5Kdq4xM_1w%40mail.gmail.com.

Allen Madsen

unread,
Jun 21, 2021, 10:01:56 AM6/21/21
to elixir-l...@googlegroups.com
+1

I've noticed mtime changes in some scenarios related to CI and docker, that causes things to be recompiled again even though there's no other change to the files.

Zach Daniel

unread,
Jun 21, 2021, 11:03:01 AM6/21/21
to elixir-lang-core
This question may be tangentially related, but I'll bring it up here:
It seems that, even if it is just the last module that has to compile that fails, nothing is cached/saved, and on subsequent compilations, the entire application will need to be recompiled. I'm unsure if there are technical limitations that cause that to be true, though.

José Valim

unread,
Jun 21, 2021, 11:06:18 AM6/21/21
to elixir-l...@googlegroups.com
I can't reproduce it. I introduced a syntax error:

~/ML/livebook[jv-eval-compile]$ mix test
Compiling 1 file (.ex)

== Compilation error in file lib/livebook_cli.ex ==
** (ArithmeticError) bad argument in arithmetic expression
    lib/livebook_cli.ex:1: (module)
    (stdlib 3.15) erl_eval.erl:685: :erl_eval.do_apply/6

Then I fixed it:

~/ML/livebook[jv-eval-compile *]$ mix test
Compiling 1 file (.ex)
.................................................................................................................................................

José Valim

unread,
Jun 21, 2021, 11:08:26 AM6/21/21
to elixir-l...@googlegroups.com
But if you mean that if 10 files need to be compiled, then 1 fails, then all 10 have to be compiled again, then that's correct. There are compiler passes that need to run on all modules, such as undefined function warnings, so we can't save intermediate files without introducing reasonably complex checkpoints.

Zach Daniel

unread,
Jun 21, 2021, 11:14:08 AM6/21/21
to elixir-lang-core
Got it, yeah that makes sense. I (well, an Ash user) was dealing with an issue that caused one of their resources to fail at compile-time, and they had to do a fresh recompile of their project on a very underpowered machine (which is the real problem, TBH). But what that meant is that we had to wait *eleven* minutes in between each compile to determine if we had fixed the problem. There are clever things to do like just comment out the code and any referencing code if they get into that situation again, but it did bring up the question of whether or not there was a way to make that process smoother. Perhaps some way to get the compiler to attempt to compile a specific module first? Naturally, any dependencies would still need to be compiled, but it could help short-circuit that process. This likely isn't a big enough problem worth spending someone's time on, but I thought it was worth bringing up.
Message has been deleted

José Valim

unread,
Jun 22, 2021, 12:31:30 PM6/22/21
to elixir-l...@googlegroups.com
I mean, even the impression of ordering is wrong. The compiler is running in parallel, so it may be that all of them are being compiled at the same time and none of them actually completes. :)

On Tue, Jun 22, 2021 at 6:27 PM Marc-André Lafortune <marc-...@marc-andre.ca> wrote:
It is a separate topic, but I find that very frustrating also.

Typical scenario:
A is a compile-time dependency of a lot of files B to Z, which themselves are not dependencies of anything.
1) Change A
2) `mix compile` => compiles A, B, C, ... Y, and then produces an error in Z.
3) Fix Z
4) `mix compile` => recompiles A, B, C, ...Y and finally Z. If there is still an error, goto 3. A-Y should already be compiled. I haven't doubled checked, but it is my impression that the compiler doesn't prioritize latest modified file either?

Xavier Noria

unread,
Jun 22, 2021, 12:44:03 PM6/22/21
to elixir-l...@googlegroups.com
On Tue, Jun 22, 2021 at 6:30 PM José Valim <jose....@dashbit.co> wrote:

I mean, even the impression of ordering is wrong. The compiler is running in parallel, so it may be that all of them are being compiled at the same time and none of them actually completes. :)

Obligatory plug of this post of mine with an overview of it https://medium.com/@fxn/the-elixir-parallel-compiler-53a1be353049.
 

Marc-André Lafortune

unread,
Jun 22, 2021, 1:18:01 PM6/22/21
to elixir-lang-core
I probably gave an overly simplified scenario.

Still, is it not true that when recompiling <n> files, any compilation error occurring at any point in the process will insure that all the <n> files will have to be recompiled next time, for any kind of interdependence of these <n> files? 

Xavier Noria: thanks for your post. Do I read correctly that the number of processes involved depends on the number of cores? That sounds more tricky to implement than having one process per file to compile. If that's the case, files within a process could be prioritized (e.g. if they failed last time and/or by modification date), no?

José Valim

unread,
Jun 22, 2021, 1:26:44 PM6/22/21
to elixir-l...@googlegroups.com
> Still, is it not true that when recompiling <n> files, any compilation error occurring at any point in the process will insure that all the <n> files will have to be recompiled next time, for any kind of interdependence of these <n> files?

Correct. All I am saying is that the notion of Z being last is not necessarily the case. In practice we likely have several files completing about the same time. :)

We spawn one process per file, correct, and the number of processes is based on the number of cores. Indeed we could prioritize the file that failed first, this seems considerably simpler than any kind of checkpointing. It is mostly a matter of where to store this information? We likely don't want to touch the manifest in case of failures.

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.

José Valim

unread,
Jun 22, 2021, 1:29:53 PM6/22/21
to elixir-l...@googlegroups.com
Oh, just now I understand your previous point: we could simply order the files by their last modified time. The most recently modified is likely the one we should try first anyway. This seems like a very trivial change to make, I am on it.

José Valim

unread,
Jun 22, 2021, 2:07:15 PM6/22/21
to elixir-l...@googlegroups.com

Xavier Noria

unread,
Jun 22, 2021, 2:21:17 PM6/22/21
to elixir-l...@googlegroups.com
El dt, 22 juny 2021 a les 19:18 Marc-André Lafortune <marc-...@marc-andre.ca> va escriure:

Xavier Noria: thanks for your post. Do I read correctly that the number of processes involved depends on the number of cores?

What I remember from memory is that, as a rule of thumb, you can think order of magnitude is as many compilers as cores.

Within that rule of thumb, there's some flexibility needed, because if all compilers are waiting on something, you may need to spawn another one to unblock them. So the queue size may adapt if required.

It's been a while though, José can confirm (or refute :).

Marc-André Lafortune

unread,
Jun 22, 2021, 10:52:45 PM6/22/21
to elixir-lang-core
On Tue, Jun 22, 2021 at 1:29 PM José Valim wrote:
> Oh, just now I understand your previous point: we could simply order the files by their last modified time. > The most recently modified is likely the one we should try first anyway. This seems like a very trivial change to make, I am on it.

Super, thanks, this should help :-)

The question remains: why can't we add the successfully compiled files to the manifest and thus avoid recompiling them completely in the future?

José Valim

unread,
Jun 23, 2021, 1:39:02 AM6/23/21
to elixir-l...@googlegroups.com
Because compiling a file may trigger further work, such as export dependencies, and they also need to be verified by the group pass, which is only invoked after all files have been compiled.

So compiling a file is only part of the work and in order to support partial compilation we would need to be able to store that a file was compiled but not really gone through all hops, and I believe this is going to be non trivial. Plus, if we get it wrong, it will likely lead to hard to debug bugs and scenarios.

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages