Continuous Integration update and future

Paulo Matos

unread,

Nov 14, 2019, 3:05:27 AM11/14/19

to racke...@googlegroups.com

Summary: We currently have 5 CI systems: Travis, Azure, Gitlab, AppVeyor,
and DrDr. I explain what I have done so far in Gitlab and propose
unifying this into a single solution in the future. Request concerns,
suggestions and comments.

Long version:
A few years ago GitHub CI through Travis was lacking important features (for a
start it only supported two OS images and one architecture) and many,
many other systems provided much better solutions. I had worked in
industry with many systems including Gitlab[1], buildbot[2] and
Jenkins[3]. Jenkins-pre2 was pretty buggy - I went through the pain
of setting up a large CI system for internal development tools in one of
my previous clients and it scarred me for life. Jenkins2 by then had
just come out and looked better and shiny but I never gave it a
go. Buildbot OTOH is not a CI system per se but more of a framework from
which you create your system in Python. I wrote a prototype of a
buildbot-based CI system for GCC around 2017[3] and many other
systems have been using buildbot like llvm[4], webkit[5] and gdb[6] to name just a
few. A little later, Gitlab integrated a CI solution and I
started to use it with Racket in a Gitlab fork and later when Gitlab
released CI for Github projects, I worked with SamTH to get a Gitlab
solution for the official Racket tree and nowadays you see it working under
gitlab.com/racket/racket with its configuration in
https://github.com/racket/racket/blob/master/.gitlab-ci.yml.

Here's what it does at this point (currently all configurations are
running on Linux):
On x86_64:
1. Builds RacketCGC -> Racket3M -> RacketCS
2. Runs tests on all of the three variants (similar to those ran by Travis)
3. Builds once again all the variants and tests with --enable-ubsan
keeping a record of the runtime errors (for example:
https://gitlab.com/racket/racket/-/jobs/350271624/artifacts/file/runtime-errors.log)
4. Builds once again all the variants with the llvm static analyser and
keeps a record of the failures (for example:
https://gitlab.com/racket/racket/-/jobs/350271564/artifacts/file/scan-report_mmm/2019-11-14-032030-5552-1/index.html).
This step requires us to build LLVM with Z3 enabled so we can use the
work from ICSE'19[7].

On armv7l (arm32 with hard float):
1. Build RacketCGC -> Racket3M
2. Runs tests

The above pipeline takes 1h10m[8] to run through on my machines.

Every night besides the above we extend the pipeline with:
1. Emulate the build of CGC and 3M on arm64, armel (32bits little endian - soft floating
point), armhf (32bits little endian - hard floating point), i386, mips
(32bits big endian), mips64el (64bits, little endian), mipsel (32bits
little endian),
2. Same as above but configured with --enable-generations=no (since
https://github.com/racket/racket/commit/7c3a207f36dc25baaac4afdf7ecedc18bf9ff49c).

The above pipeline takes 4h to run since it also compiles QEmu 3.1.0 (debian
container qemu is too old) beforehand. Both this build and the LLVM
build are cached so it only really builds once until the versions are
changed. QEmu 4.1.0 shows good results[9] so I will upgrade this soon.

The biggest problem with the Gitlab pipeline is that it worked _really_
well until I started wanting to optimize the pipeline. For example to
have a stage-less pipeline where jobs only need a few jobs running in
previous stages instead of waiting for all of them. Gitlab is finally
catching up to this with the `needs` keyword but the interface becomes a
mess. For all the time I spent with Gitlab CI, I spent almost as much
time configuring Racket, as I spent trying to figure out why some things
break or don't work[10, 11]. Gitlab CI for simple projects is great but it
just gets harder and harder as the pipeline complexity grows.

I have slowly been gathering a few machines to test Racket and other
work related projects so I have quite a few machines/boards of varied
architectures (arm32, arm64, x86_64, mipsel). I also got a machine 2
months ago with Windows to test
Racket on Windows10 (also something coming soon [12]). For you to do
Gitlab CI of Racket on your machine, you simply need to install `gitlab-runner`,
and connect it to the project appropriately. However, I just got a rpi4
with 4Gbs and just found out I cannot use it because gitlab-runner
doesn't run on arm64 yet (Go apparently doesn't support arm64 yet). So
that's another bummer.

Lately I noticed that Gitlab CI for Github projects (what we use for
Racket) doesn't support afaict, running the pipelines on PRs. And if it
did, it probably wouldn't support running a special faster pipeline so
the PR author understand if it's breaking something.

All in all, we have outgrown Gitlab CI and I would like to spend more of
my free time working on an improved GC for RacketCS than on fixing
GitlabCI or working around it. I also think it is a waste of resources
to run so many CI systems simultaneously sometimes doing the same thing.

My next CI project was to support benchmarking and develop a Racket
Dashboard Webapp that displays the
important results of CI in a visually appealing way that's easy to
understand. I take some time every day to look at the Gitlab CI pipeline
and ensure that all the yellows (expected failures) are what we already
knew that was going to fail instead of some new failure that ends up
being categorized in the same way - again the interface doesn't help when you
have 20+ jobs.

My proposal is to rewrite the current Gitlab CI pipeline in using
Buildbot and take it from there. This means writing Python but maybe
with some luck parts of it can be written in Racket and interfaced with
Python if necessary (can Pycket help here)?

Buildbot runs on all the architectures Python runs - all the ones we are
interested in and deploying it is as easy as it is with gitlab. Granted
that the code won't look like a yaml file anymore but I am pretty sure
that by now the Python code might be more readable than the current
~1300 line yaml file we have to configure our pipeline.

Once Buildbot has the same features as Gitlab CI, I will extend it to
ensure architectures tested with Azure, Travis, and AppVeyor are
covered. At this point we could potentially switch off other systems.
DrDr seems to be a different beast, much harder to replace so
before I go there, I will sync with the rest of the team but I still
think that having a unified system and interface could be the way to
go.

If you are happy with my proposal, I will go ahead and start a new
project on GitHub: racket-buildbot. Once we get this to a stable point,
we could merge this into the racket tree and remove .gitlab, etc.

At this point, I welcome any comments and suggestions. Having good CI
means that in the long term we'll have ensured that Racket keeps running
on all supported platforms (and once benchmarking is done - how Racket's
performance changes over time). So having good CI is important. However,
it is only relevant if it is useful to the racket team and
contributors. It would be great if everyone involved could chime in with
what they would like to have/see. Feel free to request whatever you
want, I cannot promise implementing all of this but I can make a list.

Refs:
[1] https://gitlab.com
[2] https://buildbot.net
[3] https://jenkins.io
[4] http://lab.llvm.org:8011/
[5] https://build.webkit.org/
[6] https://gdb-buildbot.osci.io/#/
[7] https://dl.acm.org/citation.cfm?id=3339673
[8] https://gitlab.com/racket/racket/pipelines/95803835
[9] https://github.com/LinkiTools/racket/tree/pmatos-qemu-410
[10] https://gitlab.com/gitlab-org/gitlab/issues?scope=all&utf8=%E2%9C%93&state=opened&author_username=pmatos
[11] https://gitlab.com/gitlab-org/gitlab-runner/issues?scope=all&utf8=%E2%9C%93&state=opened&author_username=pmatos
[12]https://github.com/LinkiTools/racket/tree/pmatos-ci-win10

Thanks for reading this,
--
Paulo Matos

Paulo Matos

unread,

Nov 14, 2019, 3:50:44 AM11/14/19

to racke...@googlegroups.com

Paulo Matos writes:

> Summary: We currently have 5 CI systems: Travis, Azure, Gitlab, AppVeyor,
> and DrDr. I explain what I have done so far in Gitlab and propose
> unifying this into a single solution in the future. Request concerns,
> suggestions and comments.
>

Also - Racket would be IMO a great foundation for a Buildbot-like CI
framework given it's DSL capabilites (#lang ci-pipeline?!?). However, it
doesn't yet exist.
But here's the hint for someone interested in CI and with some time in their
hands - I'd be happy to contribute. Until then we have to play with what
we have.

--
Paulo Matos

Robby Findler

unread,

Nov 14, 2019, 7:03:34 AM11/14/19

to Paulo Matos, Racket Developers

I think it is wonderful that you're doing this. Thank you!

Robby

> --
> You received this message because you are subscribed to the Google Groups "Racket Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to racket-dev+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/racket-dev/87ftiq6dv6.fsf%40linki.tools.

Matthias Felleisen

unread,

Nov 14, 2019, 5:22:29 PM11/14/19

to Paulo Matos, racke...@googlegroups.com

> On Nov 14, 2019, at 3:05 AM, Paulo Matos <pma...@linki.tools> wrote:

Let me echo Robby with 1000*thanks :)

>
> 4. Builds once again all the variants with the llvm static analyser and
> keeps a record of the failures (for example:
> https://gitlab.com/racket/racket/-/jobs/350271564/artifacts/file/scan-report_mmm/2019-11-14-032030-5552-1/index.html).
> This step requires us to build LLVM with Z3 enabled so we can use the
> work from ICSE'19[7].

Is this about “flaky tests”? Jon Bell at George Mason (DC) has done work on this idea that could possibly supplement or replace this: https://www.jonbell.net/icse18-deflaker.pdf

— Matthias

Sage Gerard

unread,

Nov 15, 2019, 10:38:56 AM11/15/19

to Robby Findler, Paulo Matos, Racket Developers

Raising my hand to volunteer when development starts on a Racket-centric CI solution. I was thinking about that being my next big project once existing efforts are self-sustaining, but I can still chip in on an existing project.

~slg

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

> To view this discussion on the web visit https://groups.google.com/d/msgid/racket-dev/CAL3TdOMmws-sHJ2NvZXs2dk9aE-c%3DV1xzg3y%2BHR0r_c5Ve%3Dmng%40mail.gmail.com.

Paulo Matos

unread,

Nov 15, 2019, 11:18:02 AM11/15/19

to Sage Gerard, Robby Findler, Racket Developers

Sage Gerard writes:

> Raising my hand to volunteer when development starts on a Racket-centric CI solution. I was thinking about that being my next big project once existing efforts are self-sustaining, but I can still chip in on an existing project.
>
>

Hi Sage,

Thanks for the offer for helping. That's very kind.
As soon as we reach a consensus on how to move forward, I will contact
you and we can discuss how to move forward. There are certainly several
aspects that will need work not only on the CI specification itself but
on the dashboard side as well.

Paulo

Paulo Matos

unread,

Nov 15, 2019, 11:36:40 AM11/15/19

to Matthias Felleisen, racke...@googlegroups.com

Matthias Felleisen writes:

>> On Nov 14, 2019, at 3:05 AM, Paulo Matos <pma...@linki.tools> wrote:
>
>
> Let me echo Robby with 1000*thanks :)
>
>

Thanks.

>>
>> 4. Builds once again all the variants with the llvm static analyser and
>> keeps a record of the failures (for example:
>> https://gitlab.com/racket/racket/-/jobs/350271564/artifacts/file/scan-report_mmm/2019-11-14-032030-5552-1/index.html).
>> This step requires us to build LLVM with Z3 enabled so we can use the
>> work from ICSE'19[7].
>
>
> Is this about “flaky tests”? Jon Bell at George Mason (DC) has done work on this idea that could possibly supplement or replace this: https://www.jonbell.net/icse18-deflaker.pdf
>

Thanks for the reference. I will take a look. The work for ICSE'19 was
very good because the amount of reasoning the static analyzer was doing
was limited generating quite a few false positives during code analysis
- unrelated to testing. With Z3, given all the warning generated by the
static analyzer Z3 is called trying to find a counter example. If such
counter example is found, then the warning is a false positive and not
shown to the user. This was integrated into LLVM early this year but for
licensing reasons, LLVM cannot be built with Z3 integration enabled. We
therefore build it and cache the result so we don't need to rebuild it
for every commit.

Kind regards,
--
Paulo Matos

Paulo Matos

unread,

Nov 15, 2019, 12:01:12 PM11/15/19

to racke...@googlegroups.com

Thank you for all your kind emails. It is, as always, a pleasure to
contribute to Racket.

I spent some time discussing this Sam today which mentioned GitHub
Actions. I had actually been part of GitHub Actions Beta and dismissed
it straight away because initially they were not supporting self-hosted runners
and there weren't many details on how free was the offer for public
repos. Sam's point made me revisit the current status of GitHub Actions
CI which just recently left Beta.

It turns out, there are quite a few changes. A lot of the information I
picked up comes from these references:

https://www.youtube.com/watch?v=9EoNqyxtSRM
https://help.github.com/en/actions/automating-your-workflow-with-github-actions/about-self-hosted-runners
https://github.com/features/actions
https://twitter.com/pocmatos/status/1195100718270681088

My understand stands as follows:
- GitHub Actions provides CI free of charge (unlimited minutes) on
public repositories on a variety of operating systems (Linux, Windows
and MacOS) on x86_64;
- GitHub Actions provides a self-hosted runner that runs on X86_64 and
ARM devices;
- GitHub Actions is highly integrated, as expected, in the workflow and
uses as other services do, a yaml configuration file.
- Pietro Albini, working on porting the Rust CI from Azure Pipelines to
Github Actions mentioned that Azure and Actions are very similar.

This is certainly a compelling argument towards using GitHub
Actions. Also, we wouldn't have to pull from Python code for CI (which
we would if we used Buildbot).

As mentioned GitHub Actions are part of GitHub and very well integrated with the
workflow. On one hand a part of me worries about vendor lockin with
GitHub, but then again we are already so invested in GitHub that adding
CI to the mix won't (I think) worsen the situation.

There's however, the issue of supporting more architectures than GitHub
Actions provides. My suggestion comes from my own experience with QEmu
and what Rust is also doing.

Lets come up with support tiers. We can have a tier where we compile,
test and benchmark and other tiers where we only ensure Racket builds
(for example with Rust, see
https://forge.rust-lang.org/infra/docs/rustc-ci.html).

We can certainly use GitHub Actions hosting to test on x86_64 on all
supported OSes. We can then use my own CI server as well (to compile
natively but also run cross-compilations and emulated testruns) and the
self-hosted runner on my ARM boards. For architectures that are not supported by
the github runner, we can cross compile. For those architectures for
which I have boards, we can weekly (for example) run some scripts that
compile and test natively (for example mipsel and riscv64). In the long
run, as initially mentioned, the idea is to gather the information in a
racket webapp dashboard (I discussed this briefly today with Jack Firth, who
actually works in this area, and we will be discussing the architecture of
this dashboard and integration with the CI system in the coming week).

With a setup like this, there are few compelling reasons to design a
system from scratch using Buildbot, so I am sold on the idea of moving
forward with Actions. It is actually a good time. Next week, for family
reasons, I will have extra free time on my hands and I am sure I can
bring this proposal up to speed and have some more information towards
the end of next week.

What are everyone's thoughts? If nobody complains loudly, I will follow
through and bring up a draft PR, with a Github Actions CI system during
next week.

Jack Firth

unread,

Nov 16, 2019, 2:50:35 AM11/16/19

to Racket Developers

I'm relatively in favor of a GitHub Actions based approach. I was also in the beta and wrote an action for building and testing Racket packages. I would be happy to help with this effort, especially the Docker-based parts of implementing an action.

Sam Tobin-Hochstadt

unread,

Nov 18, 2019, 11:55:05 AM11/18/19

to Jack Firth, Racket Developers

This sounds great. I see also that Bogdan has developed an action for
downloading Racket.

Let's continue this work by organizing in the #ci channel on Slack,
which I've just created. Anyone else who's interested should join that
channel, and we'll work out next steps there.

Sam

On Sat, Nov 16, 2019 at 2:50 AM Jack Firth <jackh...@gmail.com> wrote:
>
> I'm relatively in favor of a GitHub Actions based approach. I was also in the beta and wrote an action for building and testing Racket packages. I would be happy to help with this effort, especially the Docker-based parts of implementing an action.
>
> On Friday, November 15, 2019 at 9:01:12 AM UTC-8, Paulo Matos wrote:
>>
>>

> --
> You received this message because you are subscribed to the Google Groups "Racket Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to racket-dev+...@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/racket-dev/6a1ef4d3-5eeb-4b51-80aa-ad359084cffb%40googlegroups.com.

Philip McGrath

unread,

Nov 18, 2019, 12:35:57 PM11/18/19

to Sam Tobin-Hochstadt, Jack Firth, Racket Developers

While it's about CI of Racket programs rather than of Racket itself, I wanted to point to a recent discussion on Greg Hendershott's "travis-racket" repository: https://github.com/greghendershott/travis-racket/issues/37 I think it's a very important piece of tooling for the Racket environment, and, given that it's used in `raco pkg new`, I think it's worth considering a quasi-official piece of infrastructure.

In particular, based on my quick-and-dirty work on a sort of port of the repository to AppVeyor (to test Windows-specific issues), I agree with this comment of Greg's:

One quick thought: Much of what install-racket.sh does is fill a void -- the absence of a redirect server with logical URLs, which the core Racket team has so far been unwilling or unable to provide themselves.

That is, something like download.racket-lang.org/{version}/{platform}/{flavor like minimal} would redirect to NEU or NW or wherever.
If the core team releases a new version, they add it.
If the core team is having a server issue (e.g. "Winooski"), they just change the redirect.
As opposed to:
Having a redirect server in the form of a big case statement in install-racket.sh.
Someone like you or me having to notice and eventually update it days or weeks later.
Maybe after such a change install-racket.sh and this repo still exist in a simplified form, or not, I don't know.

(I'm posting here instead of on Slack for persistence.)

-Philip

To view this discussion on the web visit https://groups.google.com/d/msgid/racket-dev/CAK%3DHD%2BYQmN%3DXPnRoAXad1QuoiSmb%3DA1o5RWeU3Yo0A0B%2BQiV_A%40mail.gmail.com.

John Clements

unread,

Nov 18, 2019, 2:52:43 PM11/18/19

to Philip McGrath, Sam Tobin-Hochstadt, Jack Firth, Racket Developers

This makes sense to me, but it would be easy to choose the wrong path protocol. Things are definitely interesting around source builds—what do I put for {platform} there? … and of course CS, which is an alternative now but should soon become the default. Also, there are things like “.tgz” bundling options for people who don’t want e.g. “.dmg” files.

Of course, you don’t necessarily have to include everything, and you can create space to be wrong later by putting a prefix in there; you could just go with e.g.

https://download.racket-lang.org/archived/v7.5/macos/racket.dmg

… and leave out the source builds & the CS builds & the mac .tgz bundles for now, with the idea that you’d change “archived” to something else if you discover that we got it all wrong.

John

> To view this discussion on the web visit https://groups.google.com/d/msgid/racket-dev/CAH3z3gammsbDmDATDF4wbgvHbAb2V20NL5m80H_atSUOFs-87A%40mail.gmail.com.

Sam Tobin-Hochstadt

unread,

Nov 18, 2019, 4:44:08 PM11/18/19

to John Clements, Philip McGrath, Jack Firth, Racket Developers

I think there are a few issues worth separating here:

1. snapshot.racket-lang.org is not a download site, but a page with two links.
We could fix this by having some redirects/URL rewrites, perhaps
from download.racket-lang.org.

2. when one of the snapshot builds fails, there's no notice/sometimes
the whole snapshot site just fails to be useful.
This is a harder problem to fix, and would require more commitment
than I think anyone running the snapshot builds is planning on right
now.

3. the different snapshot sites don't use consistent build names
between each other or with download.racket-lang.org.
Some of this is because the Northwestern and Utah sites build
slightly different things -- the northwestern site builds test
packages as well, plus they use different linux distributions.

Depending on what the issues that are most important to fix, we could
try to address these in various ways. Alternatively, if there are just
some redirects that download.racket-lang.org should implement that's
relatively easy.

Sam

On Mon, Nov 18, 2019 at 2:52 PM John Clements <clem...@brinckerhoff.org> wrote:
>
> This makes sense to me, but it would be easy to choose the wrong path protocol. Things are definitely interesting around source builds—what do I put for {platform} there? … and of course CS, which is an alternative now but should soon become the default. Also, there are things like “.tgz” bundling options for people who don’t want e.g. “.dmg” files.
>
> Of course, you don’t necessarily have to include everything, and you can create space to be wrong later by putting a prefix in there; you could just go with e.g.
>
> https://download.racket-lang.org/archived/v7.5/macos/racket.dmg
>
> … and leave out the source builds & the CS builds & the mac .tgz bundles for now, with the idea that you’d change “archived” to something else if you discover that we got it all wrong.
>
> John
>
> > On Nov 18, 2019, at 9:35 AM, Philip McGrath <phi...@philipmcgrath.com> wrote:
> >

> > To view this discussion on the web visit https://groups.google.com/d/msgid/racket-dev/CAH3z3gammsbDmDATDF4wbgvHbAb2V20NL5m80H_atSUOFs-87A%40mail.gmail.com.
>
>
>

Reply all

Reply to author

Forward