Re: [racket-dev] support for arm64 / aarch64

95 views
Skip to first unread message

David Bremner

unread,
Jun 29, 2015, 3:33:52 AM6/29/15
to Matthew Flatt, dev, 774...@bugs.debian.org
Matthew Flatt <mfl...@cs.utah.edu> writes:

> It looks like this patch was submitted for v6.1. Version 6.1.1 (the
> current release), uses SGC instead of Boehm's GC during the build
> process by default. So, it at least avoids this immediate problem.
>
> I can't think of any other problem that would turn up in v6.1.1, but
> I'm not sure it will work. We'd definitely welcome feedback on whether
> Racket 6.1.1 builds on AArch64, or where it gets stuck if not.
>

Uh, sorry, I dropped the ball on this.

Racket 6.2 is failing to build on AArch64

https://buildd.debian.org/status/fetch.php?pkg=racket&arch=arm64&ver=6.2-2&stamp=1435538908

Scrolling to the end

mkdir xsrc
/usr/bin/make ../gracket3m
make[6]: Entering directory '/«PKGBUILDDIR»/build/gracket/gc2'
../../racket/racket3m -cqu /«PKGBUILDDIR»/src/gracket/gc2/../../racket/gc2/xform.rkt --setup ../../racket/gc2 --cpp "gcc -E -I/«PKGBUILDDIR»/src/gracket/gc2/../../racket/gc2 -I./../../racket/ -I/«PKGBUILDDIR»/src/gracket/gc2/../../racket/include/ -DUSE_SENORA_GC -D_FORTIFY_SOURCE=2 -Dwx_xt -MMD" --keep-lines -o xsrc/grmain.c +D INITIAL_COLLECTS_DIRECTORY='"'"`cd /«PKGBUILDDIR»/src/gracket/gc2/../../../collects; pwd`"'"' +D INITIAL_CONFIG_DIRECTORY='"'"`cd /«PKGBUILDDIR»/src/gracket/gc2/../../..; pwd`/etc"'"' /«PKGBUILDDIR»/src/gracket/gc2/../grmain.c
E: Caught signal ‘Terminated’: terminating immediately
make[3]: *** [gracket-3m] Terminated

At least superficially the build failures for ppc64el and s390x look similar:

https://buildd.debian.org/status/fetch.php?pkg=racket&arch=ppc64el&ver=6.2-2&stamp=1435539483

https://buildd.debian.org/status/fetch.php?pkg=racket&arch=s390x&ver=6.2-2&stamp=1435538877

Juan Francisco Cantero Hurtado

unread,
Jun 29, 2015, 8:45:06 PM6/29/15
to racke...@googlegroups.com, d...@racket-lang.org
Similar bug:
http://bugs.racket-lang.org/query/?cmd=view%20audit-trail&database=default&pr=15079

David, Debian has some machine with those architectures available to
give shell accounts to upstream developers?. I asked for a shell account
for Matthew in the OpenBSD mailing list long time ago but nobody replied.

James McCoy

unread,
Jun 29, 2015, 9:58:15 PM6/29/15
to racke...@googlegroups.com
On Tue, Jun 30, 2015 at 02:44:46AM +0200, Juan Francisco Cantero Hurtado wrote:
> Similar bug: http://bugs.racket-lang.org/query/?cmd=view%20audit-trail&database=default&pr=15079
>
> David, Debian has some machine with those architectures available to give
> shell accounts to upstream developers?. I asked for a shell account for
> Matthew in the OpenBSD mailing list long time ago but nobody replied.

Yes, David or I could sponsor a request to get Matthew guest access on
relevant porterboxes[0] if he'd like to debug the build failures[1].

Matthew would just need to follow the “non-DMs” instructions[2] to
provide one of us with the relevant information we can use to make the
request.

[0]: https://db.debian.org/machines.cgi Systems where purpose=porterbox
and hostname matches debian.org.
[1]: https://buildd.debian.org/status/package.php?p=racket
[2]: https://dsa.debian.org/doc/guest-account/ DMUP refers to [3]
[3]: https://www.debian.org/devel/dmup

Cheers,
--
James
GPG Key: 4096R/331BA3DB 2011-12-05 James McCoy <jame...@debian.org>
signature.asc

Matthew Flatt

unread,
Jul 2, 2015, 6:31:48 PM7/2/15
to James McCoy, Juan Francisco Cantero Hurtado, racke...@googlegroups.com
I was able to get an AArch64 installation running with Qemu, and I
think I've found the main problem with these failing builds.

The end of "gc2.h" has a preprocessor test for `__x86_64__` or `WIN64`,
but it should have been a more general test for
`SIXTY_FOUR_BIT_INTEGERS`.

With that repair, the AArch64 build still fails for me due to a GC
problem. If I disable generational GC, then the build seems ok. I think
I've had trouble in the past, where the signal handlers that implement
the write barrier didn't work correctly inside Qemu. The problem might
not be Qemu-specific, though, and I'm interested to hear whether the
"gc2.h" change to use `SIXTY_FOUR_BIT_INTEGERS` fixes the problem on
real machines.

Here's the commit to fix "gc2.h":

https://github.com/plt/racket/commit/0cda0c98b085dc289bbb40cb37325042b35eea07

Juan Francisco Cantero Hurtado

unread,
Jul 2, 2015, 7:12:12 PM7/2/15
to racke...@googlegroups.com, public-racket-dev-/J...@plane.gmane.org
Racket builds fine on linux/ppc64 with your change.

The tests fail:
$ racket -f quiet.rktl
Section(basic)


TIMEOUT -- ABORTING!

David Bremner

unread,
Jul 2, 2015, 11:32:01 PM7/2/15
to Matthew Flatt, James McCoy, Juan Francisco Cantero Hurtado, racke...@googlegroups.com
Matthew Flatt <mfl...@cs.utah.edu> writes:

> I was able to get an AArch64 installation running with Qemu, and I
> think I've found the main problem with these failing builds.
>
> The end of "gc2.h" has a preprocessor test for `__x86_64__` or `WIN64`,
> but it should have been a more general test for
> `SIXTY_FOUR_BIT_INTEGERS`.
>
> With that repair, the AArch64 build still fails for me due to a GC
> problem. If I disable generational GC, then the build seems ok. I think
> I've had trouble in the past, where the signal handlers that implement
> the write barrier didn't work correctly inside Qemu. The problem might
> not be Qemu-specific, though, and I'm interested to hear whether the
> "gc2.h" change to use `SIXTY_FOUR_BIT_INTEGERS` fixes the problem on
> real machines.

looks like not qemu specific ? This is on a real (AMD X-Gene Mustang) amd64
machine:

racket/racket3m -X "/home/bremner/racket-6.2/debian/tmp/usr/share/racket/collects" -G "/home/bremner/racket-6.2/debian/tmp/etc/racket" -N "raco" -l- setup --no-user -j --no-launcher --no-install --no-post-install
raco setup: bootstrapping from source...
SIGSEGV MAPERR si_code 1 fault on addr 0x4000
Aborted
Makefile:160: recipe for target 'install-3m' failed
make[2]: *** [install-3m] Error 134
make[2]: Leaving directory '/home/bremner/racket-6.2/build'

d

Matthew Flatt

unread,
Jul 3, 2015, 2:49:00 PM7/3/15
to David Bremner, James McCoy, Juan Francisco Cantero Hurtado, racke...@googlegroups.com
Thanks for your help!

I now have a build that works on my Qemu installation. The problem was
that the signal-handling stack installed via sigaltstack() was too
small. Making the stack 10*SIGSTKSZ bytes instead of SIGSTKSZ bytes
solves the problem.

In retrospect, it's clear how this is related to generational GC: no
write barrier means one less signal handler. But it wasn't simply that
the write-barrier signal handler was running out of space; overflow
required the combination of that one plus a nested SIGPROF handler
(so, not so easy to track down).

I expect that 64-bit platforms other than x86_64 have larger stack
frames, which explains why we only see the problem on other
architectures.

Here's the commit to increase the stack size:

https://github.com/plt/racket/commit/d6fa581a4c487cd55ca62b853a36842e2fd381a3

Matthew Flatt

unread,
Jul 3, 2015, 2:49:53 PM7/3/15
to Juan Francisco Cantero Hurtado, racke...@googlegroups.com
It's possible that the recent change will fix this problem, but I'm not
optimistic. Assuming that the test still fails, is it possible to get
output from `racket -f basic.rktl` to we can see more specifically the
point of failure?
> --
> You received this message because you are subscribed to the Google Groups
> "Racket Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to racket-dev+...@googlegroups.com.
> To post to this group, send email to racke...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/racket-dev/5595C53F.50809%40juanfra.info.
> For more options, visit https://groups.google.com/d/optout.

Juan Francisco Cantero Hurtado

unread,
Jul 4, 2015, 2:55:34 PM7/4/15
to racke...@googlegroups.com, public-racket-dev-/J...@plane.gmane.org
Apparently, racket's threads have problems on linux/ppc64. I ran
manually each test and only failed "thread.rktl".

Here is the output: http://git.io/vqm18

quiet.rktl contains this code:

;; -- set up a timeout
(set! timeout-thread
(thread
(lambda ()
(sleep 1200)
(fprintf errp "\n\n~aTIMEOUT -- ABORTING!\n" Section-prefix)
(exit 3)
;; in case the above didn't work for some reason
(sleep 60)
(custodian-shutdown-all cust)))))))

quiet.rktl prints the error very quickly, less than a second. (sleep
1200) is not working there.

David Bremner

unread,
Jul 14, 2015, 10:49:23 AM7/14/15
to Matthew Flatt, James McCoy, Juan Francisco Cantero Hurtado, racke...@googlegroups.com
Matthew Flatt <mfl...@cs.utah.edu> writes:

> Here's the commit to increase the stack size:
>
> https://github.com/plt/racket/commit/d6fa581a4c487cd55ca62b853a36842e2fd381a3
>

With this patch (and the previous one), 6.2 builds on arm64, but only
with a single process build. With -j2 or more, I get errors.
I _think_ this happens the first time the build runs racket3m
in a non-trivial way.

I attach a truncated log for a -j3 build; to my eyes the -j2 looked
similar, both failing at compiler-lib/compiler with "collection not
found". It seems to go in the same vein for another 10k lines; I
truncated most of this before sending to the list, but let me know if
the full build log would be useful.

build3.log.gz

David Bremner

unread,
Dec 18, 2015, 12:21:40 PM12/18/15
to Matthew Flatt, Juan Francisco Cantero Hurtado, racke...@googlegroups.com
David Bremner <da...@tethera.net> writes:

> Matthew Flatt <mfl...@cs.utah.edu> writes:
>
>> Here's the commit to increase the stack size:
>>
>> https://github.com/plt/racket/commit/d6fa581a4c487cd55ca62b853a36842e2fd381a3
>>
>
> With this patch (and the previous one), 6.2 builds on arm64, but only
> with a single process build. With -j2 or more, I get errors.
> I _think_ this happens the first time the build runs racket3m
> in a non-trivial way.
>

Just to update, the default (-j4) build is still failing on ppc64el /
aarm64 with racket 6.3

https://buildd.debian.org/status/fetch.php?pkg=racket&arch=ppc64el&ver=6.3-2&stamp=1450447398

I can try a single threaded build later, but I wondered if the huge
number of warnings from the JIT code was normal.

d

Matthew Flatt

unread,
Dec 18, 2015, 3:42:35 PM12/18/15
to David Bremner, Juan Francisco Cantero Hurtado, racke...@googlegroups.com
At Fri, 18 Dec 2015 13:21:29 -0400, David Bremner wrote:
> Just to update, the default (-j4) build is still failing on ppc64el /
> aarm64 with racket 6.3
>
>
> https://buildd.debian.org/status/fetch.php?pkg=racket&arch=ppc64el&ver=6.3-2&st
> amp=1450447398
>
> I can try a single threaded build later, but I wondered if the huge
> number of warnings from the JIT code was normal.

Those warnings are normal for some variants of gcc, but the JIT should
not be used at all for PPC64, so that seems likely to be the problem.

The PPC JIT is currently enabled by

# if defined(__powerpc__) && !defined(__powerpc64__)

What should it be?


From here, it looks like the arm64 build is ok:

https://buildd.debian.org/status/logs.php?pkg=racket&ver=6.3-2

The x32 failure makes sense, though. Currently, Racket in various ways
assumes that x86_64 means 64-bit pointers and longs.

David Bremner

unread,
Dec 18, 2015, 4:21:55 PM12/18/15
to Matthew Flatt, Juan Francisco Cantero Hurtado, racke...@googlegroups.com
Matthew Flatt <mfl...@cs.utah.edu> writes:

> At Fri, 18 Dec 2015 13:21:29 -0400, David Bremner wrote:
>> Just to update, the default (-j4) build is still failing on ppc64el /
>> aarm64 with racket 6.3
>>
>>
>> https://buildd.debian.org/status/fetch.php?pkg=racket&arch=ppc64el&ver=6.3-2&st
>> amp=1450447398
>>
>> I can try a single threaded build later, but I wondered if the huge
>> number of warnings from the JIT code was normal.
>
> Those warnings are normal for some variants of gcc, but the JIT should
> not be used at all for PPC64, so that seems likely to be the problem.
>
> The PPC JIT is currently enabled by
>
> # if defined(__powerpc__) && !defined(__powerpc64__)

Unless there is some hijinx with undefining prepocessor symbols, that
looks like it should be correct

bremner@plummer ~ % gcc -E -dM - < /dev/null | grep -i 'power\|ppc'
#define _ARCH_PPCGR 1
#define __PPC64__ 1
#define _ARCH_PPCSQ 1
#define _ARCH_PPC 1
#define __powerpc64__ 1
#define __PPC__ 1
#define __powerpc__ 1
#define __POWER8_VECTOR__ 1
#define _ARCH_PPC64 1

That's gcc 5.3.1 on the porterbox.

d

David Bremner

unread,
Dec 18, 2015, 6:04:37 PM12/18/15
to Juan Francisco Cantero Hurtado, Matthew Flatt, racke...@googlegroups.com
Juan Francisco Cantero Hurtado <i...@juanfra.info> writes:

> On Friday 18 December 2015 13:42:32 Matthew Flatt wrote:
>>At Fri, 18 Dec 2015 13:21:29 -0400, David Bremner wrote:
>>>
>>> I can try a single threaded build later, but I wondered if the huge
>>> number of warnings from the JIT code was normal.
>>
>>Those warnings are normal for some variants of gcc, but the JIT should
>>not be used at all for PPC64, so that seems likely to be the problem.
>>
>>The PPC JIT is currently enabled by
>>
>> # if defined(__powerpc__) && !defined(__powerpc64__)
>
> I added that change to sconfig. I tested racket on fedora ppc64 big
> endian and fedora ppc64 little endian. Both are IBM POWER systems.

I don't understand what you had to add? According to Matthew, that is
the test already there? Although I didn't the exact test anywhere in the
6.3 source.

> Everything works fine and there is only a remaining bug in big endian. I
> can reproduce the same bug on OpenBSD powerpc (32bits big endian).
> Anyway, racket builds fine always.

Can you build successfully with -j4 ?

d

Matthew Flatt

unread,
Dec 18, 2015, 6:10:04 PM12/18/15
to David Bremner, Juan Francisco Cantero Hurtado, racke...@googlegroups.com
At Fri, 18 Dec 2015 19:04:32 -0400, David Bremner wrote:
> Juan Francisco Cantero Hurtado <i...@juanfra.info> writes:
>
> > On Friday 18 December 2015 13:42:32 Matthew Flatt wrote:
> >>At Fri, 18 Dec 2015 13:21:29 -0400, David Bremner wrote:
> >>>
> >>> I can try a single threaded build later, but I wondered if the huge
> >>> number of warnings from the JIT code was normal.
> >>
> >>Those warnings are normal for some variants of gcc, but the JIT should
> >>not be used at all for PPC64, so that seems likely to be the problem.
> >>
> >>The PPC JIT is currently enabled by
> >>
> >> # if defined(__powerpc__) && !defined(__powerpc64__)
> >
> > I added that change to sconfig. I tested racket on fedora ppc64 big
> > endian and fedora ppc64 little endian. Both are IBM POWER systems.
>
> I don't understand what you had to add? According to Matthew, that is
> the test already there? Although I didn't the exact test anywhere in the
> 6.3 source.

Ah --- looking again, I see that the condition above wasn't in v6.3.
Juan Francisco supplied the patch in October, but after the branch for
v6.3.

Juan Francisco Cantero Hurtado

unread,
Dec 18, 2015, 6:52:00 PM12/18/15
to racke...@googlegroups.com, public-racket-dev-/J...@plane.gmane.org
git master builds fine and racket 6.3 needs this patch:
https://github.com/racket/racket/commit/e957a7d.patch

David Bremner

unread,
Dec 24, 2015, 2:37:07 PM12/24/15
to Juan Francisco Cantero Hurtado, racke...@googlegroups.com, public-racket-dev-/J...@plane.gmane.org
Just to confirm that that patch fixes the racket 6.3 build on
Debian. There are no problems on armel, that I will follow up on if I
can reproduce.
Reply all
Reply to author
Forward
0 new messages