Aranym for RPi

129 views
Skip to first unread message

philippe.noble

unread,
Oct 10, 2022, 11:37:57 AM10/10/22
to ARAnyM
Hi,

I have tried to build for a Raspberry Pi400 the latest commit and got bad results.
Aranym standard is built correctly but is 2 times slower than the version from 2014 that I use in BeePi.
Aranym-JIT can't be built at all.

For now BeePi is still stuck with commit 7373ac9 which is the last one working correctly on a RPi.

Have you got more chance than me?

Chris Jenkins

unread,
Oct 10, 2022, 4:39:27 PM10/10/22
to philippe.noble, ARAnyM
Hi Philippe,

(Apologies, my first reply only went to you Philippe because I forgot I have to click Reply-All in Google Groups :-/ )

I have successfully built aranym-mmu on my PI 4. It seems to work ok (I can boot FreeMiNT and I'm currently watching it play Doom in a window). So at least it works after I build it.

What is it that is 2 times slower than the version from 2014 that is used in BeePi? And how did you measure it (if it isn't obvious)?

Re Aranym-JIT, what happens when you try to build it?

When I run
../configure --prefix= --enable-addressing=direct --enable-usbhost --disable-sdl2 --enable-jit-compiler --enable-jit-fpu
I get the following error:

configure: error: Sorry, extended segfault handler not supported on your platform


Is that what you get?

Looking at configure.ac, it looks like I'm getting this because it doesn't recognise my CPU_TYPE. It recognises the arm architecture but unfortunately I'm running a 64-bit OS and my CPU_TYPE is therefore aarch64. I imagine there is some work to figure out whether arm 64 actually supports an extended segfault handler (not sure what that is yet tbh) and what command line flags are needed in order to use it.

I've run out of time to look at this now and I will be travelling for the next few days but I'm keen to look at this further and try to figure it out when I can.

Cheers,
Chris

--
You received this message because you are subscribed to the Google Groups "ARAnyM" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aranym+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aranym/376d5358-f085-4500-93e3-7b2d87e2bc4fn%40googlegroups.com.

Thorsten Otto

unread,
Oct 10, 2022, 11:29:36 PM10/10/22
to ara...@googlegroups.com

On Montag, 10. Oktober 2022 22:39:14 CEST Chris Jenkins wrote:

>I imagine there is some work to figure out whether arm 64 actually supports >an extended segfault handler (not sure what that is yet tbh) and what command >line flags are needed in order to use it.

 

A bit simplified:

 

When generating "normal" code, every access to atari is checked for accesses to I/O memory (like in https://github.com/aranym/aranym/blob/master/src/uae_cpu/memory-uae.h#L197 ) When JIT code is generated, that check is ommitted, and it relies on such accesses to generate a segfault. The segfault handler will then check whether that was actually a real segfault by accessing invalid memory, or I/O space. In the latter case, it calls the HW_get*/HW_put* functions, and then actually has to perform emulation of the generated (ARM) instructions. This has to take into account not only instructions generated by the JIT compiler, but also instructions generated by the C compiler, since before JIT code is generated, the normal emulation routines are used.

 

So the handler has know how to emulate the ARM instructions, and how to access the registers. This is system dependant.

 

But i expect that to actually not being so difficult, code for 32bit and 64bit x86 is also very similar in this area.

 

Much more work has to be put IMHO in the code generation. This is rather tricky, and simplified a lot on x86 due to the fact that you can restrict every single instruction to use 32bit addresses only, even on x86_64. This is absolutely necessary, since there are a lot places which assume that processor registers can be used for m68k register. If they must use real 64 bit addresses, this is not true anymore (they are written sometimes to the regs struct, which only has room for the 32bit m68k d0/a0 registers).

 

It also assumes that you can reach some global variables like the regs struct using 32bit offsets from the JIT generated code. I don't think the aarch64 has similar constructs.

 

But of course, supporting aarch64 would be a big goal. Every modern ARM device nowadays has such an architecture.



Chris Jenkins

unread,
Oct 11, 2022, 1:48:27 AM10/11/22
to ARAnyM
Many thanks for the explanation, Thorsten. So was my problem simply that I'm trying to build Aranym-JIT on an architecture (aarch64) that it doesn't support?

I'm aiming that the best plan for me when I try again to build it would be to use an aarch32 toolchain. I'm guessing that there won't be a need to run Aranym as a 64-bit binary so long as the CPU and the OS still support 32-bit.

I do take your point that supporting aarch64 would be a good goal. It works be really nice to have Aranym-JIT running natively on my M1 Mac one day...

--
You received this message because you are subscribed to the Google Groups "ARAnyM" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aranym+un...@googlegroups.com.

Philippe Noble

unread,
Oct 11, 2022, 4:12:59 AM10/11/22
to Chris Jenkins, ARAnyM

Hi Chris,

What is it that is 2 times slower than the version from 2014 that is used in BeePi? And how did you measure it (if it isn't obvious)?

I did some measurements with KRONOS : Builds are done with : ./configure --disable-sdl2
Aranym std 1.0.2-7373ac9 : CPU speed vs TT = 690
Aranym std 1.1.0-cur : CPU speed vs TT = 300

I tried with SDL2 : same result.
 
Re Aranym-JIT, what happens when you try to build it?
 
Aranym JIT 1.1.0-cur : ./configure --enable-jit-compiler --enable-jit-fpu --disable-sdl2

make[3]: *** [Makefile:1871: uae_cpu/compiler/compemu_fpp.o] Error 1
make[3]: *** Attente des tâches non terminées....
make[3] : on quitte le répertoire « /home/pi/Builds/aranym/src »
make[2]: *** [Makefile:2315: all-recursive] Error 1
make[2] : on quitte le répertoire « /home/pi/Builds/aranym/src »
make[1]: *** [Makefile:482: all-recursive] Error 1
make[1] : on quitte le répertoire « /home/pi/Builds/aranym »
make: *** [Makefile:422: all] Error 2


When I run
../configure --prefix= --enable-addressing=direct --enable-usbhost --disable-sdl2 --enable-jit-compiler --enable-jit-fpu
I get the following error:

configure: error: Sorry, extended segfault handler not supported on your platform


Is that what you get?

I use Raspbian Buster 32 bits, so that's not the same issue.

Thorsten Otto

unread,
Oct 11, 2022, 7:02:38 AM10/11/22
to ara...@googlegroups.com

On Dienstag, 11. Oktober 2022 10:12:46 CEST Philippe Noble wrote:

>Aranym std 1.0.2-7373ac9 : CPU speed vs TT = 690 Aranym std 1.1.0-cur : CPU >speed vs TT = 300

 

IIRC, there might be some issue due to using a not-so-optimal architecture (eg. arm6 vs arm7). Maybe you can try to specify CFLAGS & CXXFLAGS before invoking configure, or if that does not work, change the configure.ac (and then rebuild configure of course)

 

>I tried with SDL2 : same result.

 

I dont' think that SDL2 vs SDL has great impact on the emulation speed.

 

>make[3]: *** Attente des tâches non terminées....

 

Google translates that to "Waiting for unfinished tasks....", so there must have some error occured before that.

 

Eero Tamminen

unread,
Oct 11, 2022, 3:06:33 PM10/11/22
to ara...@googlegroups.com
Hi,

On 11.10.2022 14.02, Thorsten Otto wrote:
> On Dienstag, 11. Oktober 2022 10:12:46 CEST Philippe Noble wrote:
>
>> Aranym std 1.0.2-7373ac9 : CPU speed vs TT = 690 Aranym std 1.1.0-cur : CPU
>> speed vs TT = 300
>
> IIRC, there might be some issue due to using a not-so-optimal architecture
> (eg. arm6 vs arm7). Maybe you can try to specify CFLAGS & CXXFLAGS before
> invoking configure, or if that does not work, change the configure.ac (and
> then rebuild configure of course)
>
>> I tried with SDL2 : same result.

Unlike SDL1, SDL2 assumes HW accleration.

If host is missing real GL HW driver, rendering can happen using
software GL driver.


- Eero

Chris Jenkins

unread,
Nov 9, 2022, 4:25:34 PM11/9/22
to Philippe Noble, ARAnyM
Hi Philippe,

I finally found some time to try to build Aranym on my Pi400 (running a 32-bit OS). Sorry for the delay!

I can build aranym and aranym-mmu ok. I now need to install an OS and then run KRONOS to get some benchmarks. I assume you ran the benchmarks against aranym (sans MMU). Is that correct?

Cheers,
Chris


On Mon, 7 Nov 2022 at 19:32, Chris Jenkins <cdpje...@gmail.com> wrote:

 
Absolutely, that ’s one solution. You can also buy another SD card and use the same Pi 4.
Aranym standard can be built. The issue is building Aranym-JIT

I'll get another SD card and install it in my Pi400 (sadly this means I can't run BeePi while I'm working on it but never mind!)

I'll try building AranymJIT and then I'll let you know how it goes.

Cheers,
Chris

 

And if I can do that successfully, what next?

Well, we are entering the most interesting part, I guess.

Aranym standard runs 50% slower than the version from 2014 I use in Beepi. According to Thorsten, it looks like an optimization issue with the compilation for armv7 
Aranym-JIT had several compatibility issues since 2015 !  So  first step will be to test if they are still there. 


(Once again, I confess that the only platform I've successfully built Aranym on is Mac, using Xcode, so I have a bit of work to do to figure it out on Linux.)

I don't have a lot of time this week but I can put this on my list and try to do it this time!

Don’t worry, we have time.
Thanks again for your help-


On Mon, 7 Nov 2022 at 15:55, Philippe Noble <philipp...@gmail.com> wrote:
Hi Chris,

Have you had time to look into this?

Best regards,

Philippe

Chris Jenkins

unread,
Nov 9, 2022, 4:28:31 PM11/9/22
to Philippe Noble, ARAnyM
Regarding aranym-jit, I think the build fails for me in the same way that it fails for you. See [0] build failure below.

I'm going to try Thorsten's suggestion of trying to specify a different arm architecture using CFLAGS/CXXFLAGS in the build, though I confess my gcc skills are not strong so let's see if I can figure it out.


[0]
make  all-recursive
make[1]: Entering directory '/home/cdpj/GitProjects/aranym/jit'
Making all in src
make[2]: Entering directory '/home/cdpj/GitProjects/aranym/jit/src'
Making all in uae_cpu
make[3]: Entering directory '/home/cdpj/GitProjects/aranym/jit/src/uae_cpu'
make  all-am
make[4]: Entering directory '/home/cdpj/GitProjects/aranym/jit/src/uae_cpu'
make[4]: Nothing to be done for 'all-am'.
make[4]: Leaving directory '/home/cdpj/GitProjects/aranym/jit/src/uae_cpu'
make[3]: Leaving directory '/home/cdpj/GitProjects/aranym/jit/src/uae_cpu'
make[3]: Entering directory '/home/cdpj/GitProjects/aranym/jit/src'
  CXX      uae_cpu/compiler/compemu_fpp.o
../../src/uae_cpu/compiler/compemu_fpp.cpp: In function ‘int get_fp_value(uint32, uint16)’:
../../src/uae_cpu/compiler/compemu_fpp.cpp:140:4: error: ‘fmovi_rm’ was not declared in this scope
  140 |    fmovi_rm(FS1, (uintptr) temp_fp);
      |    ^~~~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp:156:4: error: ‘fmovs_rm’ was not declared in this scope
  156 |    fmovs_rm(FS1, (uintptr) temp_fp);
      |    ^~~~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp:272:3: error: ‘fmovi_rm’ was not declared in this scope
  272 |   fmovi_rm(FS1, (uintptr) temp_fp);
      |   ^~~~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp:278:3: error: ‘fmovs_rm’ was not declared in this scope
  278 |   fmovs_rm(FS1, (uintptr) temp_fp);
      |   ^~~~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp:292:3: error: ‘fmov_ext_rm’ was not declared in this scope; did you mean ‘mov_b_rm’?
  292 |   fmov_ext_rm(FS1, (uintptr) (temp_fp));
      |   ^~~~~~~~~~~
      |   mov_b_rm
../../src/uae_cpu/compiler/compemu_fpp.cpp:310:3: error: ‘fmov_rm’ was not declared in this scope
  310 |   fmov_rm(FS1, (uintptr) (temp_fp));
      |   ^~~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp: In function ‘int put_fp_value(int, uint32, uint16)’:
../../src/uae_cpu/compiler/compemu_fpp.cpp:340:3: error: ‘fmov_rr’ was not declared in this scope
  340 |   fmov_rr(dest_reg, val);
      |   ^~~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp:356:4: error: ‘fmovi_mr’ was not declared in this scope
  356 |    fmovi_mr((uintptr) temp_fp, val);
      |    ^~~~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp:371:4: error: ‘fmovs_mr’ was not declared in this scope
  371 |    fmovs_mr((uintptr) temp_fp, val);
      |    ^~~~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp:480:3: error: ‘fmovi_mr’ was not declared in this scope
  480 |   fmovi_mr((uintptr) temp_fp, val);
      |   ^~~~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp:486:3: error: ‘fmovs_mr’ was not declared in this scope
  486 |   fmovs_mr((uintptr) temp_fp, val);
      |   ^~~~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp:492:3: error: ‘fmov_ext_mr’ was not declared in this scope; did you mean ‘mov_b_mr’?
  492 |   fmov_ext_mr((uintptr) temp_fp, val);
      |   ^~~~~~~~~~~
      |   mov_b_mr
../../src/uae_cpu/compiler/compemu_fpp.cpp:512:3: error: ‘fmov_mr’ was not declared in this scope
  512 |   fmov_mr((uintptr) temp_fp, val);
      |   ^~~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp: In function ‘void comp_fscc_opp(uint32, uint16)’:
../../src/uae_cpu/compiler/compemu_fpp.cpp:620:2: error: ‘fflags_into_flags’ was not declared in this scope
  620 |  fflags_into_flags(S2);
      |  ^~~~~~~~~~~~~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp: In function ‘void comp_fbcc_opp(uint32)’:
../../src/uae_cpu/compiler/compemu_fpp.cpp:756:2: error: ‘fflags_into_flags’ was not declared in this scope
  756 |  fflags_into_flags(S2);
      |  ^~~~~~~~~~~~~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp: In function ‘void comp_fpp_opp(uint32, uint16)’:
../../src/uae_cpu/compiler/compemu_fpp.cpp:1170:8: error: ‘fmov_ext_mr’ was not declared in this scope; did you mean ‘mov_b_mr’?
 1170 |        fmov_ext_mr((uintptr) temp_fp, reg);
      |        ^~~~~~~~~~~
      |        mov_b_mr
../../src/uae_cpu/compiler/compemu_fpp.cpp:1190:8: error: ‘fmov_ext_mr’ was not declared in this scope; did you mean ‘mov_b_mr’?
 1190 |        fmov_ext_mr((uintptr) temp_fp, reg);
      |        ^~~~~~~~~~~
      |        mov_b_mr
../../src/uae_cpu/compiler/compemu_fpp.cpp:1265:8: error: ‘fmov_ext_rm’ was not declared in this scope; did you mean ‘mov_b_rm’?
 1265 |        fmov_ext_rm(reg, (uintptr) (temp_fp));
      |        ^~~~~~~~~~~
      |        mov_b_rm
../../src/uae_cpu/compiler/compemu_fpp.cpp:1285:8: error: ‘fmov_ext_rm’ was not declared in this scope; did you mean ‘mov_b_rm’?
 1285 |        fmov_ext_rm(reg, (uintptr) (temp_fp));
      |        ^~~~~~~~~~~
      |        mov_b_rm
../../src/uae_cpu/compiler/compemu_fpp.cpp:1434:5: error: ‘fmov_pi’ was not declared in this scope
 1434 |     fmov_pi(reg);
      |     ^~~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp:1437:5: error: ‘fmov_log10_2’ was not declared in this scope
 1437 |     fmov_log10_2(reg);
      |     ^~~~~~~~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp:1443:5: error: ‘fmov_rm’ was not declared in this scope
 1443 |     fmov_rm(reg, (uintptr) & const_e);
      |     ^~~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp:1447:5: error: ‘fmov_log2_e’ was not declared in this scope
 1447 |     fmov_log2_e(reg);
      |     ^~~~~~~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp:1457:5: error: ‘fmov_0’ was not declared in this scope
 1457 |     fmov_0(reg);
      |     ^~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp:1460:5: error: ‘fmov_loge_2’ was not declared in this scope
 1460 |     fmov_loge_2(reg);
      |     ^~~~~~~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp:1470:5: error: ‘fmov_1’ was not declared in this scope
 1470 |     fmov_1(reg);
      |     ^~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp:1511:4: error: ‘dont_care_fflags’ was not declared in this scope; did you mean ‘dont_care_flags’?
 1511 |    dont_care_fflags();
      |    ^~~~~~~~~~~~~~~~
      |    dont_care_flags
../../src/uae_cpu/compiler/compemu_fpp.cpp:1518:4: error: ‘fmov_rr’ was not declared in this scope
 1518 |    fmov_rr(reg, src);
      |    ^~~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp:1590:4: error: ‘fsqrt_rr’ was not declared in this scope
 1590 |    fsqrt_rr(reg, src);
      |    ^~~~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp:1673:4: error: ‘fsin_rr’ was not declared in this scope
 1673 |    fsin_rr(reg, src);
      |    ^~~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp:1701:4: error: ‘fetox_rr’ was not declared in this scope
 1701 |    fetox_rr(reg, src);
      |    ^~~~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp:1718:4: error: ‘ftwotox_rr’ was not declared in this scope
 1718 |    ftwotox_rr(reg, src);
      |    ^~~~~~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp:1768:4: error: ‘flog2_rr’ was not declared in this scope
 1768 |    flog2_rr(reg, src);
      |    ^~~~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp:1787:4: error: ‘fabs_rr’ was not declared in this scope
 1787 |    fabs_rr(reg, src);
      |    ^~~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp:1817:4: error: ‘fneg_rr’ was not declared in this scope
 1817 |    fneg_rr(reg, src);
      |    ^~~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp:1845:4: error: ‘fcos_rr’ was not declared in this scope
 1845 |    fcos_rr(reg, src);
      |    ^~~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp:1886:4: error: ‘fdiv_rr’ was not declared in this scope
 1886 |    fdiv_rr(reg, src);
      |    ^~~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp:1904:4: error: ‘frem_rr’ was not declared in this scope
 1904 |    frem_rr(reg, src);
      |    ^~~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp:1923:4: error: ‘fadd_rr’ was not declared in this scope; did you mean ‘add_b’?
 1923 |    fadd_rr(reg, src);
      |    ^~~~~~~
      |    add_b
../../src/uae_cpu/compiler/compemu_fpp.cpp:1942:4: error: ‘fmul_rr’ was not declared in this scope
 1942 |    fmul_rr(reg, src);
      |    ^~~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp:1979:4: error: ‘frem1_rr’ was not declared in this scope
 1979 |    frem1_rr(reg, src);
      |    ^~~~~~~~
../../src/uae_cpu/compiler/compemu_fpp.cpp:2025:4: error: ‘fsub_rr’ was not declared in this scope; did you mean ‘sub_b’?
 2025 |    fsub_rr(reg, src);
      |    ^~~~~~~
      |    sub_b

make[3]: *** [Makefile:1871: uae_cpu/compiler/compemu_fpp.o] Error 1
make[3]: Leaving directory '/home/cdpj/GitProjects/aranym/jit/src'

make[2]: *** [Makefile:2315: all-recursive] Error 1
make[2]: Leaving directory '/home/cdpj/GitProjects/aranym/jit/src'
make[1]: *** [Makefile:484: all-recursive] Error 1
make[1]: Leaving directory '/home/cdpj/GitProjects/aranym/jit'
make: *** [Makefile:424: all] Error 2

Chris Jenkins

unread,
Nov 9, 2022, 7:05:39 PM11/9/22
to Philippe Noble, ARAnyM
Hi Philippe,

I get similar results to you:

latest Aranym from git (880e24dd), speed vs TT == 363
aranym 7373c9 from git (from 2014), speed vs TT == 651

So the latest Aranym is about half the speed of the version from 2014. I built both myself on my Pi400.

One other thing that I noticed, that might or might not be relevant: The latest Aranym built from the latest git commit seemed to be less stable. It froze several times when I tried to run the konos benchmark and was pegged at 100% CPU, even when idle. The version from 2014 did not do that; it's currently idling at 8.3% CPU on my Pi.

So I'm not really sure where to go with this but there is definitely a problem here :-/

Cheers,
Chris


On Wed, 9 Nov 2022 at 21:25, Chris Jenkins <cdpje...@gmail.com> wrote:

Philippe Noble

unread,
Nov 10, 2022, 12:56:09 PM11/10/22
to Chris Jenkins, ARAnyM
Hi Chris,

Thanks. At least we are 2 with the same result. I fell a bit less alone ;-)
There is definitely something wrong with RPi compatibility, and that since 2014.
Where to look at now ? I don’t know.

Philippe

Chris Ridd

unread,
Nov 10, 2022, 1:15:56 PM11/10/22
to Philippe Noble, ARAnyM, Chris Jenkins
Normally you’d use `git bisect` to track down a regression like this. Hopefully it won’t take too many builds… how long does a build take on a Pi?

Chris

Philippe Noble

unread,
Nov 10, 2022, 2:14:47 PM11/10/22
to Chris Ridd, ARAnyM
I already did it years ago, 
The version used in BeePi is the last one working correctly : https://github.com/aranym/aranym/issues/9
Searching with git bisect I have found that the last good commit was 7373ac9
and the first bad a030b4c

Chris Jenkins

unread,
Nov 10, 2022, 3:03:52 PM11/10/22
to Philippe Noble, Chris Ridd, ARAnyM
Well this is what changed in that commit. The commit was supposed to fix a faulty case (and maybe it did) but it definitely affects handling of different arm arch versions.

commit a030b4cbda1cf4a815def8349815185f034ca290

Author: Jens Heitmann <jh_dr...@users.sourceforge.net>

Date:   Sat Nov 1 15:00:56 2014 +0100


    Fixed faulty case


diff --git a/ChangeLog b/ChangeLog

index 245fa5f3..1961b430 100644

--- a/ChangeLog

+++ b/ChangeLog

@@ -1,3 +1,7 @@

+

+2014/11/01 - Jens

+configure.ac - Fixed faulty case

+

 2014/11/01 - Jens

 configure.ac - Enable ARMv7 to 9 and set -marm to disable thumb instructions, not covered by sigsegv handler

 

diff --git a/configure.ac b/configure.ac

index 642bb67c..2fa6e3f7 100644

--- a/configure.ac

+++ b/configure.ac

@@ -1345,7 +1345,7 @@ elif [[ "x$GCC" = "xyes" -a "x$HAVE_ARM" = "xyes" ]]; then

   dnl ARM CPU

   if [[ "x$HAVE_GAS" = "xyes" ]]; then

     case "$host_cpu" in

-       armv[6-9]*)

+       armv6*|armv7*|armv8*|armv9*)

            ASM_OPTIMIZATIONS="ARM/V6 architecture w optimized flags"

           DEFINES="$DEFINES -DARMV6_ASSEMBLY -DARM_ASSEMBLY -DOPTIMIZED_FLAGS"

            CFLAGS="$CFLAGS -march=armv6 -marm"


Chris Ridd

unread,
Nov 10, 2022, 6:01:44 PM11/10/22
to Chris Jenkins, ARAnyM, Philippe Noble
That’s odd, to me the 2 case patterns seem like they would match the same strings.

Chris

Miro Kropáček

unread,
Nov 11, 2022, 2:36:37 AM11/11/22
to ARAnyM
Unless I haven't overlooked something, the original version would match any string starting with armv (6-9 can be used also zero times) while the new one would match only strings starting with armv6, ... armv9.

Thorsten Otto

unread,
Nov 11, 2022, 7:03:29 AM11/11/22
to ara...@googlegroups.com

On Freitag, 11. November 2022 08:36:23 CET Miro Kropáček wrote:

>the original version would match any string starting with armv (6-9 can be >

>used also zero times)

 

No, it doesn't. The shell does not use regular expressions, so the "*" only applies to the following characters, not the bracket expression. You can simply verify it with

 

 

$ case armv in armv[6-9]*) echo match;; *) echo no match;; esac

 

output:

no match

 

So i agree that imho both cases match the same strings. But there must be some reason for that patch, maybe different behaviour of the shell used on RPi? Is that a ksh maybe?

 

In any case, i suspect the bad performance to result from the "-march=armv6" switch. Maybe someone can try different settings there?

 

If that should fix it, it would imho be better to remove that switches from autoconf.ac, and use the compilers defaults instead. If you really need them for whatever reason, you can always pass them to configure by using "CFLAGS=-march=armv6" ./configure etc., But getting rid of such switches is not possible without hacking the configure script.

 

Chris Jenkins

unread,
Nov 11, 2022, 8:31:01 AM11/11/22
to Thorsten Otto, ara...@googlegroups.com
Pi uses bash. (Well, mine with relatively recent Raspberry Pi OS aka Raspbian does.) It's basically just a slightly customised Debian Linux that shouldn't do anything very different from any other Linux.

I'll play with this stuff when I have the chance to switch it on.

From reading the code last night, the two things that are actually affected by that check are:

- setting the ARMV6_ASSEMBLY macro (which then causes a lot of more specialised-looking instructions to be used in the UAE code)
- adding -march=armv6 -marm to the CFLAGS/CXXFLAGS


In any case, i suspect the bad performance to result from the "-march=armv6" switch. Maybe someone can try different settings there?


I'll try this and report back.

If that should fix it, it would imho be better to remove that switches from autoconf.ac, and use the compilers defaults instead. If you really need them for whatever reason, you can always pass them to configure by using "CFLAGS=-march=armv6" ./configure etc., But getting rid of such switches is not possible without hacking the configure script.


Sounds like a bigger task. I really hate autoconf/autogen but will see what I can do.
 

 

--
You received this message because you are subscribed to the Google Groups "ARAnyM" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aranym+un...@googlegroups.com.

Chris Jenkins

unread,
Nov 11, 2022, 6:07:06 PM11/11/22
to Thorsten Otto, ara...@googlegroups.com
I have found what triggered the slowdown (though I don't understand why): It was the ARMV6_ASSEMBLY macro.

If I remove it in configure.ac, build Aranym and run kronos, CPU performance goes up to 661.7, which is the highest that I've achieved on this Pi 400.

I confess I still don't understand precisely how it has this effect. That macro controls a lot of hand-crafted ARM assembly code in the UAE code but my ARM asm skills aren't great and I don't know which part is causing the problem or how to investigate it further.

BTW, host_cpu seems to be set to armv7l on my Pi400. My understanding is that the CPU is actually a Cortex-A72 that supports ARM v8-A but I guess the instruction set gets detected as armv7l when running in 32-bit mode. (On a Pi 4 that is running in 64-bit mode, host_cpu gets set to aarch64).

So anyway, it appears that not setting ARMV6_ASSEMBLY results in a much faster Aranym. Below [0] is a patch - which I could turn into a pull request - that removes it. I don't know if I'm ready to make that change, though, given that I don't understand the relevant UAE code yet.

What do you lot think?

Cheers,
Chris



[0]
---
 configure.ac | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure.ac b/configure.ac
index a082491b..f91ba5fd 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1365,7 +1365,7 @@ elif test "$GCC" = yes && $HAVE_ARM; then
    ;;
  armv7*|armv8*|armv9*)
            ASM_OPTIMIZATIONS="ARM/V7+ architecture w optimized flags"
-   DEFINES="$DEFINES -DARMV6_ASSEMBLY -DARM_ASSEMBLY -DOPTIMIZED_FLAGS"
+   DEFINES="$DEFINES -DARM_ASSEMBLY -DOPTIMIZED_FLAGS"
            CFLAGS="$CFLAGS -marm"
            CXXFLAGS="$CXXFLAGS -marm"
    ;;
--
2.30.2

Paul Wratt

unread,
Nov 12, 2022, 2:50:30 AM11/12/22
to ARAnyM-list
(sry I am slow with this)

> adding -march=armv6 -marm

this now is wrong, but it was fine for original RPi's A & B

verify the output (--cflags and --libs) of sdl2-config or sdl-config
if you --disable-sdl2 (this should be what .configure sets up), it
should be in the Makefile after you run .configure

for modern rpi, to get the right arch build values, check the output
of your OS's GCC, it should be something like -march=armv7l-neon and
--mfloat-abi=hard or similar for 32-bit, and -march=armv8-neon for
64-bit (Armv8/9 dont need "float" setting)

note: if you want to tweak the FPU instructions:
--mfpu=vfp (generic, works on all RPi and 99% ARMv6/v7/v8/v9)
--mfpu=vfp2 RPiA/A+B/2B/2B+ all RPi3/4 and CM3/4 and 90% or ARMv6/v7/v8 **
--mfpu=vfp3 Modern RPi2B+ and RPi3/3A+/3B+ and CM3 and any RPi4
32bit mode (not ARMv6 some ARMv7)
--mfpu=vfp4 RPi4 and RPi400 and CM4 (not ARMv6 or ARMv7)

** not 100% sure on this, I dont have access to my original tests,
there are 2 common ARMv6 & ARMv7 that have Neon but no useful VFP (or
flipped) - one is an iMX chip

this should at least tell you what you need to know to make changes:
gcc -march=native -E -v - </dev/null 2>&1 | grep cc1

unfortunately the "cc1" output cant be used directly without
manipulating it first, to be usable the options all need 2x (--) when
building as far as I know

If you want to know what defines get set with the above output, look
thru the output of:
echo | gcc -dM -E - -march=native

you dont "turn off thumb", instead you dont "turn on ARMv6", just
"turn on ARM", the compiler should sort the rest, but it is not the
proper way to build.

Sorry I never had RPi Zero or Zero2 to test against

If you build on RPi 32bit OS _without_ setting any GCC specific
ARM/Float settings, it will run on 99% of RPi OS and most ARM Debian
OS too

NOTE:

"armhf" OS is ARMv6 compatible Hardware-Float, which will run on any
ARMv7 (incl RPi 4 series),
and is 32bit

"aarch64" OS is ARMv8/v9/v10 and can _not_ run any ARMv6 (Thumb)
code, not all ARMv8 have ARMv7 capability (Mac M1/M2)


On 11/12/22, Chris Jenkins <cdpje...@gmail.com> wrote:
> I have found what triggered the slowdown (though I don't understand why):
> It was the ARMV6_ASSEMBLY macro.
>
> If I remove it in configure.ac, build Aranym and run kronos, CPU
> performance goes up to *661.7*, which is the highest that I've achieved on
>>> <https://groups.google.com/d/msgid/aranym/3376811.H8p5Gg1CAb%40earendil?utm_medium=email&utm_source=footer>
>>> .
>>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "ARAnyM" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to aranym+un...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/aranym/CAP1x9TkecbNSw3yGVxs7ieaSOLU7qNtWLfPxyiWFGg6yd4E2Ww%40mail.gmail.com.
>

Thorsten Otto

unread,
Nov 12, 2022, 4:53:43 AM11/12/22
to ara...@googlegroups.com

On Samstag, 12. November 2022 08:50:26 CET Paul Wratt wrote:

> for modern rpi, to get the right arch build values, check the output

> of your OS's GCC, it should be something like -march=armv7l-neon and

> --mfloat-abi=hard or similar for 32-bit, and -march=armv8-neon for

> 64-bit (Armv8/9 dont need "float" setting)

 

Thats why i suggested to remove the flags from configure.ac. I vaguely remember that Jens added it some years ago, also for perfomance reasons. That might be only valid for older RPi models at that time.

 

Removing those settings from configure.ac would allow it to override the flags at configure time. Bad things about this: we might need two different builds, tweaked for different processors, to get acceptable performance in both cases.

 

And of course we also have to check what influence that ARMV6_ASSEMBLY setting has on newer processors. Maybe it it should only be activated when really using armv6, but not when when using armv7 or better. Ultimatively, someone would have to check whether it makes a noticable difference on armv6.

 

 

 

 

Chris Jenkins

unread,
Nov 12, 2022, 5:34:09 AM11/12/22
to Paul Wratt, ARAnyM-list
Hi,

tl;dr I think the Aranym compiler settings are already correct for RPi4[00]. The only thing that seems to make the performance worse is the ARMV6_ASSEMBLY macro. Interested in any opinions on the details below.

Questions:
1) Given that ARMV6_ASSEMBLY seems to be the one thing I've found that affects the performance, I intend to look into this a bit further (starting from a very low level of skill and experience, unfortunately) and understand what the UAE code is actually doing with it. Does that sound sensible and are there any other suggestions from folks?
2) Meta question, where do I go to learn about this stuff? I confess I haven't programmed C professionally for 18 years and I've never coded for ARM systems before this year so interested in any advice that might help me to level up.



More details in response to your suggestions, Paul:
 
>  adding -march=armv6 -marm

this now is wrong, but it was fine for original RPi's A & B

Apologies, I got this wrong last night. The configure.ac in the latest Aranym adds precisely these CFLAGS (and not armv6):

           CFLAGS="$CFLAGS -marm"

           CXXFLAGS="$CXXFLAGS -marm"


So it's not adding -marmv6 on my modern RPi (which I'm reliably informed contains an ARMV8 CPU)..

verify the output (--cflags and --libs) of sdl2-config or sdl-config
if you --disable-sdl2 (this should be what .configure sets up), it
should be in the Makefile after you run .configure

FYI sdl-config on my Pi 400 gives:

$ sdl-config --cflags

-I/usr/include/SDL -D_GNU_SOURCE=1 -D_REENTRANT


$ sdl-config --libs

-L/usr/lib/arm-linux-gnueabihf -lSDL


If you want to know what defines get set with the above output, look
thru the output of:
 echo | gcc -dM -E - -march=native
FYI this gives the following on my Pi400:

gcc -march=native -E -v - </dev/null 2>&1 | grep cc1

 /usr/lib/gcc/arm-linux-gnueabihf/10/cc1 -E -quiet -v -imultilib . -imultiarch arm-linux-gnueabihf - -mfloat-abi=hard -mfpu=vfp -mtls-dialect=gnu -marm -march=armv8-a+crc+simd

 
TBH the above does look sane to me. In particular, -mfpu=vfp looks sensible if it'll work on all RPis.

I'm therefore assuming that I don't need to change anything in those flags; correct me if I'm wrong.

If you build on RPi 32bit OS _without_ setting any GCC specific
ARM/Float settings, it will run on 99% of RPi OS and most ARM Debian
OS too

Yup, that's what it looks like it's doing; building without any GCC specific ARM/float settings now. I can't see any problems with it.

Cheers,
Chris







Chris Jenkins

unread,
Nov 12, 2022, 5:47:58 AM11/12/22
to Thorsten Otto, ara...@googlegroups.com
Hi,

Thats why i suggested to remove the flags from configure.ac. I vaguely remember that Jens added it some years ago, also for perfomance reasons. That might be only valid for older RPi models at that time.


There was one more flag left in C[XX]FLAGS: -marm. Given what Paul explained to me, it looks like that's redundant and gcc on my RPi OS will pull in that flag anyway.

FYI I did one more build of Aranym with -marm removed from the configure.ac. Kronos CPU is 654.2 i.e. no change from the previous build, which is what I would expect.
 

And of course we also have to check what influence that ARMV6_ASSEMBLY setting has on newer processors. Maybe it it should only be activated when really using armv6, but not when when using armv7 or better. Ultimatively, someone would have to check whether it makes a noticable difference on armv6.


I'm planning on reading the UAE code and trying to understand the weird instructions that ARMV6_ASSEMBLY pulls in but I don't feel very confident, given my low level of ARM knowledge. Do you have any other suggestions?

I'd be happy to raise a little PR that _only_ sets ARMV6_ASSEMBLY if you're building on an ARMv6 box. That would have the result that anyone building on a modern RPi gets better performance but I don't have an RPi 1 (or other ARMv6 box) to test on and ensure that I've not broken it. TBH I think it would be reasonable at this point to say "Aranym only supports ARMv7 onwards and if you've got an older ARM processor then proceed at your own risk."
 
Cheers,
Chris


 

 

 

 

--
You received this message because you are subscribed to the Google Groups "ARAnyM" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aranym+un...@googlegroups.com.

Thorsten Otto

unread,
Nov 12, 2022, 6:28:55 AM11/12/22
to ara...@googlegroups.com

On Samstag, 12. November 2022 11:47:46 CET Chris Jenkins wrote:

>Do you have any other suggestions?

 

No sorry, not really. I'm not very experienced in ARM assembly either, and have to read the manual every time when i try to understand the code ;)

 

 



Thorsten Otto

unread,
Nov 12, 2022, 6:33:48 AM11/12/22
to ara...@googlegroups.com

On Samstag, 12. November 2022 11:47:46 CET Chris Jenkins wrote:

>TBH I think it would be reasonable at this point to say "Aranym only supports >ARMv7 onwards and if you've got an older ARM processor then proceed at your >own risk."

 

I think it would be safe to leave the define, when just building for armv6. Especially when it does not break the build completely, in the worst case you should get poorer perfomance (could be considered a regression of course, but better get good performance on *current* processors than an ancient ones).



Chris Jenkins

unread,
Nov 12, 2022, 7:17:30 AM11/12/22
to Thorsten Otto, ara...@googlegroups.com
Here is a PR that removes -DARMV6_ASSEMBLY and -marm from armv7 and later but leaves armv6 unchanged:


Feedback gratefully received.

I think it would be safe to leave the define, when just building for armv6. Especially when it does not break the build completely, in the worst case you should get poorer perfomance (could be considered a regression of course, but better get good performance on *current* processors than an ancient ones).

Agreed, so that's unchanged. I'd be interested to hear if there's anyone here that is interested in building Aranym on ARMv6 (and how well it works etc). Sadly my first Raspberry Pi was a Pi 3 so I have no idea.

Cheers,
Chris


 



--
You received this message because you are subscribed to the Google Groups "ARAnyM" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aranym+un...@googlegroups.com.

Philippe noble

unread,
Nov 12, 2022, 11:49:06 AM11/12/22
to Chris Jenkins, ara...@googlegroups.com
I have tested your commit.
I have slower results on the pi400 than you ( Kronos cpu@580) but that’s a nice improvement :-)
The main issue with this version is that it freezes with TOS 4.04, whereas it works well with emutos. I don’t know if it is related or if it’s another issue.

Philippe


Le 12 nov. 2022 à 13:17, Chris Jenkins <cdpje...@gmail.com> a écrit :



Chris Jenkins

unread,
Nov 12, 2022, 11:59:49 AM11/12/22
to Philippe noble, ARAnyM






I have slower results on the pi400 than you ( Kronos cpu@580) but that’s a nice improvement :-)

That's good news. (Though I wonder why yours is a little slower...)


The main issue with this version is that it freezes with TOS 4.04, whereas it works well with emutos. I don’t know if it is related or if it’s another issue.

I haven't used Atari TOS with Aranym. Is there anything special that I need to do in order to test it? I'm happy to try on my Pi when I have time.

Jens Heitmann

unread,
Nov 12, 2022, 12:04:11 PM11/12/22
to ARAnyM-list
ARMV6_ASSEMBLY  enables the use of instructions that are not available
Before Arm v6. In normal circumstances these should be better and faster
Implementations, but maybe sthg. Is wrongly implemented in a ARMV6 case

i.E. the SXTH isn’t available before ARMV6. But SXTH needs only one clock
cycle and ASR needs 1 plus additional per shift count. So in normal cases SXTH
Is much faster.

static inline void SIGNED16_IMM_2_REG(W4 r, IMM v) {
#if defined(ARMV6_ASSEMBLY)
MOV_ri8(r, (uint8) v);
ORR_rri8RORi(r, r, (uint8)(v >> 8), 24);
SXTH_rr(r, r);
#else
MOV_ri8(r, (uint8)(v << 16));
ORR_rri8RORi(r, r, (uint8)(v >> 8), 8);
ASR_rri(r, r, 16);
#endif
}

I used only this function as an example. May be any other faulty instruction
or function.



Jens Heitmann
Tel. 0511 47322186

————
Diese E-Mail könnte vertrauliche und/oder rechtlich geschützte Informationen
enthalten. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie
bitte sofort den Absender und vernichten Sie diese Mail.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail sind nicht gestattet.

This e-mail may contain confidential 
and/or privileged information.
If you are ;not the intended recipient (or have received this e-mail in
error) please notify the sender immediately and destroy this e-mail. Any
unauthorised copying, disclosure or distribution of ;the material in this
e-mail is strictly forbidden.

Thorsten Otto

unread,
Nov 12, 2022, 12:41:18 PM11/12/22
to ara...@googlegroups.com

Hi Jens,

 

glad to hear from you again ;)

>ARMV6_ASSEMBLY enables the use of instructions that are not available Before Arm v6.



By grepping through the source, i found that (except for sysdeps.h where it is also used for generic memory accesses), it is only used in the JIT compiler. Would it maybe be possible to replace it there with a runtime-flag instead of a compile-time define? I think similar things are already done in the x86 version where it detects availability of cmovs instructions etc.

 

I also wonder why instructions that where just introduced on armv6 (rev & revsh vs a series of several other instructions in the case of memory accesses) should cause such performance loss on armv7 and better.

 

 

Philippe Noble

unread,
Nov 12, 2022, 1:00:50 PM11/12/22
to Chris Jenkins, ARAnyM

I have slower results on the pi400 than you ( Kronos cpu@580) but that’s a nice improvement :-)

That's good news. (Though I wonder why yours is a little slower...)


Quite strange indeed. I have tried different setups but the results are the same. 
With Beepi aranym version I get 690, so it is 15% below, but I don’t know the margin of error of Kronos.
Have you configured something special ?  My build was done with the default autogen.


The main issue with this version is that it freezes with TOS 4.04, whereas it works well with emutos. I don’t know if it is related or if it’s another issue.

I haven't used Atari TOS with Aranym. Is there anything special that I need to do in order to test it? I'm happy to try on my Pi when I have time.


To test the freeze, you just need to boot aranym with Tos 4.04. Aranym freezes just after loading TOS 4.04 and doesn’t even boot.

Otherwise, you can use the TOS setup of Beepi. The config file and hd image are stored in /h/.system/aranym-tos/
If you run it on the console use config.pi3

Jens Heitmann

unread,
Nov 12, 2022, 1:21:35 PM11/12/22
to Thorsten Otto, ARAnyM-list
Am 12.11.2022 um 18:41 schrieb Thorsten Otto <ad...@tho-otto.de>:

Hi Jens,

 

glad to hear from you again ;)
>ARMV6_ASSEMBLY enables the use of instructions that are not available Before Arm v6.


By grepping through the source, i found that (except for sysdeps.h where it is also used for generic memory accesses), it is only used in the JIT compiler. Would it maybe be possible to replace it there with a runtime-flag instead of a compile-time define? I think similar things are already done in the x86 version where it detects availability of cmovs instructions etc.

if (have_cmov)
CMOVLrr(cc, s, d);
else { /* 

Right. It is done in intel by an if statement. But the ARMV6_ASSEMBLY intention was
more like the X86 or X64 in the past. It can be extended to v7 or v8 too. Each capture
by in „if“ but it will cause more overhead in the compile time. Another option would be
Initializing functions pointers pointing to the optimized version for the CPU. But this
requires a good way to find out the current CPU at rt.

 

I also wonder why instructions that where just introduced on armv6 (rev & revsh vs a series of several other instructions in the case of memory accesses) should cause such performance loss on armv7 and better.


This may cause if a statement is not well implemented. Maybe it updates some flags not correctly so a loop may be running longer as neccessary or sthg. like that.


 

 


-- 
You received this message because you are subscribed to the Google Groups "ARAnyM" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aranym+un...@googlegroups.com.

Thorsten Otto

unread,
Nov 12, 2022, 1:57:14 PM11/12/22
to ara...@googlegroups.com
On Samstag, 12. November 2022 19:21:30 CET Jens Heitmann wrote:

>This may cause if a statement is not well implemented. Maybe it updates some
>flags not correctly so a loop may be running longer as neccessary or sthg.
>like that.

Still strange.

For the memory access macros, i experimented a bit. I have an older cross-
compiler for arm-linux-gnueabi installed here (version 7.2). First thing that
i noticed is, that using just arm-linux-gnueabihf-gcc (without specifiying any
architecture) seem to be diffent than using "arm-linux-gnueabihf-gcc -
march=archv6", although the compiler seems to generate code for armv6 by
default. In the latter case, i immediately get an error :

/usr/arm-linux-gnueabihf/libc/usr/include/bits/stdio.h:37:1: sorry,
unimplemented: Thumb-1 hard-float VFP ABI

... unless i also specify -marm. So for some reason, that flag seem to be
needed.

Then i tried some simple functions:

static inline unsigned int do_byteswap_32(unsigned int v) { return
__builtin_bswap32(v);}


unsigned int test(unsigned int v)
{
return do_byteswap_32(v);
}


When i compile that for armv6, i get:

test:
rev r0, r0
bx lr

When compiled for armv5, i get:

test:
eor r3, r0, r0, ror #16
lsr r3, r3, #8
bic r3, r3, #65280
eor r0, r3, r0, ror #8
bx lr

That looks rather identical to the inline asm macros in sysdeps.h. So i guess
we can just drop them there, and let the compiler use its builtin functions.
That should avoid any hassle with wrong and/or non-optimal constraints on the
asm directives.





Jens Heitmann

unread,
Nov 12, 2022, 4:20:56 PM11/12/22
to ARAnyM-list
https://github.com/lubomyr/uae4arm/tree/master/src/jit

May be it would be interesting how Lubomyr improved uae.
Even it seems to have support vor armA64.

Jens Heitmann
Tel. 0511 47322186

————
Diese E-Mail könnte vertrauliche und/oder rechtlich geschützte Informationen
enthalten. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie
bitte sofort den Absender und vernichten Sie diese Mail.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail sind nicht gestattet.

This e-mail may contain confidential 
and/or privileged information.
If you are ;not the intended recipient (or have received this e-mail in
error) please notify the sender immediately and destroy this e-mail. Any
unauthorised copying, disclosure or distribution of ;the material in this
e-mail is strictly forbidden.
--
You received this message because you are subscribed to the Google Groups "ARAnyM" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aranym+un...@googlegroups.com.

Chris Jenkins

unread,
Nov 13, 2022, 4:23:35 PM11/13/22
to Philippe Noble, ARAnyM
Hi Philippe,
 
Quite strange indeed. I have tried different setups but the results are the same. 
With Beepi aranym version I get 690, so it is 15% below, but I don’t know the margin of error of Kronos.
Have you configured something special ?  My build was done with the default autogen.
To build it, I ran the following (inspired by the steps from the .travis/build.sh and by your instructions in an earlier email):

NO_CONFIGURE=1 ./autogen.sh

../configure $common_opts

make depend # actually I don't think there is a depend step any more
make

To test the freeze, you just need to boot aranym with Tos 4.04. Aranym freezes just after loading TOS 4.04 and doesn’t even boot. 

Otherwise, you can use the TOS setup of Beepi. The config file and hd image are stored in /h/.system/aranym-tos/
If you run it on the console use config.pi3
Many thanks. I'm away for a few days and will try these when I'm back home with my Pi.

Cheers,
Chris

Paul Wratt

unread,
Nov 16, 2022, 2:46:11 AM11/16/22
to ARAnyM-list
If I understand rightly what the ARMV6 define allows, I would say that
when compiling for Armv8 it has to swithch _down_ to 24 bit
(data/address) from 64 bit, and so the slowdown is noticeable, and
somewhat more "slower" that the slowdown on Armv7 build (dropping down
from 32bit)

I would leave the define set _only_ for Armv6. That said, that code
should be easily extendable for other Armv7/v8

It is possible (probable?) that Thorston's gcc is building for
"generic" (read "safe") Armv6 code by default, which is why there is a
difference. RPi Armv6 is a very specific type of Armv6.

I made an issue on RetroPi GH page for SDL when RPi2B+ was 1st
released, and I am pretty sure I wrote the Armv6 settings DebHelper
used, I'll see if I can find it. I have a couple of RPiB+ (the same
CPU as v1 RPiZero) that I used to test pTOS with, but I am not sure
that I have any usable SD-cards for it atm (like RPi3 they have narrow
limits on what SD-cards you can use). Actually, there might also be
something on the pTOS GH page issues on Armv6 (or in a Makefile), as
then RPiOS did not have the newer gcc being used to cross-compile for
Qemu at the time.

As soon as I get my current power problems resolved, I'll dig them out
of storage and test them, but that may take a few months, sorry ..

.. and for Nehza D1 Risc-V too .. and for cross-build too ..

Cheers

Sorry I cant be more prompt, and more helpful ATM, but you guys seem
to have figured out most of it.

Cheers

Paul

PS yeah I too have looked into UAE4ARM usage too, but they were still
developing/tweaking it at the time, so I put it off (and never got
back to it)
>>>> <https://groups.google.com/d/msgid/aranym/10324932.DrmEJsJWr9%40earendil?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups
>>> "ARAnyM" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an
>>> email to aranym+un...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/aranym/CAP1x9TkkF8ZDMLQSPx6R8PrEvVbaFE%3D-0QCa%2Bn_p3aANZ9vj8Q%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/aranym/CAP1x9TkkF8ZDMLQSPx6R8PrEvVbaFE%3D-0QCa%2Bn_p3aANZ9vj8Q%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "ARAnyM" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to aranym+un...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/aranym/CAP1x9T%3D7-tHwxjwf9jgmBjJG5R%3D_Zifd%2BWjvENm65CYF8mmmOQ%40mail.gmail.com.
>

Philippe Noble

unread,
Jan 16, 2023, 9:14:43 AM1/16/23
to ARAnyM-list
Hi,

Happy new year to you all !

Do you have any news regarding Aranym JIT for RPi ?

Best regards,

Philippe

Philippe Noble

unread,
Feb 11, 2023, 11:46:34 AM2/11/23
to ARAnyM-list
Hi,

No answer : -( I don’t know what it means. Lack of interest, time or equipment ?
It’s sad because I may have to give up on BeePi development sooner or later if I am not able to update aranym.

I am ready to give away a RPi4 with its power supply and a SD card setup with BeePi to anybody willing to fix aranym JIT build for a RPi.

Regards,

Philippe

Chris Jenkins

unread,
Feb 12, 2023, 8:55:16 AM2/12/23
to Philippe Noble, ARAnyM-list
[Sending this email to the list because I accidentally sent my last reply directly to Philippe:

Hi Philippe,

I'm really sorry, I did see this email a few weeks ago but I didn't do anything about it; I was hoping that someone with more skills than me would be interested in working on this problem. I hope you don't have to give up working on BeePi because I still love it; my Pi 400 with BeePi is the only machine that I have that boots straight into an Atari OS.

I'm happy to try to work on trying to build/test Aranym JIT, with the caveat that I probably don't have some of the necessary skills (I last wrote C or asm professionally over 15 years ago and have never worked with ARM asm outside of toy projects). But I can try. I just found my Raspbian 32 SD card and put it back into my Pi 400 and will try building Aranym JIT again and see what happens.

Would folks on this list mind if I spam the list with noob questions?

Cheers,
Chris


--
You received this message because you are subscribed to the Google Groups "ARAnyM" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aranym+un...@googlegroups.com.

Chris Jenkins

unread,
Feb 12, 2023, 12:57:02 PM2/12/23
to Philippe Noble, ARAnyM-list
Hi Philippe,

Do you know what commit you used to build the last version of BeePi (the one where aranym-jit compiled ok)? I want to find a version that compiles ok to see what that looks like, and then I want to bisect to find the first commit where it broke. But I haven't even found a commit that compiles yet.

Cheers,
Chris

Philippe Noble

unread,
Feb 12, 2023, 1:17:39 PM2/12/23
to ARAnyM-list
Hi Chris,

The last commit working fine was the 7373ac9, that’s the one used in BeePi.
The following commits were able to be built but with strange bugs and serious slow down. I don’t know when building JIT started to be broken.

I have tried recently to build again commit 7373ac9 but without success. I guess that this old version is no longer compatible with recent libs.

Chris Jenkins

unread,
Feb 12, 2023, 4:47:12 PM2/12/23
to Philippe Noble, ARAnyM-list
Hi Philippe,

tl;dr I have managed to build aranym-jit on my Pi 400. I had to build it without --enable-jit-fpu (because I couldn't see any evidence that the 68k's floating point instructions were implemented in the version of UAE that is contained within Aranym). Not everything is working perfectly (for example, I tried to run Kronos and it doesn't start up correctly and leaves TerraDesk in a bad state) but it's progress. And some programs run OK.

This is the configure command that I used:

../configure --prefix=/home/cdpj/opt/aranym-jit --enable-addressing=direct --enable-usbhost --enable-jit-compiler

If I configure with --enable-jit-fpu then I get a whole load of failures like this (either on the latest commit from master or on commit 7373ac9 from many years ago):

../src/uae_cpu/compiler/compemu_fpp.cpp: In function ‘int get_fp_value(uint32, uint16)’:
../src/uae_cpu/compiler/compemu_fpp.cpp:87:17: error: ‘nop’ was not declared in this scope
   87 | #define delay2  nop() ;nop()
      |                 ^~~
../src/uae_cpu/compiler/compemu_fpp.cpp:128:6: note: in expansion of macro ‘delay2’
  128 |      delay2;
      |      ^~~~~~
../src/uae_cpu/compiler/compemu_fpp.cpp:129:6: error: ‘fmovi_rm’ was not declared in this scope
  129 |      fmovi_rm(FS1,(uintptr)temp_fp);
      |      ^~~~~~~~
../src/uae_cpu/compiler/compemu_fpp.cpp:145:6: error: ‘fmovs_rm’ was not declared in this scope
  145 |      fmovs_rm(FS1,(uintptr)temp_fp);
      |      ^~~~~~~~
../src/uae_cpu/compiler/compemu_fpp.cpp:87:17: error: ‘nop’ was not declared in this scope
   87 | #define delay2  nop() ;nop()
      |                 ^~~
<snip>

Those at least some of the macros that are not found look like they implement 68k floating point instructions (I never had a 68k machine that could do floating point so I never learned them!). As far as I can tell, those floating point instructions are defined in a header called compemu_midfunc_x86.h which (as the name suggests) only gets pulled in when building on x86. I can't see any equivalent definitions of those floating point instructions for  ARM so I'm wondering one thing: Is it possible that Aranym is using a version of UAE that doesn't support JITing floating point instructions on ARM devices?

Does this match what you get? Or are you getting any different failures?

Cheers,
Chris




Philippe Noble

unread,
Feb 13, 2023, 12:23:05 PM2/13/23
to ARAnyM-list
Hi Chris,

Great !!! I have been able to build successfully aranym JIT last commit  with your flags.
What does —enable-addressing=direct do ? I had never used this flag before.

It boots ok, but I have seen some strange behavior (tests done with BeePi)
- Effectively Kronos freezes
- Works : graphical bugs and elevators of the wrong size
- Zview: error jpg.ldg has a bad format
- Procalc : give weird results 45+10=54.999999998 
- Litchi, Cresus, KKcommander suffer of the same wrong number format as ProCalc
- PhGmap can’t find codec …

This weird behavior doesn’t happen with aranym std.

Regards,

Philippe

Chris Jenkins

unread,
Feb 13, 2023, 2:44:16 PM2/13/23
to Philippe Noble, ARAnyM-list

What does —enable-addressing=direct do ? I had never used this flag before.

I confess I don't fully understand what it does and I only turned it on because the Travis (Linux x86) build turns it on. It turns on the macro DIRECT_ADDRESSING which in turns controls some behaviour (that I don't yet understand) in main_unix.cpp. Can anyone explain what the intent of that flag is?

I can try different values for that flag when I have some time but, again, I don't yet understand what it is supposed to do.

It boots ok, but I have seen some strange behavior (tests done with BeePi)
- Effectively Kronos freezes
- Works : graphical bugs and elevators of the wrong size
- Zview: error jpg.ldg has a bad format
- Procalc : give weird results 45+10=54.999999998 
- Litchi, Cresus, KKcommander suffer of the same wrong number format as ProCalc
- PhGmap can’t find codec …

One hypothesis that I tried to test is that floating point maths is completely broken in this build... but I _can_ run a simple program that uses 68040 floating point instructions (this one https://github.com/cdpjenkins/hello_gem/blob/master/float-test/src/main.c) successfully. Does anyone have any ideas?
 

philippe.noble

unread,
Oct 31, 2023, 1:03:45 PM10/31/23
to ARAnyM
Hi Chris and all,

I am just reactivating this thread, hoping that a solution could be found for Aranym JIT and RPi.
Is there any news ?

Regards,

Philippe

Chris Jenkins

unread,
Oct 31, 2023, 3:58:56 PM10/31/23
to philippe.noble, ARAnyM
Hi Philippe,

I did see the query from TheNameOfTheGame on Atari Forum today and remembered this thread. I'm afraid I didn't get any further after the last time we spoke about this. My recollection is that I was able to build and run Aranym JIT on my Pi 400 but that it mysteriously gave incorrect results and I wasn't able to figure out why. In the meantime, I just run Aranym with no JIT.

I still love Aranym and BeePi but I need more skill and more free time in order to help with this. (I confess I did make an effort to learn AArch32 assembly language this year to see if it would help me to debug the issue but I have a lot more to learn.)

I still have my Pi 400 and I'll try to find the time to play with the Aranym build again and try to understand what is happening... I might figure something out but I can't make any promises.

Cheers,
Chris


philippe.noble

unread,
Oct 31, 2023, 7:25:57 PM10/31/23
to ARAnyM
Thank you Chris, I appreciate every help.

As discussed before, I know that most members of this group don't work on RPi,  so I am ready to give away a RPi4 to anyone willing to work on it.
I really hope that a solution could be found and have a working and optimized version of Aranym for the RPi for 2024 :-)

Chris Jenkins

unread,
Nov 2, 2023, 5:42:23 PM11/2/23
to philippe.noble, ARAnyM
FYI I have managed to build Aranym JIT again on my Pi 400. I got similar results as last time: FreeMiNT (taken from EasyAraMint on my Mac) boots OK but has serious problems. For example, I find that Aranym crashes with a bus error when I run PmDoom 1 from EasyAraMint. I don't get this problem if I run plain Aranym or Aranym MMU. I'll try to spend some time debugging and see if I can make some progress figuring out what is going wrong.

I also hit another problem building Aranym. I'm running Raspbian Bullseye 32 bit. This OS used a 32 bit kernel when I first installed it but, since an update earlier this year, it has decided to switch to a 64-bit kernel instead (with the userland remaining 32 bit). Unfortunately, this confuses Aranym's configure script (which I think ultimately gets the CPU_TYPE from `uname -m` or something like that) and causes it to think it's building for a 64 bit AArch64 target, even though the toolchain (and indeed the whole userland of the OS) is still 32 bit AArch32.

I don't understand autoconf/automake very well so I just worked around this by reverting to a 32 bit kernel; I set arm_64bit=0 in my /boot/config.txt as advised by this page https://forums.raspberrypi.com/viewtopic.php?t=349070. I mention this because 32 bit support on the Pi is only going to get worse in the future and it will be necessary to find a proper solution to this. (I _think_ the Pi 5 can still run a 32 bit code in user mode but my understanding is that its Arm Cortex-A76 process can not run a 32 bit kernel.)

I'll report back if I get any further.


philippe.noble

unread,
Dec 28, 2023, 1:29:04 PM12/28/23
to ARAnyM
Season greetings to you all.

I am working on an update of Beepi. I have done some testing with Debian Bookworm and I have encountered problems building Aranym JIT too : configure ended with an error "extended segfault handler not supported on your platform"
After setting arm_64bit=0 as you recommended , I have been able to build Aranym 1.1.0 of 27/12/2023.
Aranym standard works at full speed (even faster than the 1.02 version I use in BeePi),  I can see that the Hostfs drives are mounted by Aranym, but hostfs can't be accessed under Mint or Emutos. The same configuration / versions of Mint and Emutos can access the hostfs with Aranym 1.02.
Aranym JIT has the same serious problems as before and the same issue with hostfs.

Chris, do you have news ?


Paul Wratt

unread,
Dec 28, 2023, 4:39:45 PM12/28/23
to ARAnyM
Hey guys, nice to see you are still trying to push this along

> I had to build it without --enable-jit-fpu (because I couldn't see any evidence that the 68k's
> floating point instructions were implemented in the version of UAE that is contained within
> Aranym)

you are sorta right. From memory, you have to use a specific FPU
"enable" flag with `./configure` , the IEEE one i think - but I also
thought this was already fixed at some point when someone did some M1
work on ARAnyM too, so I dont know (thats a 64bit OS)

> I also hit another problem building Aranym. I'm running Raspbian Bullseye 32 bit. This OS
> used a 32 bit kernel when I first installed it but, since an update earlier this year, it has
> decided to switch to a 64-bit kernel instead (with the userland remaining 32 bit).

This is how I run my Buster desktop, the 64 kernel does not have a
problem with a 32 OS. This is the configuration you want if you are
goning to run "heavy" 32bit apps. It allows the OS (kernel
multitasking) to remain responsive. It also means the binaries are
smaller, and can never consume more than 4GB of address space. (BTW
the 64bit kernel is a "config.txt" setting)

NOTE: you can run and build 64bit apps with the 32bit OS, but you
basically need a 64bit chroot to do it (I made a post on RPi Forums
about it a couple of years ago) - it can be "installed" via "apt-get
install raspbian-nspawn-64" if you have the room and want both 64bit &
32bit on the 32bit Raspberrypi OS

https://forums.raspberrypi.com/viewtopic.php?p=1974866#p1974866

> this confuses Aranym's configure script (which I think ultimately gets the CPU_TYPE from
> `uname -m` or something like that) and causes it to think it's building for a 64 bit AArch64
> target, even though the toolchain (and indeed the whole userland of the OS) is still 32 bit
> AArch32.

yeah this is about the only drawback for 32bitOS with 64bit kernel,
and a known problem - unset 64bit OS in "config.txt" reboot, compile
what you want, then set set it again when you want (safer/faster)
64bit kernel

> I can see that the Hostfs drives are mounted by Aranym, but hostfs can't be accessed under
> Mint or Emutos. The same configuration / versions of Mint and Emutos can access the hostfs
> with Aranym 1.02. Aranym JIT has the same serious problems as before and the same issue
> with hostfs.

I think this might be a 64bit / 32bit incompatible API thing (dont
quote me on that tho). Although I did have one problem similar to this
at some point, turned out to be the ARAnyM side using user:root" while
the host side was using "user:pi" (sorry it was more than 3 years ago,
so memory not so accurate)

My Build Server Update:

no luck so far, had a problem installing the OS, has "mostly" been
fixed, but still not 100% usable for me yet. Also I (again) lost 5
years worth of "work" on a 1TB drive, not upset about this time, just
means I have to start from scratch again, with no reference material
(cant find it using internet "search" any more)

I should be able to afford new drive for RPi next week, if so, I will
throw Bookworm on it too

Other "Real Life (tm)" stuff also got in the way, again (whats new?)

I will see if I can dig out my original Wheezy sd-card, which has
ARAnyM & STEemSSE built on it too, might be able to just copy Makefile
from them and "work-out-of-the-box" ..

Anyway, keep up the effort, looks like almost there ..

Cheers

Paul
Reply all
Reply to author
Forward
0 new messages