In <
2022Jan2...@mips.complang.tuwien.ac.at> I presented
measurements of FORWARD vs. DEFER performance on Gforth.
Here I do the same for a native-code system, in particular VFX. AFAIK
VFX has no FORWARD, but the resulting code uses a non-inlined direct
call, so I measure that, using the following microbenchmark:
defer foo2
: bar1 noop ;
: bar2 foo2 ;
' noop is foo2
: bench1 100000000 0 do bar1 bar1 bar1 bar1 bar1 bar1 bar1 bar1 bar1 bar1 loop ;
: bench2 100000000 0 do bar2 bar2 bar2 bar2 bar2 bar2 bar2 bar2 bar2 bar2 loop ;
On VFX64 BAR1 and BAR2 become:
ee bar1
BAR1
( 004E3F00 E8C329F3FF ) CALL 004168C8 NOOP
( 004E3F05 C3 ) RET/NEXT
( 6 bytes, 2 instructions )
ok
see bar2
BAR2
( 004E3F40 48FF1579FFFFFF ) CALL FFFFFF79 [RIP] @004E3EC0
( 004E3F47 C3 ) RET/NEXT
( 8 bytes, 2 instructions )
BAR1 and BAR2 are inlined in the BENCH words.
for i in 1 2; do LC_NUMERIC=en_US.utf8 perf stat -B -e cycles:u -e instructions:u vfx64 "include bench-defer.fs bench$i bye"; done
On a Ryzen 5800X this takes the following time and instructions (per
call to BAR1/BAR2):
cyc inst
4.1 2.3 bar1
4.2 2.3 bar2
So performance is not a reason to introduce FORWARD.
- anton
--
M. Anton Ertl
http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs:
http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard:
http://www.forth200x.org/forth200x.html
EuroForth 2021:
https://euro.theforth.net/2021