On 06/05/2014 16:14, Mark Wills wrote:
> Well, I bit the bullet and hacked together an optimising Brainfuck compiler.. It reads text files of BF code, and compiles a Forth representation (optimising as it goes) directly to memory.
>
> It's written for my Forth system (Forth 83). I'm not sure if it's possible to take the approach I have taken in an ANS system, especially with systems that generate machine code.
>
> It reads a line of text from the file, and then starts processing. When instructions repeat (e.g. ++++) it counts the number of repetitions and only emits code for it (LIT 4 BF+) when a new instruction arrives - so a change in instruction causes the previous instruction to be emitted/flushed.
>
> There's some 'look-back' type of optimisation going on with reference to a single instruction sequence: [-] this sequence decrements the current location to 0 and is obviously ripe for optimising. When it sees ] it looks back at the (compiled) code to see if we've just compiled [-] and if so it backs up HERE and overwrites the code with a call to a ~ instruction, which zeros the current location. I don't know if this technique is doable on a native compiling system? It's easy on an ITC, you can read back the code you just compiled and compare them to CFAs and identify a code-sequence. Not so easy when you compile Forth, but read back machine code, which is what happens on a native system. Some other system would have to be employed.
>
> I think for a native code generating Forth system, the simplest approach for this type of thing is to write an interpreter, rather than a compiler. It then becomes very similar to an ITC system; you're compiling tokens to memory space, which will be read and looked up with a CASE...ENDCASE or similar.
>
> Ah well, I think that ends my temporary interest in BF!
>
[...]
Perhaps so but here's a non-optimising BF compiler in ANS Forth which is
a bit more compact. I'll show how to implement an optimiser after the
code. Incidentally this follows the desciption in the Wikipedia article
and ignores any character in the BF program that isn't one of the 8
commands. Hence the 2 versions of Hello World below, both of whch work.
A BF program is contained within bf[ and ]
The final ] is unmatched and therefore illegal in BF and is used to
terminate compilation.
32000 constant bf-mem_size
create bf-mem bf-mem_size chars allot
variable pointer
variable #nested \ Count the number of nested [ ] pairs
variable bf-end \ Flag to indicate end of BF program
: reset ( -- ) \ zero bf memory and reset pointer
bf-mem bf-mem_size 0 fill
bf-mem pointer ! 0 #nested ! 0 bf-end !
;
: refill? ( caddr u -- caddr u | caddr' u' )
begin
dup 0=
while
2drop refill 0= abort" Early end of file"
source
repeat
;
wordlist constant commands \ For definition of the BF commands
: next-cmd ( caddr u -- caddr' u' xt )
begin
refill? 1 /string
over 1 chars - 1 commands search-wordlist
until
;
: peek ( -- n ) pointer @ c@ ;
: poke ( n -- ) pointer @ c! ;
: (>) ( -- ) 1 pointer +! ; \ increment the pointer
: (<) ( -- ) -1 pointer +! ; \ decrement the pointer
: (+) ( -- ) peek 1+ poke ; \ increment the byte at the pointer
: (-) ( -- ) peek 1- poke ; \ decrement the byte at the pointer
: (.) ( -- ) peek emit ; \ output the byte at the pointer
: (,) ( -- ) key poke ; \ input a byte, store at the pointer
\ The BF commands in a separate wordlist
get-current commands set-current
: > ( -- ) postpone (>) ;
: < ( -- ) postpone (<) ;
: + ( -- ) postpone (+) ;
: - ( -- ) postpone (-) ;
: . ( -- ) postpone (.) ;
: , ( -- ) postpone (,) ;
: [ ( caddr u ) \ jump past the matching ] if the byte at the pointer = 0
1 #nested +!
2>r postpone begin postpone peek postpone while 2r>
;
\ An unmatched ] indicates the end of the BF program
: ] ( caddr u ) \ jump backward to the matching [ or end of program
#nested @
if -1 #nested +! 2>r postpone repeat 2r> else -1 bf-end ! then
;
set-current
: bf[ ( -- ) \ Start of BF program
reset :noname ( -- xt )
source >in @ /string ( -- xt caddr u )
begin
next-cmd execute bf-end @
until
source rot - >in ! 2drop \ Update >in for the Forth interpreter
postpone ; cr execute
;
bf[ \ Hello world compressed
++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.
]
bf[ \ Hello World with comments
+++++ +++ Set Cell #0 to 8
[
>++++ Add 4 to Cell #1; this will always set Cell
#1 to 4
[ as the cell will be cleared by the loop
>++ Add 4*2 to Cell #2
>+++ Add 4*3 to Cell #3
>+++ Add 4*3 to Cell #4
>+ Add 4 to Cell #5
<<<<- Decrement the loop counter in Cell #1
] Loop till Cell #1 is zero
>+ Add 1 to Cell #2
>+ Add 1 to Cell #3
>- Subtract 1 from Cell #4
>>+ Add 1 to Cell #6
[<] Move back to the first zero cell you find;
this will
be Cell #1 which was cleared by the
previous loop
<- Decrement the loop Counter in Cell #0
] Loop till Cell #0 is zero
The result of this is:
Cell No : 0 1 2 3 4 5 6
Contents: 0 0 72 104 88 32 8
Pointer : ^
>>. Cell #2 has value 72 which is 'H'
>---. Subtract 3 from Cell #3 to get 101 which is 'e'
+++++ ++..+++. Likewise for 'llo' from Cell #3
>>. Cell #5 is 32 for the space
<-. Subtract 1 from Cell #4 for 87 to give a 'W'
<. Cell #3 was set to 'o' from the end of 'Hello'
+++.----- -.----- ---. Cell #3 for 'rl' and 'd'
>>+. Add 1 to Cell #5 gives us an exclamation point
>++. And finally a newline from Cell #6
]
\ --- End of Code -----
You've said elsewhere you can't see the point of quotations, well the
definition of is:
: + ( -- ) postpone (+) ;
where (+) is defined earlier. If we use a quotation we can write
: + ( -- ) [: peek 1+ poke ;] compile, ;
which looks a lot clearer to me.
Optimisation can be added in this way e.g. to optimise a sequence of
+'s, add this code in the appropriate places
: n+ ( n -- ) peek + poke ; \ Optimised sequence of +'s
0 value +xt
\ Replace the original definition of + with
: + ( caddr u -- caddr' u' ) \ Optimises sequence of +'s
0 >r
begin
next-cmd dup +xt =
while
drop r> 1+ >r
repeat
r> ?dup if 1+ postpone literal postpone n+ else postpone (+) then
execute
;
get-order commands swap 1+ set-order
' + to +xt
previous
--
Gerry