This month I want to do something a bit more free form. Your exercise
for this month is to do something interesting with the brainfuck
language (http://www.muppetlabs.com/~breadbox/bf/). This is a very
simple Turing-complete
(https://en.wikipedia.org/wiki/Turing_completeness) language that is
easy to pick up, but you will probably find it incredibly difficult to
write non-trivial code. The point of this exercise is to think more
deeply about how programming languages work and how the features of
higher level languages are implemented.
As I said, it is hard to write code in this language, but you will
hopefully learn something and come out the other side as a better
programmer. There are a lot of brainfuck resources around the internet
and feel free to write to the CPB list with questions or ideas.
Have fun :)
Since Matt has shared brainfuck, I thought I'd throw in a couple of
other frivolous 'programming languages' that are out there in the form
of games by Zach of Zachtronics.
(http://www.zachtronicsindustries.com). The two games of his that I'm
the biggest fan of basically involve programming manipulators to move
atoms and molecules around. In particular, SpaceChem is one that I
really enjoyed, it's a very physical take on programming, but you do
have modification, flip-flops and decision-making so you can make
systems that do surprisingly complex things.
One of them (and a reason I brought SpaceChem up in the first place)
is that someone has written a brainfuck interpreter in the SpaceChem
sandbox, using molecules to represent bits and different instructions.
I'd recommend having a look at either SpaceChem (Steam) or the Codex
of Alchemical Engineering (a little flash game) if you're interested
in something that will stretch your mind. The best thing about
SpaceChem is that it's a puzzle game, but at the end of each puzzle it shows a
histogram of how efficiently people have solved that puzzle on a
number of different metrics, so you can drive yourself to distraction
trying to optimise the hell out of one of your processes.
Hope this doesn't eat too much of anyone's time, but I think it's a
great way for some1 with a programming mindset to unwind.
-- Dave
--
-- DKW
In the spirit of this month, I've been trying to write an x86
brainfuck compiler as a sed invocation. What I've got so far follows
below. It doesn't work at all yet (produces seg faulting code for
everything I've used it on so far), but I thought I'd share the design
I'm working with and what I've learned so far.
sed \
-e 's/[^][<>\+\-\.,]*//g' \
-e 's/\,/\tcmpl $0,%ebx\n\tjne 1f\n\tpushl %ebx\n\tpushl
%ecx\n\tcall getchar\n\tpopl %ecx\n\tpopl %ebx\n\tmovl
%eax,(%ecx)\n1:\n\n/g' \
-e 's/>/\tcmpl $0,%ebx\n\tjne 1f\n\tincl %ecx\n1:\n\n/g' \
-e 's/</\tcmpl $0,%ebx\n\tjne 1f\n\tdecl %ecx\n1:\n\n/g' \
-e 's/\[/\tcall 2f\n2:\tcmpl $0,%ebx\n\tjne 1f\n\tcmpl
$0,(%ecx)\n\tje 3f\n1:\tincl %ebx\n3:\n\n/g' \
-e 's/\]/\tpopl %edx\n\tcmpl $0,%ebx\n\tjne 1f\n\tjmp
*(%edx)\n1:\tdecl %ebx\n\n/g' \
-e 's/\+/\tcmpl $0,%ebx\n\tjne 1f\n\tincl (%ecx)\n1:\n\n/g' \
-e 's/\-/\tcmpl $0,%ebx\n\tjne 1f\n\tdecl (%ecx)\n1:\n\n/g' \
-e 's/\./\tcmpl $0,%ebx\n\tjne 1f\n\tpushl %ebx\n\tpushl
%ecx\n\tpushl (%ecx)\n\tcall putchar\n\taddl $4,%esp\n\tpopl
%ecx\n\tpopl %ebx\n1:\n\n/g' \
-e '1i\\t\n\t.local p\n\t.comm p,30000,32\n\t.text\n.globl
main\n\n\t.text\nmain:\n\tpushl %ebp\n\tmovl %esp,%ebp\n\tandl
$-16,%esp\n\tsubl $16,%esp\n\tmovl $0,%ebx\n\tmovl p,%ecx\n\n' \
-e '$a\\tmovl $0,%eax\n\tleave\n\tret\n' \
The first -e removes all irrelevant symbols, the next eight translate
the relevant symbols and the final two introduce prelude and epilogue.
To understand what on earth is going on, I'm using ECX as a pointer
into the array, EBX as an indicator of whether to execute the current
instruction (0 = yes, otherwise no) and the stack to stash the EIP of
the start of the current loop context. With the caveat that x86
assembly is not at all my forte, here's my musings so far:
- This exercise would be trivial without [ and ]. There would be none
of this weird EBX testing and jumps all over the place. Most
instructions would translate directly to one assembly instruction.
- The complexity in this is trying to write a compiler without any
state. Because there is no context of what loop depth you are at I had
to make each instruction manipulate and/or check this state within the
program at runtime.
- Sed has some weird (and seemingly undocumented) behaviour when
using [ and ] within a regex block.
- X86 sucks. Namely:
1. You can't directly read or write EIP from user code meaning I
needed to do silly tricks like "call 2f; 2:" to push EIP onto the
stack.
2. There are no predicated instructions like ARM, meaning I had to
cmpl and jmp all the time which will play havoc with the branch
prediction and destroy performance (not to mention looking
ridiculous).
3. Why can I not use je to jump through a register? Seriously, why?
I can "call *(%eax)" and "jmp *(%eax)", but no "je *(%eax)". I suspect
those in the know will say this is an instruction length optimisation.
X86 has a whole raft of these oddities that do not make sense by
modern standards. For a prime example, see the answer to
http://stackoverflow.com/questions/1396527/any-reason-to-do-a-xor-eax-eax.
I'm reasonably sure the design of the whole thing I have currently can
work, but one thing I could use advice on is if you think there's a
better design that would simplify the whole thing. I realise what I've
just described is probably pretty confusing (I wrote it and *I'm*
pretty confused by it) so please ask me to go into more detail about
anything that's unclear.