I understand that it is basically about structuring your code in such a way that the CPU can process multiple instructions simultaneously and I've seen some examples in the archives but I would really like to learn how it works and how to apply it properly.
Thanks
Bram
Following up my own post: I just found a very nice article
about the working of MMX, which incidentally answers my own
question (and some other latent questions I had ;-).
http://webster.cs.ucr.edu/AoA/Windows/HTML/TheMMXInstructionSet.html