Hi Lewis,
1. MT models are built from parallel training data, as you know. This is the --train argument.
2. In addition, MT models need to be tuned. There are a lot of parameters in the model that tell it which of its components to trust, and these parameters have to be set to help the MT system produce translations that correlate with a good metric (which is BLEU score, for MT). Defaults can't really be used, because the right parameters differ for every language pair. This is the --tune argument. Typically, tuning data is just a few thousand sentences, whereas training data can be as many as many millions. If you just have a big parallel dataset, you can take a few thousand (I'd suggest 3–5k) sentences randomly, and reserve them for tuning (make sure to remove them from train).
3. Often in research, we wish to test how well a tuned model will generalize. For that scenario, we typically have a separate test set. If you're just building a language pack, you probably don't need this (and you can tell the pipeline to stop at tuning, "--last-step tune").
matt