Hi Baskaran,
Thanks for the mail and your interest in pialign! To answer your questions:
1) I actually don't 100% remember, but I think for the lexicalized tables with multiple samples I simply used "cat" to compile the .samp files together, and ran
itgstats.pl to calculate the reordering probabilities from this file.
2) The -noqueue doesn't avoid beam search, but just uses a different data structure during search. The results should be the same with or without this option (although they will be a little different due to the order of sampling). If you want to run exhaustive search, you can set -probwidth to zero. In this case the amount of time necessary will expand O(n^6) in the length of the sentences. This may be possible for sentences of up to 10 words or so, but any longer and it will take forever.
3) As pialign's search is approximate, it is not sampling directly from the true probability distribution. One way to fix this is by performing a Metropolis-Hastings rejection step after parsing. (See "Bayesian inference for pcfgs via markov chain monte carlo"). However, I have found that this actually hurts accuracy somewhat, so it is not enabled by default.
4) pialign usually performs training by sampling sentences in random order. By using -noshuffle you can sample sentences in corpus order instead. I haven't found a huge difference in accuracy either way, particularly if you run for many iterations, but usually shuffling helps accuracy a little when you are using a small number of iterations.
5) -noremnull was inspired by Section 4.3 of the paper "Sampling Alignment Structure under a Bayesian Translation Model," and causes the model to not remember null alignments to prevent common words from being aligned to "null" all the time. I didn't see a big difference in accuracy either way when using this option though.
Graham