Dear Murat,
thank you very much for your very detailed reply! I now understand much better what is going on with my dataset.
Basically the sequences I took are the ones right after the trim.seqs step (the one after denoising) in mothur (I took them so high up in the pipeline to maximize the total number of initial sequences for oligotyping). So those sequences have been filtered to remove seqs shorter than 200 bp and with homopolymers longer than 8 bp, the forwards primer was removed but NOT the reverse one. So when I align them with PyNast what happens is that I have 1) variable lengths at the end of the alignment because the reverse primer sequence is still there, 2) some sequences are longer than 200 bp but are not long enough to get to the reverse primer....
Now I can see that I have two (three) options (tell me if I'm wrong):
I could just trim out the reverse primer from this dataset and perhaps filter out reads that are not long enough to get to that position...
Or, I could start again by taking the sequences a bit further down in the pipeline, after the align.seqs and screen.seqs, so that they have already been filtered against the Silva database to remove misaligned sequences and to trim both ends at a precise position of the 16S alignment.... or I could go even a bit more down and take the sequences after chimera checking...
I see that I might have a tradeoff here, precision and quality of the dataset vs. total number of sequences that I input in the analysis.
What would you suggest me to do then?
Thanks a lot again for your kind help!
Best,
Joanito