After running the pick_open_reference_otus.py script with the Greengenes database on some 16S MiSeq data, I looked at the taxa summaries and found that I have a very high percent of unassigned taxa for most of the samples, ranging from 50% to 90%, with most being in the 70% range of unassigned taxa. I have not worked with bacterial data before but it is my understanding that this number is unusually high.
I do not think it is due to the samples themselves or the MiSeq run because ITS primers were used on the same samples and the fungal data looks pretty good, with low numbers of unassigned reads. Additionally, this sample is not from soil or ocean (which I know are more likely to have unassigned taxa with this database); it is from a controlled processing experiment.
Does anyone have any insight as to what the issue might be, or if this is actually a typical amount of unassigned reads for 16S?