Hi All - and especially André. Merci beaucoup pour toutes les idées. C'est exactement ce que je devais entendre. Honteux que je ne le connaissais pas déjà, mais je ne suis pas un expert - plutôt un interlope avec une mission d'éducation pour d'autres.
I have thought about using barcodes themselves to sort different samples, but don't know the bioinformatics for this. Do you have a tool that we could incorporate into our demultiplexing? Sorry to be naive. It makes perfect sense to me and was discussing this possibility; I just didn't know someone else had solved the problem.
Given that, I don't need more indexes, I just need refinement to my pipeline in DNA Subway - easy to do with the help of our programmer, who handles all the things I'm inept at. That really simplifies things as we build out.
Can you share your favourite primer sets or point me to what works or doesn't? I have done amplification with or without the first amplification without extra sequences, and it is great to know you have figured out which works and what doesn't. I'm always looking to use proven approaches, so appreciate all the details. I chose to get rid of primary amplifications without adapters for 16S because it seemed not to matter, then struggled with other primer sets where things didn't work, but didn't realize that it would be primer set specific.
I'll look at details more tomorrow, and look forward to hearing more.
Merci beaucoup! J'espère que vous êtes Francophone et que ce Français n'est pas cause de désespoir.
Finally, if you are interested, I'd be happy to pool resources to grow the index set. I'd also love to know the IDT primer set for avoiding shared indexes in multiple samples. I have plenty of evidence of index mixing between adjacent or nearby wells, although at low levels most of the time. This varies a lot with novice or near-novice users, including students in classes taught by faculty we support.
Money is available on my end. Knowledge of what best to do is lacking. Maybe we can order together and get some bigger sets for bigger Illumina platforms? Seems like it isn't really necessary, though, and your insights fit my instincts to use the barcode region to bin reads for the same indexes but different barcodes.
By the way do you or others combine multiple amplicons (I'm pretty sure people do) for the same samples to increase taxonomic power. If so, how does that work? I haven't had time to tackle that, but it seems almost (but not quite) trivial in concept and super hard bioinformatically. How does the math work to determine the statistical reliability? How does trimming affect this? Presumably, you trim all to the same length for each amplicon based on QC, deblur in some way, then combine information? Do you somehow trim the classifiers (bioinformatically) and weight the value of each barcode? Does this take into account relative levels of variability in the different regions? Perhaps by making a virtual "combined barcode?" with weight based on variable positions and their relative discriminatory power? How do barcode regions that give conflicting info (if ever?) get handled? Sorry to not know the literature.... A C. elegans researcher out of my depth in all this biodiversity and statistics.
Bruce - un Canadien, mais aux Etats "Unis"- je m'en doute de temps en temps. D'Alberta, alors avec un Francais, mais pas du tout du Francais...