Extended index set

eric nash

unread,

Feb 22, 2023, 8:22:19 PM2/22/23

to Microbiome Helper

Hi all,

First off, thank you to the many people in here! What a great resource.

I use microbiome helper primers to support authentic student research. We have a platform for analysis through DNA Subway's Purple Line (a simplified QIIME2 GUI for educational programs). We have great support (I think, but feel free to correct me) for Earth Microbiome primer analyses. We have OK eDNA support for 12S, but I need a better classifier. I have pretty much failed to get COI, ITS, and RbcL to work well. Is there anyone out there with good MiSeq strategies for plants, fungi, and animals that aren't fish?

Hoping for classifiers for taxonomic assignments and for good choices for primers to use.

Also, with over a hundred faculty trained, I need more primer sets. Does anyone have a large number of primers so I can go beyond the 384 I currently use? Also, do you have sets that work well to avoid or, if it happens, identify cross contamination? I'm really hoping to clean things up with the many students of the faculty I support.

Ideally, I would have separate sets of indexes for each barcode region - I would need at least 384 index combinations for each of these. So, many more indexes at present. Either that or some kind of index code, so the same index could be used over and over but I could tell that the index was for 16S, not COI, etc... That could work for samples we do multiple regions with and, perhaps, overall as a way to preprocess and put sequences into the right downstream pipeline (by variable region and hence have sequencing choose the classifier).

Thanks for any thoughts... I suspect some of this is out there. Apologies if my thoughts are covered in this or other places. I want to avoid BLAST or large sequence analysis to cluster into barcode regions. Sounds like a lot of extra compute power for nothing....

Bruce

Andre Comeau

unread,

Feb 23, 2023, 11:44:34 AM2/23/23

to Microbiome Helper

Bruce,

I have a few comments on your stuff - can't answer it all, but I realize this was a "blast" out to everyone anyways:

- Our clients have been having quite a bit of sequencing success with the 12S MiFish primers and various rbcL primers we have been using to generate their data. Analysis side, I have no idea since we normally don't process when it is needing custom databases such as this as they can be better placed to know what the best reference sequences are for the dbase construction.

- We finally have our COI sequencing working quite well, but clients need to do the 1^st round PCR using the normal versions of the primers and we then do the 2^nd round using our fusion Illumina versions (normally we only do 1 direct PCR for all sequencing) - there seem to be a few primer sets we encounter, such as the COI + aforementioned MiFish, that don't work as well in full fusion versions directly. Again, we have not set up dbases for doing the COI analysis ourselves yet.

- ITS2 sequencing on the MiSeq (and full ITS with PacBio) has been working very well for a long time here at the IMR - that is in our MicrobiomeHelper SOPs already and we use the UNITE dbase for tax assignment which is the community standard for fungi.

- As for the primer/index issue, we don't really deal with greater and 384 combos as we do not want to load more than that # of samples on one whole MiSeq run, as you would start to reduce per-sample output - we find that the 50k raw reads/sample (max for a good run) is a perfect "sweet-spot" that allows for enough remaining reads after all the typical losses incurred in the pipeline QC (and ASV dereplication which takes a greater toll than the older OTU process) and sample variation that can be encountered in the pool (due to some samples underperforming in the PCRs), so that you generally have adequate numbers of reads for most samples once the run+analysis is done. If you are not running >384 in any given run, then you don't really need more combos for other primer sets as you shouldn't be overlapping the same well locations (barcode combos) for 16S + 18S, for example, on the same run.

- If it is a question of eventual/potential mistakes in a multiuser facility, then you can simply screen those raw reads using the primer sequences instead of the barcode sequences (which you shouldn't be handling anyways since demux should be happening on-board to generate the raw FASTQ files, F+R per sample). The primer sequences are longer/easier to match than barcodes and this is what we do when we overlay functional genes (such as nifH) on top of same client's 16S samples (ie: use same barcode combos), which we will sometimes do in-house to save internal client some money since the nifH diversity is much lower than universal 16S (and so mix 1/5^th of the 50k reads for the nifH and the remaining 5/6^th for the 16S). This is quite easy to implement with simple scripting and/or pre-made modules such as cutadapt that will already do it.

- If you really are interested in >384 indices, then I would recommend you talk to IDT about it. We are still (due to the ~20k start-up fee to order a full UDI set) using dual-indexing for most of our less-popular primers sets (and one-off custom orders), but we switched our most popular universal V4V5 set to full 384 UDIs quite easily through IDT and I'm pretty sure they mentioned we could have gotten more combos if needed.

ANDRÉ M. COMEAU, PhD
Manager • Integrated Microbiome Resource (IMR)
T: 902.494.2684 | E: andre....@dal.ca

Address for deliveries:
Dept. of Pharmacology
Tupper Med. Bldg., room 5D
Dalhousie University
5850 College St.
Halifax NS B3H 4R2

Research Associate (Lab Manager)

Morgan Langille Lab • Dept. of Pharmacology
ResearchGate Profile • GoogleScholar Publications

"Without fantasy, there is no science. Without fact, there is no art." - Nabokov
"The good thing about science is that it's true whether or not you believe in it." - Neil deGrasse Tyson

From: microbio...@googlegroups.com <microbio...@googlegroups.com> on behalf of eric nash <ericbr...@gmail.com>
Sent: Wednesday, February 22, 2023 9:22 PM
To: Microbiome Helper <microbio...@googlegroups.com>
Subject: [microbiome-helper] Extended index set

CAUTION: The Sender of this email is not from within Dalhousie.

--
You received this message because you are subscribed to the Google Groups "Microbiome Helper" group.
To unsubscribe from this group and stop receiving emails from it, send an email to microbiome-hel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/microbiome-helper/37d51b7c-cef4-4e2e-a6b1-173d47951829n%40googlegroups.com.

Bruce Nash

unread,

Feb 23, 2023, 9:40:02 PM2/23/23

to microbio...@googlegroups.com

Hi All - and especially André. Merci beaucoup pour toutes les idées. C'est exactement ce que je devais entendre. Honteux que je ne le connaissais pas déjà, mais je ne suis pas un expert - plutôt un interlope avec une mission d'éducation pour d'autres.

I have thought about using barcodes themselves to sort different samples, but don't know the bioinformatics for this. Do you have a tool that we could incorporate into our demultiplexing? Sorry to be naive. It makes perfect sense to me and was discussing this possibility; I just didn't know someone else had solved the problem.

Given that, I don't need more indexes, I just need refinement to my pipeline in DNA Subway - easy to do with the help of our programmer, who handles all the things I'm inept at. That really simplifies things as we build out.

Can you share your favourite primer sets or point me to what works or doesn't? I have done amplification with or without the first amplification without extra sequences, and it is great to know you have figured out which works and what doesn't. I'm always looking to use proven approaches, so appreciate all the details. I chose to get rid of primary amplifications without adapters for 16S because it seemed not to matter, then struggled with other primer sets where things didn't work, but didn't realize that it would be primer set specific.

I'll look at details more tomorrow, and look forward to hearing more.

Merci beaucoup! J'espère que vous êtes Francophone et que ce Français n'est pas cause de désespoir.

Finally, if you are interested, I'd be happy to pool resources to grow the index set. I'd also love to know the IDT primer set for avoiding shared indexes in multiple samples. I have plenty of evidence of index mixing between adjacent or nearby wells, although at low levels most of the time. This varies a lot with novice or near-novice users, including students in classes taught by faculty we support.

Money is available on my end. Knowledge of what best to do is lacking. Maybe we can order together and get some bigger sets for bigger Illumina platforms? Seems like it isn't really necessary, though, and your insights fit my instincts to use the barcode region to bin reads for the same indexes but different barcodes.

By the way do you or others combine multiple amplicons (I'm pretty sure people do) for the same samples to increase taxonomic power. If so, how does that work? I haven't had time to tackle that, but it seems almost (but not quite) trivial in concept and super hard bioinformatically. How does the math work to determine the statistical reliability? How does trimming affect this? Presumably, you trim all to the same length for each amplicon based on QC, deblur in some way, then combine information? Do you somehow trim the classifiers (bioinformatically) and weight the value of each barcode? Does this take into account relative levels of variability in the different regions? Perhaps by making a virtual "combined barcode?" with weight based on variable positions and their relative discriminatory power? How do barcode regions that give conflicting info (if ever?) get handled? Sorry to not know the literature.... A C. elegans researcher out of my depth in all this biodiversity and statistics.

Bruce - un Canadien, mais aux Etats "Unis"- je m'en doute de temps en temps. D'Alberta, alors avec un Francais, mais pas du tout du Francais...

You received this message because you are subscribed to a topic in the Google Groups "Microbiome Helper" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/microbiome-helper/bfCyCIFFDFU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to microbiome-hel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/microbiome-helper/QB1PR01MB3650314BE18C9578670B7290FDAB9%40QB1PR01MB3650.CANPRD01.PROD.OUTLOOK.COM.

Andre Comeau

unread,

Mar 28, 2023, 12:42:04 PM3/28/23

to microbio...@googlegroups.com

Bruce,

Below I assume you meant to say you have thought about using the primer sequences to sort different samples (not barcodes), since that was what we were discussing in the previous email. You can simply use cutadapt to do this as you list the primer sequences as the 5' and 3' "adapters" and then it splits the sequence file into "adapters found" and "no adapters found" files which would then be your corresponding primer-matching reads and rest of the mix, respectively.

For primer sets, our Protocols page lists what we use in routine and are working well (https://imr.bio/protocols.html). So far, the COI + MiFish + AMF (WANDA+AML2) primer sets are the only main ones we have encountered that seem to need the 2-step PCR. Btw, all our protocols are now also on Protcols.io: https://www.protocols.io/workspaces/integrated-microbiome-resource-imr/publications

Given that we have a high-throughput facility, I don't think we will need to pool for index set ordering, but I'll keep in mind for the future.

Combining different types of amplicons in one pipeline run-through is generally considered a bad idea - you need to process each amplicon separately due to the fact their lengths, error models, etc. are going to be different and hence should not be "deblured/DADAed" together. After you have final ASVs, though, you could imagine some way of "concatenating" those feature tables in a way that might increase your resolution, but has to be done carefully.

PS: Oui, francophone Acadien de la Nouvelle-Ecosse.

ANDRÉ M. COMEAU, PhD
Manager • Integrated Microbiome Resource (IMR)
T: 902.494.2684 | E: andre....@dal.ca

Address for deliveries:
Dept. of Pharmacology
Tupper Med. Bldg., room 5D
Dalhousie University
5850 College St.
Halifax NS B3H 4R2

Research Associate (Lab Manager)

Morgan Langille Lab • Dept. of Pharmacology
ResearchGate Profile • GoogleScholar Publications

"Without fantasy, there is no science. Without fact, there is no art." - Nabokov
"The good thing about science is that it's true whether or not you believe in it." - Neil deGrasse Tyson

From: microbio...@googlegroups.com <microbio...@googlegroups.com> on behalf of Bruce Nash <ericbr...@gmail.com>
Sent: Thursday, February 23, 2023 10:39 PM
To: microbio...@googlegroups.com <microbio...@googlegroups.com>
Subject: Re: [microbiome-helper] Extended index set

To view this discussion on the web visit https://groups.google.com/d/msgid/microbiome-helper/CAAXYNuGdeZLD1yad3zC9G%3DU4rMA84B1diZgAhiGgD6_JgW6J1Q%40mail.gmail.com.

Reply all

Reply to author

Forward