Oligotyping using public consensus sequences

26 views

Skip to first unread message

Rodrigo Ortega

unread,

Dec 8, 2017, 1:33:17 AM12/8/17

to Oligotyping and MED

Hi Meren,

First of all, thank you very much for your work on oligotyping. I really enjoyed reading the papers, and it seems like a very promising tool for my work. I can’t wait to work with it on my data.

I have a very naive and newbie question. I have read our tutorials and blog articles, and it is clear to me that this tool is to be used with Illumina or 454 data. I have Illumina data that I can work with, but that is not what brings me to write here.

I have a dataset of marker (consensus) sequences that are publicly available. Many of these have been generated using Sanger sequencing… Unfortunately, there does not seem to be SRA FASTQ data available for those sequences either. However, I am very interested in applying the oligotyping approach to them. I am thinking of using snippy to generate pseudo-reads from those marker sequences that I can then use with oligotyping. I have done preliminary analysis using the FASTA consensus sequences and aligned them to a reference alignment using PyNAST, and other tools to identify metadata-associated SNPs. I have identified group-specific SNPs this way. However, I think that oligotyping has much more potential for identifying subtle variations in publicly available assembled data.

Do you think that this is a valid approach? What would you recommend in terms of oligotyping parameters if I was to use this approach?