Hi Meren,
First of all, thank you very much for your work on oligotyping.
I really enjoyed reading the papers, and it seems like a very promising
tool for my work. I can’t wait to work with it on my data.
I have a very naive and newbie question. I have read our tutorials and blog articles, and it is clear to me that this tool is to be used with Illumina or 454 data. I have Illumina data that I can work with, but that is not what brings me to write here.
I have a dataset of marker
(consensus) sequences that are publicly available. Many of these have
been generated using Sanger sequencing… Unfortunately, there does not
seem to be SRA FASTQ data available for those sequences either. However,
I am very interested in applying the oligotyping approach to them. I am
thinking of using snippy to generate pseudo-reads from those marker sequences that I can then use with oligotyping.
I have done preliminary analysis using the FASTA consensus sequences
and aligned them to a reference alignment using PyNAST, and other tools
to identify metadata-associated SNPs. I have identified group-specific
SNPs this way. However, I think that oligotyping has much more potential for identifying subtle variations in publicly available assembled data.
Do you think that this is a valid approach? What would you recommend in terms of oligotyping parameters if I was to use this approach?
Thank you very much in advance for your time,
Rodrigo