Re: Using dDocent for population genetic structure with diploid and polyploid populations, polyploids are asexual...

173 views
Skip to first unread message

Jon Puritz

unread,
May 29, 2018, 9:24:18 AM5/29/18
to Evan Hersh, ddo...@googlegroups.com

Hi Evan,

Thanks for getting in touch. It’s nice to get an email with a real biological issue/challenge as opposed to troubleshooting installations!

dDocent should be able to handle the mixed ploidy samples for you. However, you will have to edit one line of code and provide a file documenting the ploidy of each sample.

dDocent relies on freebayes to call genotypes, and according to the documentation freebayes can handle multiple ploidy levels with the following parameter adjustment:

 -A --cnv-map FILE
                   Read a copy number map from the BED file FILE, which has
                   either a sample-level ploidy:
                      sample name, copy number
                   or a region-specific format:
                      reference sequence, start, end, sample name, copy number
                   ... for each region in each sample which does not have the
                   default copy number as set by --ploidy.

Once you have this, you will need to modify the default dDocent code. For the latest version, this is line 387

freebayes -b split.$1.bam -t mapped.$1.bed -v raw.$1.vcf -f reference.fasta -m 5 -q 5 -E 3 --min-repeat-entropy 1 -V --populations popmap -n 10

to

freebayes -b split.$1.bam -t mapped.$1.bed -v raw.$1.vcf -f reference.fasta -m 5 -q 5 -E 3 --min-repeat-entropy 1 -V --populations popmap -n 10 -A your_cnv_file

You may also want to experiment with adding the -a parameter which turns off any HWE priors for the genotyping model. I’m not sure if it will correctly handle the mixed ploidy.

I’ve never worked with anything other than diploids, so I cannot give much more guidance to filtering and VCF manipulation. I encourage you to experiment with lots of different types of filtering!

In terms of combining your double files, you want to do this before starting dDocent

Given your names,

cat C85_1A.F.fq.gz C85_1B.F.fq.gz > C85_1.F.fq.gz
cat C85_1A.R.fq.gz C85_1B.R.fq.gz > C85_1.R.fq.gz

Please keep me posted on how this works out for you. It would be great to include this on the website and documentation.

Jon



-- 
Jon Puritz, PhD

Assistant Professor
Department of Biological Sciences
University of Rhode Island
120 Flagg Road, Kingston, RI 02881

Webpage: MarineEvoEco.com

Email: 
jpu...@gmail.com 

Cell:    401-338-8739
Work:  401-874-9020

"The most valuable of all talents is that of never using two words when one will do.” -Thomas Jefferson


On May 25, 2018 at 5:57:07 PM, Evan Hersh (evan....@botany.ubc.ca) wrote:

Hello Jon,

I’m a PhD candidate at UBC and have just begun my journey into bioinformatics. Multiple people that I’ve talked to about my data have suggested dDocent, so I’m gonna give it a shot! I’ve read through your documentation and am looking forward to working through the tutorials, but my own dataset has a few caveats that I’d like to check with you before I get in too deep…

I’ve got one lane of Illumina HiSEQ 2500 data (paired-end ddRAD with combinatorial inline barcodes). I’m working with a non-model plant species (no reference available) that has diploid sexuals and polyploid apomicts (that reproduce clonally through seed). The polyploids are mainly triploid, but there are potentially a few tetraploids as well. The populations mainly have uniform ploidy (either all diploid or all polyploid, except for one mixed-ploidy population), and were previously screened before being included in the library so we know which samples are polyploid-asexual and which aren’t. Also, it was recommended to me to include each polyploid individual twice (so as to ensure deep enough coverage to confidently call SNPs), so I’m expecting those to have approximately double the coverage as the diploid individuals. The hope in the end is to look at the usual population genetic suspects, such as genetic distance, structure, isolation by distance, etc, especially comparing levels of differentiation (and maybe identifying origins of asexuality?) between our diploid-sexual and polyploid-asexual populations...

Given that info, I’ve got a few (probably naive) questions:

- Do you have any recommendations for using dDocent with polyploids, specifically datasets with both diploids and polyploids? Is this even possible?
- Similarly, do you know of particular parameters I should be careful of when attempting to detect variants in polyploid asexual populations (which may have low levels of differentiation within each population)?
- My polyploid individuals were included twice in my library (each with their own barcode), so after demultiplexing they come out as separate files. They’ll need to be combined at some point, but I’m a newb to all this so I’m not sure at what point in the pipeline this should be done. With my current file-naming format (I attempted to follow the dDocent requirements), the two files from the same individual would be named something like:

population_individual
C85_1A
C85_1B

C85_2A
C85_2B

etc.

I’m sure any tips you may have will be extremely helpful! 

Thanks,

Evan
 
--
Evan Hersh
PhD Candidate
Whitton Lab
UBC






Alfons Weig

unread,
Feb 12, 2021, 9:50:20 AM2/12/21
to dDocent User Help Forum
Hello,
I am going to analyse MIG-seq data from hexaploid trees.
Could you update where in the dDocent code the --cnv-map modification has to be inserted. The line given above seems not the correct line in the most recent release.
I found two possible lines 489 and 507 in dDocent version 2.8.12, downloaded as a conda environment today.

May be, you could also answer another question: I would like to analyse the SNP genotypes in STRUCTURE; is there an export function for that particular format in dDocent?

Thanks for that great tool!
Best regards

Jon Puritz

unread,
Feb 12, 2021, 10:31:41 AM2/12/21
to ddo...@googlegroups.com
Any line with a freebayes command, so yes 489 and 507.  For STRUCTURE, the best practice would be to convert to haplotypes using https://github.com/chollenbeck/rad_haplotyper/blob/master/README.md.  It can output a genepop format which can easily be converted to STRUCTURE.
--
You received this message because you are subscribed to the Google Groups "dDocent User Help Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ddocent+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ddocent/f83f630b-0dcd-4417-ba3d-aa11661946bdo%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages