Hi Evan,
Thanks for getting in touch. It’s nice to get an email with a real biological issue/challenge as opposed to troubleshooting installations!
dDocent should be able to handle the mixed ploidy samples for you. However, you will have to edit one line of code and provide a file documenting the ploidy of each sample.
dDocent relies on freebayes to call genotypes, and according to the documentation freebayes can handle multiple ploidy levels with the following parameter adjustment:
-A --cnv-map FILE
Read a copy number map from the BED file FILE, which has
either a sample-level ploidy:
sample name, copy number
or a region-specific format:
reference sequence, start, end, sample name, copy number
... for each region in each sample which does not have the
default copy number as set by --ploidy.
Once you have this, you will need to modify the default dDocent code. For the latest version, this is line 387
freebayes -b split.$1.bam -t mapped.$1.bed -v raw.$1.vcf -f reference.fasta -m 5 -q 5 -E 3 --min-repeat-entropy 1 -V --populations popmap -n 10
to
freebayes -b split.$1.bam -t mapped.$1.bed -v raw.$1.vcf -f reference.fasta -m 5 -q 5 -E 3 --min-repeat-entropy 1 -V --populations popmap -n 10 -A your_cnv_file
You may also want to experiment with adding the -a parameter which turns off any HWE priors for the genotyping model. I’m not sure if it will correctly handle the mixed ploidy.
I’ve never worked with anything other than diploids, so I cannot give much more guidance to filtering and VCF manipulation. I encourage you to experiment with lots of different types of filtering!
In terms of combining your double files, you want to do this before starting dDocent
Given your names,
cat C85_1A.F.fq.gz C85_1B.F.fq.gz > C85_1.F.fq.gz
cat C85_1A.R.fq.gz C85_1B.R.fq.gz > C85_1.R.fq.gz
Please keep me posted on how this works out for you. It would be great to include this on the website and documentation.
Jon
On May 25, 2018 at 5:57:07 PM, Evan Hersh (evan....@botany.ubc.ca) wrote:
Hello Jon,I’m a PhD candidate at UBC and have just begun my journey into bioinformatics. Multiple people that I’ve talked to about my data have suggested dDocent, so I’m gonna give it a shot! I’ve read through your documentation and am looking forward to working through the tutorials, but my own dataset has a few caveats that I’d like to check with you before I get in too deep…I’ve got one lane of Illumina HiSEQ 2500 data (paired-end ddRAD with combinatorial inline barcodes). I’m working with a non-model plant species (no reference available) that has diploid sexuals and polyploid apomicts (that reproduce clonally through seed). The polyploids are mainly triploid, but there are potentially a few tetraploids as well. The populations mainly have uniform ploidy (either all diploid or all polyploid, except for one mixed-ploidy population), and were previously screened before being included in the library so we know which samples are polyploid-asexual and which aren’t. Also, it was recommended to me to include each polyploid individual twice (so as to ensure deep enough coverage to confidently call SNPs), so I’m expecting those to have approximately double the coverage as the diploid individuals. The hope in the end is to look at the usual population genetic suspects, such as genetic distance, structure, isolation by distance, etc, especially comparing levels of differentiation (and maybe identifying origins of asexuality?) between our diploid-sexual and polyploid-asexual populations...Given that info, I’ve got a few (probably naive) questions:- Do you have any recommendations for using dDocent with polyploids, specifically datasets with both diploids and polyploids? Is this even possible?- Similarly, do you know of particular parameters I should be careful of when attempting to detect variants in polyploid asexual populations (which may have low levels of differentiation within each population)?- My polyploid individuals were included twice in my library (each with their own barcode), so after demultiplexing they come out as separate files. They’ll need to be combined at some point, but I’m a newb to all this so I’m not sure at what point in the pipeline this should be done. With my current file-naming format (I attempted to follow the dDocent requirements), the two files from the same individual would be named something like:population_individualC85_1AC85_1BC85_2AC85_2Betc.I’m sure any tips you may have will be extremely helpful!Thanks,Evan--Evan HershPhD CandidateWhitton LabUBC
--
You received this message because you are subscribed to the Google Groups "dDocent User Help Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ddocent+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ddocent/f83f630b-0dcd-4417-ba3d-aa11661946bdo%40googlegroups.com.