Hello Jon,
probably having memory issues with the freebayes variant calling. Which parameters can and should be adjusted?
After all my 171 samples were mapped successfully to a reference genome the downstream variant calling failed. So I rerun the variant calling step with the following config.file:
Number of Processors
12
Maximum Memory
320G
Trimming
no
Assembly?
no
Type_of_Assembly
PE
Clustering_Similarity%
0.8
Minimum within individual coverage level to include a read for assembly (K1)
2
Minimum number of individuals a read must be present in to include for assembly (K2)
2
Mapping_Reads?
no
Mapping_Match_Value
1
Mapping_MisMatch_Value
4
Mapping_GapOpen_Penalty
6
Calling_SNPs?
yes
After several call_genos errors like this:
ERROR: freebayes instance DID NOT COMPLETE
See below:
/usr/bin/bash: line 1: 20271 Killed freebayes -b cat-RRG.bam -t mapped.$1.bed -v raw.$1.vcf -f reference.fasta -m 5 -q 5 -E 3 --min-repeat-entropy 1 -V --populations popmap -n 10 -F 0.1 &>fb.$1.error.log
parallel: This job failed:
call_genos 449
I got the message: “Multiple instances of freebayes failed. dDocent will now recalibrate run parameters to use less memory.”
And later: “A previous freebayes instance failed again. dDocent will now recalibrate run parameters to use even less memory.”
Finally freebayes was terminated, because it failed a third time.
Subsequently I created a new config.file setting the maximum memory to 0, as suggested elsewhere in this chat, and started dDocent with the new config.file.
The process is still running, but the first two messages have already occurred and freebayes now started a third time.
Could it help to reduce the number of processors to 10 or less (makes 30GB/processor)?
Do you have other suggestions how to solve this?
Theresa
Hi Theresa,
Truly, this is difficult for me to troubleshoot for you. Mostly, this is a factor of your computing system limits. The memory used by freebayes scales directly with the number of individuals and the coverage for each individual. One other thing to mention, you should always mention to a developer if you are using their software/pipeline for a non-standard usage. Your example directory say MIG-seq which is fundamentally very different than RADseq or WGS.
I can offer a couple of suggestions (and I am not sure if either will work).
First, you could try adding the --limit-coverage N flag to the freebayes command in lines 489 and 507 of dDocent. Make sure to replace N with some expected coverage limit (maybe 50)? This will downsample coverage in regions above 50 to 50.
From freebayes help:
--limit-coverage N
Downsample per-sample coverage to this level if greater than this coverage.
Second, you could try changing line 422 from
SNPNUMProc=$(( $NUMProc * 2 ))
to
SNPNUMProc=$(( $NUMProc * 36 ))
This should help complete genotyping, but may lead to another error when merging all the vcf files together. I can help with that later if it happens.
Jon

--
You received this message because you are subscribed to the Google Groups "dDocent User Help Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ddocent+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ddocent/e26674a5-3100-44e9-8115-47b2978d07a6o%40googlegroups.com.