freebayes failed a third time

101 views
Skip to first unread message

the...@puchner.net

unread,
Mar 3, 2021, 10:54:08 AM3/3/21
to dDocent User Help Forum

Hello Jon, 

 

probably having memory issues with the freebayes variant calling. Which parameters can and should be adjusted?

After all my 171 samples were mapped successfully to a reference genome the downstream variant calling failed. So I rerun the variant calling step with the following config.file:

 

Number of Processors

12

Maximum Memory

320G

Trimming

no

Assembly?

no

Type_of_Assembly

PE

Clustering_Similarity%

0.8

Minimum within individual coverage level to include a read for assembly (K1)

2

Minimum number of individuals a read must be present in to include for assembly (K2)

2

Mapping_Reads?

no

Mapping_Match_Value

1

Mapping_MisMatch_Value

4

Mapping_GapOpen_Penalty

6

Calling_SNPs?

yes

Email

 

 

After several call_genos errors like this:

 

ERROR: freebayes instance DID NOT COMPLETE

 

See below:

/usr/bin/bash: line 1: 20271 Killed                  freebayes -b cat-RRG.bam -t mapped.$1.bed -v raw.$1.vcf -f reference.fasta -m 5 -q 5 -E 3 --min-repeat-entropy 1 -V --populations popmap -n 10 -F 0.1 &>fb.$1.error.log

parallel: This job failed:

call_genos 449

 

I got the message: “Multiple instances of freebayes failed. dDocent will now recalibrate run parameters to use less memory.

And later: “A previous freebayes instance failed again.  dDocent will now recalibrate run parameters to use even less memory.”

Finally freebayes was terminated, because it failed a third time.



Subsequently I created a new config.file setting the maximum memory to 0, as suggested elsewhere in this chat, and started dDocent with the new config.file. 

The process is still running, but the first two messages have already occurred and freebayes now started a third time. 

 

Could it help to reduce the number of processors to 10 or less (makes 30GB/processor)?

Do you have other suggestions how to solve this?



Theresa

the...@puchner.net

unread,
Mar 4, 2021, 9:37:19 AM3/4/21
to dDocent User Help Forum
After running almost 2 days, dDocent finished like this:

99% 630:1=2m17s 25
100% 631:0=0s 25

FreeBayes has now failed a third  time, likely because of memory issues.  More resources must be allocated to finish this analysis.

Using VCFtools to parse TotalRawSNPS.vcf for SNPs that are called in at least 90% of individuals

dDocent has finished with errors in /MIG-Seq2/prepr-paired

dDocent started Tue Mar 2 18:43:21 CET 2021

dDocent finished Thu Mar 4 15:11:02 CET 2021

Please check log files

After filtering, kept 7663 out of a possible 126679 Sites

dDocent 2.8.13
The 'd' is silent, hillbilly.

Is there anything more I can do to solve this memory issue?
Would it be possible to use the SNPs called up to this point for further analyzes (like Structure)? 
Could someone explain what exactly is missing if some of the call_genos steps fail? 

Would be very grateful for any advice!

Jon Puritz

unread,
Mar 7, 2021, 12:31:27 PM3/7/21
to ddo...@googlegroups.com, a.w...@uni-bayreuth.de

Hi Theresa,

Truly, this is difficult for me to troubleshoot for you. Mostly, this is a factor of your computing system limits. The memory used by freebayes scales directly with the number of individuals and the coverage for each individual. One other thing to mention, you should always mention to a developer if you are using their software/pipeline for a non-standard usage. Your example directory say MIG-seq which is fundamentally very different than RADseq or WGS.

I can offer a couple of suggestions (and I am not sure if either will work).

First, you could try adding the --limit-coverage N flag to the freebayes command in lines 489 and 507 of dDocent. Make sure to replace N with some expected coverage limit (maybe 50)? This will downsample coverage in regions above 50 to 50.

From freebayes help:

 --limit-coverage N
  
Downsample per-sample coverage to this level if greater than this coverage.

Second, you could try changing line 422 from

SNPNUMProc=$(( $NUMProc * 2 ))

to

SNPNUMProc=$(( $NUMProc * 36 ))

This should help complete genotyping, but may lead to another error when merging all the vcf files together. I can help with that later if it happens.

Jon



-- 
Jon Puritz, PhD

Assistant Professor
Department of Biological Sciences
University of Rhode Island
120 Flagg Road,  Kingston, RI 02881
Pronouns: he/him


"The most valuable of all talents is that of never using two words when one will do. ”  -Thomas Jefferson
--------------------------------------------
Q: Why is this email five sentences or less?
--------------------------------------------
I only check and respond to email at 10am and 4:00pm each day. 

--
You received this message because you are subscribed to the Google Groups "dDocent User Help Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ddocent+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ddocent/e26674a5-3100-44e9-8115-47b2978d07a6o%40googlegroups.com.

the...@puchner.net

unread,
Mar 8, 2021, 7:17:37 AM3/8/21
to dDocent User Help Forum
Thank you Jon!

I will try both suggestions!


Am Mittwoch, 3. März 2021 16:54:08 UTC+1 schrieb the...@puchner.net:
Reply all
Reply to author
Forward
0 new messages