is preprocessing with elprep filter needed?

12 views
Skip to first unread message

Stephane Plaisance

unread,
Aug 23, 2022, 5:00:49 AM8/23/22
to elprep
Hi,
I will soon run elprep and would like to optimize it. I do not see a tutorial about filtering before running elprep.
I already used the sfm command but not yet the filter command.
Is filtering the BAM with elprep filter before running sfm something of a good practice?
Thanks for any info or primer
Stephane

# my current command on unfiltered BWA mappings is the following:

```pixeldist-2500
elprep sfm ${inbam} ${outbam} \
  --optical-duplicates-pixel-distance ${pixeldist} \
  --mark-duplicates \
  --mark-optical-duplicates ${workdir}/elprep/${pfx}.output.metrics \
  --sorting-order coordinate \
  --bqsr ${workdir}/elprep/${pfx}_bqsr.output.recal \
  --known-sites ${dbsnp} \
  --reference ${ref}  \
  --haplotypecaller ${workdir}/elprep/${pfx}.g.vcf.gz \
  --reference-confidence GVCF \
  --nr-of-threads ${thr} \
  --timed \
  --log-path ${workdir}/elprep \
  --tmp-path ${workdir}/tmpfiles```

I have the latest version 5.1.3 installed with conda

Charlotte Herzeel (imec)

unread,
Aug 23, 2022, 10:10:06 AM8/23/22
to Stephane Plaisance, elprep
Hi Stéphane,

The elprep sfm and elprep filter commands have identical semantics. The difference is that a filter command executes a pipeline completely in memory, while the sfm command splits up the bam file in smaller files on disk for processing. The goal of sfm is to reduce memory usage. If you have enough RAM available, using filter instead of sfm may considerably speedup the runtime of your pipeline. See for example the “Split and Merge tools” in the README.

It is in general best to include all filters you want to execute on the data in the same elprep command. This is because elprep merges the execution of the different steps in pipelines, which we have shown to greatly speedup the execution compared to executing the steps one by one.

I hope this is more clear.

Thanks!

Best,
Charlotte

--
You received this message because you are subscribed to the Google Groups "elprep" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elprep+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elprep/02be2f92-f0e5-48f7-8e7e-5cec9c9b2579n%40googlegroups.com.

Stephane Plaisance

unread,
Aug 23, 2022, 10:16:53 AM8/23/22
to elprep
Dear Charlotte, 

I had understood it wrongly and was using sfm while I have 512GB RAM and could probably benefit from filter; I will definitely try filter now, thanks for the explanation.
I have a few more hesitations about unused filters
should I apply [--filter-mapping-quality] and is 30 a good value? or does BQSR already take care of this?
also should I add [--filter-unmapped-reads] and [--filter-non-exact-mapping-reads] ? (standard or strict?) to get better results or will it only reduce the size of the outbam?
I understand the point of these parameters but do not know how much they could improve or affect the pipeline
Thanks for the great tool!
Stephane

Reply all
Reply to author
Forward
0 new messages