Sorted CRAM output

34 views
Skip to first unread message

Anne Mensink

unread,
Oct 20, 2021, 7:43:45 AM10/20/21
to elprep
Hello,
We use elPrep for BQSR, sorting and duplicate marking. However, when we attempt to run picard CollectMultipleMetrics on the CRAM output files we encounter an error:

"Exception in thread "main" htsjdk.samtools.SAMException: Requesting earlier reference sequence: 11 < 12",

which may point to the CRAM files not being properly sorted.
This made us wonder if we might be using elPrep incorrectly.

The elPrep commands we use in our workflow are the following:


elprep split "$BAMS/" "$SPLIT/" \
            --output-prefix "$sample_ID" \
            --output-type bam \
            --nr-of-threads 16


elprep filter $bam_file "$Results/$base.bam" \
            --mark-duplicates --remove-duplicates \
            --mark-optical-duplicates-intermediate "$Results/$base.metrics" \
            --nr-of-threads 16 \
            --bqsr-tables-only "$Results/$base.elrecal" \
            --bqsr-reference $el_fasta \
            --quantize-levels 0 \
            --known-sites $el_dbsnp,$el_known_indels,$el_mills \
            --sorting-order coordinate
            

elprep merge "$BAMS/" /dev/stdout --nr-of-threads 6 | \
elprep filter /dev/stdin /dev/stdout --bqsr-apply "$RECAL/" --recal-file "$RESULTS/$sample_ID.recal" --nr-of-threads 24 | \
samtools view -C -T $refFasta -o "$RESULTS/$sample_ID.dedup.recal.cram" -@ 4 --write-index -

Charlotte Herzeel (imec)

unread,
Oct 21, 2021, 6:53:06 AM10/21/21
to Anne Mensink, elprep
Hi Anne,

I don’t see anything wrong with your elprep commands. I do have a couple of questions:

* Which version of elPrep are you using? I am assuming it is an elprep 4.x.x version? That is because the option “—bqsr-reference” was renamed to “—reference” in version 5.0.0.

* I think you are passing multiple input files ($BAMS/) to the split command. In earlier versions, elPrep made no attempt at merging the headers of multiple input files. This may be problematic if the input files have different headers. Could this be an issue in your case? This issue is resolved in the latest release of elPrep (v. 5.1.1). Would it be possible to test with the latest release of elPrep?

* Is it possible to test the elprep sfm command instead of the different split/filter/merge calls? 

So something like:

elprep sfm $BAMS/ output.bam —sorting-order coordinate —mark-duplicates —remove-duplicates —mark-optical-dulicates $Results/$base.metrics —bqsr $Results/$base.metrics —known-sites …

(Or directly pipe to samtools to make cram or convert the bam to cram)

I would be curious to know if this works.

* Would it be possible to send the header of your final result?

Thanks!

Kind regards,
Charlotte




--
You received this message because you are subscribed to the Google Groups "elprep" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elprep+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elprep/34d03aff-cab4-49ba-8c33-b9c21aedf851n%40googlegroups.com.

Anne Mensink

unread,
Oct 28, 2021, 3:12:49 AM10/28/21
to elprep
Hi Charlotte,

We were indeed using v4.1.6 and the input BAM files we used contained different read groups.
I repeated the workflow with elPrep v5.1.1 and this seems to have solved the problem.

Thank you for your time and assistance.

Kind regards,
Anne
Reply all
Reply to author
Forward
0 new messages