SAM validation error

252 views
Skip to first unread message

mehr...@broadinstitute.org

unread,
Aug 23, 2018, 7:15:54 AM8/23/18
to last-align
Hello,

I followed the PE recipe to align the reads. At a later stage I need to add ReadGrp to the bam file and PICARD is showing following error:
'''
Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Record 1, Read name M03309:282:000000000-D4876:1:1102:11182:26444, Mapped mate should have mate reference name
at htsjdk.samtools.SAMUtils.processValidationErrors(SAMUtils.java:441)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:644)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.<init>(BAMFileReader.java:617)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.<init>(BAMFileReader.java:605)
at htsjdk.samtools.BAMFileReader.getIterator(BAMFileReader.java:313)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.iterator(SamReader.java:448)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.iterator(SamReader.java:350)
at picard.sam.AddOrReplaceReadGroups.doWork(AddOrReplaceReadGroups.java:124)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:209)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)

'''
Commands used:
```
~/Research/Programs/LAST/last-948/scripts/fastq-interleave ../../qc_qa/DeDup/D00172_000001_R1.DeDup.fq.gz ../../qc_qa/DeDup/D00172_000001_R2.DeDup.fq.gz | ~/Research/Pr
ograms/LAST/last-948/src/lastal -Q1 -D1000 -i1 HumanIDX_last48/last_hg19 >temp.maf

~/Research/Programs/LAST/last-948/src/last-pair-probs temp.maf >out.maf

~/Research/Programs/LAST/last-948/scripts/maf-convert sam out.maf >out.sam

samtools view -T HumanIDX_last48/ucsc.hg19.fasta -b -S out.sam >Sample_D00172_000001_ORF15Test_LAST.bam

java -Xmx2G -jar /modules/ogi-mbc/software/picardtools/1.141/picard.jar AddOrReplaceReadGroups I=Sample_D00172_000001_ORF15Test_LAST.bam O=Sample_D00172_000001_ORF15Test_LAST.ReadGrp.bam RGPL=Illumina RGSM=D00172_000001 RGLB=D00172_000001 RGPU=Illumina RGID=D00172_000001
```
Even when I add VALIDATION_STRINGENCY=LENIENT to Picard, many reads are ignored. Perhaps maf-convert is adhering to old standar of SAM/BAMs?

I did updated and used the latest version of LAST: last-948

Thank you,
Sudeep

mehr...@broadinstitute.org

unread,
Aug 24, 2018, 8:39:53 AM8/24/18
to last-align

Hello,
I used the '-r' option in mac-convert to added the read group. Once that was done, I used Picard to validate the sam/bam and its not validating.

```
## HISTOGRAM java.lang.String
Error Type Count
ERROR:HEADER_RECORD_MISSING_REQUIRED_TAG 1
ERROR:INVALID_FLAG_FIRST_OF_PAIR 3271
ERROR:INVALID_FLAG_SECOND_OF_PAIR 2637
ERROR:MISSING_PLATFORM_VALUE 1
ERROR:POORLY_FORMATTED_HEADER_TAG 5
```

I had to do all this as when I get the 'sam' output from LAST, its coverted to 'BAM' sorted and then used GATK to make variant calls. When GATK failed, based on the errors shown I started fixing issues. But now when readgroup is added. GATK still wont run. This is when I tried 'validating' sam file.

Thanks,
Sudeep

mehr...@broadinstitute.org

unread,
Aug 24, 2018, 8:45:26 AM8/24/18
to last-align

Sorry about frequent emails.

To check/fix issues, I tried running samtools fixmate:
```
samtools fixmate -O bam out_ReadGrpAdded.sam out_ReadGrpAdded.fixmate.bam
[E::sam_parse1] missing SAM header
[W::sam_read1] Parse error at line 3
```
So my initial suspicion seems correct:
`Perhaps maf-convert is adhering to old standard of SAM/BAMs` or may be its something else.

Best,
Sudeep

Frith, Martin

unread,
Aug 26, 2018, 11:31:48 PM8/26/18
to mehr...@broadinstitute.org, last-align
Hi Sudeep,

many thanks for your interest in LAST. In here:
There are some comments about getting a "sequence dictionary" and fixing pair information.
Does that help?

Have a nice day,
Martin


--
You received this message because you are subscribed to the Google Groups "last-align" group.
To unsubscribe from this group and stop receiving emails from it, send an email to last-align+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

mehr...@broadinstitute.org

unread,
Dec 13, 2018, 12:13:38 PM12/13/18
to last-align
Hi Martin,

Sorry, for the huge gap.

I tried adding the dictionary and still. Picard validate of SAM is not happy.

I tried samtools fixmate etc and still cannot get Picard Validation to pass.
Bypassing the validation, GATK fails.

Issue remains at MAF->SAM

Coverting maf to SAM
~/Research/Programs/LAST/last-961/scripts/maf-convert sam out.maf -f ../../../LASTIDX/ucsc.hg19.rCRS.fasta.dic >Sample_D00539_000388_1.sam

SAM-.BAM
samtools view -T ../../../LASTIDX/ucsc.hg19.rCRS.fasta -b -S Sample_D00539_000388_1.sam >Sample_D00539_000388_1_LAST.bam

Removing singelton.
samtools view -@ 5 -h -b -F 4 -F 8 Sample_*LAST.bam >Sample_D00539_000388_1_LAST.UMapped_SE_PE_Removed.bam

fixmate needs file to be read name sorted
samtools sort -n -@ 5 -O BAM Sample_D00539_000388_1_LAST.UnMapped_SE_PE_Removed.bam -o Sample_D00539_000388_1_LAST.UnMappedMapped_SE_PE
_Removed.NameSorted.bam

samtools fixmate -r -@ 5 -O BAM Sample_D00539_000388_1_LAST.UnMappedMapped_SE_PE_Removed.NameSorted.bam MateFixed.bam

java -Xmx2G -jar /modules/ogi-mbc/software/picardtools/1.141/picard.jar ValidateSamFile I=MateFixed.bam MODE=SUMMARY


## HISTOGRAM java.lang.String
Error Type Count
ERROR:INVALID_FLAG_FIRST_OF_PAIR 3043
ERROR:INVALID_FLAG_SECOND_OF_PAIR 1484
ERROR:MISSING_READ_GROUP 1
ERROR:RECORD_OUT_OF_ORDER 34698
WARNING:RECORD_MISSING_READ_GROUP 504591

Please let me know your thoughts to fix the issue.

Thanks,
Sudeep

Frith, Martin

unread,
Dec 28, 2018, 2:29:41 AM12/28/18
to mehr...@broadinstitute.org, last-align
Hello

sorry for this late and mostly-useless reply.

For the "read group" errors, you might try the read group option mentioned here:

Judging by your email address, you should have some colleagues who are experts in this stuff :-)

Have a nice day,
Martin

Reply all
Reply to author
Forward
0 new messages