Error Parsing text SAM files

1,574 views
Skip to first unread message

Roshan Sharma Poudel

unread,
Jan 20, 2016, 10:26:15 AM1/20/16
to rna-star
Hi,

I am using GATK Best Practices workflow for SNP and indel calling on RNAseq data. As per the pipeline I used Star 2 Pass alignment steps to generate the SAM files. However, my sam files keeps on throwing the error message when I used the PICARD tool. I was wondering if anyone have come up this problem and had got solution for it.

Lines of my Fastq file:

@NB500921:39:HK5VYBGXX:1:11101:1587:1040 1:N:0:ACAGTG

TTTCCATTCCCTCCTTCCTGGGGAATCCACCAACCATAACCGCAACATTCACACCAGTGCAAGCCTCAACAGGATCAGTTGTT

+

EEEEEEEEEEEEEEEEEEEEEEEEAEEEEEAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEAA

@NB500921:39:HK5VYBGXX:1:11101:24524:1040 1:N:0:ACAGTG

GCCTGCTGACTGAGAGTGGCAACACTAAGGATGACCTGAAGCTTCCCACTGATGATGTTCTGCTTGGCCAGATCAAGACTG

+

EEEEEEE/EEEEEEEEE6EEEEEEEEEEEEEE/6EEE/EE/AEEE66EAE/EEEEEEEEEE<EEEE/E<E6/EEAEAEEE<

@NB500921:39:HK5VYBGXX:1:11101:3319:1041 1:N:0:ACAGTG

TGTTACTCGCATATGAAATGAATGGAGAGACTATCAATCGTGACCATGGATATCCACTCCGTGTTGTTGTACCTGGTGTTATAG

+

EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEAEEEEEAAEEEEEAEEEEEEEEEEEEEE/EEEEEEE

@NB500921:39:HK5VYBGXX:1:11101:7988:1041 1:N:0:ACAGTG

TGAGCCCATGACTCCTGGCCAGTGCAATTTGGTCGTGGAGAGGCTTGGCGACTACCTGGTCGAGCAGGGTTTCTAAGCCCACCC

+

EEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEA


The command I used:


~/STAR/source/STAR --runMode genomeGenerate --genomeDir '/home/bobsgenefinder/Desktop/Link_to_share/STAR/genomeDir' --genomeFastaFiles '/home/bobsgenefinder/Desktop/Link_to_share/STAR/sccl_ref_fasta_for_GATK.fasta' --runThreadN 18


~/STAR/source/STAR --genomeDir '/home/bobsgenefinder/Desktop/Link_to_share/STAR/genomeDIR' --readFilesIn '/home/bobsgenefinder/Desktop/Link_to_share/STAR/runDIR/72_00 trimmed.fastq' --runThread 18


~/STAR/source/STAR --runMode genomeGenerate --genomeDir '/home/bobsgenefinder/Link to share/STAR/runDir/genomeDir' --genomeFastaFiles '/home/bobsgenefinder/Link to share/STAR/sccl_ref_fasta_for_GATK.fasta'  --sjdbFileChrStartEnd '/home/bobsgenefinder/Link to share/STAR/runDir/SJ.out.tab' --sjdbOverhang 150 --runThreadN 20


~/STAR/source/STAR --genomeDir '/home/bobsgenefinder/Desktop/Link_to_share/STAR/72_00 trimmed.fastq/genomeDIR' --readFilesIn '/home/bobsgenefinder/Desktop/Link_to_share/STAR/runDIR/72_00 trimmed.fastq' --runThread 18



Lines of my Sam files:

NB500921:39:HK5VYBGXX:1:11308:20045:6791        0                  Supercontig_2.6       1266086    255             151M1S    *                    0                  0                    GTGTTTGACCAGGGGGGTGACCGAAGGACGAGCGTTGTTGACGACCGTAAGCCAACCGGATACGGCGAAGATTTCATCGATCTCCTCCAACGACCGGTAAGCAGTTTCCGGAAAGAAGAAATATATACTCGGGACAATGAACGCGTTGACG

                    AA/AAEEAEEEEEEEEAAEEEAEEEAEEEEEE<EEEEEEE/EEEE6<EEEEAEEEEEA</EAEEEEEAEEEEEEE<E6EEAEEAEA<EEE/EAAAEEEEEEAEEEEE//E<E<A<A<<EAEAEEE<A6AAAE<AAA<EAE<A6AAAEAEAA

                    NH:i:1        HI:i:1          AS:i:149    nM:i:0

NB500921:39:HK5VYBGXX:1:11308:21585:6791        0                  Supercontig_2.51     543058      255             121M70N30M1S                    *                 0                  0                    CATGACCTATGACCTGAAGGTTTTGAAGCTTCTAGGCTTCAACATGGTTAGAAAACATGTCAAGATCGAGCCGGATCTCTTCTATTATGCCTGCGACAAAATGGGCTTGATGGTATGGCAAGACATGCCCTCGATGAACCCAGATTCGCCA

                    AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEAEEEEEEEEEEEEEEEEEAEEEAEEEEEEEEEEEEEEEEEAAEEEEEEE<EEEEEEEEEEEEEEAEEEAE<EEEEAEE<6<AEEAEE<EAAAAEAAEEAAA<

                    NH:i:1        HI:i:1          AS:i:151    nM:i:0

NB500921:39:HK5VYBGXX:1:11308:24870:6809        16               Supercontig_2.5       153123      255             1S151M    *                    0                  0                 

GAAGAAGCCAAGCAAACCTAGGAAGCGAATCAGGAAGACCAGGCAAAGCATTTCGGAGGAGACCACGCTTTTGGCCGACAATTCCTCCGCTTTGGTAGCTCGCGACGGGGCGGAACCCAGCACACCTGAAGAGATCTTTCAACTGAACCCG                   

/EEEEAAEAEEAAAAEEEEEAEEEAAAE<E<EAAEEEEAEEAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA                    NH:i:1        HI:i:1          AS:i:149    nM:i:0


The error message I get when i used PICARD AddorReplaceGroups option:

Caused by: htsjdk.samtools.SAMFormatException: Error parsing text SAM file. Not enough fields; File 2Aligned.out.sam; Line 396
Line: NB500921:39:HK5VYBGXX:1:11309:16590:5060 0 Supercontig_2.9 702412 255 151M1S * 0 0 CAGCAAGCTCGATCATCACCAAAACCCGCTCCAAAGACCACCACCACCACCCCAACCGTAAATTCTTCACCATGCACCTTGGACTCTACCTTGCTTGGGCTCTGGTTGTACTCCACTCTGTGGTAGCTCACCCGCAGTGCTACGAACCCTG
at htsjdk.samtools.SAMLineParser.reportFatalErrorParsingLine(SAMLineParser.java:432)
at htsjdk.samtools.SAMLineParser.parseLine(SAMLineParser.java:217)
at htsjdk.samtools.SAMTextReader$RecordIterator.parseLine(SAMTextReader.java:248)
at htsjdk.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:236)
at htsjdk.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:212)
at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:545)
at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:519)
at htsjdk.samtools.SamFileValidator.validateSamRecordsAndQualityFormat(SamFileValidator.java:263)

Rory Kirchner

unread,
Jan 20, 2016, 10:29:46 AM1/20/16
to Roshan Sharma Poudel, rna-star
Hi Roshan,

It looks like your SAM file might be truncated. If you do

 tail 2Aligned.out.sam

Does the last record look complete? From the Picard error it looks like it is missing fields after the sequence.

Best,

Rory

--
You received this message because you are subscribed to the Google Groups "rna-star" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rna-star+u...@googlegroups.com.
Visit this group at https://groups.google.com/group/rna-star.

Roshan Sharma Poudel

unread,
Jan 20, 2016, 1:31:12 PM1/20/16
to rna-star

Thanks Rory for the response. 


I was wondering if file size have any effect on the alignments.i have 23 samples and the fastq file size ranges from 12GB to 30GB,

Rory Kirchner

unread,
Jan 20, 2016, 1:32:32 PM1/20/16
to Roshan Sharma Poudel, rna-star
Hi Roshan,

Different file sizes wouldn’t cause the issue you’re seeing, it looks like the file is corrupted somehow. You’ll have to re-align to fix it.

Best,

Rory

On Jan 20, 2016, at 1:31 PM, Roshan Sharma Poudel <wrosa...@gmail.com> wrote:

Thanks Rory for the response. 


I was wondering if file size have any effect on the alignments.i have 23 samples and the fastq file size ranges from 12GB to 30GB,

Roshan Sharma Poudel

unread,
Jan 20, 2016, 1:55:22 PM1/20/16
to rna-star

Hi Rory,


I already started the alignment again. I just want to make sure if there is an issue with bigger files.

I hope it works this time. 

Thanks for the suggestions.

Roshan
Reply all
Reply to author
Forward
0 new messages