Error: sort order of reads in BAMs must be the same

1,728 views
Skip to first unread message

Zoe

unread,
May 27, 2014, 5:46:25 PM5/27/14
to rna-...@googlegroups.com
Hi All,

I am new to RNA seq data analysis. Recently I applied Tophat-Cufflink pipeline to our data successfully. I also want to try STAR-Cufflink pipeline and STAR_HTSeq pipeline, however, it doesn't go smoothly.

When I run STAR-cufflink pipeline, I got the "Error: sort order of reads in BAMs must be the same". Below are the procedures I had:

 1. STAR --genomeDir /path/to/genomeDir --outSAMstrandField intronMotif --readFilesIn /path/to/input/read_r1.fastq read_r2.fastq --runThreadN 8 --outFileNamePrefix /output/dir/
 2. samtools view -b -S Aligned.out.sam > Aligned.out.bam
 3. samtools sort -n Aligned.out.bam Aligned.out.bam_sorted.bam
 4.cufflinks -p 8 -o /output/dir /path/to/Aligned.out.bam_sorted.bam

It gave me the error "Error: sort order of reads in BAMs must be the same".

I also tried to sort the data using the command recommended by the cufflink user manual:
sort -k3,3 -k4,4n Aligned.out.sam > Aligned.out_sorted.sam
and applied cufflinks as follow:

cufflinks -p 8 -o /output/dir Aligned.out_sorted.sam

then when I applied cuffmerge as follow:

cuffmerge -p 8 -o ./merged -s /path/to/genomeRef.fa ./assembly_GTF_list.txt

It also give me a similar error:

Begining transcriptome assembly merge

---------------------------------------------

Preparing output location ./merged/

Warning: no reference GTF provided!

[Tue May 27 17:36:08 2014] Converting GTF files to SAM [17:36:08] Loading reference annotation.

[17:36:10] Loading reference annotation.

[17:36:12] Loading reference annotation.

[17:36:14] Loading reference annotation.

[17:36:16] Loading reference annotation.

[17:36:17] Loading reference annotation.

[Tue May 27 17:36:19 2014] Assembling transcripts You are using Cufflinks v2.2.1, which is the most recent release.

Command line:

cufflinks -o ./merged/ -F 0.05 -q --overhang-tolerance 200 --library-type=transfrags -A 0.0 --min-frags-per-transfrag 0 --no-5-extend -p 16 ./merged/tmp/mergeSam_fileAniOQv [bam_header_read] EOF marker is absent. The input is probably truncated.

[bam_header_read] invalid BAM binary header (this is not a BAM file).

File ./merged/tmp/mergeSam_fileAniOQv doesn't appear to be a valid BAM file, trying SAM...

Error: sort order of reads in BAMs must be the same

               [FAILED]

Error: could not execute cufflinks

Is anybody experienced the same? Can anybody help me to figure out what went wrong?

Thank you very much in advance,

Zoe



SBGJansen

unread,
May 29, 2014, 4:50:55 AM5/29/14
to rna-...@googlegroups.com
Hi Zoe,

Why do you use namsort to sort your sam?
Use samtools sort without -n and try again.
As from manual: "The SAM file supplied to Cufflinks must be sorted by reference position"

As for cuffmerge, what are you trying to do? the output is a merged .gtf file, not a tmp/.sam file....?
usually you run cuffmerge on the output of cufflinks, rather than the other way around? which gtf files are you merging?

You can provide the merged .gtf from cuffmerge as a guide for cufflinks (-g/--GTF-guide option), but the input sam (or bam) files should be the sorted ones you produced previously.
However, if you do plan to use the merged gtf as a guide for cufflinks, I suggest you provide cuffmerge the reference gtf as well (-g/--ref-gtf option).

Hope this helps
<p class="MsoPlainTe
...

Zoe

unread,
May 29, 2014, 12:25:36 PM5/29/14
to rna-...@googlegroups.com
Hi SBGJansen,
Thank you very much for the quick reply.
1. Initially, I used samtools sort without -n, it gave me the error  "Error: sort order of reads in BAMs must be the same". and found other people on this site suggest using -n for sort.
2. I do not have a reference GTF file. I merged the gtf file from out put of Cufflinks. and try to use this gtf file as input for cuffdiff and get differentially expressed genes.
Using the suggestion from Cufflinks user manual to sort the data seems working fine, however, when I tried to merge the gtf from Cufflinks using the commend below:
cuffmerge -p 8 -o ./merged -s /path/to/genomeRef.fa ./all/the/gtf files/ from/previous/step cufflinks (transcripts.gtf), it complains as follows:

"Error: sort order of reads in BAMs must be the same

               [FAILED]

Error: could not execute cufflinks"

The problem is that the output Aligned.out.sam file from STAR can not be sorted properly, doesn't matter if I use samtools or the suggestion from cufflinks some how.

Zoe

SBGJansen

unread,
May 29, 2014, 1:28:04 PM5/29/14
to rna-...@googlegroups.com
Hi,

No problem,

I am still not certain if you actually have problems running cuffmerge?.. I assume not? Cuffmerge does not require bam files... And you sue a manifest file for the gtf files right?..
This suggests your problem lies with running cufflinks as in your first post...

Have you taken this into account:
If you use UN-stranded RNA you need to add --outSAMstrandField intronMotif to the STAR command. If you have STRANDED RNA you need to add library-type fr-firststrand to the cufflinks command. Also see the manual.

Other than that, I have no clue why this happens, it seems the BAM files are sorted differently.. Aligned.out.sam should be able to sort properly, with no need to namesort.

Best,

Zoe

unread,
May 29, 2014, 5:21:02 PM5/29/14
to rna-...@googlegroups.com

Thank you SBGJansen,

"--outSAMstrandField intronMotif" was used at the step of "Generating genomes".

BTW, the version of STAR I used is 2.3.0e. and samtools is 0.1.19

SBGJansen

unread,
May 30, 2014, 5:19:19 AM5/30/14
to rna-...@googlegroups.com
Hi Zoe,

It should not be used at the "Generating genomes" step. It is an extra option for the STAR aligning step, it will filter out reads it cannot identify a strand for.

Best,

Sjoert

Zoe

unread,
May 30, 2014, 9:32:04 AM5/30/14
to rna-...@googlegroups.com

Hi Sjoert,

Sorry, it was my mistake. It was used at the Aligning step actually instead of the "Generating genomes" step.

Thanks,
Zoe

Alexander Dobin

unread,
May 30, 2014, 4:56:45 PM5/30/14
to rna-...@googlegroups.com
Hi Zoe, Sjoert,

Sjoert's suggestions were correct, you need to sort the file by coordinate (no -n option):
 3. samtools sort Aligned.out.bam Aligned.out.bam_sorted

 4.cufflinks -p 8 -o /output/dir /path/to/Aligned.out.bam_sorted.bam  

Note that 'samtools sort' will add.bam suffix to the output sorted file name.

To check the sorting you can try 
samtools index Aligned.out.bam_sorted.bam
If this completes without errors, the file is sorted properly.

Cheers
Alex

Pingchuan Li

unread,
Oct 26, 2016, 10:40:50 AM10/26/16
to rna-star
Not sure if your problem has been resolved, recently I was  trying to use cufflinks and meeting the same problem, it's resolved by sorting the reference genome and repeating the tophat. 

Alexander Dobin

unread,
Oct 28, 2016, 11:47:48 AM10/28/16
to rna-star
Hi Pingchuan

Cufflinks requires sorted BAM files, which you can get by either sorting with samtools, or using STAR option --outSAMtype BAM SortedByCoordinate
Reply all
Reply to author
Forward
0 new messages