bam header without the sorting field and wiggle file empty

182 views
Skip to first unread message

mbourgey

unread,
Nov 24, 2014, 9:51:57 AM11/24/14
to rna-...@googlegroups.com
Hello Alex,

I noticed 2 issues:
1/ when sorting my bam during the alignment with STAR that the sorting filed info is not present in the header: @HD     VN:1.4  SO:coordinate
I only get: @HD     VN:1.4

2/ second my wiggle output files are empty

my command is :
STAR --runMode alignReads --genomeDir reference.Merged --readFilesIn trim/SRR032238/SRR032238.trim.single.fastq.gz --runThreadN 3 --readFilesCommand zcat --outStd Log --outSAMunmapped Within --outSAMtype BAM SortedByCoordinate --limitGenomeGenerateRAM 25000000000 --outFileNamePrefix alignment/SRR032238/ --limitBAMsortRAM 25000000000 --limitIObufferSize 4000000000 --outWigType wiggle read1_5p --outWigStrand Unstranded --outWigReferencesPrefix chr --chimSegmentMin 21

I have to advocate that I used the 2.4.0e ! So perhaps these issues are already corrected in the last STAR version but I don't see anything that refers to these issues in the change log of the last version.

Thanks in advance

Mathieu

mbourgey

unread,
Nov 24, 2014, 1:17:49 PM11/24/14
to rna-...@googlegroups.com
Update:
the header issue is resolved when the last version of STAR (2.4.0f1) is used. But I still get empty wiggle files

Mathieu

Alexander Dobin

unread,
Nov 25, 2014, 6:46:31 PM11/25/14
to rna-...@googlegroups.com
Hi Mathieu,

I cannot reproduce this problem on my dataset with your parameters. Could you try to cut a few thousand reads from your dataset, re-try the wiggle generation, and if it does not work, send me the BAM for the small subset?

Note that --outWigType wiggle read1_5p --outWigStrand Unstranded will make signal for the left-most coordinate of every read on the +strand, independent of which strand the read was mapped to.

Cheers
Alex

Tom van Schaik

unread,
Jan 19, 2015, 5:22:18 AM1/19/15
to rna-...@googlegroups.com
Hi all,

I experienced a similar problem as described: empty wiggle files. I used the call:

STAR --genomeDir ../reference/bos_taurus/sequence/star_index \
   --readFilesCommand zcat \
   --runThreadN 8 \
   --sjdbGTFfile ../reference/bos_taurus/annotation/Bos_taurus.UMD3.1.78.gtf \
   --readFilesIn test_trim/paired_1.fastq.gz test_trim/paired_2.fastq.gz \
   --outFileNamePrefix test_map/ \
   --outFilterMultimapNmax 20 \
   --genomeLoad LoadAndRemove \
   --outSAMtype BAM SortedByCoordinate \
   --limitBAMsortRAM 15000000000 \
   --outWigType wiggle \
   --outWigStrand Unstranded \
   --outWigReferencesPrefix chr

However, when I removed the last argument from the call ("outWigReferencePrefix chr"), the wiggle file turned up fine. Modifying the wiggle file from "X" to chr"X" shouldn't be a big problem afterwards, just as a quick fix.

Cheers, Tom

Alexander Dobin

unread,
Jan 21, 2015, 2:18:21 PM1/21/15
to rna-...@googlegroups.com
Hi Tom,

it appears that you are using ENSMEBL genome and annotations, which do not contain the "chr" prefix  in the chromosome names.
However, the --outWigReferencesPrefix chr option requires the chromosomes output in wiggles to have this prefix. Omitting this option is exactly the right thing to do, then the wiggles files will contain the same chromosomes as your genome.

This option is useful if you want only some of the references to be present in the wiggle files, e.g. we have normal chromosomes (named with "chr" prefix) and spike-ins, and we only want to output the normal chromosomes.

Cheers
Alex

Tom van Schaik

unread,
Jan 21, 2015, 2:25:14 PM1/21/15
to rna-...@googlegroups.com
Ah, I'm afraid I misinterpreted the manual. I though it would "add" the prefix, to be compatible with the UCSC browser for example. My bad, sorry! Thanks for the explanation though!

Cheers.
Reply all
Reply to author
Forward
0 new messages