STAR crashing on small fasta file of simulated reads....

95 views
Skip to first unread message

Nick Schurch

unread,
Jun 1, 2016, 6:27:49 AM6/1/16
to rna-star
I have simulated a small (1224) set of 100bp paired-end reads with the polyester R package. When I try and map the resulting fastq files with STAR (2.4.0f1) I get an error:

STAR --runThreadN 8 --genomeLoad LoadAndRemove --outSAMmode Full --outFilterMultimapNmax 1 --outFilterMismatchNmax 0 --outFilterType BySJout --outSJfilterIntronMaxVsReadN 5000 10000 15000 20000 --outSJfilterOverhangMin 1 1 1 1 --outSJfilterCountUniqueMin 1 1 1 1 --outSJfilterDistToOtherSJmin 1 1 1 1 --scoreDelOpen -20 --alignSJoverhangMin 1 --alignSJDBoverhangMin 1 --outFileNamePrefix mm0_nodel --genomeDir sjdbOverhang23/ --readFilesIn R1.fasta R2.fasta


EXITING because of FATAL ERROR in reads input: short read sequence line: 1
Read Name=>read612/AT3G05870_mod.2;mate1:223-322;mate2:361-459
Read Sequence====
DEF_readNameLengthMax=50000
DEF_readSeqLengthMax=500

This read is the last read in the fasta file and looks like:

>read612/AT3G05870_mod.2;mate1:223-322;mate2:361-459
CCAAGCAATCAAACCAAGGACAAGCTTCAAAAACATTACTAAAGCTCTATCTTGCGTTTCAGCAGATGATCATACTCAACACTAACGGTCCAAATACAAC

Which looks perfectly reasonable. Why would star be crashing here and failing to read the sequencefrom the last line of the file? Does it need a blank line at the end of the file or something?

Alexander Dobin

unread,
Jun 1, 2016, 4:52:32 PM6/1/16
to rna-star
Hi Nick,

please send me the actual file. It could be some unseen character that causes the error. STAR expects LF (newline) at the end of all lines including last.

Cheers
Alex

Nick Schurch

unread,
Jun 2, 2016, 5:07:45 AM6/2/16
to rna-star
Hi Alex,

Thanks for looking at this for me, much appreciated. I've included the error output from the STAR run too.

Thanks,

Nick
R1.fasta
R2.fasta
STAR.e479650

Alexander Dobin

unread,
Jun 3, 2016, 1:18:36 PM6/3/16
to rna-star
Hi Nick,

could not reproduce the problem - mapped these files without an error.
Could you please do
$ tail -n2 R1.fasta | hexdump -c
$ tail -n2 R2.fasta | hexdump -c
and send me the output. On my system, this looke normal, all lines terminated with \n

One thing to try would be to add an empty line at the end.

Cheers
Alex

Nick Schurch

unread,
Jun 6, 2016, 4:48:04 AM6/6/16
to rna-star
Hi Alex,

here is the output:

tail -n2 sim03/R1.fasta | hexdump -c
0000000   >   r   e   a   d   6   1   2   /   A   T   3   G   0   5   8
0000010   7   0   _   m   o   d   .   2   ;   m   a   t   e   1   :   2
0000020   2   3   -   3   2   2   ;   m   a   t   e   2   :   3   6   1
0000030   -   4   5   9  \n   C   C   A   A   G   C   A   A   T   C   A
0000040   A   A   C   C   A   A   G   G   A   C   A   A   G   C   T   T
0000050   C   A   A   A   A   A   C   A   T   T   A   C   T   A   A   A
0000060   G   C   T   C   T   A   T   C   T   T   G   C   G   T   T   T
0000070   C   A   G   C   A   G   A   T   G   A   T   C   A   T   A   C
0000080   T   C   A   A   C   A   C   T   A   A   C   G   G   T   C   C
0000090   A   A   A   T   A   C   A   A   C  \n                        
000009a
tail -n2 sim03/R2.fasta | hexdump -c
0000000   >   r   e   a   d   6   1   2   /   A   T   3   G   0   5   8
0000010   7   0   _   m   o   d   .   2   ;   m   a   t   e   1   :   2
0000020   2   3   -   3   2   2   ;   m   a   t   e   2   :   3   6   1
0000030   -   4   5   9  \n   A   A   T   G   G   G   T   G   A   A   T
0000040   T   C   G   C   A   G   A   C   G   A   G   C   C   A   A   G
0000050   C   T   C   A   T   T   G   C   C   C   A   A   T   G   T   G
0000060   C   A   G   A   A   G   A   G   A   A   T   G   G   C   A   G
0000070   T   T   C   A   A   A   G   A   G   T   A   A   G   C   A   A
0000080   G   T   T   A   T   G   A   G   G   A   G   G   A   T   T   A
0000090   G   T   C   T   G   C   A   G   C  \n                        
000009a

This look reasonable. I'll try adding a line, and I'll try a more up-to-date version too...

Nick

Alexander Dobin

unread,
Jun 6, 2016, 11:44:19 AM6/6/16
to rna-star
Hi Nick,

nothing suspicious in the hexdump, so it's not a formatting issue - the next suspect is some silent memory corruption.
Could you please send me the Log.out file and the link to the genome fasta/gtf you have been using?
I will run it through valgrind to try to catch the bug.

Cheers
Alex

Nick Schurch

unread,
Jun 8, 2016, 8:48:30 AM6/8/16
to rna-star
Hi Alex,

I tried running this with STAR 2.5 and things aligned with no errors so I'm putting this down to an old version. Do you still want the files for debugging or not? Probably 'or not' I imagine!

Nick

Alexander Dobin

unread,
Jun 10, 2016, 1:06:20 PM6/10/16
to rna-star
Hi Nick,

I think we shall consider it solved. :)
Thanks a lot for reporting the problem and tests!

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages