Alignment hangs on last created thread

158 views
Skip to first unread message

Nic Wheeler

unread,
Nov 19, 2015, 12:14:18 PM11/19/15
to rna-star
Hello all,

I have done all the troubleshooting that I can think of, but can't seem to get STAR going. Genome indexing with GTF seems to go as expected, and all necessary files are created. The process breaks down when I begin alignment of paired data.

I am running on this on a HPC cluster on my campus, and I have 100 threads available to me, each with 8GB of RAM. It uses OpenMP to parallelize, but I haven't been able to figure that out yet (I'm still learning a lot about this system and how to use HPC, and I would love it if anyone would also explain how to parallelize these commands) and so haven't used it. Anyway, I submit a PBS script asking for 1 compute node and 8 processors - the same parameters that I used to index and that worked perfectly. Here's the index command, for reference, as well as the first few lines of the GTF file:

STAR --runThreadN 8 --runMode genomeGenerate --genomeDir ... --genomeFastaFiles ... --sjdbGTFfile ... --sjdbOverhang 75

[user]$ head -5 smansoni_annotations.gtf
Smp.Chr_1 WormBase_imported exon 11159 11220 . + . transcript_id "transcript:Smp_186980.1"; gene_id "gene:Smp_186980"; gene_name "Smp_186980";
Smp.Chr_1 WormBase_imported exon 12411 12750 . + . transcript_id "transcript:Smp_186980.1"; gene_id "gene:Smp_186980"; gene_name "Smp_186980";
Smp.Chr_1 WormBase_imported CDS 11159 11220 . + 0 transcript_id "transcript:Smp_186980.1"; gene_id "gene:Smp_186980"; gene_name "Smp_186980";
Smp.Chr_1 WormBase_imported CDS 12411 12750 . + 1 transcript_id "transcript:Smp_186980.1"; gene_id "gene:Smp_186980"; gene_name "Smp_186980";
Smp.Chr_1 WormBase_imported exon 16927 17082 . + . transcript_id "transcript:Smp_197050.1"; gene_id 

However, when I use the below command to align, it constantly hangs after it creates the second to last thread. For example, it loads the genome correctly and begins creating the threads to be used for aligning, but it never creates the last thread (even if left overnight). My aligning command:

STAR --runThreadN 8 --genomeDir ... --readFilesIn ... ... --outSAMtype BAM SortedByCoordinate

I've attached the Log.out. The Log.progress.out is empty (other than the column headings). It creates the Aligned.sortedByCoord.out.bam file and _STARtemp directory and subdirectories, but everything is empty.

It's possible that this is a question for my system admin, but I haven't gone that route because everything has been performed on a single compute node. Even if I run it on the HOME node and don't submit it to the PBS scheduler (using only 4 threads this time), the same thing happens - it hangs after creating 3 threads, so it doesn't seem to be a problem with the system. I am using the STAR_2.5.0 static executables.

Best,
Nic
Log.out

Alexander Dobin

unread,
Nov 19, 2015, 5:19:02 PM11/19/15
to rna-star
Hi Nic,

the Log.out file looks OK. The threads are number from 0, so if you see "created thread # 7", all of the threads started fine.
Please try the following:
1. Run it with one thread and without BAM sorting (i.e. with all default parameters except --readFilesIn and --genomeDir)
2. Run it on a very small dataset, say 100 reads, 1st read only.
3. Try to run the dynamic executable (not static).
4. Try to compile from the source.

Cheers
Alex

Nic Wheeler

unread,
Nov 19, 2015, 9:55:41 PM11/19/15
to rna-star
Am I missing something? The Log.out file reads:

Processing splice junctions database sjdbN=58346,   sjdbOverhang=75 
alignIntronMax=alignMatesGapMax=0, the max intron size will be approximately determined by (2^winBinNbits)*winAnchorDistNbins=589824
Created thread # 1
Created thread # 2
Created thread # 3
Created thread # 4
Created thread # 5
Created thread # 6
Created thread # 7

Starts at #1, not 0. I was sure to check that initially. Unless it doesn't note that the first thread was running? I guess maybe that's it. I'll do as you suggest and see what happens.

Thanks.

Nic Wheeler

unread,
Nov 20, 2015, 11:40:33 AM11/20/15
to rna-star
Ok, I ran it with default parameters with 25 reads, and the mapping finished as expected in 2-3 s. I then was able to complete it as well using the PBS scheduler, asking for 16 nodes and 8 processors per node (-lnodes16:ppn8). I tried using --runThreadN options of 4, 16, 32, 64, and 128. All finished except 128, but it took increasingly longer to create each additional thread, so it's possible that I just didn't wait long enough for 128 (I waited for about 20 minutes). I am now trying several runs that have differing parameters, and we'll see if anything is ever written to Log.progress.out. 


On Thursday, November 19, 2015 at 4:19:02 PM UTC-6, Alexander Dobin wrote:

Nic Wheeler

unread,
Nov 20, 2015, 2:06:25 PM11/20/15
to rna-star
UPDATE: 2.5 hours later, and 3 separate runs still have blank Log.progress.out files. The Log.out files remain hung at the last created thread. Is there some way for me to check to see if they're actually busy aligning?

Alexander Dobin

unread,
Nov 20, 2015, 4:54:54 PM11/20/15
to rna-star
Hi Nic,

the "0th" thread is the master thread that runs as soon as you start STAR, so it does not need to be "created".

Cheers
Alex

Alexander Dobin

unread,
Nov 20, 2015, 5:00:48 PM11/20/15
to rna-star
I am not sure what happens on your system when you ask for multiple nodes for one run. It may be able to somehow re-distribute the load between the nodes, but I would generally recommend against it.
The best way to parallelize on the nodes is to run different samples on different nodes.
Please try to run it on just one node requesting 8 processors per node and --runThreadN 8.

Cheers
Alex

Nic Wheeler

unread,
Nov 20, 2015, 5:03:02 PM11/20/15
to rna-star
I can try that, but I was able to successfully run the program on a pair of limited .fastq files with 8 nodes. But I guess maybe it will finish, but perform more slowly than it would if kept to a single node. Thanks for all your help with this.

Nic Wheeler

unread,
Nov 23, 2015, 10:35:41 AM11/23/15
to rna-star
UPDATE: 48 hours later, I've hit the walltime and the Log.progress.out file is still empty. I'm at a loss as to what to try next.

Alexander Dobin

unread,
Nov 23, 2015, 3:15:05 PM11/23/15
to rna-star
Hi Nic,

have you tried to run it on just one node? Please try to do it with increasing number of reads, e.g. 100 1000 10000 ... to see at what point the problem occurs.

Cheers
Alex

Nic Wheeler

unread,
Nov 23, 2015, 3:20:13 PM11/23/15
to rna-star
Yes, the previous were run on 1 node, 8 threads. I have now had the system admin compile it from the source and will follow your advice with the compiled executable.
Reply all
Reply to author
Forward
0 new messages