Nodes, Processes and Threads; Cufflinks/CuffDiff version 2.0.2 and 2.1.1 how does -p 8 turn into 4 processes and 14 threads and stay on one node?

Starr Hazard

unread,

May 15, 2013, 5:41:21 PM5/15/13

to tuxedo-to...@googlegroups.com

Hi

I recently tried a CuffDiff computation which kept running many hours and then getting killed.

My jobs were run on a small cluster where each node has 24G ram and 4G swap space and dual quad core CPUs.

These jobs were killed by the system because they seemed to be asking for more swap space than a single node had available.

Accounting information about this job:

CPU_T WAIT TURNAROUND STATUS HOG_FACTOR MEM SWAP

169725.98 2 64381 exit 2.6363 12974M 14480M

I tried to get my LSF scheduler to distribute the jobs to enough nodes to then have an aggregate swap space in excess of the 14G it was asking for.

I attempted to do this by asking LSF to give me 8 nodes (each node a dual quad core unit with 24G ram on each) assuming I would then have the aggregate 32 G swap ( 8x4 G swap per node).

CuffDiff was being asked to use 8 threads (-p 8)

The job always started on one node and stayed there. And crashed.

I then used LSF to restrict the number of processes to a single process per node. Once again the job always started on one node and stayed there. And crashed.

The accounting information for these jobs indicated that a number of processes (4) had been started as well as a number of threads ( say 14) eg

Resource usage summary:

CPU time : 169725.98 sec.

Max Memory : 12974 MB

Max Swap : 14480 MB

Max Processes : 4

Max Threads : 14

So why if threads is set to 8, is the program launching 14 threads? and 4 processes

If I restrict the number of processes per node to one, why does the program stay on one node?

Is it possible to get CuffDIff to run on more than a single node?

Starr

re...@channing.harvard.edu

unread,

Jul 16, 2013, 5:46:26 PM7/16/13

to tuxedo-to...@googlegroups.com, Isaac Houston

I get the following error message with on a machine with 32 GB of RAM and 8 Cores with -p set to 8. Is this the same error message you are getting? Do the developers have a fix or guidance?

===============================================

You are using Cufflinks v2.1.1, which is the most recent release.

[18:15:58] Loading reference annotation.

Warning: No conditions are replicated, switching to 'blind' dispersion method

[18:16:14] Inspecting maps and determining fragment length distributions.

[18:44:18] Modeling fragment count overdispersion.

> Map Properties:

> Normalized Map Mass: 22494699.53

> Raw Map Mass: 21535093.41

> Fragment Length Distribution: Empirical (learned)

> Estimated Mean: 181.42

> Estimated Std Dev: 62.85

> Map Properties:

> Normalized Map Mass: 22494699.53

> Raw Map Mass: 23448522.16

> Fragment Length Distribution: Empirical (learned)

> Estimated Mean: 186.77

> Estimated Std Dev: 70.69

[18:47:58] Calculating preliminary abundance estimates

[18:47:58] Testing for differential expression and regulation in locus.

> Processing Locus chr16:3355483-3368576 [******** ] 34%Killed

==============================================

Katharina Hayer

unread,

Sep 9, 2014, 10:49:37 AM9/9/14

to tuxedo-to...@googlegroups.com

Has there been a solution to this problem? I am running into the similar swap and thread issues with cuffnorm (version: cufflinks-2.2.1.Linux_x86_64). SWAP was at 21G and event though I didn't change the -p flag, it used 4 instead of only one thread.

Any help is appreciated,

Katharina

David Oliver

unread,

Dec 15, 2014, 3:54:03 PM12/15/14

to tuxedo-to...@googlegroups.com

Also having this problem.

Interestingly, I don't get the error at the same place. These are the last two attempts

Processing Locus chr13:39342891-39603528 [****** ] 24%Killed

Processing Locus chr20:19212645-19722937 [************* ] 53%Killed

I'm curious if anyone knows how to solve this issue. As mentioned by Starr Hazard, I also initially ran on distributed nodes (4 nodes, 48 processors, 24Gb/node).

Info:

cufflinks v2.2.1

Architecture: x86_64

CPU op-mode(s): 32-bit, 64-bit

Byte Order: Little Endian

CPU(s): 12

On-line CPU(s) list: 0-11

Thread(s) per core: 1

Core(s) per socket: 6

Socket(s): 2

NUMA node(s): 2

Vendor ID: GenuineIntel

CPU family: 6

Model: 44

Stepping: 2

CPU MHz: 2800.195

BogoMIPS: 5599.87

Virtualization: VT-x

L1d cache: 32K

L1i cache: 32K

L2 cache: 256K

L3 cache: 12288K

NUMA node0 CPU(s): 0,2,4,6,8,10

NUMA node1 CPU(s): 1,3,5,7,9,11

Eric P

unread,

Jun 13, 2015, 5:29:33 PM6/13/15

to tuxedo-to...@googlegroups.com

So is there a fix for this problem yet?? I am getting the same error and I am kind of stuck as I do not any other computing options

prathik kumar

unread,

Jul 12, 2019, 2:10:35 PM7/12/19

to Tuxedo Tools Users

is there any update on this problem?

Reply all

Reply to author

Forward