Nodes, Processes and Threads; Cufflinks/CuffDiff version 2.0.2 and 2.1.1 how does -p 8 turn into 4 processes and 14 threads and stay on one node?

685 views
Skip to first unread message

Starr Hazard

unread,
May 15, 2013, 5:41:21 PM5/15/13
to tuxedo-to...@googlegroups.com
Hi 

I recently tried a CuffDiff computation  which kept running many hours and then getting killed.

My jobs were run on a small cluster where each node has 24G ram and 4G swap space and dual quad core CPUs.

These jobs were killed by the system because they seemed to be asking for more swap space than a single node had available.

Accounting information about this job:
     CPU_T     WAIT     TURNAROUND   STATUS     HOG_FACTOR    MEM    SWAP
 169725.98        2          64381                   exit         2.6363            12974M  14480M

I tried to get my LSF scheduler to distribute the jobs to enough nodes to then have an aggregate swap space in excess of the 14G it was asking for.

I attempted to do this by asking  LSF to give me 8 nodes (each node a dual quad core unit with 24G ram on each) assuming I would then have the aggregate 32 G swap ( 8x4 G swap per node).

CuffDiff was being asked to use 8 threads  (-p 8) 

The job always started on one node and stayed there. And crashed.

I then used LSF to restrict the number of processes to a single process per node. Once again the job always started on one node and stayed there. And crashed.

The accounting information for these jobs indicated that a number of processes (4) had been started as well as a number of threads ( say 14)  eg

Resource usage summary:

    CPU time   : 169725.98 sec.
    Max Memory :     12974 MB
    Max Swap   :     14480 MB

    Max Processes  :         4
    Max Threads    :        14


So why if threads is set to 8, is the program  launching 14 threads? and 4 processes

If I restrict the number of processes per node to one, why does the program stay on one node?

Is it possible to get CuffDIff to run on more than a single node?

Starr

re...@channing.harvard.edu

unread,
Jul 16, 2013, 5:46:26 PM7/16/13
to tuxedo-to...@googlegroups.com, Isaac Houston
I get the following error message with on a machine with 32 GB of RAM and 8 Cores with -p set to 8. Is this the same error message you are getting? Do the developers have a fix or guidance?

===============================================
You are using Cufflinks v2.1.1, which is the most recent release.
[18:15:58] Loading reference annotation.
Warning: No conditions are replicated, switching to 'blind' dispersion method
[18:16:14] Inspecting maps and determining fragment length distributions.
[18:44:18] Modeling fragment count overdispersion.
> Map Properties:
> Normalized Map Mass: 22494699.53
> Raw Map Mass: 21535093.41
> Fragment Length Distribution: Empirical (learned)
>              Estimated Mean: 181.42
>           Estimated Std Dev: 62.85
> Map Properties:
> Normalized Map Mass: 22494699.53
> Raw Map Mass: 23448522.16
> Fragment Length Distribution: Empirical (learned)
>              Estimated Mean: 186.77
>           Estimated Std Dev: 70.69
[18:47:58] Calculating preliminary abundance estimates
[18:47:58] Testing for differential expression and regulation in locus.
> Processing Locus chr16:3355483-3368576       [********                 ]  34%Killed
==============================================

Katharina Hayer

unread,
Sep 9, 2014, 10:49:37 AM9/9/14
to tuxedo-to...@googlegroups.com
Has there been a solution to this problem? I am running into the similar swap and thread issues with cuffnorm (version: cufflinks-2.2.1.Linux_x86_64). SWAP was at 21G and event though I didn't change the -p flag, it used 4 instead of only one thread.

Any help is appreciated,
Katharina 

David Oliver

unread,
Dec 15, 2014, 3:54:03 PM12/15/14
to tuxedo-to...@googlegroups.com
Also having this problem. 

Interestingly, I don't get the error at the same place. These are the last two attempts

 Processing Locus chr13:39342891-39603528     [******                   ]  24%Killed
 Processing Locus chr20:19212645-19722937     [*************            ]  53%Killed

I'm curious if anyone knows how to solve this issue. As mentioned by Starr Hazard, I also initially ran on distributed nodes (4 nodes, 48 processors, 24Gb/node). 

Info:
cufflinks v2.2.1

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                12
On-line CPU(s) list:   0-11
Thread(s) per core:    1
Core(s) per socket:    6
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 44
Stepping:              2
CPU MHz:               2800.195
BogoMIPS:              5599.87
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              12288K
NUMA node0 CPU(s):     0,2,4,6,8,10
NUMA node1 CPU(s):     1,3,5,7,9,11

Eric P

unread,
Jun 13, 2015, 5:29:33 PM6/13/15
to tuxedo-to...@googlegroups.com
So is there a fix for this problem yet?? I am getting the same error and I am kind of stuck as I do not any other computing options

prathik kumar

unread,
Jul 12, 2019, 2:10:35 PM7/12/19
to Tuxedo Tools Users
is there any update on this problem? 

Reply all
Reply to author
Forward
0 new messages