gc resource usage

9 views
Skip to first unread message

Joosep Pata

unread,
Nov 15, 2016, 4:57:12 AM11/15/16
to grid-control
Hi,

Recently, we have started to see strange resource usage from gc-submitted jobs. In particular, the `/bin/bash ./gc-run.sh` command and it's children and siblings keep consuming resources, even after the actual payload has started. See the following htop printout.
Has anyone seen this?

Cheers



 
9954 jpata      20   0  103M  1264  1076 S  0.0  0.0  0:00.01    └─ /bin/bash /gridware/sge/default/spool/t3wn15/job_scripts/6694898
 
9992 jpata      20   0  104M  1600  1132 S  0.0  0.0  0:00.03       └─ /bin/bash ./gc-run.sh 34
12500 jpata      20   0  104M   896   424 S  0.0  0.0  0:00.00          ├─ /bin/bash ./gc-run.sh 34
12501 jpata      20   0  104M   860   388 S  0.0  0.0  0:00.00            └─ /bin/bash ./gc-run.sh 34
12505 jpata      20   0  103M  1668  1092 S  0.0  0.0  0:00.03               └─ /bin/bash ./meanalysis-heppy.sh
12598 jpata      20   0 1606M  577M 52344 R 99.0  2.4  5:26.68                  └─ python /mnt/t3nfs01/data01/shome/jpata/tth/sw/CMSSW/src/TTH/MEAnalysis/gc/MEAnalysis_h
16409 jpata      20   0 1606M  577M 52344 S  0.0  2.4  0:00.00                     ├─ python /mnt/t3nfs01/data01/shome/jpata/tth/sw/CMSSW/src/TTH/MEAnalysis/gc/MEAnalysi
16408 jpata      20   0 1606M  577M 52344 S  0.0  2.4  0:00.00                     ├─ python /mnt/t3nfs01/data01/shome/jpata/tth/sw/CMSSW/src/TTH/MEAnalysis/gc/MEAnalysi
16407 jpata      20   0 1606M  577M 52344 S  0.0  2.4  0:00.00                     ├─ python /mnt/t3nfs01/data01/shome/jpata/tth/sw/CMSSW/src/TTH/MEAnalysis/gc/MEAnalysi
16406 jpata      21   1 1606M  577M 52344 S  0.0  2.4  0:00.00                     ├─ python /mnt/t3nfs01/data01/shome/jpata/tth/sw/CMSSW/src/TTH/MEAnalysis/gc/MEAnalysi
16405 jpata      20   0 1606M  577M 52344 S  0.0  2.4  0:00.13                     └─ python /mnt/t3nfs01/data01/shome/jpata/tth/sw/CMSSW/src/TTH/MEAnalysis/gc/MEAnalysi
10208 jpata      20   0  104M   984   524 R 55.0  0.0  4:21.25          ├─ /bin/bash ./gc-run.sh 34
10207 jpata      20   0  104M   984   524 R 59.0  0.0  4:17.99          └─ /bin/bash ./gc-run.sh 34

Max Fischer

unread,
Nov 15, 2016, 5:05:37 AM11/15/16
to Joosep Pata, grid-control
Hi Joosep,

gc runs its own small watchdog via gc-run.sh in parallel, so there should be some activity even while the payload is running.
55% CPU seems like a lot, though.

Cheers,
Max
> --
> You received this message because you are subscribed to the Google Groups "grid-control" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to grid-control...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Joosep Pata

unread,
Nov 15, 2016, 5:08:23 AM11/15/16
to Max Fischer, grid-control
Hi,

Thanks. Problem is, our SGE reporting is now telling us our jobs use 3x the CPU (and thus killing them)... I'll try reverting back a few versions, because we've not seen this behaviour in the past. Also, here I see that there are actually 2 watchdogs running, consuming resources independently.

>> 10208 jpata 20 0 104M 984 524 R 55.0 0.0 4:21.25 │ │ ├─ /bin/bash ./gc-run.sh 34
>> 10207 jpata 20 0 104M 984 524 R 59.0 0.0 4:17.99 │ │ └─ /bin/bash ./gc-run.sh 34

Cheers

Joosep Pata

unread,
Nov 15, 2016, 5:48:18 AM11/15/16
to Max Fischer, grid-control
We've found the problem: in our local patches, we had removed various directory size monitors from the watchdog, as they can be very expensive on NFS systems, but along that, we also incorrectly removed the `sleep 20`, causing the watchdog to run at full speed.

Fred Stober

unread,
Nov 15, 2016, 1:50:21 PM11/15/16
to Joosep Pata, Max Fischer, grid-control
Ok,

I guess the best solution is then to make the watchdog configurable, so
you don't need special patches ...

I'm quite busy this week but I'll try to implement it.

Cheers,

Fred
Reply all
Reply to author
Forward
0 new messages