One more report, now on version 1.4 and multiple cores.
-------- Original Message --------
Hi!
We have installed CSPP SDR version 1.4 and tested a bit. Thanks for the
quick delivery!
The results does not unfortunately mark any big breakthrough for us at
the moment. Not yet at least. We might look more into this after the summer.
As I have explained we run on RDR granules in real time, and due to the
cross-granule issue we run each CSPP instance on each group of 3
(consequtive) granules. And processing three granules with 1.3 takes
approximately 10 minutes in our environment. Remember this is a server
with 24 cores and 50 GB RAM, but a server that also does other stuff. It
has a smaller CPU cache memory (12288 KB) than your benchmark server. As
discussed in Madison this could be an important difference. So, after a
while the server will be busy with several such CSPP instances. About 10
minutes after the last RDR granule received all instances will be finished.
So, moving to 1.4 and letting CSPP itself run on several cores will
quickly blow up the memory consumption. So, I was prepared for trouble! ;-)
Here below are some observations. The time is in minutes and counting
the real time it takes for each group of 3-granule to finish. I ran on
five RDR granules only, feeding our runner every 85 seconds with a new RDR.
CSPP-1.3
[INFO: 2013-06-19 19:35:58 : npp_sdr_runner] Seconds wall clock time:
346.697037935
[INFO: 2013-06-19 19:40:47 : npp_sdr_runner] Seconds wall clock time:
550.727682114
[INFO: 2013-06-19 19:41:54 : npp_sdr_runner] Seconds wall clock time:
532.846452951
[INFO: 2013-06-19 19:42:03 : npp_sdr_runner] Seconds wall clock time:
366.723992109
[INFO: 2013-06-19 19:43:08 : npp_sdr_runner] Seconds wall clock time:
521.714389086
CSPP-1.4 - one cpu ("-p 1"):
[INFO: 2013-06-19 19:08:10 : npp_sdr_runner] Seconds wall clock time:
349.276097059
[INFO: 2013-06-19 19:12:11 : npp_sdr_runner] Seconds wall clock time:
505.949388981
[INFO: 2013-06-19 19:13:40 : npp_sdr_runner] Seconds wall clock time:
509.758774996
[INFO: 2013-06-19 19:13:45 : npp_sdr_runner] Seconds wall clock time:
339.747205973
[INFO: 2013-06-19 19:14:51 : npp_sdr_runner] Seconds wall clock time:
495.671334982
CSPP-1.4 - three cpu's ("-p 3"):
[INFO: 2013-06-19 20:38:46 : npp_sdr_runner] Seconds wall clock time:
198.783289909
[INFO: 2013-06-19 20:40:50 : npp_sdr_runner] Seconds wall clock time:
237.666941881
[INFO: 2013-06-19 20:42:26 : npp_sdr_runner] Seconds wall clock time:
248.064553976
[INFO: 2013-06-19 20:43:39 : npp_sdr_runner] Seconds wall clock time:
236.428142071
[INFO: 2013-06-19 20:44:33 : npp_sdr_runner] Seconds wall clock time:
199.622745991
The above tests were performed when the server was almost idle.
Then I started a number of other activities (a more normal activity
level) and ran again 1.4 with 3 CPUs:
[INFO: 2013-06-19 21:25:04 : npp_sdr_runner] Seconds wall clock time:
200.303040981
[INFO: 2013-06-19 21:28:27 : npp_sdr_runner] Seconds wall clock time:
318.391680956
[INFO: 2013-06-19 21:31:29 : npp_sdr_runner] Seconds wall clock time:
415.129799843
[INFO: 2013-06-19 21:31:58 : npp_sdr_runner] Seconds wall clock time:
359.149260044
[INFO: 2013-06-19 21:33:15 : npp_sdr_runner] Seconds wall clock time:
346.184264183
Form this I concluded that it might be worth trying to run 1.4 with
option "-p 3" in real time on full passes. Using only one CPU, and only
taking advantage of the potential speed up due to the new compiler
optimisation, didn't seem to bring much improvement. But what happened
last night that when we got a full pass of 11 granules the server was
hit hard and went on it's knees for 2-3 hours, swapping like an insane!
So, since the basic processing speed on one CPU is still slow, and much
slower than the time of one granule, we can't use this concept with the
given hardware.
We will probably soon run version 1.4 with one core, instead of 1.3. But
as shown above the improvement will most likely be marginal.
In this respect it might be worthwhile giving instructions how users
could compile and install CSPP/ADL from source. There is greater
likelihood that we will be able to optimise it for our architecture
compiler-wise doing that, instead of rely on your optimisations done on
a different hardware. It is actually quite possible that optimisations
done on one architecture can cause a degradation on another.
Best regards
Adam
--
Adam Dybbroe,
Satellite Remote Sensing Scientist,
Numerical models and Remote Sensing,
Core Services, Swedish Meteorological and Hydrological Institute (SMHI)
www.pytroll.org
nwcsaf.smhi.se
www.smhi.se