I have reported the below to the CSPP team on some recent observed
problems here. See below, if you are interested.
Any comments or feedback is of as always very welcome! Would be nice
to know if you also have encountered something like this before.
-------- Original Message --------
Liam and Kathy,
Thanks you for a very nice meeting last month! We felt very welcome, and
we found it very useful to meet and discuss with both fellow users and
the CSPP developers!
Say hello to Scot, Ray, David and the entire crew!
We have been running RT-STPS and CSPP in streaming mode for around 3
weeks now, almost since we came back. As I outlined in my presentation.
We are fairly happy with that, though I am not yet fully satisfied with
the timeliness. We are down to 10-12 minutes now (where we were above 20
minutes before) from last scan received to all SDR's available on disk.
We do a lot of redundant processing of course due to the cross-granule
issues, but leave that for now... I will send you a preliminary report
shortly on the version 1.4 and how it performs in our setup.
My issue now, is that we have over the last three weeks observed three
scenes where we lost one granule due to some error in ADL. Since it
happended almost overhead Norrköping we first thought it was related to
the S-NPP downlink antenna anomaly (what was now the name for it?). But
that I have ruled out for the time being. I could not see any
significant drop of signal strength for those passes/granules. Also,
what even more weird and anoying the problem didn't seem to occur in
test. We run the exact same CSPP version in test and production (on
different physical servers - same hardware). So it is easy to compare.
For all three passes we didn't have the error messages (shown below) in
test, but only in production. For sure the last problematic scene was
okay. It had all SDR granules in test but not in production. We can't
say for sure with the other two of the three problematic scenes observed
as the data are gone now.
Here is a snippet of one of the log files from production:
[INFO: 2013-06-17 10:38:48 : npp_sdr_runner] INFO:adl_common:
[INFO: 2013-06-17 10:38:48 : npp_sdr_runner] ( Final 3
out of 3 granules processed, 2 successfully )
[INFO: 2013-06-17 10:38:48 : npp_sdr_runner]
[INFO: 2013-06-17 10:38:50 : npp_sdr_runner] INFO:adl_common:
[INFO: 2013-06-17 10:38:50 : npp_sdr_runner] ( Created
ADL controller XML sdr_viirs_NPP000521369010.xml for NPP000521369010 )
[INFO: 2013-06-17 10:38:50 : npp_sdr_runner]
[INFO: 2013-06-17 10:38:50 : npp_sdr_runner] INFO:adl_common:
[INFO: 2013-06-17 10:38:50 : npp_sdr_runner] (
Executing
['/local_disk/opt/CSPP/current/common/ADL/bin/ProSdrViirsController.exe','sdr_viirs_NPP000521369010.xml']
with WORK_DIR='/san1/cspp/work/tmpV2Rilv' )
[INFO: 2013-06-17 10:38:50 : npp_sdr_runner]
[INFO: 2013-06-17 10:38:52 : npp_sdr_runner] INFO:adl_common:Normal
Completion.
[INFO: 2013-06-17 10:38:52 : npp_sdr_runner] WARNING:adl_asc:Take bigger
granule a > b Keep A
[INFO: 2013-06-17 10:38:52 : npp_sdr_runner] WARNING:adl_asc:Take bigger
granule a < b Keep B
[INFO: 2013-06-17 10:38:52 : npp_sdr_runner] WARNING:adl_asc:Take bigger
granule a > b Keep A
[INFO: 2013-06-17 10:38:52 : npp_sdr_runner] WARNING:adl_asc:Take bigger
granule a < b Keep B
[INFO: 2013-06-17 10:38:52 : npp_sdr_runner]
ERROR:adl_viirs_sdr:ProSdrViirsController.exe failed on
'sdr_viirs_NPP000521366449.xml': CalledProcessError(). Continuing...
[INFO: 2013-06-17 10:38:52 : npp_sdr_runner] WARNING:adl_viirs_sdr:Had
problems running these XML files: ['sdr_viirs_NPP000521366449.xml']
[INFO: 2013-06-17 10:38:52 : npp_sdr_runner] ERROR:adl_viirs_sdr:Key
error on blob property ObservedStartTime or VIIRS-DNB-SDR
[INFO: 2013-06-17 10:38:52 : npp_sdr_runner] ERROR:adl_viirs_sdr:Key
error on blob property ObservedStartTime or VIIRS-I1-SDR
[INFO: 2013-06-17 10:38:52 : npp_sdr_runner] ERROR:adl_viirs_sdr:Key
error on blob property ObservedStartTime or VIIRS-I2-SDR
[INFO: 2013-06-17 10:38:52 : npp_sdr_runner] ERROR:adl_viirs_sdr:Key
error on blob property ObservedStartTime or VIIRS-I3-SDR
[INFO: 2013-06-17 10:38:52 : npp_sdr_runner] ERROR:adl_viirs_sdr:Key
error on blob property ObservedStartTime or VIIRS-I4-SDR
[INFO: 2013-06-17 10:38:52 : npp_sdr_runner] ERROR:adl_viirs_sdr:Key
error on blob property ObservedStartTime or VIIRS-I5-SDR
...
This is running CSPP version 1.3.
I went to the place in the code where it shouts out the above message
and found it was happening inside a rather big try-except block. I think
I would change the LOG.error() to LOG.exception() in the future to
better see where in the code it fails if there will be a next time.
(In version 1.4 it is at line 365 in adl_viirs_sdr.py).
I wonder if you have a clue what could be the problem? It seems to be
related to the state of the machine when CSPP is running. I of course
re-ran the scene later to see if I could further pin down the problem,
but I was unable to provoke the error! It went fine!
We tried to see if there were any full disks or swap or something at the
time it happened, but couldn't see anything alarming.
PS: It is soon summer vacation times here, so we might not respond very
efficiently for the next one and half month! :-)
Best regards
Adam
--
Adam Dybbroe,
Satellite Remote Sensing Scientist,
Numerical models and Remote Sensing,
Core Services, Swedish Meteorological and Hydrological Institute (SMHI)
www.pytroll.org
nwcsaf.smhi.se
www.smhi.se