Fwd: Lost SDR granules and adl errors

34 views
Skip to first unread message

Adam Dybbroe

unread,
Jun 20, 2013, 10:05:24 AM6/20/13
to NPP satellite
Hi everyone!

I have reported the below to the CSPP team on some recent observed problems here. See below, if you are interested.
Any comments or feedback is of as always very welcome! Would be nice to know if you also have encountered something like this before.

Best regards
Adam



-------- Original Message --------
Subject: Lost SDR granules and adl errors
Date: Thu, 20 Jun 2013 15:31:48 +0200
From: Adam Dybbroe <Adam.D...@smhi.se>
To: Liam Gumley <Liam....@ssec.wisc.edu>, Kathy Strabala <kathy.s...@ssec.wisc.edu>
CC: a001673 <martin....@smhi.se>


Liam and Kathy,

Thanks you for a very nice meeting last month! We felt very welcome, and 
we found it very useful to meet and discuss with both fellow users and 
the CSPP developers!
Say hello to Scot, Ray, David and the entire crew!

We have been running RT-STPS and CSPP in streaming mode for around 3 
weeks now, almost since we came back. As I outlined in my presentation. 
We are fairly happy with that, though I am not yet fully satisfied with 
the timeliness. We are down to 10-12 minutes now (where we were above 20 
minutes before) from last scan received to all SDR's available on disk. 
We do a lot of redundant processing of course due to the cross-granule 
issues, but leave that for now... I will send you a preliminary report 
shortly on the version 1.4 and how it performs in our setup.

My issue now, is that we have over the last three weeks observed three 
scenes where we lost one granule due to some error in ADL. Since it 
happended almost overhead Norrköping we first thought it was related to 
the S-NPP downlink antenna anomaly (what was now the name for it?). But 
that I have ruled out for the time being. I could not see any 
significant drop of signal strength for those passes/granules. Also, 
what even more weird and anoying the problem didn't seem to occur in 
test. We run the exact same CSPP version in test and production (on 
different physical servers - same hardware). So it is easy to compare. 
For all three passes we didn't have the error messages (shown below) in 
test, but only in production. For sure the last problematic scene was 
okay. It had all SDR granules in test but not in production. We can't 
say for sure with the other two of the three problematic scenes observed 
as the data are gone now.

Here is a snippet of one of the log files from production:


[INFO: 2013-06-17 10:38:48 : npp_sdr_runner] INFO:adl_common:
[INFO: 2013-06-17 10:38:48 : npp_sdr_runner]                  ( Final 3 
out of 3 granules processed, 2 successfully )
[INFO: 2013-06-17 10:38:48 : npp_sdr_runner]
[INFO: 2013-06-17 10:38:50 : npp_sdr_runner] INFO:adl_common:
[INFO: 2013-06-17 10:38:50 : npp_sdr_runner]                  ( Created 
ADL controller XML sdr_viirs_NPP000521369010.xml for NPP000521369010 )
[INFO: 2013-06-17 10:38:50 : npp_sdr_runner]
[INFO: 2013-06-17 10:38:50 : npp_sdr_runner] INFO:adl_common:
[INFO: 2013-06-17 10:38:50 : npp_sdr_runner]                  ( 
Executing 
['/local_disk/opt/CSPP/current/common/ADL/bin/ProSdrViirsController.exe','sdr_viirs_NPP000521369010.xml'] 
with WORK_DIR='/san1/cspp/work/tmpV2Rilv'  )
[INFO: 2013-06-17 10:38:50 : npp_sdr_runner]
[INFO: 2013-06-17 10:38:52 : npp_sdr_runner] INFO:adl_common:Normal 
Completion.
[INFO: 2013-06-17 10:38:52 : npp_sdr_runner] WARNING:adl_asc:Take bigger 
granule a > b Keep A
[INFO: 2013-06-17 10:38:52 : npp_sdr_runner] WARNING:adl_asc:Take bigger 
granule a < b Keep B
[INFO: 2013-06-17 10:38:52 : npp_sdr_runner] WARNING:adl_asc:Take bigger 
granule a > b Keep A
[INFO: 2013-06-17 10:38:52 : npp_sdr_runner] WARNING:adl_asc:Take bigger 
granule a < b Keep B
[INFO: 2013-06-17 10:38:52 : npp_sdr_runner] 
ERROR:adl_viirs_sdr:ProSdrViirsController.exe failed on 
'sdr_viirs_NPP000521366449.xml': CalledProcessError(). Continuing...
[INFO: 2013-06-17 10:38:52 : npp_sdr_runner] WARNING:adl_viirs_sdr:Had 
problems running these XML files: ['sdr_viirs_NPP000521366449.xml']
[INFO: 2013-06-17 10:38:52 : npp_sdr_runner] ERROR:adl_viirs_sdr:Key 
error on blob property ObservedStartTime or VIIRS-DNB-SDR
[INFO: 2013-06-17 10:38:52 : npp_sdr_runner] ERROR:adl_viirs_sdr:Key 
error on blob property ObservedStartTime or VIIRS-I1-SDR
[INFO: 2013-06-17 10:38:52 : npp_sdr_runner] ERROR:adl_viirs_sdr:Key 
error on blob property ObservedStartTime or VIIRS-I2-SDR
[INFO: 2013-06-17 10:38:52 : npp_sdr_runner] ERROR:adl_viirs_sdr:Key 
error on blob property ObservedStartTime or VIIRS-I3-SDR
[INFO: 2013-06-17 10:38:52 : npp_sdr_runner] ERROR:adl_viirs_sdr:Key 
error on blob property ObservedStartTime or VIIRS-I4-SDR
[INFO: 2013-06-17 10:38:52 : npp_sdr_runner] ERROR:adl_viirs_sdr:Key 
error on blob property ObservedStartTime or VIIRS-I5-SDR

...

This is running CSPP version 1.3.
I went to the place in the code where it shouts out the above message 
and found it was happening inside a rather big try-except block. I think 
I would change the LOG.error() to LOG.exception() in the future to 
better see where in the code it fails if there will be a next time.

(In version 1.4 it is at line 365 in adl_viirs_sdr.py).

I wonder if you have a clue what could be the problem? It seems to be 
related to the state of the machine when CSPP is running. I of course 
re-ran the scene later to see if I could further pin down the problem, 
but I was unable to provoke the error! It went fine!

We tried to see if there were any full disks or swap or something at the 
time it happened, but couldn't see anything alarming.

PS: It is soon summer vacation times here, so we might not respond very 
efficiently for the next one and half month! :-)
Best regards
Adam

-- 
Adam Dybbroe,
Satellite Remote Sensing Scientist,
Numerical models and Remote Sensing,
Core Services, Swedish Meteorological and Hydrological Institute (SMHI)
www.pytroll.org
nwcsaf.smhi.se
www.smhi.se




Reply all
Reply to author
Forward
0 new messages