insert NULL datavalue for missing hourly data?

138 views
Skip to first unread message

Prokein, Peter

unread,
Apr 28, 2013, 2:20:33 AM4/28/13
to cuahsi-wds-us...@googlegroups.com
Folks,

I am dealing with a real-time telemetry dataset that frequently has missing hourly values, meaning that there is a datavalue transmitted for hour 08:00 but none for 09:00, there may be one for 10:00 or maybe not.  It's also possible that there are entire days of data missing.

Obviously those issues are corrected in the QA/QC'ed dataset but that doesn't help me for real-time plotting.

Has anyone developed a way to insert the correct time stamps and NULL as datavalues for missing hourly data?  Seems to me like this would be something that could go into the UpdateSeriesCatalog stored procedure and just "fix" the DataValues table each time the series catalog gets updated. There is of course the issue with unintended consequences if it goes into UpdateSeriesCatalog for datasets where that additional data isn't desired.

As of now the missing data essentially creates a discontinuous dataset which is a real bear to deal with if one expects a constant DateTime interval.

Thanks in advance for any insight you can provide.

-p

--
Peter Prokein
Institute of Northern Engineering
University of Alaska Fairbanks

office: (907) 474-2720
mobile: (907) 750-0115

Jeff Horsburgh

unread,
Apr 29, 2013, 12:07:15 PM4/29/13
to cuahsi-wds-us...@googlegroups.com
Peter,

Good to hear from you.  I have a couple of thoughts, although maybe not the answers you want.  The convention in ODM is that there are no Null values in the DataValues table.  For time intervals for which there should be values, but for some reason values are not collected, the convention is to insert a NoData value.  But, the ODM data loaders (both the regular data loader and the streaming data loader) key off the time stamps in the files that you give them and so only know that a value should be present if there is a timestamp in the file for the interval that should be recorded.  For example, it is not uncommon in our data to have "NaN" values and "-Inf" values reported by our dataloggers.  The Streaming Data Loader can replace those values in the database with NoData values, recognizing that they are erroneous values.  But, this only works because the timestamp actually exists in the datalogger file.  If there are gaps in the timestamps in your datalogger file, neither ODM data loader has functionality to automatically fill them for you.

The UpdateSeriesCatalog table stored procedure in ODM only affects the SeriesCatalog table and so I'm not sure that's the right place to fix this issue.  There may be a couple of things that could address this:  

1.  Why are the gaps occurring in the first place? Is is possible to change something in the way the data are being collected or transmitted to ensure that the gaps are not occurring?
2.  Could you develop a script that inserts the time steps you want into your data immediately after data are loaded?

Neither of these will solve the fact that the data are discontinuous because they can't create data that was not collected.  In doing real-time plotting you would have to either interpolate on the fly or filter out the NoData values anyway.  But, with all of the missing time intervals filled in with NoData values, it seems like it would be easier then to do the QA/QC.

--------------------------------------------------------------

Jeffery S. Horsburgh

Utah Water Research Laboratory

Utah State University

8200 Old Main Hill

Logan, UT 84322-8200

Phone: (435) 797-2946  Fax: (435) 797-3663

jeff.ho...@usu.edu   http://jeffh.usu.edu

---------------------------------------------------------------


--
You received this message because you are subscribed to the Google Groups "CUAHSI HIS User Group Forums" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cuahsi-wds-user-gro...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Prokein, Peter

unread,
Apr 29, 2013, 12:18:44 PM4/29/13
to cuahsi-wds-us...@googlegroups.com
Thanks for the quick reply Jeff.  Data gaps are often a function of either telemetry problems or sensor/site failure.  We have recently started to migrate to Iridium based telemetry away from Freewave radio over hundreds of miles requiring repeaters to alleviate some of the problems.  However, the sites that connect via GOES are notoriously bad at providing continuous real-time data.

I use Highcharts to plot the data and it requires null for nodata points or the plot won't come out right.  No biggie though, what I ended up doing was to just iterate over the data after I pull it out of the database to insert missing timestamps and null datavalues for those.

It makes perfect sense to not store actual missing data as null in the database.

-p

--
Peter Prokein
Institute of Northern Engineering
University of Alaska Fairbanks

office: (907) 474-2720
mobile: (907) 750-0115


Reply all
Reply to author
Forward
0 new messages