Handling scarce external data

127 views
Skip to first unread message

Pila

unread,
May 10, 2020, 5:33:26 AM5/10/20
to weewx-user
I have added one measurement (sea temperature) to WeeWx, which I get probably only 3 times a day. How to handle such scarce data source? As there are multiple variables which should be balanced here, I would appreciate pointers. Small amount of data points makes for annoyingly slow testing.

Theoretical maximum is 6 measurements from 7:00 to 17:00 but the frequency depends on the location. Temperature will vary mostly 1 deg C between measurements, rarely 2 degs. The number of measurements may vary. I get new data when it is available and my system knows know this is a new measurement. Currently, measurements are made at 8:00, 14:00 and 15:00. I have two questions.

1) I would like normal day/week/month graphs with all points connected. Data from my weather station is saved once in 5 minutes. I use example set by pond.py to read data as extraTemp1. I see 3 ways to supply the data to pond.py.

a) Save the data and have it available for only one reading. E.g. save it at 8:13, it is read by WeeWx at 8:15 and removed from pond.txt at 8:18. Until new data arrives, nothing is to be read by WeeWx into this field. I just turned this mode on to test it again.

b) I could have one reading available for one hour (or some other duration). 8:00 reading is being received by my system at 8:18. I can leave it available at pond.txt for reading 60 minutes and remove it at 9:18. No data is available for reading until new data is read

c) Let the data be available until replaced with the new data. This I tried last day and it makes for a very jagged and wrong plot. It keeps the last temp from yesterday (the highest) overnight and it drops suddenly at 8:00 the next morning to the lowest temperature.

Logically, I should collect data using a) and have it plotted with line connected. I tried it briefly with line_gap_fraction = 0 to connect my scarce data points but I still had breaks.

2) on the index page where the current values are, I should always see the last standing value. Longest interval with no data is between 15:00 and 8:00 the next day. This is related to above methods a) and b). In the skin's index.html.tmpl I tried replacing $current.extraTemp1 with $latest.extraTemp1 but I was getting N/A.

I limit yscale to something like 15,25,1 to see the trends better.

Maybe I was to quick in testing a) and something did not refresh or regenerate. Since multiple features need to be balanced, I may have failed to see the desired result using method a) which I am retesting again.

Tom Keffer

unread,
May 11, 2020, 3:01:30 PM5/11/20
to weewx-user
Interesting problem.

1. You should archive only what you observed. That is, when a datum is observed, it should be put in the database. Then you're done with it --- the value should not persist in the database. 

That means your strategy 1a is best --- you put in a value for temperature when it was observed. Otherwise, it's null.

You didn't say what software setup you have, but I can see a couple of options. One would be include a field for sea-temperature (call is 'seaTemperature') in your otherwise weather-oriented database schema. The archive interval would be 5 minutes or so, and the archive record, and database, would be mostly filled with weather data. Most values of seaTemperature would be null, but an occasional row would have a non-null value. In this case, seaTemperature would come from a WeeWX service.

Another option is a weewx instance dedicated to sea temperature. No weather data. Its "LOOP" packets would arrive only rarely,  once every few hours, so most of the time this instance would be blocking on its driver, waiting for that rare packet. Its archive interval would be pretty long too, maybe 3 to 6 hours. The resultant database would have very few entries, but most would be non-null. In this case, you'd write a driver for seaTemperature.

Either approach would work.

2. Use aggregate type 'last'. It will return the last non-null value. For example,

<p>The last measured sea temperature is $day.seaTemperature.last at $day.seaTemperature.lasttime</p>

You don't want to use $latest.seaTemperature because that would give you seaTemperature in the last record of the database, which is likely to be null.

-tk


--
You received this message because you are subscribed to the Google Groups "weewx-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to weewx-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/weewx-user/a0cb5a23-f0dd-494f-99cf-6f0a69bdaddc%40googlegroups.com.

Pila

unread,
May 19, 2020, 6:24:57 AM5/19/20
to weewx-user
To make things shorter: OK, thanks, I will try but this is a slow thing to test :))

This sea temp is not important and I would avoid building anything separate for it. It is being displayed together with external temp. As I am pulling it now fir days, I can see there are 3 values every day: 8:00, 14:00 and 15:00 every day.

I am pulling this data from our national Weather service and extract temperatures. Then save them to be read by the pond.py. Interval for saving data is 5 minutes for everything.

Last night, I started another instance of the same problem. I am trying to find a better position for my external weather sensors. So, I keep one at its usual place were it is affected by the building (too high if sunny). Second sensor I moved to potentially better place. For a reference, I am pulling official temperature as I dowith the sea above, but I get it only once per hour with 13 minute delay from the full hour.

So, If I repeat saving this same temp for entire hour (as I do now) and other 2 sensors I get and save new data every 5 minutes, this one plot line gets quite jagged. It would be better if I can save data as I get it - once per hour) and connect points with smooth line. But, since I did not manage to do it with the sea temp, I did not even try it with this hourly temp.

Pila

unread,
May 19, 2020, 7:37:42 AM5/19/20
to weewx-user
2. Use aggregate type 'last'. It will return the last non-null value. For example,

<p>The last measured sea temperature is $day.seaTemperature.last at $day.seaTemperature.lasttime</p>

You don't want to use $latest.seaTemperature because that would give you seaTemperature in the last record of the database, which is likely to be null.
pond.txt for reading 60 minutes and remove it at 9:18. No data is available for reading until new data is read

Thanks! I am glad to confirm .last does perfectly what I mistakenly tried to achieve with latest.

I have adjusted my program to provide the sea temp only when new measurement is available, so by tomorrow, I should have few separate rare data points to play with.

Doing that, I recalled I am already adding hourly temperature at this other location, I added it to the plot. Plot connects hourly temp points well. I just do not like square form of hourly data so I am off looking how to make lines connecting  data points round :)

Progress is made :) Many thanks for the nudge in the right direction!

Pila

unread,
May 19, 2020, 12:48:55 PM5/19/20
to weewx-user
I misspoke :( I was pulling the data every hour but it was kept available to WeeWx to read it for the entire hour, not just once per observation. I left it that way as I could not fix the plot gaps.

I adjusted for both sporadic observations to be available to WeeWx only once. WeeWx saves all the other data every 5 minutes. But now: graphs do not even show this lone data points. One is the temperature observation every hour (may be several hours) and the second is the sea temperature with 3 observations per day. Two different locations, separate computers and installations, no new data plots for these any of these observations. Everything else works as normal. Graphs are updated regularly.

My program places the data when it is available to pond.txt (at say 13th min). Leaves it to be read by WeeWx at 15th min and then my program removes the data at 18th min. The index page shows correctly the last existing observations. Hence, I believe data is correctly added into the database.

As for pot: I am using line_gap_fraction. Docs are very brief. it is written:

... No line_gap_fraction specified ... to connect data points. The only thing I can do is to add:

line_gap_fraction = 0

and it seems to be ignored. No data points and nothing connected since I changed availability of the data from permanent to single observation only when the new one exists.

Another related question arose. WeeWx reads my data using pond.py. How, that I have started serving it empty fields, it logs a line:

pond: cannot read value: could not convert string to float

Data is prepared for pond.py and saved to pond.txt like this:

18,24.4 (both temps available)

or

,24.4   (no sea, only air)

or just a , (only comma in the pond.txt) when no data is to be read by the WeeWx.

Is that a problem? They are not logged as errors, but observation of a fact: no data to read.

Pila

unread,
May 19, 2020, 3:22:38 PM5/19/20
to weewx-user
Not working. I have two problems.

1) First location has this sporadic (hourly) temp as the last temp it is reading from pond.txt. When the field is empty, it logs "pond: cannot read value: could not convert string to float:." as defined in pond.py. When there is a temp, it is properly added to the database. If it matters, new sporadic data is saved at :15 minutes each hour.

The plot is ignoring all the data saved at one hour interval. Plotting stopped with the last datappint saved at 5 min interval. Last plotted datapoint was at 13:15. All further hourly datapoints starting from 14:15 to 20:15 inclusive are not ploted. After I reverted to the old way, from 20:30 data is plotted again from that point on.

That particular plot displays 3 different temperatures: other 2 are being saved at 5 min intervals.

2) Second location apparently breaks after the empty field, when I save the hourly data into the second position at the pond.txt. like ",24.1" without quotes. Pond.py never loads the number after the empty field and the comma to the database. As soon as I reverted to saving both temps, both fields are being read correctly.

What to save as an empty field so that pond.py does not break further reading after the empty field? There is no data, logically, I shouldn't save anything there. Seems like Python problem? I do not know Python. Should I read data differently?

I am saving data into pond.txt like this:
Location 1:
23.4,21.8,22.4,23.3,18.0,17.4
Location 2:
18,24.1

Reading it with code modified to read multiple comma separated fields:

with open(self.filename) as f:
   line = f.readline()
   value
= line.split(',')
   syslog
.syslog(syslog.LOG_DEBUG, "pond: found value of %s" % value)
event.record['leafTemp1'] = float(value[0])
event.record['extraTemp2'] = float(value[1])
...



Tom Keffer

unread,
May 19, 2020, 3:39:34 PM5/19/20
to weewx-user
I'm not following, but in the hope that we get lucky with a random answer, in WeeWX, if a data value is not available, maybe because the sensor is offline, or can't produce a value, you set the value to None. If the data value just plain doesn't exist, maybe because you don't own that sensor, you leave it out of the record completely.

-tk

--
You received this message because you are subscribed to the Google Groups "weewx-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to weewx-user+...@googlegroups.com.

Pila

unread,
May 19, 2020, 5:48:06 PM5/19/20
to weewx-user
Plot will not draw line when data points are one hour apart. Only when data points are 5 minutes apart, line will be plotted.

But, since I am unable to fix python breaking when a field is missing, I can not continue.

Tom Keffer

unread,
May 19, 2020, 5:57:31 PM5/19/20
to weewx-user
If a line has an embedded value of None, it is considered to be two separate line segments, regardless of the setting of gap_fraction. So, I would suggest leaving the data value out entirely, rather than setting it to None.

Does that make sense?

-tk

On Tue, May 19, 2020 at 2:48 PM Pila <mrzim...@gmail.com> wrote:
Plot will not draw line when data points are one hour apart. Only when data points are 5 minutes apart, line will be plotted.

But, since I am unable to fix python breaking when a field is missing, I can not continue.

--
You received this message because you are subscribed to the Google Groups "weewx-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to weewx-user+...@googlegroups.com.

Pila

unread,
May 20, 2020, 6:48:37 AM5/20/20
to weewx-user
If a line has an embedded value of None, it is considered to be two separate line segments, regardless of the setting of gap_fraction. So, I would suggest leaving the data value out entirely, rather than setting it to None.

Does that make sense?


I believe I got it! I think my problem is with using the pond.py to read the data and how it reacts when the field is empty. If I understand well, a command:


event.record['leafTemp1'] = float(value[0])

will add the data to the database even if nothing is there to be add. Instead of simply skipping the add operation, it will add empty data represented by None. None breaks line in plot. If the above line would not execute, nothing would be placed in the database and the line would not be broken.

Now I save something in each pass: real data or "Hey, this is empty!". Instead, I want to save the data only once per hour and nothing in between!

I do not know Python but am a programmer. I expected this in pond.py should fix the issue:
if value[0]:
   
event.record['leafTemp1'] = float(value[0])

I tried this but it seems to not work. But, if I am on the right track, I may have been to quick. To test, I can clear the fields between measurements.

 

Tom Keffer

unread,
May 20, 2020, 8:06:55 AM5/20/20
to weewx-user
Sorry. My fault for not thinking this through. It will still end up being a null value in the database, so when it's read back in, it takes on the value of None.

I think you're back to either a separate database, or using large aggregation intervals and using aggregation type ".last".

-tk


--
You received this message because you are subscribed to the Google Groups "weewx-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to weewx-user+...@googlegroups.com.

Pila

unread,
May 20, 2020, 11:46:19 AM5/20/20
to weewx-user
Precisely what I found out: there are no (soilTemp2 = '' ) records, records are either: (soilTemp2 IS NULL) or (soilTemp2 > 1). There is no third option.

But, what stil bugs me: Plot does not acknowledge in any way (dot, segment nor anything) sporadic data in soilTemp2 field. If data is in every 5 minutes spaced records - all is well. As soon as one record has an empty soilTemp2 field, plot breaks. It does not display dot alone or segment alone. Nothing. Plot resumes only when both neighbour records have data in the field soilTemp2.

But OK, I can try this variants now :) Thanks.



Pila

unread,
May 20, 2020, 2:19:23 PM5/20/20
to weewx-user
This possibly returns me to my first try. Why the line_gap_fraction has no effect? It describes the problem perfectly. I have a gap in the data. But neither data points are connected nor even shown when they are in the gap..

If there is a time gap in the data, the option line_gap_fraction controls how line plots will be drawn.
Reply all
Reply to author
Forward
0 new messages