Bad Data after Maintenance

160 views
Skip to first unread message

Andre

unread,
Jul 6, 2020, 10:28:14 AM7/6/20
to weewx-user
After maintenance of my Davis Vantage Vue, I have an extremely high value of 13107.0 mm/hr on 4 July 2020 09:33 for the value "highest rain rate per day". How can I delete or correct this high value from the database?

gjr80

unread,
Jul 6, 2020, 10:33:07 AM7/6/20
to weewx-user
Hi,

The wiki article “Cleaning up old ‘bad’ data” (https://github.com/weewx/weewx/wiki/Cleaning-up-old-'bad'-data) would be a good starting point.

Gary

Andre

unread,
Jul 6, 2020, 1:49:39 PM7/6/20
to weewx-user
Thanks, Gary.

I've tried this

echo "SELECT * FROM archive WHERE (RainRate > 13);" | sqlite3 /home/pi/weewxcopy.sdb
1581284100|1|5|28.994532150147|||29.005||||||||||||4.72|43.3322839672562||0.0|||||||||||||||||||||46.1|||||||50.0|69.5||-88.0|-90.0|2.0|8.0||||||||||||90.0|46.1||||||28.8686683318007||0.11811023625||13.3385826805||99.9375|||||||||||||||2.0|8.0|2.0|8.0|-88.0|-90.0|-88.0|-90.0||0.0||||46.1|202.5|10.0|337.5||2.0
1581284400|1|5|28.9915092022014|||29.002||||||||||||4.72|41.9497362666115||0.0|||||||||||||||||||||44.7|||||||50.0|69.5||-88.0|-90.0|2.0|8.0||||||||||||90.0|44.7||||||28.8656567623561||0.18110236225||13.3385826805||99.9375|||||||||||||||2.0|8.0|2.0|8.0|-88.0|-90.0|-88.0|-90.0||0.0||||44.7|45.0|14.0|112.5||3.0

but I can't see which data is bad. Is it possible that Highest Daily RainRate is a calculated value? If so what will be the right way to fix my 13107 mm/hr highest daily rainrate on 4 July 2020 09:33?
Message has been deleted

gjr80

unread,
Jul 7, 2020, 1:14:09 AM7/7/20
to weewx-user
Probably easiest to limit the columns being returned (* returns all columns). Try something like:

echo "SELECT dateTime,rainRate FROM archive WHERE (RainRate > 13);" | sqlite3 /home/pi/weewxcopy.sdb

That should return two columns only; the first will be dateTime and the second rainRate. Remember also that the number used in your query needs to be in the rainRate units used in your database. If your database uses US units (the default) it will be in inches, if METRIC it is cm, if METRICWX it is mm. You can check what unit system your database uses by looking at the target_unit setting under [StdConvert] in weewx.conf or you can add the field usUnits to the above query:

echo "SELECT dateTime,usUnits,rainRate FROM archive WHERE (RainRate > 13);" | sqlite3 /home/pi/weewxcopy.sdb

1 = US, 16 = METRIC, 17 = METRICWX

Any daily rainRate aggregate will be calculated or come directly from the rainRate daily summary table, but for the purposes of fixing 'bad' data that is largely irrelevant. The daily summaries are part of an optimization strategy that improves performance when obtaining aggrgegates over multiples of one day. The daily summary tables are derived from data in the archive table. So what? Well when it comes to fixing 'bad' data it means that you need to fix the 'bad' data in the archive table and then rebuild the daily summaries once you have removed the 'bad' data from the archive tale. If you work on fixing the bad data in the daily summary then it might fix the symptom short term but underlying 'bad' data in the archive table will remain and eventually there will be a daily sumamry rebuild which will overwrite your corrected data with the 'bad'.

Gary

Andre

unread,
Jul 7, 2020, 7:06:06 AM7/7/20
to weewx-user
Thanks again, Gary.
This is my result of the query

 echo "SELECT dateTime,usUnits,rainRate FROM archive WHERE (RainRate > 13);" | sqlite3 /home/pi/weewxcopy.sdb
1581284100|1|13.3385826805
1581284400|1|13.3385826805

I don't understand the dateTime 1581284100 which means 02-09-2020, 22:35:00. But my "bad data" are from 4 July 2020 09:33. I can't see where the "bad data" come from.

Graham Eddy

unread,
Jul 7, 2020, 7:20:04 AM7/7/20
to weewx...@googlegroups.com
usUnits==1 → rainRate is inches/hour. are you looking for > 13 inches/hour?

also, for easier reading of timestamps, you could use
  SELECT datetime(dateTime,’unixepoch’,’localtime'),usUnits,rainRate FROM archive ...

Andre

unread,
Jul 7, 2020, 2:03:27 PM7/7/20
to weewx-user
My first thought was that it would be a good idea to search for "bad data" in rainRate.
So I tried again to search with
 
echo "SELECT datetime(dateTime,'unixepoch','localtime'),usUnits,rainRate FROM archive WHERE (RainRate);" | sqlite3 /home/pi/weewxcopy.sdb

Please see result.txt - As you can see, no "bad data" is apparent. Also my HighCharts images are without any abnormal outlier.
My actual skin is Belchertown and maybe it could be a display issue. Not sure.
You could see it live here www.wetter-norderstedt.de - please take a look at "Stärkster Regen: 13107,0 mm/h" on frontpage.
I'm now not sure where I can take another look for identifying wrong values.
result.txt

Andre

unread,
Jul 7, 2020, 2:41:00 PM7/7/20
to weewx-user
I've noticed in result.txt there is a 5 hour gap between 2020-07-04 07:05:00|1|0.0314960628 and 2020-07-04 12:00:00|1|0.1496062983.
Could this caused my issue?

Graham Eddy

unread,
Jul 7, 2020, 3:14:46 PM7/7/20
to weewx...@googlegroups.com
there would be lots of such gaps - you excluded rainRate==0 from your selection

if there are no bad values now in the 5 minute data, presumably the bad values are in summary data, so rebuild your summary tables (instructions in manual)

after that, if no bad values in 5 minute data or in summary data, it would have to be the graph generation software

--
You received this message because you are subscribed to the Google Groups "weewx-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to weewx-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/weewx-user/c744f102-1a69-461b-b69b-4e48faa5e96do%40googlegroups.com.

gjr80

unread,
Jul 8, 2020, 2:28:42 AM7/8/20
to weewx-user
I suspect you will find the offending value in the rainRate daily summary table, the following can be used to check:

echo "SELECT datetime(dateTime,'unixepoch','localtime'),max,datetime(maxtime,'unixepoch','localtime') FROM archive_day_rainRate WHERE (max>13);" | sqlite3 /home/pi/weewxcopy.sdb

One mechanism that could cause this is if under [StdArchive] you have loop_hilo = true (the default). This will cause WeeWX to include loop packet data when determining the high and low value for each observation for the day. The vantage stations emit rainRate in each loop packet. According to Davis application note 28 rainRate is calculated every time the rain gauge bucket tips and is based on the elapsed time since the previous bucket tip. It is not clear how rainRate is calculated in a console generated archive record, but I think it would be fair the say it would be some sort of averaged value. Consequently in a given archive period some loop rainRate values will be higher than the archive record rainRate and some will be lower. If loop_hilo = True then WeeWX will record the highest loop rainRate value in the max field in the applicable days row in the rainRate daily summary. On the other hand the archive table will only record the rainRate value from the corresponding archive record and the max rainRate value is not used (recorded) at all in the archive table (note that if software record generation was used WeeWX would calculate the archive record rainRate value as the average of the loop rainRate values seen in the archive period).

So seeing a rainRate value in the daily summary tables that is substantially higher than the highest rainRate value in the archive table is quite possible, in fact I would say almost certain. A similar effect happens with other observations but the effect tends to not be noticed very much as temperatures, pressures etc tend not to vary much between loop packets whereas rainRate can and does vary significantly. A similar effect will likely exist for wind speeds where there can be a rapid change in speed between loop packets.

So it comes down to is 1300mm/hr an unreasonable figure. Using the the application note 28 informtion and a 0.2mm rain bucket a rain rate of 1300mm/h would equate to 6500 tips/hour or 0.55 seconds between tips. Is it possible to have 0.55 seconds between two tips (remember based on the Davis method of calcualting rainRate it only takes two tips 0.55 seconds apart to register 1300 mm/hr)? I don't know, probably unlikely but I don't think you can say impossible.

In any case I guess it is up to you whether you leave it or clear it. Simply rebuilding the daily sumamries will clear the 1300 mm/hr value and re-calaculate the rainRate max/min/maxtime/mintime based on the archive data.

Of course I could be completey wrong and this culd be a totally bogus value that has somehow appeared in the rainRate daily summary, but given it occurred on a day when you had a good bit of rain.....

Gary


Andre

unread,
Jul 8, 2020, 8:05:10 AM7/8/20
to weewx-user
Gotcha! Many thanks, Gary!

I've tried this

echo "SELECT datetime(dateTime,'unixepoch','localtime'),max,datetime(maxtime,'unixepoch','localtime') FROM archive_day_rainRate WHERE (max>500);" | sqlite3 /home/pi/weewxcopy.sdb

and the result is

2020-07-04 00:00:00|516.0236188995|2020-07-04 09:33:45

This is exactly the bad value.

My question is now, do I NULL this entry or do I delete this value?
What is the correct  SELECT statement for this?

Regards, Andre

gjr80

unread,
Jul 8, 2020, 8:28:17 AM7/8/20
to weewx-user
Just use wee_database to rebuild the daily summaries for that day only (use the —rebuild-daily and —date command line options).

Gary

Andre

unread,
Jul 8, 2020, 12:35:36 PM7/8/20
to weewx-user
Everything is fine now except the monthly maximum rain rate on the homepage. It is done via AJAX and I have no idea how to correct this. Maybe Pat reads along here and can give me a hint. See "Stärkster Regen" on my Homepage www.wetter-norderstedt.de

Vetti52

unread,
Jul 26, 2020, 12:07:43 PM7/26/20
to weewx-user
This thread is a very practical recipe for deleting unwanted data. I could reproduce it step by step, except, that the data was shown in archive_day_rain. Thanks a lot!

One thing, I had to consider, was, that after rebuilding, the wrong data in the archive_day_rain were shown again. I found, that the data existed in the archive filed rain also, which were obviously written back to the archive_day_rain table. As I am not familiar with SQL, I found with sqliteadmin.exe a nice graphical explorer and editor to proceed.
So I just replaced the entries with 0 in both data sets and wrote it back to its original site. I hope, this is ok like that. At least, it looks nice now on the weewx web site.
Well, as I did this in a runnig system, all of it had to be completed within 5 minutes.

Peter

Am Mittwoch, 8. Juli 2020 10:28:17 UTC+2 schrieb gjr80:
Reply all
Reply to author
Forward
0 new messages