Consolidate database interval records?

168 views
Skip to first unread message

Pat

unread,
Oct 15, 2019, 6:10:21 PM10/15/19
to weewx-development
Back when I started with weewx in 2015 I had a 1 minute archive interval. I kept it that way for a few months until I realized more in how the archive interval works. 

I am performing some wee_import functions to get calc_missing on some records. wee_import warns about the mismatch in the intervals, which got me thinking:

Is it possible to consolidate all my "1" interval to match the rest of my database at 5 minute intervals?

Thomas Keffer

unread,
Oct 15, 2019, 8:41:41 PM10/15/19
to Pat, weewx-development
Yes, it's certainly possible, but you would have to write a Python script to do it. You could start with the transfer_db.py script, located in the examples subdirectory. 

--
You received this message because you are subscribed to the Google Groups "weewx-development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to weewx-developm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/weewx-development/99ddc189-5354-4c2f-88d0-27e0df50db5d%40googlegroups.com.

Pat

unread,
Oct 15, 2019, 9:31:16 PM10/15/19
to weewx-development
OTOH, I'm not entirely sure how I'd merge these records successfully and accurately. I certainly could export this subset of records to a CSV and use wee_import - if wee_import's interval = conf is an optional way to accomplish this?


On Tuesday, October 15, 2019 at 8:41:41 PM UTC-4, Tom Keffer wrote:
Yes, it's certainly possible, but you would have to write a Python script to do it. You could start with the transfer_db.py script, located in the examples subdirectory. 

On Tue, Oct 15, 2019 at 3:10 PM Pat <p...@obrienphoto.net> wrote:
Back when I started with weewx in 2015 I had a 1 minute archive interval. I kept it that way for a few months until I realized more in how the archive interval works. 

I am performing some wee_import functions to get calc_missing on some records. wee_import warns about the mismatch in the intervals, which got me thinking:

Is it possible to consolidate all my "1" interval to match the rest of my database at 5 minute intervals?

--
You received this message because you are subscribed to the Google Groups "weewx-development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to weewx-de...@googlegroups.com.

gjr80

unread,
Oct 15, 2019, 11:02:45 PM10/15/19
to weewx-development
wee_import does not do any form of record aggregation, it just reads in the records as they appear in the source and applies one of three methods of determining the interval field for each record. Firstly it can derive the interval field as the difference between successive record timestamps (works well for complete sets of data ie no missing records but not good for source data that is incomplete (eg WU)). Secondly, you can set the interval field equal to the weewx.conf interval value (this suits if you are importing your own records or if otherwise the weewx.conf interval is the same as your source interval - this is better for handling sources with some missing records). Finally, you can force the interval field to some number (again suits the case where the source has a constant interval but some records are missing). At the end of the day if you have 100 records in your source you will end up with 100 records in your imported data.

If you had source data that had one minute spaced timestamps and you imported it with a five minute interval you would have an obvious discrepancy between the timestamps and the interval field of your imported data. I expect (not tested) that wee_import would not complain. Subsequent use of the imported data by WeeWX might cause a problem, would expect that aggregates would be fine but am sure something somewhere might get upset.

My advice, leave it as it is unless it is causing an issue.

Gary

Pat

unread,
Oct 16, 2019, 5:26:15 PM10/16/19
to weewx-development
Thanks. Not causing an issue - just would clean up about 20,000 rows and keep things clean and consistent. 

Cameron D

unread,
Oct 16, 2019, 9:09:26 PM10/16/19
to weewx-development
Is there any reason you could not just use sqlite's own consolidation functions...

create temporary table cons5min as
select max(dateTime) as dateTime,
    usUnits
, 5 as interval,
    avg
(barometer) as barometer,
    avg
(inTemp) as inTemp,
   
<all the other data>
   
from archive
   
where interval = 1
   
group by (dateTime-60) / 300  ;



choosing appropriate functions, such as max for windgust

then delete the lines with interval=1
and
insert into archive select * from cons5min;


Thomas Keffer

unread,
Oct 16, 2019, 9:16:47 PM10/16/19
to weewx-development
Very clever!

Thanks, Cameron! Now I know who to email if I get stuck with SQL. :-)

-tk

--
You received this message because you are subscribed to the Google Groups "weewx-development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to weewx-developm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/weewx-development/8fc3a53f-8aab-4663-a375-2ae308fd99e2%40googlegroups.com.

Pat

unread,
Oct 16, 2019, 9:19:19 PM10/16/19
to weewx-development
This is great! 

I use MySQL primarily as my source, but frequently flip/flop for testing various things. I can export the 20,000 rows to SQLite, run this then import the new rows back to MySQL. Thanks!

Thomas Keffer

unread,
Oct 16, 2019, 9:29:35 PM10/16/19
to Pat, weewx-development
The command should work in MySQL as well.

--
You received this message because you are subscribed to the Google Groups "weewx-development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to weewx-developm...@googlegroups.com.

Pat

unread,
Oct 16, 2019, 9:31:12 PM10/16/19
to weewx-development
Excellent. Good to know. I have a disposable SQLite already setup to go. Working on finishing the query now as practice. We already know my SQL is a little weak tonight :-)
To unsubscribe from this group and stop receiving emails from it, send an email to weewx-de...@googlegroups.com.

Cameron D

unread,
Oct 16, 2019, 9:32:53 PM10/16/19
to weewx-development
In that case, just work in mysql directly - it is generally more versatile than sqlite3.

I see no benefit in transferring to sqlite3 as an intermediate, I was just assuming that was where your data was stored.

It goes without saying that you back up your db before embarking on this.

Thomas Keffer

unread,
Oct 16, 2019, 9:35:39 PM10/16/19
to Pat, weewx-development
One thing though: you might have to FLOOR the "GROUP BY" clause:

create temporary table cons5min as
select max(dateTime) as dateTime,
    usUnits, 5 as interval,
    avg(barometer) as barometer,
    avg(inTemp) as inTemp,
    <all the other data>
    from archive
    where interval = 1
    group by FLOOR((dateTime-60) / 300);

Otherwise thing get grouped by the floating point result. That is, they don't get grouped at all.



To unsubscribe from this group and stop receiving emails from it, send an email to weewx-developm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/weewx-development/65ecf732-75a7-4852-88bb-225b10cb2949%40googlegroups.com.

Cameron D

unread,
Oct 16, 2019, 11:02:26 PM10/16/19
to weewx-development
I had already tried it in sqlite3 - the calcs were done as INT without needing to cast.
Not sure about mysql.
OTOH, a FLOOR will make it doubly safe.

Vince Skahan

unread,
Oct 17, 2019, 10:32:02 AM10/17/19
to weewx-development
On Wednesday, October 16, 2019 at 6:09:26 PM UTC-7, Cameron D wrote:
Is there any reason you could not just use sqlite's own consolidation functions...


oooh that's pretty slick - perhaps wiki-worthy ?????

Cameron D

unread,
Oct 18, 2019, 2:07:44 AM10/18/19
to weewx-development
If is worth a wiki entry, then much better if Pat could post the full script - my version has lots of bits left to the imagination.

Pat

unread,
Oct 18, 2019, 8:46:10 AM10/18/19
to weewx-development
I'm sure I'm not the only one who was looking to do this, so the wiki makes sense. If this goes into the wiki, a more automated approach would probably be best.

Here's what I ended up doing. A little more manual than I'd like but I had my MySQL database in SQLite already so I just worked on that database as my test database. 
  1. I ran the SQL Query below and did a manual QC spot check on the records. 
  2. When I felt like it seemed acceptable, I just exported the temporary table from SQLite to CSV (using DB Browser for SQLite on Windows)
  3. Then on MySQL I did DELETE FROM archive WHERE `interval` = 1; 
  4. Then imported the CSV file to MySQL
If someone could review the aggregation for each observation I chose, that would be appreciated. (I have backups so I can always do this again if it's not right.) 

create temporary table cons5min as
select max(dateTime) as dateTime,
    usUnits
,
   
5 as interval,
    avg
(barometer) as barometer,

    avg
(pressure) as pressure,
    avg
(altimeter) as altimeter,
    avg
(inTemp) as inTemp,
    avg
(outTemp) as outTemp,
    avg
(inHumidity) as inHumidity,
    avg
(outHumidity) as outHumidity,
    avg
(windSpeed) as windSpeed,
    max
(windDir) as windDir,
    max
(windGust) as windGust,
    avg
(windGustDir) as windGustDir,
    avg
(rainRate) as rainRate,
    avg
(rain) as rain,
    avg
(dewpoint) as dewpoint,
    avg
(windchill) as windchill,
    avg
(heatindex) as heatindex,
    avg
(ET) as ET,
    avg
(radiation) as radiation,
    avg
(UV) as UV,
    avg
(extraTemp1) as extraTemp1,
    avg
(extraTemp2) as extraTemp2,
    avg
(extraTemp3) as extraTemp3,
    avg
(soilTemp1) as soilTemp1,
    avg
(soilTemp2) as soilTemp2,
    avg
(soilTemp3) as soilTemp3,
    avg
(soilTemp4) as soilTemp4,
    avg
(leafTemp1) as leafTemp1,
    avg
(leafTemp2) as leafTemp2,
    avg
(extraHumid1) as extraHumid1,
    avg
(extraHumid2) as extraHumid2,
    avg
(soilMoist1) as soilMoist1,
    avg
(soilMoist2) as soilMoist2,
    avg
(soilMoist3) as soilMoist3,
    avg
(soilMoist4) as soilMoist4,
    avg
(leafWet1) as leafWet1,
    avg
(leafWet2) as leafWet2,
    avg
(rxCheckPercent) as rxCheckPercent,
    avg
(txBatteryStatus) as txBatteryStatus,
    avg
(consBatteryVoltage) as consBatteryVoltage,
    avg
(hail) as hail,
    avg
(hailRate) as hailRate,
    avg
(heatingTemp) as heatingTemp,
    avg
(heatingVoltage) as heatingVoltage,
    avg
(supplyVoltage) as supplyVoltage,
    avg
(referenceVoltage) as referenceVoltage,
    avg
(windBatteryStatus) as windBatteryStatus,
    avg
(rainBatteryStatus) as rainBatteryStatus,
    avg
(outTempBatteryStatus) as outTempBatteryStatus,
    avg
(inTempBatteryStatus) as inTempBatteryStatus
    FROM archive WHERE interval
= 1
    GROUP BY
(dateTime-60) / 300;
   

    

Thomas Keffer

unread,
Oct 18, 2019, 9:05:32 AM10/18/19
to Pat, weewx-development
Almost. The aggregation type is wrong for some of the columns. You want

sum(rain) as rain
sum(ET) as ET
sum(hail) as hail

There is also the problem of windDir and windGustDir. To do these properly, you have to switch from polar notation (scalar, direction) to vector components (xvec, yvec), and do the averaging with the latter. 

Something like (I'm making this up and it is NOT TESTED):

avg(windSpeed * cos((90.0 - windDir) / 57.3)) as xvec,
avg(windSpeed * sin((90.0 - windDir) / 57.3)) as yvec,
avg(windSpeed) as windSpeed,
90.0 - arctan2(yvec, xvec) * 57.3 as windDir

No idea whether SQL offers math functions, but hopefully you get the idea.

-tk




--
You received this message because you are subscribed to the Google Groups "weewx-development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to weewx-developm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/weewx-development/12d539d3-ef47-425b-ae92-c9850b85031c%40googlegroups.com.

Cameron D

unread,
Oct 18, 2019, 11:03:57 AM10/18/19
to weewx-development
SQLite3 has no trig functions, however the recent versions of "DB Browser for sqlite" (sqlitebrowser.org) have them as an extension pack.

I suppose the windgustdir should not be averaged - you need to take the value at the time of the chosen max(windgust)
I think that will take another pass to load into the temporary table, but I'd need to think about it and do a bit of experimentation.  Tomorrow (unless someone else can offer the answer)


On Friday, 18 October 2019 23:05:32 UTC+10, Tom Keffer wrote:
...
Message has been deleted

Thomas Keffer

unread,
Oct 19, 2019, 8:36:36 AM10/19/19
to Cameron D, weewx-development
Very sweet, Cameron! 

Somebody capture this in the Wiki!

-tk

On Sat, Oct 19, 2019 at 5:28 AM Cameron D <Cgo...@davidsoncj.id.au> wrote:
Was a bit more painful than necessary, but...

There are now 3 temporary tables and intermediate calculations.
It might be possible to do it more concisely, but this way gives you intermediate steps to check if something goes wrong.

I spent most time because I figured it should not need the full direction rotation, reversal and then back again. Eventually swapping X and Y does that.
I wasn't helped by excel swapping the order of arguments to atan2().
The sqlite maths extension also provides the degrees() and radians() functions.

-- create first temporary table for wind vector averaging
-- It would be easier in table c5 except that sqlite3 cannot drop columns later.

create temporary table wind
as
 
select max(dateTime) as dateTime,
   
1 as windDir,
    avg
(windSpeed * cos(radians(windDir))) as yvec,
    avg
(windSpeed * sin(radians(windDir))) as xvec
    FROM archive WHERE interval
= 1
    GROUP BY FLOOR
((dateTime-30) / 300);

-- create the temp table containing the newly consolidated results
-- the column names here should match the original database archive table,
-- for ease of re-inserting into archive

create temporary table c5
as

 
select max(dateTime) as dateTime,
    usUnits
,
   
5 as interval,
    avg
(barometer) as barometer,
    avg
(pressure) as pressure,
    avg
(altimeter) as altimeter,
    avg
(inTemp) as inTemp,
    avg
(outTemp) as outTemp,
    avg
(inHumidity) as inHumidity,
    avg
(outHumidity) as outHumidity,
    avg
(windSpeed) as windSpeed,

   
1 as windDir,
    max
(windGust) as windGust,
   
1 as windGustDir,
    avg
(rainRate) as rainRate,
    sum
(rain) as rain,

    avg
(dewpoint) as dewpoint,
    avg
(windchill) as windchill,
    avg
(heatindex) as heatindex,

    sum
(ET) as ET,

    avg
(radiation) as radiation,
    avg
(UV) as UV,
    avg
(extraTemp1) as extraTemp1,
    avg
(extraTemp2) as extraTemp2,
    avg
(extraTemp3) as extraTemp3,
    avg
(soilTemp1) as soilTemp1,
    avg
(soilTemp2) as soilTemp2,
    avg
(soilTemp3) as soilTemp3,
    avg
(soilTemp4) as soilTemp4,
    avg
(leafTemp1) as leafTemp1,
    avg
(leafTemp2) as leafTemp2,
    avg
(extraHumid1) as extraHumid1,
    avg
(extraHumid2) as extraHumid2,
    avg
(soilMoist1) as soilMoist1,
    avg
(soilMoist2) as soilMoist2,
    avg
(soilMoist3) as soilMoist3,
    avg
(soilMoist4) as soilMoist4,
    avg
(leafWet1) as leafWet1,
    avg
(leafWet2) as leafWet2,
    avg
(rxCheckPercent) as rxCheckPercent,
    avg
(txBatteryStatus) as txBatteryStatus,
    avg
(consBatteryVoltage) as consBatteryVoltage,

    sum
(hail) as hail,

    avg
(hailRate) as hailRate,
    avg
(heatingTemp) as heatingTemp,
    avg
(heatingVoltage) as heatingVoltage,
    avg
(supplyVoltage) as supplyVoltage,
    avg
(referenceVoltage) as referenceVoltage,
    avg
(windBatteryStatus) as windBatteryStatus,
    avg
(rainBatteryStatus) as rainBatteryStatus,
    avg
(outTempBatteryStatus) as outTempBatteryStatus,
    avg
(inTempBatteryStatus) as inTempBatteryStatus
    FROM archive WHERE interval
= 1

    GROUP BY FLOOR
((dateTime-30) / 300);

-- now do the vector averaging
update wind
   
set    windDir = degrees(atan2(xvec, yvec) );
   
update wind
   
set windDir = windDir + 360.0 where windDir < 0.0;

update c5
    SET  windDir
= (
       
select wind.windDir
           
from wind
           
where c5.dateTime = wind.dateTime
       
);

-- wind gust averaging is not done, since the gust is
-- the  maximum ovre the consolidation period. This
-- extracts the direction at which the maximum was observed.
   
create temporary table gusts
as
select
    c5
.dateTime as dateTime,
    archive
.dateTime as arcdt,            -- just for checking
    archive
.windGustDir as windGustDir,
    archive
.windGust as windGust        -- just for checking
   
from archive join c5
    on archive
.windGust = c5.windGust and FLOOR((archive.dateTime-30) / 300) = FLOOR((c5.dateTime-30)/300)
    order
by dateTime;

update c5
    SET  windGustDir
= (
       
select gusts.windGustDir
           
from gusts
           
where c5.dateTime = gusts.dateTime
       
);


--
You received this message because you are subscribed to the Google Groups "weewx-development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to weewx-developm...@googlegroups.com.

Cameron D

unread,
Oct 19, 2019, 9:21:22 AM10/19/19
to weewx-development
OK I can try that - I presume you are referring to the wiki on Github.
I have just created an account, and I see an "edit" button has appeared, so I guess I can mess about with it.
I will add to it after I have had a bit of a play about.

Pat

unread,
Oct 19, 2019, 3:50:28 PM10/19/19
to weewx-development
Latest queries worked well for me but I had to remove the FLOOR() statement. Even after loading the math extension pack for SQLite Browser. It didn't give me an error it just said "modified 0 rows". Removing it made it modify the rows it needed to. 

Cameron D

unread,
Oct 28, 2019, 11:17:51 AM10/28/19
to weewx-development
Wiki has been written up.  Do not use the code I posted earlier in this thread - there are mistakes in it.

It took a bit longer than expected, as the code needed significant reworking for various reasons..
I (foolishly) decided to test out the code with my database, that has about 3 years of samples at 1 minute interval - about 1.5 million rows - and to make sure it works on mysql.

As a point of reference I later wrote an awk script to consolidate the wind gust data on a csv file (without the extraneous data fields) and it processed the entire 1.5 million records in 1.5 seconds.
On the other hand, using my original script, I had to kill the MariaDB server process after an hour or so.  Mysql server on windows and sqlite gave similar times - once I cut the database down to times I could measure to completion.
It turns out trying to calculate the rounded date/time in the join on  windgust speed turned the process from linear to N-squared. Adding an extra precalculated time column rounded to the 5 minute value returned it to sensible times.

I am at a loss to understand Pat's problem with floor(). It certainly worked for me with the sqlite browser. But it is redundant in any case, I will just mention that I have noticed the sqlite browser sometimes reporting 0 rows modified when I expected changes, and everything still seemed to work as expected.

Thomas Keffer

unread,
Oct 28, 2019, 11:49:54 AM10/28/19
to Cameron D, weewx-development
Nicely done. Thanks, Cameron!

--
You received this message because you are subscribed to the Google Groups "weewx-development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to weewx-developm...@googlegroups.com.

Pat

unread,
Oct 28, 2019, 8:52:20 PM10/28/19
to weewx-development
Just tried it again, and everything worked great. Even the floor(). Not sure what I had wrong before. Thanks again! 
Reply all
Reply to author
Forward
0 new messages