'wee_database --calc-missing' long to execute

128 views
Skip to first unread message

Graham Eddy

unread,
Mar 8, 2022, 7:42:56 PM3/8/22
to weewx...@googlegroups.com
presently my vp2 and gw1000 data are merged in one database, and i am splitting them.
i took the weewx.sdb and placed it in new weewx 4.7.0 environment.
i ran weewx a couple of times configured to create empty vp2.sdb and gw1000.sdb (with vp2.conf and gw1000.conf respectively pointing to their own schemas and dbs).
i then copied the relevant observations (own program) from weewx.sdb columns into vp2.sdb and gw1000.sdb then ran ‘wee_database —conf=__.conf —calc-missing’ to fill in the respective derived values.
the vp2 (with 16 years of data) ran in about 10 mins, as expected.
the gw1000 (with 2 years of data) has been running for 12 hours so far, at about 7000 recs/hour, and at present rate will take about 2 days in total, which ain’t gonna fly when i try actual migration of production environment.
note: when processing the vp2 data, cpu is typically 98%, but with gw1000 data wee_database dawdles on 0.3% and usually lower.

before i dive into detailed analysis on why —calc-missing is orders of magnitude slower on the gw1000 data than the vp2 data, can someone provide me some pointers, guidance where to look?

(if this doesn’t work, i’ll try again by moving columns around copies of weewx.sdb rather than programmatically splitting the data, and see if i get —calc-missing issue again…)
cheers
 Graham Eddy 

vince

unread,
Mar 8, 2022, 8:11:34 PM3/8/22
to weewx-user
Need some info on which computer(s) you are doing this on.   Normally when I see long run times and low cpu, it's out of RAM or waiting for i/o.  What does something like 'htop' show ?

Have you tried to split your input data into smaller pieces ?  Perhaps try a year as a test and see if that's still massively slow ?

Graham Eddy

unread,
Mar 8, 2022, 10:03:37 PM3/8/22
to weewx...@googlegroups.com
it’s an 8MB RPi 4B and RAM is abundant, though using μSD card for filesys.
the interesting thing is that there are two datasets, one large and one small, and the large one is quick but the small one is orders of magnitude slower. the larger dataset (vp2-originated) gobbles >95% cpu but the smaller dataset (gw1000-originated) <0.3%. it must be something to do with the nature of the data, not the quantity.
 Graham Eddy 

On 9 Mar 2022, at 12:11 pm, vince <vince...@gmail.com> wrote:

Need some info on which computer(s) you are doing this on.   Normally when I see long run times and low cpu, it's out of RAM or waiting for i/o.  What does something like 'htop' show ?

Have you tried to split your input data into smaller pieces ?  Perhaps try a year as a test and see if that's still massively slow ?


--
You received this message because you are subscribed to the Google Groups "weewx-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to weewx-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/weewx-user/5e57077d-8e2e-4529-92f9-d70581d3f6cbn%40googlegroups.com.

vince

unread,
Mar 9, 2022, 12:16:34 AM3/9/22
to weewx-user
Well I'd still try either splitting it up into pieces, or running it and measuring its resource usage in another shell.   If it's not out of ram and you're not pegging the CPU than waiting for i/o is the only thing left I'd guess.  Run htop or the like to see.  

Graham Eddy

unread,
Mar 9, 2022, 7:09:25 PM3/9/22
to weewx...@googlegroups.com
for gw1000 dataset, i had been using top and iotop previously to ascertain that cpu, memory and i/o usage were extremely low, and ps-efl showed it was spending its time waiting on interrupt. i would usually just conclude it was slow disk and spending all its time waiting on i/o completion *except* this is only for the smaller gw1000 dataset not the larger vp2 dataset. it is something to do with the different nature of the data (perhaps something as simple as different missing data being calc’ed)

to give an idea of the magnitude of the difference, using built-in shell time to run command:


recs real /sec user /sec sys /sec Idle /%
vp2 --rebuild-daily 505,336 165 148 2 9
vp2 --calc-missing 505,336 571 525 18 5
gw1000 --rebuild-daily 162,882 86 81 1 5
gw1000 --calc-missing 162,882 23,758 301 13 99

as it stands right now, for migration of production to split environment, i will have to
  * take a database snapshot and build the equivalent temp gw1000.sdb before migrating
  * do —calc-missing offline on the temp gw1000.sdb (7 hours !!)
  * dump the were-missing values in temp gw1000.sdb into a file
  * when dumped data avail, stop production system, split the databases, load the dumped were-missing values into gw1000.sdb
  * run —calc-missing on the interval only between dump and now ← hopefully not long, gw1000 data being lost!
  * start new production system on split databases

does anyone have insight into the origin of the wait-for-interrupt plaguing my gw1000 dataset migration? perhaps some wxxtypes in do_calculations() have a realtime delay built in? perhaps the yield in genBatchRecords() is not context switching to another thread effectively (internal python issue)? has anyone seen such behaviour elsewhere?

cheers
 Graham Eddy 

graha...@gmail.com

unread,
Mar 14, 2022, 10:50:36 PM3/14/22
to weewx-user
i reduced the --calc-missing time from 7 hours to 7 minutes by such a simple trick that i kick myself for not seeing it earlier - i moved the database to ramdisk and symlinked it under archive, ran wee_database, then moved the database back. chalk this one under 'handy tips'

[snip] 

William Garber

unread,
Feb 20, 2023, 7:46:29 AM2/20/23
to weewx-user
I am having the same problem.  I have about 130,000 different datetimes (records) in weewx.sdb.  I also tried moving it to a ramdisk.  Still extremely slow.  Any help please?  Should I run calc-missing on individual timeslices like one week periods?  The data covers one year of measurements.

William Garber

unread,
Feb 20, 2023, 9:51:33 AM2/20/23
to weewx-user
This is a year of data.  It works much better if you process it in small chunks.  I think this is a bug. 
I changed the schema so it was necessary to add --rebuild-daily first.
The following script behaves radically differently depending on the step size (+4 is good).
If the step size is too large (too many days at once) it gets stuck at 6000 records predictably.
Small chunks are effective but you have to press "y" to continue every 4 days.
<pre><code>
#!/bin/bash
rm end-date.log
# next line helps a lot
wee_database --rebuild-daily
for (( past=-392; past <= 0; past=past+4 ))
# for (( past=-301; past <= 0; past=++ ))
do
    past1=$(( past ))
    past2=$(( past + 4 ))
    DATE1=$(date -d "today $past1 days" +%F)
    DATE2=$(date -d "today $past2 days" +%F)
    echo
    echo "processing DATE=$DATE1"
    # wee_database --calc-missing --date="$DATE1"
    wee_database --calc-missing --from="$DATE1" --to="$DATE2"
    res=$?
    echo "res=$res"
    [[ $res -eq 0 ]] || exit 1
    echo $DATE >> end-date.log
done
# the final one runs MUCH FASTER and fixes any glitches
wee_database --calc-missing
# eee eof
</code></pre>
Reply all
Reply to author
Forward
0 new messages