weewx 4.0.0b16 wee_import - calculate missing data problems

75 views
Skip to first unread message

Lucas Heijst

unread,
Mar 22, 2020, 8:55:13 AM3/22/20
to weewx-development
Tom,

After an upgrade of mariadb10 my databases got corrupted. My most recent backup was a year old, so I'm missing a year of data. :-(
I use weewx utility wee_import of weewx version 4.0.0b16 to receive the missing data from wu, but the calc_missing seems not to work (or I do something wrong!).

1. file ../util/import/wu-example.conf describes a command line parameter "--calc-missing" which is not recognized by program wee_util.

2. My wu.conf file is listed below. Parameter "calc_missing" is set to True, but none of these fields are calculated:
    #   altimeter
    #   ET
    #   pressure
    #   appTemp
    #   cloudbase
    #   humidex
    #   maxSolarRad
    #   windrun

My command line at /home/weewx is:
./bin/wee_import --import-config=wu.conf --config=weewx_wund.conf --from=2019-02-04 --to=2020-03-04

Attached the first 10 exported records of the from WU imported data

Luc

====
# EXAMPLE CONFIGURATION FILE FOR IMPORTING FROM THE WEATHER UNDERGROUND
#
# Copyright (c) 2009-2019 Tom Keffer <tke...@gmail.com> and Gary Roderick.
# See the file LICENSE.txt for your rights.

##############################################################################

# Specify our source. Available options are:
#   CSV - import obs from a single CSV format file
#   WU - import obs from a Weather Underground PWS history
#   Cumulus - import obs from a one or more Cumulus monthly log files
#   WD - import obs from a one or more WD monthly log files
# Format is:
#   source = (CSV | WU | Cumulus)
source = WU

##############################################################################

[WU]
    # Parameters used when importing from a WU PWS

    # WU PWS Station ID to be used for import.
    station_id = lucs_stationid

    # WU API key to be used for import.
    api_key = lucs_apikey

    #
    # When importing WU data the following WeeWX database fields will be
    # populated directly by the imported data (provided the corresponding data
    # exists on WU):
    #   barometer
    #   dateTime
    #   dewpoint
    #   heatindex
    #   outHumidity
    #   outTemp
    #   radiation
    #   rain
    #   rainRate
    #   windchill
    #   windDir
    #   windGust
    #   windSpeed
    #   UV
    #
    # The following WeeWX database fields will be populated from other
    # settings/config files:
    #   interval
    #   usUnits
    #
    # The following WeeWX database fields will be populated with values derived
    # from the imported data provided the --calc-missing command line option is
    # used during import:
    #   altimeter
    #   ET
    #   pressure
    #
    # The following WeeWX fields will be populated with derived values from the
    # imported data provided the --calc-missing command line option is used
    # during import. These fields will only be saved to the WeeWX database if
    # the WeeWX schema has been modified to accept them. Note that the pyephem
    # module is required in order to calculate maxSolarRad - refer WeeWX Users
    # Guide.
    #   appTemp
    #   cloudbase
    #   humidex
    #   maxSolarRad
    #   windrun

    # How will the interval field be determined for the imported records.
    # Available options are:
    #   derive - Derive the interval field from the timestamp of successive
    #            records. This setting is best used when the imported records
    #            are equally spaced in time and there are no missing records.
    #   conf   - Use the interval setting from weewx.conf. This setting is
    #            best used if the records to be imported have been produced by
    #            WeeWX using the same archive interval as set in weewx.conf on
    #            this machine.
    #   x      - Use a fixed interval of x minutes for every record. This
    #            setting is best used if the records to be imported are
    #            equally based in time but there are some missing records.
    #            This setting is recommended for WU imports.
    # Due to WU frequently missing uploaded records, use of 'derive' may give
    # incorrect or inconsistent interval values. Better results may be
    # achieved by using the 'conf' setting (if WeeWX has been doing the WU
    # uploading and the WeeWX archive_interval matches the WU observation
    # spacing in time) or setting the interval to a fixed value (eg 5). The
    # most appropriate setting will depend on the completeness and (time)
    # accuracy of the WU data being imported.
    # Format is:
    #   interval = (derive | conf | x)
    interval = conf

    # Should the [StdQC] max/min limits in weewx.conf be applied to the
    # imported data. This may be useful if the source has extreme values that
    # are clearly incorrect for some observations. This is particulalrly useful
    # for WU imports where WU often records clearly erroneous values against
    # obs that are not reported. Available options are:
    #   True  - weewx.conf [StdQC] max/min limits are applied.
    #   False - weewx.conf [StdQC] max/min limits are not applied.
    # Format is:
    #   qc = (True | False)
    qc = True

    # Should any missing derived observations be calculated from the imported
    # data if possible. Available options are:
    #   True  - Any missing derived observations are calculated.
    #   False - Any missing derived observations are not calculated.
    # Format is:
    #   calc_missing = (True | False)
    calc_missing = True

    # Specify how imported data fields that contain invalid data (eg a numeric
    # field containing non-numeric data) are handled. Available options are:
    #   True  - The invalid data is ignored, the WeeWX target field is set to
    #           None and the import continues.
    #   False - The import is halted.
    # Format is:
    #   ignore_invalid_data = (True | False)
    # Default is True.
    ignore_invalid_data = True

    # Imported records are written to archive in transactions of tranche
    # records at a time. Increase for faster throughput, decrease to reduce
    # memory requirements. Format is:
    #   tranche = x
    # where x is an integer
    tranche = 250

    # Lower and upper bounds for imported wind direction. It is possible,
    # particularly for a calculated direction, to have a value (eg -45) outside
    # of the WeeWX limits (0 to 360 inclusive). Format is:
    #
    # wind_direction = lower,upper
    #
    # where :
    #   lower is the lower limit of acceptable wind direction in degrees
    #   (may be negative)
    #   upper is the upper limit of acceptable wind direction in degrees
    #
    # WU has at times been known to store large values (eg -9999) for wind
    # direction, often no wind direction was uploaded to WU. The wind_direction
    # parameter sets a lower and upper bound for valid wind direction values.
    # Values inside these bounds are normalised to the range 0 to 360. Values
    # outside of the bounds will be stored as None. Default is 0,360
    wind_direction = 0,360
====

Lucas Heijst

unread,
Mar 22, 2020, 8:57:38 AM3/22/20
to weewx-development
archive.sql

Lucas Heijst

unread,
Mar 22, 2020, 9:07:21 AM3/22/20
to weewx-development
Below the syslog during startup

Mar 22 09:12:53 pi37 wee_import[11885] INFO __main__: Starting wee_import...
Mar 22 09:12:53 pi37 wee_import[11885] DEBUG weewx.manager: Daily summary version is 2.0
Mar 22 09:12:53 pi37 wee_import[11885] INFO weeimport.wuimport: Observation history for Weather Underground station 'IPARAMAR5' will be imported.
Mar 22 09:12:53 pi37 wee_import[11885] DEBUG weeimport.wuimport: The following options will be used:
Mar 22 09:12:53 pi37 wee_import[11885] DEBUG weeimport.wuimport:      config=weewx_wund.conf, import-config=/home/weewx/wu.conf
Mar 22 09:12:53 pi37 wee_import[11885] DEBUG weeimport.wuimport:      station=IPARAMAR5, from=2019-02-04, to=2020-03-04
Mar 22 09:12:53 pi37 wee_import[11885] DEBUG weeimport.wuimport:      apiKey=****************************3c8f
Mar 22 09:12:53 pi37 wee_import[11885] DEBUG weeimport.wuimport:      dry-run=None, calc_missing=True, ignore_invalid_data=True
Mar 22 09:12:53 pi37 wee_import[11885] DEBUG weeimport.wuimport:      tranche=250, interval=conf, wind_direction=[0.0, 360.0]
Mar 22 09:12:53 pi37 wee_import[11885] INFO weeimport.wuimport: Using database binding 'wund_binding', which is bound to database 'weewx_wund'
Mar 22 09:12:53 pi37 wee_import[11885] INFO weeimport.wuimport: Destination table 'archive' unit system is '0x10' (METRIC).
Mar 22 09:12:53 pi37 wee_import[11885] INFO weeimport.weeimport: Obtaining raw import data for period 1 ...
Mar 22 09:12:54 pi37 wee_import[11885] INFO weeimport.weeimport: Raw import data read successfully for period 1.
Mar 22 09:12:54 pi37 wee_import[11885] INFO weeimport.weeimport: Mapping raw import data for period 1 ...
Mar 22 09:12:54 pi37 wee_import[11885] INFO weeimport.weeimport: Mapped 275 records.
Mar 22 09:12:54 pi37 wee_import[11885] INFO weeimport.weeimport: Raw import data mapped successfully for period 1.
Mar 22 09:12:54 pi37 wee_import[11885] INFO weeimport.weeimport: Saving mapped data to archive for period 1 ...

gjr80

unread,
Mar 22, 2020, 9:30:44 AM3/22/20
to weewx-development
Luc,

Did you let the wee_import session run until completion? There was a change in the way missing derived obs are calculated by wee_import under WeeWX 4.0. Previously missing derived records were calculated record by record at time of record import, now under 4.0 missing derived obs are calculated in a second pass of the imported data once the import is complete. If you have let wee_import run to its conclusion and the missing derived obs were not calculated can you post the log or console output for the entire wee_import session.

Gary

Message has been deleted

Lucas Heijst

unread,
Mar 22, 2020, 12:09:52 PM3/22/20
to weewx-development
Gary,

The calculated values were indeed calculated after the completed import. As I test I imported a period of 2 days.
So far so good!

In my case I want to import a year of data!

If during this import (which will take very long) something goes wrong, the program is aborted and no derived values will be calculated.
When I start the batch all over from the beginning, tjhe raw records which are already in the database will be skipped (and not overwritten).
The derived values also won't be calculated for those existing  records.

A beter approach would be to calculate the derived values per tranche (default: 250 records). 
Then when something goes wrong, we can start with the a new date (short after the latest catched record).

 Luc

Lucas Heijst

unread,
Mar 22, 2020, 1:27:50 PM3/22/20
to weewx-development
Gary,

I completed a run from 2019-02-04 to 2019-03-01 and got 6957 records.

The derived data went OK.

23 records had NULL for cloudbase, humidex and apptemp because outhumidity was NULL (so far so good)

159 records had NULL for pressure and altimeter. Can't think of why. Which values are needed for those calculations?

Luc

gjr80

unread,
Mar 22, 2020, 9:49:57 PM3/22/20
to weewx-development

On Monday, 23 March 2020 02:09:52 UTC+10, Lucas Heijst wrote:

In my case I want to import a year of data!

Given WU's history of flakiness and given that your year of data will require 350 odd WU API calls via the internet, I would not be doing this as one big import; you are just asking for a failure. I would do a month at a time starting at the oldest data and working through to the most recent. If a given month fails to import for some reason, you should be able to re-import the month without deleting any data (untested and I am relying on memory here). Of course if you have reason to believe that corrupt data was stored then you need delete that before re-importing.
 
If during this import (which will take very long) something goes wrong, the program is aborted and no derived values will be calculated.
When I start the batch all over from the beginning, tjhe raw records which are already in the database will be skipped (and not overwritten).
The derived values also won't be calculated for those existing  records.

If you look at the 4.0 Utilities Guide you will notice that the wee_database utility now has a --calc-missing action. This allows users to run the routine that calculates missing derived obs separately from wee_import. Note that the calc-missing routine will only update fields that are None or otherwise contain no data (hence the 'missing' in calc-missing), if a field contains data calc-missing will not change it. 

A beter approach would be to calculate the derived values per tranche (default: 250 records). 
Then when something goes wrong, we can start with the a new date (short after the latest catched record).

If you have a look at issue #443 you will see that approach was considered but not possible due to some architectural issues. Hence the approach that was adopted that made the calc-missing routine separate to, but callable from wee_import. Undertaking imports in smaller chunks (especially from WU) and being able to separately call the calc-missing routine from wee_database help mitigate against errors during imports.

 Luc

gjr80

unread,
Mar 22, 2020, 10:05:31 PM3/22/20
to weewx-development
Luc,

Some comments below.

Gary

On Monday, 23 March 2020 03:27:50 UTC+10, Lucas Heijst wrote:
Gary,

I completed a run from 2019-02-04 to 2019-03-01 and got 6957 records.

The derived data went OK.

23 records had NULL for cloudbase, humidex and apptemp because outhumidity was NULL (so far so good)

Doesn't surprise me, I never expect 100% data from WU for a number of reasons.


159 records had NULL for pressure and altimeter. Can't think of why. Which values are needed for those calculations?

WeeWX field pressure requires outTemp (current and 12 hours ago), barometer, outHumidiy and station altitude. WeeWX field altimeter requires pressure and station altitude. I am guessing that one or more of outTemp (current or 12 hours ago), barometer or outHumidity was missing in 159 of the imported records (or perhaps outTemp was missing in some existing records in the 12 hours before the start time of your imported data) and that caused pressure to be left uncalculated. Then because there was no pressure altimeter was also left uncalculated. Should be easy enough to check with a few queries on the offending records.
 
Luc

Lucas Heijst

unread,
Mar 22, 2020, 10:18:33 PM3/22/20
to weewx-development
Thanks Gary,

The needed temp 12 hours ago explains why 144 records could not be calculated because I started with an empty database.
I have to backup two databases with the same missed data, so I will use the export of this database to import it in the two databases of my two vantage stations.

I have the wu import set to 14 batches of one month each. Half way finished so far.

Luc
Reply all
Reply to author
Forward
0 new messages