Import and missing data calculation

154 views
Skip to first unread message

Andrea Di Saverio

unread,
Aug 6, 2022, 10:23:31 AMAug 6
to weewx-user
I imported ~1 year of data from csv, using this `import.conf` configuration:
    source = CSV
    [CSV]
        file = my-data.csv
        interval = 1
        qc = True
        calc_missing = True
        ignore_invalid_data = True
        tranche = 250
        UV_sensor = False
        solar_sensor = False
        raw_datetime_format = %Y-%m-%dT%H:%M
        rain = discrete
        wind_direction = 0,360
        [[FieldMap]]
            dateTime    = dateTime, unix_epoch
            usUnits     =
            interval    =
            barometer   = barometer, hPa
            pressure    =
            altimeter   =
            inTemp      = insideTemp, degree_C
            outTemp     = outsideTemp, degree_C
            inHumidity  = insideHum, percent
            outHumidity = outsideHum, percent
            windSpeed   = windSpeed, km_per_hour
            windDir     = windDirection, degree_compass
            windGust    = hiWindSpeed, km_per_hour
            windGustDir = hiWindDirection, degree_compass
            rainRate    = hiRainRate, mm_per_hour
            rain        = rain, mm
            dewpoint    =
            windchill   =
            heatindex   =
            ET          =
            radiation   =
            UV          =


What I expected was that the following fields would have been calculated, starting from the other available fields:
    altimeter
    dewpoint
    heatindex
    pressure
    windchill

But in the `archive` table, those columns are empty.

Did I do something wrong, or is the behavior correct? So I cannot have such fields calculated...?

gjr80

unread,
Aug 6, 2022, 8:09:20 PMAug 6
to weewx-user
Your import config file is fine and provided a couple of conditions were met the missing fields should have been calculated. The two conditions are  (1) any pre-requisites for calculating the fields concerned must exist, for example for heatindex, windchill and dewpoint you must have a non-None outTemp value. If a pre-requisite is missing the calculated value will be None. (2) Even if the calculated values were calculated there must be no record already existing in the archive with the same timestamp as the record being imported. If such a record already exists in the archive the imported record is discarded in it's entirety (not just the calculated fields). So if these conditions were met your imported data including calculated fields should have been imported and saved to archive, if they were not imported then something went wrong.

A few questions to try to narrow down the cause. What does your data look like? Can you post some of the records here or if you don't want to can you reply privately to this post with your raw import data. Did records already exist in your archive for the period covered by the data you were importing? How did you check the imported data? Did you query the database archive table directly or rely on WeeWX generated output? If the former what command did you use and what results were presented? Did the other non-calculated fields import correctly? What was presented on-screen when you ran wee_import? Were there any errors or warnings? wee-import will have written some entries to the log at the time of the import, they should be clearly evident with a 'wee_import' label, what do these entries say?

Gary

Andrea Di Saverio

unread,
Aug 7, 2022, 12:50:21 PMAug 7
to weewx-user
Thanks for your answer.
The two conditions you listed are satisfied: the necessary other values are present in my csv, and no records were already existing for such period in the database.

Answering your questions:

- Those are my data (a sample):
    CSV data (more fields than used in the import process are present)
    dateTime,outsideTemp,hiOutsideTemp,lowOutsideTemp,outsideHum,windSpeed,windDirection,hiWindSpeed,hiWindDirection,barometer,rain,hiRainRate,insideTemp,insideHum,numWindSamples,archiveInterval
    2021-08-01T00:01,27.61,27.61,27.61,60.0,1.61,SW,3.22,SW,1006.30,0.00,0.00,30.33,49.0,22,1
    2021-08-01T00:02,27.56,27.61,27.56,60.0,1.61,W,1.61,SW,1006.16,0.00,0.00,30.33,49.0,24,1
    2021-08-01T00:03,27.50,27.56,27.50,60.0,1.61,W,1.61,W,1006.27,0.00,0.00,30.33,49.0,23,1
    2021-08-01T00:04,27.50,27.50,27.50,60.0,0.00,SW,3.22,SW,1006.33,0.00,0.00,30.33,49.0,22,1
    2021-08-01T00:05,27.50,27.50,27.50,60.0,3.22,WSW,4.83,WSW,1006.37,0.00,0.00,30.33,49.0,24,1


    Resulting db records (extracted querying the `archive`. I here removed all null values for reading convenience)
    dateTime,usUnits,interval,appTemp,barometer,cloudbase,humidex,inDewpoint,inHumidity,inTemp,outHumidity,outTemp,rain,rainRate,windDir,windGust,windGustDir,windrun,windSpeed
    1627768860,16,1,30.5895137231874,1006.3,1167.77458102164,34.4779454220626,18.4017421866142,49,30.33,60,27.61,0,0,225,3.22,225,0.0268333333333333,1.61
    1627768920,16,1,30.5182757269146,1006.16,1167.3911940789,34.3910793180734,18.4017421866142,49,30.33,60,27.56,0,0,270,1.61,225,0.0268333333333333,1.61
    1627768980,16,1,30.4328612401526,1006.27,1166.93122058363,34.286968264113,18.4017421866142,49,30.33,60,27.5,0,0,270,1.61,270,0.0268333333333333,1.61
    1627769040,16,1,30.7459167959586,1006.33,1166.93122058363,34.286968264113,18.4017421866142,49,30.33,60,27.5,0,0,,3.22,225,0,0
    1627769100,16,1,30.1198056843466,1006.37,1166.93122058363,34.286968264113,18.4017421866142,49,30.33,60,27.5,0,0,247.5,4.83,247.5,0.0536666666666667,3.22


- No data were already present for that period
- I checked resulting records by querying the database directly
- The query I ran (for the sample): SELECT * FROM archive WHERE dateTime >= 1627768860 and dateTime < 1627769160 ORDER BY dateTime
- I imported data few weeks ago, so I cannot exactly say what the output of the import command was, but - as I can remember - nothing remarkable was printed out. I noticed just now that some fields are missing. Can I still check those logs? how to?

Thanks,
Andrea

gjr80

unread,
Aug 9, 2022, 7:58:44 AMAug 9
to weewx-user
Sorry for the delay, I wanted to run your data and import config through wee_import before replying.

I ran your data and import config through wee_import and the data was imported successfully and dewpoint, windchill and heatindex calculated correctly. altimeter and pressure were not calculated; the reason is a little complicated. Provided one of the pressures (altimeter, barometer or pressure) is present WeeWX can calculate the other two; however, when only altimeter or barometer is present pressure must be calculated first and pressure requires temperature from 12 hours earlier. In my case I did not have temperature from 12 hours earlier so pressure was not calculated. When calculating altimeter (or barometer) pressure must exist (hence why pressure is calculated first) so in my case altimeter could not be calculated either. In a typical import where temperature 12 hours ago is missing from the first 12 hours of import data you often see 12 hours of two missing pressures before all three are calculated once the temperature 12 hours ago kicks in. I digress.

I guess the next thing is to find your log entries. You are looking for whatever log file WeeWX logs to. That may be /var/log/syslog or perhaps /var/log/messages unless you have explicitly changed it. If using a RPi it will likely be /var/log/syslog. The wiki page where are my logs? may help. If WeeWX is running the log should be very easily spotted as the most recent log file (whichever name it is) will contain clearly labelled WeeWX log entries. The wee_import entries will be clearly labelled as such, you should be looking for something like this:
 
Aug 9 21:50:03 bullseye4 wee_import[1156] INFO __main__: Starting wee_import...
Aug 9 21:50:03 bullseye4 wee_import[1156] INFO weeimport.csvimport: A CSV import from source file '/var/tmp/test_data.csv' has been requested.
Aug 9 21:50:03 bullseye4 wee_import[1156] INFO weeimport.csvimport: Using database binding 'wx_binding', which is bound to database 'weewx.sdb'
Aug 9 21:50:03 bullseye4 wee_import[1156] INFO weeimport.csvimport: Destination table 'archive' unit system is '0x01' (US).
Aug 9 21:50:03 bullseye4 wee_import[1156] INFO weeimport.csvimport: Missing derived observations will be calculated.
Aug 9 21:50:03 bullseye4 wee_import[1156] INFO weeimport.csvimport: All WeeWX UV fields will be set to None.
Aug 9 21:50:03 bullseye4 wee_import[1156] INFO weeimport.csvimport: All WeeWX radiation fields will be set to None.
Aug 9 21:50:03 bullseye4 wee_import[1156] INFO weeimport.weeimport: Obtaining raw import data for period 1 ...
Aug 9 21:50:03 bullseye4 wee_import[1156] INFO weeimport.weeimport: The following imported field-to-WeeWX field map will be used:
Aug 9 21:50:03 bullseye4 wee_import[1156] INFO weeimport.weeimport: source field 'dateTime' in units 'unix_epoch' --> WeeWX field 'dateTime'
Aug 9 21:50:03 bullseye4 wee_import[1156] INFO weeimport.weeimport: source field 'barometer' in units 'hPa' --> WeeWX field 'barometer'
Aug 9 21:50:03 bullseye4 wee_import[1156] INFO weeimport.weeimport: source field 'insideTemp' in units 'degree_C' --> WeeWX field 'inTemp'
Aug 9 21:50:03 bullseye4 wee_import[1156] INFO weeimport.weeimport: source field 'outsideTemp' in units 'degree_C' --> WeeWX field 'outTemp'
Aug 9 21:50:03 bullseye4 wee_import[1156] INFO weeimport.weeimport: source field 'insideHum' in units 'percent' --> WeeWX field 'inHumidity'
Aug 9 21:50:03 bullseye4 wee_import[1156] INFO weeimport.weeimport: source field 'outsideHum' in units 'percent' --> WeeWX field 'outHumidity'
Aug 9 21:50:03 bullseye4 wee_import[1156] INFO weeimport.weeimport: source field 'windSpeed' in units 'km_per_hour' --> WeeWX field 'windSpeed'
Aug 9 21:50:03 bullseye4 wee_import[1156] INFO weeimport.weeimport: source field 'windDirection' in units 'degree_compass' --> WeeWX field 'windDir'
Aug 9 21:50:03 bullseye4 wee_import[1156] INFO weeimport.weeimport: source field 'hiWindSpeed' in units 'km_per_hour' --> WeeWX field 'windGust'
Aug 9 21:50:03 bullseye4 wee_import[1156] INFO weeimport.weeimport: source field 'hiWindDirection' in units 'degree_compass' --> WeeWX field 'windGustDir'
Aug 9 21:50:03 bullseye4 wee_import[1156] INFO weeimport.weeimport: source field 'hiRainRate' in units 'mm_per_hour' --> WeeWX field 'rainRate'
Aug 9 21:50:03 bullseye4 wee_import[1156] INFO weeimport.weeimport: source field 'rain' in units 'mm' --> WeeWX field 'rain'
Aug 9 21:50:03 bullseye4 wee_import[1156] INFO weeimport.weeimport: Raw import data read successfully for period 1.
Aug 9 21:50:03 bullseye4 wee_import[1156] INFO weeimport.weeimport: Mapping raw import data for period 1 ...
Aug 9 21:50:03 bullseye4 wee_import[1156] INFO weeimport.weeimport: Mapped 5 records.
Aug 9 21:50:03 bullseye4 wee_import[1156] INFO weeimport.weeimport: Raw import data mapped successfully for period 1.
Aug 9 21:50:03 bullseye4 wee_import[1156] INFO weeimport.weeimport: Saving mapped data to archive for period 1 ...
Aug 9 21:50:05 bullseye4 wee_import[1156] ERROR weewx.manager: Unable to add record 2021-08-01 00:01:00 AEST (1627740060) to database 'weewx.sdb': UNIQUE constraint failed: archive.dateTime
Aug 9 21:50:05 bullseye4 wee_import[1156] ERROR weewx.manager: Unable to add record 2021-08-01 00:02:00 AEST (1627740120) to database 'weewx.sdb': UNIQUE constraint failed: archive.dateTime
Aug 9 21:50:05 bullseye4 wee_import[1156] ERROR weewx.manager: Unable to add record 2021-08-01 00:03:00 AEST (1627740180) to database 'weewx.sdb': UNIQUE constraint failed: archive.dateTime
Aug 9 21:50:05 bullseye4 wee_import[1156] ERROR weewx.manager: Unable to add record 2021-08-01 00:04:00 AEST (1627740240) to database 'weewx.sdb': UNIQUE constraint failed: archive.dateTime
Aug 9 21:50:05 bullseye4 wee_import[1156] ERROR weewx.manager: Unable to add record 2021-08-01 00:05:00 AEST (1627740300) to database 'weewx.sdb': UNIQUE constraint failed: archive.dateTime
Aug 9 21:50:05 bullseye4 wee_import[1156] INFO weeimport.weeimport: Mapped data saved to archive successfully for period 1.
Aug 9 21:50:05 bullseye4 wee_import[1156] INFO weeimport.weeimport: Calculating missing derived observations ...
Aug 9 21:50:05 bullseye4 wee_import[1156] INFO weewx.engine: StdConvert target unit is 0x1
Aug 9 21:50:05 bullseye4 wee_import[1156] INFO weewx.wxservices: StdWXCalculate will use data binding wx_binding
Aug 9 21:50:05 bullseye4 wee_import[1156] INFO weewx.engine: Archive will use data binding wx_binding
Aug 9 21:50:05 bullseye4 wee_import[1156] INFO weewx.engine: Record generation will be attempted in 'hardware'
Aug 9 21:50:05 bullseye4 wee_import[1156] INFO weewx.engine: Using archive interval of 60 seconds (specified by hardware)
Aug 9 21:50:05 bullseye4 wee_import[1156] INFO weewx.restx: StationRegistry: Registration not requested.
Aug 9 21:50:05 bullseye4 wee_import[1156] INFO weewx.restx: Wunderground: Posting not enabled.
Aug 9 21:50:05 bullseye4 wee_import[1156] INFO weewx.restx: PWSweather: Posting not enabled.
Aug 9 21:50:05 bullseye4 wee_import[1156] INFO weewx.restx: CWOP: Posting not enabled.
Aug 9 21:50:05 bullseye4 wee_import[1156] INFO weewx.restx: WOW: Posting not enabled.
Aug 9 21:50:05 bullseye4 wee_import[1156] INFO weewx.restx: AWEKAS: Posting not enabled.
Aug 9 21:50:05 bullseye4 wee_import[1156] INFO weewx.engine: 'pyephem' detected, extended almanac data is available
Aug 9 21:50:05 bullseye4 wee_import[1156] INFO weewx.wxservices: StdWXCalculate will use data binding wx_binding
Aug 9 21:50:06 bullseye4 wee_import[1156] INFO weewx.manager: Starting backfill of daily summaries
Aug 9 21:50:06 bullseye4 wee_import[1156] INFO weewx.manager: Processed 5 records to backfill 1 day summaries in 0.01 seconds
Aug 9 21:50:06 bullseye4 wee_import[1156] INFO weecfg.database: Processed 1 day consisting of 5 records. 1 day consisting of 5 records were updated in 0.04 seconds.
Aug 9 21:50:06 bullseye4 wee_import[1156] INFO weeimport.weeimport: Finished calculating missing derived observations
Aug 9 21:50:06 bullseye4 wee_import[1156] INFO weeimport.weeimport: Finished import
Aug 9 21:50:06 bullseye4 wee_import[1156] INFO weeimport.weeimport: 5 records were processed and 5 unique records imported in 2.45 seconds.

(Note in this case the error lines are due to there already being records with the imported data timestamps in the archive) Please post all of the import logs, don't leave anything out.

Gary

Andrea Di Saverio

unread,
Aug 15, 2022, 1:24:14 PMAug 15
to weewx-user
Hello gjr80,
sorry for my late reply, it has been a busy week, and thanks for got tested my data.

I was unable to dig out something useful from logs: we can consider them as gone.
Anyway: checking my `/etc/weewx/weewx.conf` I remembered I have this configuration:

    [StdWXCalculate]
        [[Calculations]]        
            pressure = hardware
            altimeter = hardware
            appTemp = prefer_hardware
            barometer = prefer_hardware
            cloudbase = prefer_hardware
            dewpoint = hardware
            ET = prefer_hardware
            heatindex = hardware
            humidex = prefer_hardware
            inDewpoint = prefer_hardware
            maxSolarRad = prefer_hardware
            rainRate = prefer_hardware
            windchill = hardware
            windrun = prefer_hardware


so I'd bet the reason is that one.
The reason I have such configuration is because I have a Vantage VUE, and I've chosen `loop_request=3` (i.e. alternate between LOOP1 and LOOP2) and in that case the configuration above is suggested in the documentation: https://www.weewx.com/docs/usersguide.htm#[Vantage]

I didn't think about it when I imported my data.
So, is there a way, now, to fix and recalculate those missing values, or I have to
    1. stop weewx
    2. remove previously imported records from database
    3. change the [StdWXCalculate] -> [[Calculations]] setting everything to `prefer_hardware`
    4. re-run the whole import process
    5. restore everything as before
?

Thanks!

gjr80

unread,
Aug 15, 2022, 4:46:53 PMAug 15
to weewx-user
Yes, that explains why those five derived obs were not calculated; wee_import uses [StdWXCalculate] [[Calculations]] to calculate derived obs and in effect the fields in the data being imported are 'hardware' sourced obs. So setting an obs to 'hardware' in [StWXCalculate] [[Calculations]] would prevent those derived obs from being calculated when the import is performed.

There are a few possible solutions. One solution is to manually re-calculate the missing obs - though this is not really a practical solution. The second is as you propose; delete the imported records and re-import with updated [StdWXCalculate] settings. The third solution is to use the wee_database utility with the --calc-missing option and updated [StdWXCalculate] settings. wee_import  and wee_database --calc-missing use the same internal routines to calculate derived obs so the results should be identical. wee_database --calc-missing will recalculate derived obs where the derived obs is either missing or None. In essence, in your case (where the derived obs were not calculated) the wee_database approach is identical to re-importing the data but without actually needing to delete and re-import the data. You can use --date or --from and --to to limit the time span of records that wee_database --calc-missing will operate on. Note that if you do not use --date or --from and --to wee_database --calc-missing will recalculate missing derived obs for the entire database. 

I would recommend backing up your database before performing either operation. If you use the wee_database approach I would also recommend stopping WeeWX before using wee_database.

Gary

Andrea Di Saverio

unread,
Aug 15, 2022, 4:58:27 PMAug 15
to weewx-user
Ok, then I will go for the third solution, and use the `wee_database --cal-missing` utility.
Thank you so much for your support, I'll let you know about the result.

Andrea Di Saverio

unread,
Aug 16, 2022, 7:09:53 AMAug 16
to weewx-user
I tried, and I ended up with the known-not-known error of `database is locked` https://www.weewx.com/docs/usersguide.htm#'Database_is_locked'_error

I am using a top quality SD card, so I would exclude the problem is in there.
I also tried to increase the database timeout: it just hangs for more, before crashing.

The pattern is always the same: it starts smoothly, but after processing exactly 6000 records (the value is printed in the output) it stops, and after a while (until timeout is reached) it crashes.

According to the documentation there is no escape route to work around this error... or yes?


This is the exact output:

    pi@raspberrypi:~ $ wee_database --calc-missing
    Using configuration file /etc/weewx/weewx.conf
    Using database binding 'wx_binding', which is bound to database 'archive_sqlite'
    Missing derived observations will be calculated for all records.
    Proceed (y/n)? y
    Calculating missing derived observations...
    Traceback (most recent call last):rd: 2020-09-28 02:07:00 CEST (1601251620)
      File "/usr/share/weewx/weedb/sqlite.py", line 30, in guarded_fn
        return fn(*args, **kwargs)
      File "/usr/share/weewx/weedb/sqlite.py", line 219, in execute
        return sqlite3.Cursor.execute(self, *args, **kwargs)
    sqlite3.OperationalError: database is locked
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/usr/share/weewx/wee_database", line 1170, in <module>
        main()
      File "/usr/share/weewx/wee_database", line 237, in main
        calc_missing(config_dict, db_binding, options)
      File "/usr/share/weewx/wee_database", line 843, in calc_missing
        calc_missing_obj.run()
      File "/usr/share/weewx/weecfg/database.py", line 451, in run
        wxcalculate.do_calculations(record)
      File "/usr/share/weewx/weewx/wxservices.py", line 137, in do_calculations
        new_value = weewx.xtypes.get_scalar(obs_type, data_dict, self.db_manager)
      File "/usr/share/weewx/weewx/xtypes.py", line 77, in get_scalar
        return xtype.get_scalar(obs_type, record, db_manager, **option_dict)
      File "/usr/share/weewx/weewx/wxxtypes.py", line 372, in get_scalar
        return self.pressure(record, dbmanager)
      File "/usr/share/weewx/weewx/wxxtypes.py", line 388, in pressure
        temp_12h_vt = self._get_temperature_12h(record['dateTime'], dbmanager)
      File "/usr/share/weewx/weewx/wxxtypes.py", line 356, in _get_temperature_12h
        record = dbmanager.getRecord(ts_12h, max_delta=self.max_delta_12h)
      File "/usr/share/weewx/weewx/manager.py", line 438, in getRecord
        (time_start_ts, time_stop_ts, timestamp))
      File "/usr/share/weewx/weedb/sqlite.py", line 44, in guarded_fn
        raise weedb.OperationalError(e)
    weedb.OperationalError: database is locked



Thanks

gjr80

unread,
Aug 16, 2022, 8:12:14 PMAug 16
to weewx-user
Whilst the context of your 'database is locked' error is a little different to the example cited in the Users Guide it is likely the same mechanism causing the issue. You've tried two of the suggested mitigation strategies (increased time out and quality SD card) but you are yet to try the third - reducing the number of queries. The command you used wee_database --calc-missing will calculate missing derived obs for every record in the database, this could be many thousands of records. Since the intent of your use of --calc-missing was to correct some improperly imported data, try running wee_database --calc-missing again but with the --from and --to options to limit the number of records processed. You indicated you imported a year of data, I don't know the archive interval used but if the interval was five minutes 6000 records is around 90 days. So try just processing 60 days (2 months at a time) and run the wee_database --calc-missing a number of times with different --from and --to values until you cover the entire period of the imported data.

The other option is to delete the imported data from your database and use wee_import to re-import the data but with different [StdWXCalculate] [[Calculations]] settings so that the missing derived obs are calculated this time around.

Gary

Tom Keffer

unread,
Aug 16, 2022, 8:27:55 PMAug 16
to weewx-user
Just a long shot, but are you running weewxd while doing the import?

--
You received this message because you are subscribed to the Google Groups "weewx-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to weewx-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/weewx-user/a4dc171d-e983-4a16-a8f3-556912719390n%40googlegroups.com.

gjr80

unread,
Aug 16, 2022, 8:34:36 PMAug 16
to weewx-user
Ah yes, meant to ask that. I think you might mean when running wee_database, the import was OK, it is wee_database --calc-missing that is misbehaving.

Gary

Andrea Di Saverio

unread,
Aug 18, 2022, 10:44:39 AMAug 18
to weewx-user
At the end I worked around the problem by  running the re-calculation in batches.
I prepared a bash script, then ran it with

    > yes | ./recalcMissing.sh

the purpose of `yes` is just to provide `y` when prompted for confirmation.

Replying to the other question: no, I was not running the weewxd service while running the import, nor the recalc.


Thanks everybody.
Reply all
Reply to author
Forward
0 new messages