weectl import skip existing data?

204 views
Skip to first unread message

Ryan Stasel

unread,
Mar 16, 2025, 7:56:17 PM3/16/25
to weewx-user
I lost a few days of data due to a network issue, but I have the data in another system. Am trying to import from csv, but I think my csv overlaps a bit with what's already in the weewx database. 

Doing a dryrun, it says it will import every line in the csv... 

Is there a way to get weectl import to NOT import duplicate data, or is there an easy way to see what data is in those dates so I can modify csv to exclude those entries?

Thanks! 

gjr80

unread,
Mar 16, 2025, 8:59:03 PM3/16/25
to weewx-user
The default weectl import action is to not import records where there is already an archive record in the database with the same timestamp. weectl import will appear to import such records but if you look at the WeeWX log you will see entries similar to:

2023-11-04 15:33:01 weectl-import[3795]: ERROR weewx.manager: Unable to add record 2018-09-04 04:20:00 AEST (1535998800) to database 'weewx.sdb': UNIQUE constraint failed: archive.dateTime

indicating a record with the same timestamp (in this case 2018-09-04 04:20:00 AEST (1535998800)) already exists in the database and the imported record was discarded. Long story, but it was too inefficient to check every record before attempting to add it to the archive, hence the somewhat cumbersome import and notification. Note that if you use the (presently) undocumented --update command line option with weectl import pre-existing records will be overwritten.

Bottom line - just run weectl import as per the docs, and whatever you do don't include --update on your weectl import command line. And of course always make a backup of your database before importing.

Gary

Ryan Stasel

unread,
Mar 17, 2025, 12:14:24 PM3/17/25
to weewx-user
Thanks gjr80. I guess the question is, how do I tell easily what data already exists (the time stamps will have to match, I assume). So "2025-02-28 08:15:00" existing (for example) I assume won't match against an import of "2025-02-28 08:16:00" (let alone if there's seconds other than 00). 

Is there an "easy" way to tell? or am I stuck going into sqlite3 and doing some selects (not hard, just was kind of hoping the import process WOULD report on records already existing). =)

vince

unread,
Mar 17, 2025, 12:25:47 PM3/17/25
to weewx-user
Import is so fast you’re going to spend more time worrying about what ifs than just importing a week of data. If it’s really the same actual station then a db dump on system A and a db restore of the dumped data into systemB get you there very quickly.

gjr80

unread,
Mar 17, 2025, 6:07:39 PM3/17/25
to weewx-user
If you want to tell what data already exists you will need to get your SQL hands dirty (you could use Cheetah produce a WeeWX report that lists all the timestamps in some given period, but that's an awful lot of messing around for little gain). WeeWX archive record timestamps are saved as an integers, with the DBMS enforcing the unique constraint. So a source record with one second difference to an existing record would be imported.

The import process does report on what records were imported and what records were not via the log; imported records will be logged as being added and records that were rejected due to duplicate timestamps will be reported with a unique key error. Arguably a little crude, but to do any more detailed user friendly analysis and reporting would have to expensive time wise. Of course the reporting is after the fact not before.

Gary

wfs...@gmail.com

unread,
Mar 18, 2025, 8:11:07 AM3/18/25
to weewx-user
Attached is a simple sqlite query for finding extra database rows.  In it, both databases are weewx databases, so if you're trying to find differences between weewx and some other database, you'll have to modify the query, but the idea remains the same.

Note that the order of the databases is important.  It reports rows that are IN db1 but NOT IN db2.  If you swapped the order of the databases, the counts would still be correct but it would print no rows because every record in the smaller database is in the larger database.

Walt
Log.txt
Query Extra Records 1.txt

Ryan Stasel

unread,
Mar 18, 2025, 11:05:50 PM3/18/25
to weewx-user
I'm drawing a blank here... I'm fine altering my import data to not include overlapping data... but how the heck do I figure out what's missing? 

While I can convert unixepoch to a readable date, is there an easy way to just do something like

select * from archive where datetime -like "2025-02-28";

Then I can see what's missing and alter the import file.

Thanks! 

vince

unread,
Mar 19, 2025, 12:16:49 AM3/19/25
to weewx-user

… where datetime(dateTime,'unixepoch','localtime') … gets you a more human friendly datetime but you’ll have to work out the syntax for comparison to a date string that does what you want.

Still not understanding why you’re working so hard on this since you know which days your network was down. Heck you can pretty close visually just by looking at the graphs. But see if the syntax above helps any.

Away from computer so I can’t give you a complete example…
Message has been deleted

vince

unread,
Mar 19, 2025, 3:33:58 PM3/19/25
to weewx-user
oops - typo'd the field names in multiple places previously.  Sorry.  I was working off an alternate db with different field names than weewx.  This one matches the weewx schema.

select dateTime,datetime(dateTime,'unixepoch','localtime') from archive where datetime(dateTime,'unixepoch','localtime') like "2024-12-28 09%" limit 2;
1735405200|2024-12-28 09:00:00
1735405500|2024-12-28 09:05:00


On Wednesday, March 19, 2025 at 12:23:53 PM UTC-7 vince wrote:
Ryan - here's an example.  It looks for the first two records from 2024-12-18 starting with 9am localtime.

select datetime,datetime(datetime,'unixepoch','localtime') from archive where datetime(datetime,'unixepoch','localtime') like "2024-12-28 09%" limit 2;
1735405200|2024-12-28 09:00:00
1735405500|2024-12-28 09:05:00

wfs...@gmail.com

unread,
Mar 20, 2025, 2:48:20 PM3/20/25
to weewx-user
Attached is an updated query.  See the comments in it.  It can find missing records by comparing two databases.  It reports the extra records in the more complete database.  That's the same as saying it reports the missing records in the less complete database.

The situation with timestamps not matching up exactly between the two databases raises an issue.  If you create a csv of the missing records to import, do you need to correct the timestamps to match the interval Weewx is expecting?  For example, if you determine that the 09:00:00 entry is missing in weewx, but the 9 o'clock entry in the other database is actually timestamped 09:00:10, do you need to fix that before importing it?  Or does Weewx not really care about that?

Walt
Query Extra Records 1A.txt

vince

unread,
Mar 20, 2025, 3:44:15 PM3/20/25
to weewx-user

FWIW, I think you guys are massively overthinking this….
Reply all
Reply to author
Forward
0 new messages