Independent weewx backup script

275 views
Skip to first unread message

Jan Stelling

unread,
Jan 10, 2021, 7:01:48 AM1/10/21
to weewx-user
For some time, I was looking for an easy and independent (from weewx) way to automatically backup my weewx data, as I do not want to lose data if the Micro SD breaks down.
Recently, I found this small repo on github which only contains a backup script file. It mounts a USB drive, stops weewx, creates an archived backup of the most relevant user files and folders on the USB drive, unmounts the drive and restarts weewx.

This was almost perfect for me, but I had to introduce some changes to make it suitable for my environment. I forked it to make it available for others. It now does the following:
  1. Stop weewx
  2. Create an archived backup on a mounted network drive (under /home/pi/Shares/Temp)
  3. Start weewx
I tested it manually and running via crontab on my RasPi 3B.

Maybe this is useful for some of you...

Tom Keffer

unread,
Jan 10, 2021, 8:39:13 AM1/10/21
to weewx-user
Your approach will certainly work, but requires stopping weewxd for what could potentially be a long period of time, so you might miss a weather event.

Another approach is to use the sqlite3 ".backup" command. Replace your tar command with

tar czf $dest/$archive_file $backup_files2 $backup_files3 $backup_files4
sqlite3 $backup_files1 ".backup $dest/$backup_files1.backup"

This avoids stopping weewxd, because the sqlite3 utility will take care of any necessary locking. However, it has the disadvantage that if sqlite3 holds on to the lock for too long, a database write will get delayed and, ultimately, could time out, causing weewxd to restart.

Finally, the most sophisticated approach is to incrementally back up the database. Take a look at this page on backing up a running database. It copies a number of database pages, then releases the lock, sleeps for a bit to allow other processes to gain access to the database, then goes on to the next set of pages. This allows the database to be backed up without stopping weewxd, and without the potential hazard of a database timeout.

Something to think about...

-tk



--
You received this message because you are subscribed to the Google Groups "weewx-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to weewx-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/weewx-user/2671c065-719a-4435-9657-06841af8fed7n%40googlegroups.com.

p q

unread,
Jan 10, 2021, 9:46:55 AM1/10/21
to weewx...@googlegroups.com
I have a script that stops weewx, makes a local copy of the db restarts weewx and then copies the backup to google drive. 

The copy takes less than 2 minutes so I don't lose data. 

David Levine

unread,
Jan 10, 2021, 10:13:08 AM1/10/21
to weewx-user
I understand database consistency in a transactional database and I'm wondering about the risk of copying weewx.sdb without stopping weewx first. Would you possibly lose an in-flight transaction or might the entire sdb be inconsistent and unusable? An in-flight copy would seem to be similar to losing power to the device vs a graceful shutdown.  

Tom Keffer

unread,
Jan 10, 2021, 10:22:43 AM1/10/21
to weewx-user
If the backup takes long enough, it could interfere with writing a record to the database. Eventually, the write will time out, causing weewxd to restart from the top. It won't crash weewxd (that is, cause it to exit), nor corrupt the database, but the record would be lost. 

That's the advantage of the incremental backup approach. The backup process never holds a lock on the database for very long. Just a second or two.

BTW, the backup API is now included in Python starting with V3.7. If you have a modern version of Python, you don't have to write a C program to access the API.

I'm hoping to inspire someone to write a simple script that would run once a day using the backup API.

-tk



David Levine

unread,
Jan 10, 2021, 11:51:32 AM1/10/21
to weewx-user
wee_database --backup
    leveraging the data bindings in weewx.conf sounds like a beneficial core utility service that would then be called from a script/cron. 

Timothy L

unread,
Jan 10, 2021, 12:06:24 PM1/10/21
to weewx...@googlegroups.com
For my own personal understanding as a newcomer I would like to ask how is it possible to lose a data record using a logger such as in the Vantage Pro 2 series that should report once weewx is restarted after the backup? Wouldn't the logger have recorded the weather event and then transferred that event once weewx has restarted? Thank you

peterq...@gmail.com

unread,
Jan 10, 2021, 2:01:30 PM1/10/21
to weewx-user
Trying it. Seems very slow compared to a file copy. Doing it one page at a time is really slow. Like 100x slower than a file copy. I don't have any sense as to the page size of this database. I'm probably going to end up doing 100 pages at a time. 

The example here https://docs.python.org/3/library/sqlite3.html#sqlite3.Connection.backup is trivial to implement. Cut and paste and change the filenames/paths trivial. 

I'm going to run the backup continuously in a loop on my dev system for a while today and if it runs without problems, I'll update my backup script to use it rather than a file copy.

Tom Keffer

unread,
Jan 10, 2021, 2:32:01 PM1/10/21
to weewx-user
On Sun, Jan 10, 2021 at 9:06 AM Timothy L <lecoqacr...@gmail.com> wrote:
For my own personal understanding as a newcomer I would like to ask how is it possible to lose a data record using a logger such as in the Vantage Pro 2 series that should report once weewx is restarted after the backup? Wouldn't the logger have recorded the weather event and then transferred that event once weewx has restarted? 

True enough for a Vantage station, but not all stations have a logger. In fact, most don't.

Greg from Oz

unread,
Jan 10, 2021, 3:43:10 PM1/10/21
to weewx-user
I use mysql and keep a week of files which are backed up to another server. I use the name of the day of the week and that way it overwrites the old file with the same name and you get a rolling 7 files.

#keep a weeks worth of weewx backups
/usr/bin/mysqldump --add-drop-table --user=root --password=password weewx  >  $FILELOCATION/weewx-$(date +%A).sql

So you end up with 7 rolling files:
-rw-r--r--  1 root   root   512633446 Jan  8 23:15 weewx-Friday.sql
-rw-r--r--  1 root   root   511735369 Jan  4 23:15 weewx-Monday.sql
-rw-r--r--  1 root   root   512857289 Jan  9 23:15 weewx-Saturday.sql
-rw-r--r--  1 root   root   513079187 Jan 10 23:15 weewx-Sunday.sql
-rw-r--r--  1 root   root   512408576 Jan  7 23:15 weewx-Thursday.sql
-rw-r--r--  1 root   root   511958413 Jan  5 23:15 weewx-Tuesday.sql
-rw-r--r--  1 root   root   512183056 Jan  6 23:15 weewx-Wednesday.sql

Works well. I have restored lots of times with success.

Jan Stelling

unread,
Jan 10, 2021, 5:32:47 PM1/10/21
to weewx-user
Good point. Doing all this without stopping weewx is much nicer. In fact, after a year of data (in my case), the sqlite3 approach will not take that long. I will try that as well.

I see that data backup seems to be a hot topic...

vince

unread,
Jan 10, 2021, 6:39:39 PM1/10/21
to weewx-user
This comes up very frequently as new weewx users come onboard.   Take a look back at the weewx-users archives for lots of previous (excellent) discussions for how to automate backing up your db and 'verifying' that the backup is restorable.

There's an old sysadmin credo saying that if you haven't tried a restore, you didn't really do a backup.  Believe it.  Been there.

There was a long discussion years ago (HERE) where lots of folks went through the pros+cons and how they approach the problem.  The short answer is to use the 'pragma integrity_check' command in sqlite3 to validate your backups are good.   Other finding was that you probably don't need to shut weewx down to do a backup.

The canonical way at that time seemed to be:
  • copy your database to /var/tmp or something to make a copy of the running db
  • use sqlite3 commands to dump 'that' to a text file
  • gzip the dump file up and save it someplace on another system
  • delete your scratch copy of the db
Lots of people have posted a variety of ways to do this with Dropbox, Amazon S3, and simple scp commands, so dig around in the archives a bit for some options.

FWIW, back then I found experimentally that a simple 'copy the database, then gzip the copy and save it' was good enough.  I went back 100 backups and verified that all the backups were good, so I personally don't bother doing a .dump of the db to save a copy.  I just make a copy of the current db and compress+save the copy. The script I cooked up years ago that has always worked for me is (HERE) on Github.

Graham Eddy

unread,
Jan 11, 2021, 8:03:14 AM1/11/21
to weewx...@googlegroups.com
looking at https://sqlite.org/backup.html (extract below; my emphasis), the Backup API restarts the backup if an update (not a read) occurs during the backup → might silently never complete if backup takes longer than archive interval.
this could be dealt with by aborting the backup if it runs into end of archive interval → tell user to use some other backup method

did a couple of simple timing tests.
RPi 4B (SD card), weewx.sdb 104.5MB (381k records, dunno how many ‘pages'), no progress reporting:
  1 page at a time, default 25 msec delays: backup took 10.5 secs elapsed
  10 pages ditto: 7.9 secs elapsed
  100 pages ditto: 7.0 secs elapsed
  1000 pages ditto: 6.7 secs elapsed
conclusion: only seriously under-powered boxes would be unable to complete within typical 300 sec archive interval.
would be good if someone with such a box gave it a try


File and Database Connection Locking

During the 250 ms sleep in step 3 above, no read-lock is held on the database file and the mutex associated with pDb is not held. This allows other threads to use database connection pDb and other connections to write to the underlying database file.

If another thread or process writes to the source database while this function is sleeping, then SQLite detects this and usually restarts the backup process when sqlite3_backup_step() is next called. There is one exception to this rule: If the source database is not an in-memory database, and the write is performed from within the same process as the backup operation and uses the same database handle (pDb), then the destination database (the one opened using connection pFile) is automatically updated along with the source. The backup process may then be continued after the sqlite3_sleep() call returns as if nothing had happened.

Whether or not the backup process is restarted as a result of writes to the source database mid-backup, the user can be sure that when the backup operation is completed the backup database contains a consistent and up-to-date snapshot of the original. However:

  • Writes to an in-memory source database, or writes to a file-based source database by an external process or thread using a database connection other than pDb are significantly more expensive than writes made to a file-based source database using pDb (as the entire backup operation must be restarted in the former two cases).
  • If the backup process is restarted frequently enough it may never run to completion and the backupDb() function may never return.

vince

unread,
Jan 11, 2021, 12:24:16 PM1/11/21
to weewx-user
On Monday, January 11, 2021 at 5:03:14 AM UTC-8 graha...@gmail.com wrote:
conclusion: only seriously under-powered boxes would be unable to complete within typical 300 sec archive interval.
would be good if someone with such a box gave it a try


If you can point us at the exact script you used, I can run it versus a test db on a litany of seriousy under-powered boxes :-)

I have everything from a Seagate Dockstar (128MB RAM, laptop drive, always on the edge of swapping) to pi using SD (model-B, zero, zeroW, pi3, pi3+, pi4)
 

Graham Eddy

unread,
Jan 11, 2021, 6:46:44 PM1/11/21
to weewx...@googlegroups.com
as a once-off i did nothing fancy, even hard-coding the number of pages and filename of database (trivial to fiddle), using shell time to measure elapsed time

import sqlite3

con = sqlite3.connect('/opt/weewx-4.2.0-test/archive/weewx.sdb')
bck = sqlite3.connect('backup.db')
with bck:
    con.backup(bck, pages=1)
bck.close()
con.close()

Tom Keffer

unread,
Jan 11, 2021, 7:15:06 PM1/11/21
to weewx-user
The program can be simplified even more:

import sqlite3

with sqlite3.connect('/home/weewx/archive/weewx.sdb') as original:
    with sqlite3.connect('/home/weewx/archive/weewx.sdb.backup') as backup:
        original.backup(backup, pages=10)

I'm finding that the time it takes to do the backup depends on whether the backup file already exists. it's much faster if it does not, which makes me think that it's actually doing an incremental backup. I don't see any advantage to that.



--
You received this message because you are subscribed to the Google Groups "weewx-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to weewx-user+...@googlegroups.com.

vince

unread,
Jan 11, 2021, 8:01:34 PM1/11/21
to weewx-user
For for 1,357,184 records in the db records in my 340 MB database's archive table....
Using Tom's variant from a local file to a backup file in the same working directory....

Dockstar         = 46.8 secs via python, 35.1 secs to cp (usb2 laptop drive)
pi3              = 44.3 secs via python, 40.9 secs to cp (SD)
2012 MacBook Air =  2.2 secs via python, 1.5 to cp it    (SSD)
2018 i3 NUC      =  0.2 secs to do a cp                  (SSD)

Looks to me like it's simpler to just copy the file using the 'cp' command


Tom Keffer

unread,
Jan 11, 2021, 8:26:08 PM1/11/21
to weewx-user
I actually think 'cp' is pretty safe. When WeeWX writes a record to the database, it has to update the main archive, as well as many daily summaries. To make sure this is all done atomicallly, it does it as one big transaction.

So, unless your 'cp' has the unfortunate timing of happening at the exact moment when the transaction is being committed, you should be fine. I don't know how big a time window that is. Maybe a couple hundred milliseconds?

--
You received this message because you are subscribed to the Google Groups "weewx-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to weewx-user+...@googlegroups.com.

Graham Eddy

unread,
Jan 11, 2021, 8:55:26 PM1/11/21
to weewx...@googlegroups.com
i use ‘cp’. heck, i use ‘scp’…
i just make sure i work in the window from 2 mins after archive interval (long after my archive updates are completed) to a few secs before next archive interval.
could make BACKUP a new built-in report like FTP and RSYNC - unless want to make backup independent of weewx (e.g. scenario where restore is required because weewx is screwed up and stuffs up reports)

vince

unread,
Jan 11, 2021, 9:46:24 PM1/11/21
to weewx-user
Way back in 2014 (wow) when we had that long thread about this, many of us checked our hundreds of backups with 'pragma integrity_check' and never found any that were not restorable, so I'm pretty ok with taking that risk still.
Reply all
Reply to author
Forward
0 new messages