Migrating/Importing WD data in MySql DB to WeeWx MySql DB

266 views
Skip to first unread message

Andrew M

unread,
Aug 19, 2020, 6:48:46 PM8/19/20
to weewx-user
I started to use the wee_import process to process all my WD log files to WeeWx MySqlDB, and it was taking a long time. It seems like If I have many years of data it will take that long to import them into WeeWx MySqlDB.

I then thought, oh wait, i already have the WD data in a MySql DB, so why am i doing this process.

Now I have to figure out how I can gracefully import all the WD data I have in a MySQL DB to the one I set up for WeeWx. Both are on the same hosted server. Different DB names.

There is one table for WD and multiple tables for WeeWx so  have no idea on where to begin with this. I have ~60,000 rows of data in the WD MySQL DB table.

Does anyone have a graceful way of migrating the WD MySQL DB into a WeeWX DB?

Am I overlooking something in the documentation or in group?

Thank you.


gjr80

unread,
Aug 19, 2020, 8:37:29 PM8/19/20
to weewx-user
Sorry can't help you with MySQL to MySQL migration.

Regards wee_import though, yes it can be slow. When I wrote the WD import module the WD user who first used it in anger had something like (from memory) 10 years of data to import. The import had to be done in batches (again from memory) of 2-3 years each (wee_import uses transactions on the database but does keep track of duplicate timestamps and a few other things so memory usage does grow as the span of the import grows). I found one email from the user with the results of the first batch import and 1.4 millions records were imported on a Raspberry Pi in 58 minutes. Al things considered I find that reasonable.

There are a couple of things you can do to speed up wee_import. You can tweak a the tranche setting in the import config file, this alters the size of the transactions (in records) that wee_import uses. The default is 250, you could raise this which will result in fewer db transactions but it will likely increase memory usage so you may need to do the import in smaller batches. One other approach if using a slow(ish) RPi as your WeeWX machine is to do just the import on a faster machine and then copy the imported data to the WeeWX RPi. Granted this is simpler when using SQLite but depending on your setup could be adapted for MySQL.

You say you have 60 000 odd MySQL records, that does not seem like much, how does that correlate with the number of entries in the WD log files?

Gary

Andrew M

unread,
Aug 21, 2020, 7:31:36 PM8/21/20
to weewx-user
Thank you for the response.

I actually changed up how I am going about this. My WD ran on a Windows10 box. I had a RaspPi box sitting in a box so i decided to use that for WeeWx.
I was formatting the hard drive on the Win10 machine when i had the thought that I should just put Debian on that and use that for WeeWx. 
Which I did. This machine is a little faster that the RaspPi , so one I have other things straight I will use that for the conversion. Migrating the MySql from WD to the tables that WeeWX has is probably doable, but really don't want to sit down and think about it that much. I will just let the current weather collect on the RaspPi box and then once the import is done on the now Debian box I can combine the two DB much easier.

I have 60k rows on the MySQL DB with my current webhost, but more than that saved external to that on a hard drive from a previous webhost that I never moved. 

Andrew M

unread,
Sep 1, 2020, 5:08:20 PM9/1/20
to weewx-user
After many, many hours the wee-import processed all but a handful of files. Have several that choked on the import.
Ended up with 3,873,678 records.

Next I need to validate that the data imported to look for any potential bad data.

gjr80

unread,
Sep 1, 2020, 5:20:30 PM9/1/20
to weewx-user
If you to provide some details on problems the handful of files experienced I am happy to look at wee_import to see if any changes can be made to improve its handling of such files. The WD import module of wee_import was developed based on a handful of WD log files found on the internet, so it is quite possible there are some corner cases that may cause wee_import to reject a file.

Gary

Andrew M

unread,
Sep 1, 2020, 6:15:51 PM9/1/20
to weewx-user
Please don't take what i have written as a complaint. 
I appreciate all the hard work it took in the creation of WeeWx and all the contributors to the forum.

I have not looked at the content of those files it choked on to see if it has good data in it. Let me first do that. I will reply back once I have done that.

Thank you.

andrew

gjr80

unread,
Sep 1, 2020, 7:12:29 PM9/1/20
to weewx-user
Didn’t take your post as a complaint, sorry if I came across in a manner that gave that impression. Just wanted to point out that development of the WD module was based on a very small sample of data rather than from some written specification. If the problem was a malformed data file that caused wee_import to abort we may be able to harden wee_import skip the malformed lines. If the problem is a new format/structure in the data file that wee_import rejected then we may need to rework the WD module to handle this new format/structure.

If you want to send any relevant log entries/errors and a copy of a misbehaving data file by direct email thats fine by me.

Gary

Andrew M

unread,
Sep 9, 2020, 11:02:01 PM9/9/20
to weewx-user
Gary,

I do appreciate the assistance. Hopefully this will help others in either with what I am doing or identifying a problem with the import process.

I have attached a copy of one Weather Display log file I am having issues with (82013lg.txt.zip) as well as one I did not have any issues with (52013lg.txt.zip).
Also attached is output from wee_debug

I am not seeing any differences in the data between these files that is causing 82013lg.txt to choke.

~20 files out of ~400 files had an issue.

Please let me know if any additional information is needed.



From running wee_import on 82013lg.txt
pi@weather:/var/tmp $ wee_import --import-config=/var/tmp/wd.conf --dry-run --verbose
Using WeeWX configuration file /etc/weewx/weewx.conf
Starting wee_import...
Weather Display monthly log files in the '/var/tmp/WD' directory will be imported
The following options will be used:
     config=/etc/weewx/weewx.conf, import-config=/var/tmp/wd.conf
     from=None, to=None
     dry-run=True, calc_missing=False, ignore_invalid_data=True
     monthly logs are in US units
     tranche=300, interval=60
     UV=True, radiation=True ignore extreme temperature and humidity=True
Using database binding 'wx_binding', which is bound to database 'roundhillvaw_archive'
Destination table 'archive' unit system is '0x01' (US).
This is a dry run, imported data will not be saved to archive.
Starting dry run import ...
Records covering multiple periods have been identified for import.
Obtaining raw import data for period 1 ...
Traceback (most recent call last):
  File "/usr/share/weewx/wee_import", line 900, in <module>
    main()
  File "/usr/share/weewx/wee_import", line 830, in main
    source_obj.run()
  File "/usr/share/weewx/weeimport/weeimport.py", line 368, in run
    _raw_data = self.getRawData(period)
  File "/usr/share/weewx/weeimport/wdimport.py", line 568, in getRawData
    for rec in _reader:
  File "/usr/lib/python3.7/csv.py", line 112, in __next__
    row = next(self.reader)
_csv.Error: line contains NULL byte



From log file
pi@weather:~ $ sudo tail -f /var/log/weewx.log
Sep  9 22:39:01 weather wee_import[16538] DEBUG weeimport.wdimport: The following options will be used:
Sep  9 22:39:01 weather wee_import[16538] DEBUG weeimport.wdimport:      config=/etc/weewx/weewx.conf, import-config=/var/tmp/wd.conf
Sep  9 22:39:01 weather wee_import[16538] DEBUG weeimport.wdimport:      from=None, to=None
Sep  9 22:39:01 weather wee_import[16538] DEBUG weeimport.wdimport:      dry-run=True, calc_missing=False, ignore_invalid_data=True
Sep  9 22:39:01 weather wee_import[16538] DEBUG weeimport.wdimport:      monthly logs are in US units
Sep  9 22:39:01 weather wee_import[16538] DEBUG weeimport.wdimport:      tranche=300, interval=60
Sep  9 22:39:01 weather wee_import[16538] DEBUG weeimport.wdimport:      UV=True, radiation=True ignore extreme temperature and humidity=True
Sep  9 22:39:01 weather wee_import[16538] INFO weeimport.wdimport: Using database binding 'wx_binding', which is bound to database 'roundhillvaw_archive'
Sep  9 22:39:01 weather wee_import[16538] INFO weeimport.wdimport: Destination table 'archive' unit system is '0x01' (US).
Sep  9 22:39:01 weather wee_import[16538] INFO weeimport.weeimport: Obtaining raw import data for period 1 ...
Sep  9 22:42:24 weather wee_import[16712] INFO __main__: Starting wee_import...
Sep  9 22:42:30 weather wee_import[16712] DEBUG weewx.manager: Daily summary version is 2.0
Sep  9 22:42:30 weather wee_import[16712] INFO weeimport.wdimport: Weather Display monthly log files in the '/var/tmp/WD' directory will be imported
Sep  9 22:42:30 weather wee_import[16712] DEBUG weeimport.wdimport: The following options will be used:
Sep  9 22:42:30 weather wee_import[16712] DEBUG weeimport.wdimport:      config=/etc/weewx/weewx.conf, import-config=/var/tmp/wd.conf
Sep  9 22:42:30 weather wee_import[16712] DEBUG weeimport.wdimport:      from=None, to=None
Sep  9 22:42:30 weather wee_import[16712] DEBUG weeimport.wdimport:      dry-run=True, calc_missing=False, ignore_invalid_data=True
Sep  9 22:42:30 weather wee_import[16712] DEBUG weeimport.wdimport:      monthly logs are in US units
Sep  9 22:42:30 weather wee_import[16712] DEBUG weeimport.wdimport:      tranche=300, interval=60
Sep  9 22:42:30 weather wee_import[16712] DEBUG weeimport.wdimport:      UV=True, radiation=True ignore extreme temperature and humidity=True
Sep  9 22:42:30 weather wee_import[16712] INFO weeimport.wdimport: Using database binding 'wx_binding', which is bound to database 'roundhillvaw_archive'
Sep  9 22:42:30 weather wee_import[16712] INFO weeimport.wdimport: Destination table 'archive' unit system is '0x01' (US).
Sep  9 22:42:30 weather wee_import[16712] INFO weeimport.weeimport: Obtaining raw import data for period 1 ...
Sep  9 22:42:36 weather wee_import[16714] INFO __main__: Starting wee_import...
Sep  9 22:42:42 weather wee_import[16714] DEBUG weewx.manager: Daily summary version is 2.0
Sep  9 22:42:42 weather wee_import[16714] INFO weeimport.wdimport: Weather Display monthly log files in the '/var/tmp/WD' directory will be imported
Sep  9 22:42:42 weather wee_import[16714] DEBUG weeimport.wdimport: The following options will be used:
Sep  9 22:42:42 weather wee_import[16714] DEBUG weeimport.wdimport:      config=/etc/weewx/weewx.conf, import-config=/var/tmp/wd.conf
Sep  9 22:42:42 weather wee_import[16714] DEBUG weeimport.wdimport:      from=None, to=None
Sep  9 22:42:42 weather wee_import[16714] DEBUG weeimport.wdimport:      dry-run=True, calc_missing=False, ignore_invalid_data=True
Sep  9 22:42:42 weather wee_import[16714] DEBUG weeimport.wdimport:      monthly logs are in US units
Sep  9 22:42:42 weather wee_import[16714] DEBUG weeimport.wdimport:      tranche=300, interval=60
Sep  9 22:42:42 weather wee_import[16714] DEBUG weeimport.wdimport:      UV=True, radiation=True ignore extreme temperature and humidity=True
Sep  9 22:42:42 weather wee_import[16714] INFO weeimport.wdimport: Using database binding 'wx_binding', which is bound to database 'roundhillvaw_archive'
Sep  9 22:42:42 weather wee_import[16714] INFO weeimport.wdimport: Destination table 'archive' unit system is '0x01' (US).
Sep  9 22:42:42 weather wee_import[16714] INFO weeimport.weeimport: Obtaining raw import data for period 1 ...


wee_debug-output.txt
82013lg.txt.zip
52013lg.txt.zip

Andrew M

unread,
Sep 16, 2020, 6:39:49 PM9/16/20
to weewx-user
Bumping this post to see if anyone has any suggestions on my the attached is failing import.

Thank you in advance.

gjr80

unread,
Sep 17, 2020, 6:20:12 AM9/17/20
to weewx-user
Apologies Andrew, I misread and thought you provided the data files for use in improving wee_import WD imports rather than seeking help with the import. Other things have taken up my time, I will try to have a look Friday.

Gary

gjr80

unread,
Sep 18, 2020, 7:19:47 AM9/18/20
to weewx-user
Andrew,

Can you try the attached wdimport.py in place of your current version. To install:

1. rename your current wdimport.py:

$ mv /usr/share/weewx/weeimport/wdimport.py /usr/share/weewx/weeimport/wdimport_orig.py

2. download the attached wdimport.py to /usr/share/weewx/weeimport/

3. run the import on the problem files again

This version of wdimport.py checks each log file line for null bytes and if found removes them before the line is parsed. Any such lines are listed on the console and saved to log but given the number of lines being imported they could be quite easy to miss in the log. The 82013 file had a lot of duplicate timestamps (which are listed to screen at the end of the import) so it is quite easy to miss the null byte line output to console as well.

Let us know how this goes and if successful I will include it in the next release.

Gary
wdimport.py

Andrew M

unread,
Sep 22, 2020, 4:51:18 PM9/22/20
to weewx-user

All was good with the import with the new wdimport.py file. 

Appreciate all the development efforts that have been done. 

Thank 
Reply all
Reply to author
Forward
0 new messages