Editing db schema, extraTemp1 exception

227 views
Skip to first unread message

Duane Kerzic

unread,
Oct 8, 2020, 9:39:14 PM10/8/20
to weewx-user
Hi,

Thanks for all the help you provided last time around. Thanks in advance this time for your help.

I wanted to clean up weewx.archive table and make it a bit smaller. So I deleted the columns I don't think I'll ever use. But now I'm getting this in the system log.


I'm guessing that extraTemp1 is coded into one of those files but I haven't looked to find out yet.

I've shortened the average row length of the archive table to 126 from 217 bytes. Huge difference when you have 10 years of data.

-dk

Graham Eddy

unread,
Oct 8, 2020, 9:55:35 PM10/8/20
to weewx...@googlegroups.com
i guess ‘huge’ is relative. i have 96MB from 15½ years data → i am totally unconcerned about weewx.sdb size, even if RAM-based

Duane Kerzic

unread,
Oct 8, 2020, 10:22:44 PM10/8/20
to weewx-user
I have over 5. 5 million rows

d k

unread,
Oct 8, 2020, 10:29:14 PM10/8/20
to weewx-user
I tried to post this as an image but it doesn't show. So here is the text.

Oct  8 20:03:19 prometis weewx[271870] ERROR weewx.reportengine: Caught unrecoverable exception in generator 'weewx.imagegenerator.ImageGenerator'
Oct  8 20:03:19 prometis weewx[271870] ERROR weewx.reportengine:         ****  extraTemp1
Oct  8 20:03:19 prometis weewx[271870] ERROR weewx.reportengine:         ****  Traceback (most recent call last):
Oct  8 20:03:19 prometis weewx[271870] ERROR weewx.reportengine:         ****    File "/usr/share/weewx/weewx/reportengine.py", line 197, in run
Oct  8 20:03:19 prometis weewx[271870] ERROR weewx.reportengine:         ****      obj.start()
Oct  8 20:03:19 prometis weewx[271870] ERROR weewx.reportengine:         ****    File "/usr/share/weewx/weewx/reportengine.py", line 280, in start
Oct  8 20:03:19 prometis weewx[271870] ERROR weewx.reportengine:         ****      self.run()
Oct  8 20:03:19 prometis weewx[271870] ERROR weewx.reportengine:         ****    File "/usr/share/weewx/weewx/imagegenerator.py", line 41, in run
Oct  8 20:03:19 prometis weewx[271870] ERROR weewx.reportengine:         ****      self.genImages(self.gen_ts)
Oct  8 20:03:19 prometis weewx[271870] ERROR weewx.reportengine:         ****    File "/usr/share/weewx/weewx/imagegenerator.py", line 176, in genImages
Oct  8 20:03:19 prometis weewx[271870] ERROR weewx.reportengine:         ****      start_vec_t, stop_vec_t ,data_vec_t = weewx.xtypes.get_series(var_type,
Oct  8 20:03:19 prometis weewx[271870] ERROR weewx.reportengine:         ****    File "/usr/share/weewx/weewx/xtypes.py", line 91, in get_series
Oct  8 20:03:19 prometis weewx[271870] ERROR weewx.reportengine:         ****      raise weewx.UnknownType(obs_type)
Oct  8 20:03:19 prometis weewx[271870] ERROR weewx.reportengine:         ****  weewx.UnknownType: extraTemp1
Oct  8 20:03:19 prometis weewx[271870] ERROR weewx.reportengine:         ****  Generator terminated
Oct  8 20:03:19 prometis weewx[271870] DEBUG weewx.reportengine: Report 'SmartphoneReport' not enabled. Skipping.


gjr80

unread,
Oct 8, 2020, 10:54:27 PM10/8/20
to weewx-user
Hi,

First up, thank you for not posting images of text, it’s makes reading/searching logs a real pain.

The error is due to a skin trying to generate a plot that involves extraTemp1 and from the short log extract I would guess that this is from the Seasons skin. If you look in the Seasons skin config file (skins/Seasons/skin.conf) under [ImageGenerator] you will find the daytemp, weektemp, monthtemp and yeartemp plots use extraTemp1 (and extraTemp2 and extraTemp3). Easiest fix is to comment out those plots, eg:

#      [[[daytemp]]]
#           yscale = None, None, 0.5
#           [[[[extraTemp1]]]]
#           [[[[extraTemp2]]]]
#           [[[[extraTemp3]]]]

Save skin.conf and the error should go away on the next report cycle.

Gary

d k

unread,
Oct 8, 2020, 11:02:51 PM10/8/20
to weewx-user
Yup.. I just found that and was about to report back I was trying it that was it. Just restarted the test system to see if it went away. I think I got rid of all of them now.

Gary you are the best.  Thanks so much.

Tom Keffer

unread,
Oct 9, 2020, 9:01:49 AM10/9/20
to weewx-user
Trimming the schema  does not make as big a difference in database size as you might think.

For example, using my own database of 1.4M rows, trimming the schema from 48 observation types to 27, reduces the size from 268MB to 201MB. 

The reason is that most of the space is taken up by the indexes, not the column data.

-tk

--
You received this message because you are subscribed to the Google Groups "weewx-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to weewx-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/weewx-user/02d0a56e-c9fc-4e48-a74a-cdb6291474bbn%40googlegroups.com.

d k

unread,
Oct 9, 2020, 11:52:16 AM10/9/20
to weewx-user
The size of the indexes on the archive table are <51mb in both cases. There is no difference here. I totally agree.

I think the reason you don't see a difference in size is because of how null values are stored, I think in 1 byte but haven't found a reference. So yes even if you remove 20 unused types you only remove 20 bytes which as you point out is nothing. But the extra columns still affect read and write performance. Write isn't a big big deal as we don't do lots of writes anyway. But we might do lots of reads depending on what we are doing with our station data and we probably are all running this on inexpensive slow hardware. In my case a RPi but a new one which isn't all that slow other than if you're comparing it to something else that's new. But, for instance it still cut the time to make the daily summiers by more than half. Again not like we do that often so not a huge deal.

This is where the real change probably came from. I also changed the data types of the observations from double (8 bytes) to float (4 bytes). Mysql made the sqllite data type doubles instead of floats. I don't have  REAL_AS_FLOAT set and that's my fault.

I am going to move to FLOAT(n) and set the precision on the columns next which won't change the row length, as the columns are all still 4 bytes, but to make things easier when I use other applications against this data set.

In my case the length of the data went from ~1.1 gb to <650mb in this case. It also reduced the size of the binlogs, which get purged anyway. It also reduced the size of the *ib* files. It cut the time to and size of dumping the table almost by half, I haven't tried restoring yet but expect the same. Queries run faster.

In my opinion there are other reasons to trim the schema to fit your needs other than the size of the data file. But yes it's more work and that depends on how you use your data if it's worth it or not. Obviously I think it's worth it and YMMV.

-dk

Tom Keffer

unread,
Oct 9, 2020, 6:17:00 PM10/9/20
to weewx-user
Sorry. I should have prefaced my comments that they pertain to SQLite. I have no experience with MySQL.

d k

unread,
Oct 10, 2020, 12:40:53 AM10/10/20
to weewx-user
BTW I'm in awe of what you've done with this. It's an amazing effort and I really like what you've done. It works better than many comercial apps I've had to use.

bdf0506

unread,
Oct 11, 2020, 9:09:23 PM10/11/20
to weewx-user
I've found that deleting columns out of the schema of the archive table does more hard than good. I had a 3.x installation for a while, and i had trimmed many columns that I didn't need, and also renamed some. While it would work great for general manual querying of the data, many skins would throw weird errors, mostly when they would expect schemas that weren't there. I moved to 4.x recently, and decided to ditch my custom schema and go with a fresh install with the extended schema. Exporting data from the old column names to then importing to the new column names proved to be trickier than I would have hoped, but eventually got it working and the had to go and manually update many skins I would use. Overall it was a PITA and I wish I would have just stuck with the original schema from the beginning.

tl;dr - stick with the default schemas to save you a headache!

d k

unread,
Oct 13, 2020, 9:17:51 AM10/13/20
to weewx-user
I never used SQLite before and my previous comments were based on mysql. I decided to do an experiment as it was raining yesterday. Read some of the SQLite documentation and installed it. Impressed by how lightweight it is.

I have to agree with everyone that the savings from removing all the unused columns in sqlite is small. In my case 19.3% or from 762,440 kb to 615,352 kb = 147,088 kb with 5,480,150 rows. Not insignificant but it is a small difference.

Since I had the two SQLite database files I decided to try a query. Expected a large time penalty from all the null columns. "SELECT * from archive WHERE datetime BETWEEN 1490563860 and 1546298910;" on both files. I'm doing this on machine other than what I run weewx on, it's much faster.

When executed against the unmodified schema:
Execution finished without errors.
Result: 928677 rows returned in 729ms...

When executed against the modified schema:
Execution finished without errors.
Result: 928677 rows returned in 127ms...

That is without additional columns I need, which drove this to begin with. When they are added in as they are null for all of these records the execution goes to about 429ms for the modified schema.

 A savings of 602ms or a query that runs in 82.6% less time is, well huge when it's a simple select statement.

I haven't tried replacing all the nulls with some value yet. Perhaps '-1' or some other value those columns would never hold. But the next rainy day I just might.

If you aren't doing a lot of queries on the data this also isn't all that big a deal. Which was the reason for installing mysql. The nulls are the reason you can't normalize the data and have it work acceptable.

Tom Keffer

unread,
Oct 13, 2020, 9:28:26 AM10/13/20
to weewx-user
How are you running the two benchmarks? In the same process? SQLite caches pages, so the second query should be much faster.

Try reversing the order.

-tk

Tom Keffer

unread,
Oct 13, 2020, 9:29:18 AM10/13/20
to weewx-user
Oh, one other tip: make sure you VACUUM both databases before comparing sizes.

d k

unread,
Oct 13, 2020, 10:27:18 AM10/13/20
to weewx-user
Thought I vacuumed them but doing it again to be sure. They are separate files and opened in separate processes. I didn't use weewx to create them. Use the following statements. I'll see what happens. They don't all have the same data types. I used integers where that works.

For the unmodified one in file .../weewx_archive.db:
CREATE TABLE "archive" (
"dateTime" int NOT NULL UNIQUE,
"usUnits" int NOT NULL,
"interval" int NOT NULL,
"barometer" REAL DEFAULT NULL,
"pressure" REAL DEFAULT NULL,
"altimeter" REAL DEFAULT NULL,
"inTemp" REAL DEFAULT NULL,
"outTemp" REAL DEFAULT NULL,
"inHumidity" REAL DEFAULT NULL,
"outHumidity" REAL DEFAULT NULL,
"windSpeed" REAL DEFAULT NULL,
"windDir" REAL DEFAULT NULL,
"windGust" REAL DEFAULT NULL,
"windGustDir" REAL DEFAULT NULL,
"rainRate" REAL DEFAULT NULL,
"rain" REAL DEFAULT NULL,
"dewpoint" REAL DEFAULT NULL,
"windchill" REAL DEFAULT NULL,
"heatindex" REAL DEFAULT NULL,
"ET" REAL REAL,
"radiation" REAL DEFAULT NULL,
"UV" REAL REAL,
"extraTemp1" REAL DEFAULT NULL,
"extraTemp2" REAL DEFAULT NULL,
"extraTemp3" REAL DEFAULT NULL,
"soilTemp1" REAL DEFAULT NULL,
"soilTemp2" REAL DEFAULT NULL,
"soilTemp3" REAL DEFAULT NULL,
"soilTemp4" REAL DEFAULT NULL,
"leafTemp1" REAL DEFAULT NULL,
"leafTemp2" REAL DEFAULT NULL,
"extraHumid1" REAL DEFAULT NULL,
"extraHumid2" REAL DEFAULT NULL,
"soilMoist1" REAL DEFAULT NULL,
"soilMoist2" REAL DEFAULT NULL,
"soilMoist3" REAL DEFAULT NULL,
"soilMoist4" REAL DEFAULT NULL,
"leafWet1" REAL DEFAULT NULL,
"leafWet2" REAL DEFAULT NULL,
"rxCheckPercent" REAL DEFAULT NULL,
"txBatteryStatus" REAL DEFAULT NULL,
"consBatteryVoltage" REAL DEFAULT NULL,
"hail" REAL DEFAULT NULL,
"hailRate" REAL DEFAULT NULL,
"heatingTemp" REAL DEFAULT NULL,
"heatingVoltage" REAL DEFAULT NULL,
"supplyVoltage" REAL DEFAULT NULL,
"referenceVoltage" REAL DEFAULT NULL,
"windBatteryStatus" REAL DEFAULT NULL,
"rainBatteryStatus" REAL DEFAULT NULL,
"outTempBatteryStatus" REAL DEFAULT NULL,
"inTempBatteryStatus" REAL DEFAULT NULL,
PRIMARY KEY("dateTime")
);

For the modified one in file .../weewx_elec_archive.db:
CREATE TABLE "archive" (
"dateTime" INTEGER NOT NULL UNIQUE,
"usUnits" INTEGER DEFAULT NULL,
"interval" INTEGER DEFAULT NULL,
"barometer" REAL DEFAULT NULL,
"pressure" REAL DEFAULT NULL,
"altimeter" REAL DEFAULT NULL,
"inTemp" REAL DEFAULT NULL,
"outTemp" REAL DEFAULT NULL,
"inHumidity" INTEGER DEFAULT NULL,
"outHumidity" INTEGER DEFAULT NULL,
"windSpeed" REAL DEFAULT NULL,
"windDir" REAL DEFAULT NULL,
"windGust" REAL DEFAULT NULL,
"windGustDir" REAL DEFAULT NULL,
"rainRate" REAL DEFAULT NULL,
"rain" REAL DEFAULT NULL,
"dewpoint" REAL DEFAULT NULL,
"windchill" REAL DEFAULT NULL,
"heatindex" REAL DEFAULT NULL,
"ET" REAL DEFAULT NULL,
"radiation" INTEGER DEFAULT NULL,
"UV" REAL DEFAULT NULL,
"soilTemp1" REAL DEFAULT NULL,
"leafTemp1" REAL DEFAULT NULL,
"soilMoist1" INTEGER DEFAULT NULL,
"leafWet1" INTEGER DEFAULT NULL,
"rxCheckPercent" REAL DEFAULT NULL,
"txBatteryStatus" INTEGER DEFAULT NULL,
"consBatteryVoltage" REAL DEFAULT NULL,
PRIMARY KEY("dateTime")
);


d k

unread,
Oct 13, 2020, 10:44:20 AM10/13/20
to weewx-user
Maybe this belongs in -development rather than -user?

The whole reason for this exercise is that I want to change the schema anyway to add other data from my ted6000 and perhaps a couple of other items like a sensaphone web600 in addition to the weather envoy. So I need columns that aren't in the wview_extended schema. I've figured out how to fix the reports and having the ability to do that makes this a lot easier.

And I'm still learning about what tk did when he built this. So it's wonderful that he's an email away. You don't get that often.

d k

unread,
Oct 13, 2020, 11:26:24 AM10/13/20
to weewx-user
All vacuumed up. 
 .../weewx_archive.db  770,588,672
 .../weewx_elec_archive.db  630,120,488

Still 140,468 kb smaller or 18.2% Again today this isn't an issue. Rather small amount of space when you can buy a HD for ~$0.04/gb or less and an SD card for ~$0.4/gb or so.

This is a long way from when I started and a mb of HD was $2.79 ($2,790/gb) and had an access time of > 20ms. I guess I old habits are hard to break.

Tom Keffer

unread,
Oct 13, 2020, 12:09:01 PM10/13/20
to weewx-user
I bought a 512 MB disk for a VAX 11 in 1984 for $22,000. It was about the size of a washing machine. These days, I don't think you can buy a disk that small.

d k

unread,
Oct 13, 2020, 2:59:11 PM10/13/20
to weewx-user
I was talking about stuff I had at home. Was doing mechanical engineering in '84. But yes I remember those drives and 8" floppies that came after DEC tapes.

This might be the smallest 'drive' you can buy today but it's a sd card, 128mb, https://smile.amazon.com/Cloudisk-Card-SDXC-Flash-Memory/dp/B07RP7JJFN/ 

The smallest HD is probably about 250gb.

Let me try to do a query again.

tomn...@frontier.com

unread,
Oct 14, 2020, 5:24:12 PM10/14/20
to weewx-user
I dislike one-uppers (me in this case), but when I started doing chip testing the early 80's, we also had PDP-11's (73's to be precise), and with that
full 7' rack, we had 2 of the removable hard disks that were maybe 16" across and if I remember right, 5MB each.  One held the O/S (TSX),
and the other had the chip test programs and test patterns.  My typing skills were pretty embarrassing early on.  We used EDT on a VT-102 at the
time, and TSX was great because we could switch among several terminal sessions.  At some point we got (VMX?) cards so we could hook up
Maxtor 5.25" disks in like 160MB.  Can't remember if that was SCSI or ATA.  I think we also got telnet to work.  Now that was styling...
Chris

d k

unread,
Oct 14, 2020, 10:44:02 PM10/14/20
to weewx-user
I did some more work on this because I was curious about SQLite and how it handled nulls. Other systems I've used in the past didn't do a good job with nulls, but I've been out of this kind of work for a while >10 years now. So I mostly avoided them in past. But they were systems with a lot more users. Not a single user type for use at the edge. It's not like there are going to be lots of concurrent writes to the same table in a SQLite setting or even 1,000's of transactions in an hour.

I tried three files. All with the same ~5.4 million rows.

There is one thing I haven't tested. Inserts. Maybe another time, maybe it isn't worth it.

In one file I filled all the null fields with -1 as a test and compared it to the original. There was no benefit, it ended up being a little slower (about 10% longer) and the table size did grow by 17%. Both on initially pass and on subsequent passes without closing the file. I also tried these two statements as I didn't include pressure previously so there were both nulls and values other than null in that column:

In the table that has nulls;
SELECT * FROM archive where ifnull(pressure,1); # this used to be very slow.

In the table that has -1';
SELECT * FROM archive where pressure = -1;

It seems to able to pick out nulls faster than values, opposite of other systems I've used. I was wrong about nulls in SQLite.

On to my altered schema where I removed all the columns I don't have sensors or calculated values for.

The altered schema is faster on the initial query, ~440ms to ~770ms as I used before, SELECT * from archive WHERE datetime BETWEEN 1490563860 and 1546298910;. But very close to the same after it's cached (about ~20ms) as Tom correctly pointed out. I did make that error. I probably executed the one query 2x and the other one only once without realizing it.

My last test between the shortened schema and the original wview_extended. For the original run of the original query above the smaller scheme is ~470ms compared to ~730ms. Subsequent runs after caching no real difference but still it's still slightly faster, 10ms. I picked a new query to try.
SELECT datetime, windDir FROM archive WHERE windDir = 45;

When cached it's ~1045ms in the shorted schema and ~1141ms in wview_extended. Not huge but on the order of 10%. Again a single user machine probably nothing to write home about. The only thing I found interesting about this one is that there is no real benefit from caching for this one, no idea why, about 30ms longer on initial run.

BTW I sailed on a J/42 on a long time ago. Also had a Flying Scot 4925 for a while.

d k

unread,
Oct 14, 2020, 10:59:52 PM10/14/20
to weewx-user
Chris, Nah, no one upping. Just remembering the old days.

I remember those kinds of drives as well on Wang 2200's in a multiuser set up for my fathers lending business. I don't remember all the model numbers. And old 9 track reel to reel tapes. And yes those 5.25" floppies. And yes being able to switch between terminal session. And the joys of telnet. How things have changed. We have more computing power sitting in our homes now than major research institutions had in huge rooms back then. And it isn't all that long ago.

Tom Keffer

unread,
Oct 15, 2020, 8:05:09 AM10/15/20
to weewx-user
If you feel like it, perhaps you could write a brief summary of your findings and post it in the Wiki. It would be a shame if hard data like this got lost to antiquity.

One lesson to take from this is that database writers, the authors of SQLite in particular, have gotten very good at enhancing performance. As you point out, for a single-user system like this, it's just not worth the effort to try and optimize anything. 

-tk

d k

unread,
Oct 15, 2020, 9:12:00 AM10/15/20
to weewx-user
The only way it would be worth it to optimize this is if this was taking human input rather than logging from some sensors.

I'll also put it in a mysql db and try it there. To see if it makes any difference. The one thing i'm seeing is mysql8 had to have allocated room for the nulls to get filled while SQLite did not. I'm not certain how I have everything set in mysql8. Which totally explains the differences in the file sizes I'm seeing between the two systems. 

Not allocating room for nulls to be filled was one of the big problems with nulls in a db that has a lot of updates in the past, which this one doesn't have. You'd end up very fragmented because there wouldn't be room to update the row where it was. The existing row would get marked deleted and a new row would be appended to the end of the table, and indexes would grow as well. You'd need to do frequent reorgs of the tables. Here as I updated the rows from null to -1 a journal file was created that grew in size larger than the original file. It was also very slow (100s) on the first shot at the table but then speed increased (17s) as I did other columns.

I'll try to write something next rainy day.
Reply all
Reply to author
Forward
0 new messages