Long report generation time when using series

169 views
Skip to first unread message

David Bätge

unread,
Jun 9, 2022, 12:00:37 PM6/9/22
to weewx-development
Hi guys,
 
The group description says: "…It is a forum for exchanging architectural ideas, better ways of doing things...", so here I am to ask how to make things better :)

I recently made my first steps in python and the weewx ecosystem and build a skin for weewx, see here https://groups.google.com/g/weewx-user/c/sDK4kzO4vBw or here: https://github.com/Daveiano/weewx-wdc.

What's bothering me is the long time it takes to generate the report. On my Raspberry Pi 4B with 2GB, it takes about ~50s with data back to March 2021. Another user reports from a (suspected older model) Raspberry Pi, which takes about ~180s (data from mid 2020).

My skin includes Data Tables to show observations. If I hide these tables the generation time goes down to ~10s on my Pi (reduction of >75%). So the main problem seems to be these tables. I implemented them using the series tag. The output is then processed and rendered via JS.

In the cheetah template (I substituded some variables with values for better understanding, original code is here: https://github.com/Daveiano/weewx-wdc/blob/1.x/skins/weewx-wdc/includes/data-table-tile.inc):

var tableRows = $get_table_rows(['outTemp', 'barometer', 'rain', 'humidity', ...], $span($hour_delta=some_hour_delta, $week_delta=some_week_delta), 'year');

Search List extension (https://github.com/Daveiano/weewx-wdc/blob/1.x/bin/user/table_util.py#L72):

    def get_table_rows(self, obs, period, precision):
        """
        Returns table values for use in carbon data table.

        Args:
            obs (list): $DisplayOptions.get("table_tile_..")
            period (obj): Period to use, eg. $year, month, $span
            precision (string): Day, week, month, year, alltime

        Returns:
            list: Carbon data table rows.
        """
        carbon_values = []

        for observation in obs:
            if getattr(period, observation).has_data:
                series = getattr(period, observation).series(
                    aggregate_type=some_aggregate_type,
                    aggregate_interval=some_aggregate_interval,
                    time_series='start',
                    time_unit='unix_epoch'
                ).round(some_rounding)

                for start, data in zip(series.start, series.data):
                    cs_time = datetime.fromtimestamp(start.raw)
                    # The current series item by time.
                    cs_item = list(filter(
                        lambda x: (x['time'] == cs_time.isoformat()),
                        carbon_values
                    ))

                    if len(cs_item) == 0:
                        carbon_values.append({
                            "time": cs_time.isoformat(),
                            observation: data.raw,
                            'id': start.raw
                        })
                    else:
                        cs_item = cs_item[0]
                        cs_item_index = carbon_values.index(cs_item)
                        cs_item[observation] = data.raw if data.raw is not None else "-"
                        carbon_values[cs_item_index] = cs_item

        # Sort per time
        carbon_values.sort(
            key=lambda item: datetime.fromisoformat(item['time'])
        )

        return carbon_values


Aggregate intervals are calculated via:

        if precision == 'day':
        return 900 * 8  # 2 hours

    if precision == 'week':
        return 900 * 24  # 6 hours

    if precision == 'month':
        return 900 * 48  # 12 hours

    if precision == 'year' or precision == 'alltime':
        return 3600 * 24  # 1 day


You see what I did there: I call the series method on every observation and then I am adding the values to an array for use in JS, something like this is returned:

[
{time: '2022-06-08T02:40:00', outTemp: 13.6, id: 1654648800, outHumidity: 92.8, barometer: 1010.5, …},
{time: '2022-06-08T04:40:00', outTemp: 13.4, id: 1654656000, outHumidity: 88.6, barometer: 1010.5, …},
{...},
...
]


I am guessing these multiple series calls are resulting in a long generation time because the DB gets hit multiple times? I am using the SQLite DB, would using MySQL make any difference? I'm not a database expert but I think it should?

I would very much appreciate any suggestions to make this more performant, perhaps I missed some smart weewx utility helper?

Just for reference: https://github.com/Daveiano/weewx-wdc/issues/14

jterr...@gmail.com

unread,
Jun 12, 2022, 12:29:48 PM6/12/22
to weewx-development
Hi,
Your skin is very nice and complete. I like it !
I haven't looked yet at all parts of your code in details, but is the skin generating all the tables (even historical months and yearly table that are static)  at each archive interval ?
If it is the case, one alternative would be to store table data in a text file, somehow the  same way as the NOAA reports and/or  monthly and yearly html pages are generated with your skin. In that case, text files containing past monthly or yearly values that will not change will not be calculated and generated anymore.
Your javascript code could pickup the right text file(s) to populate the tables.
Database queries ( I have no experience with SQLLite but with MySQL) on a raspberry Pi can take time, and more and more time  as the database is growing...

David Bätge

unread,
Jun 12, 2022, 8:06:02 PM6/12/22
to weewx-development
Thank you very much for the feedback!

but is the skin generating all the tables (even historical months and yearly table that are static)  at each archive interval ?
No, it does not generate everything at each run. The historical month/year pages follow the same generation logic as the NOAA reports (SummaryByMonth and SummaryByYear reports).

I found out that even if I only disable the table on the all-time-stats-page, generation time goes down by ~50%. So I think a possible solution/workaround would be to work with rational stale_ages for the templates. Eg. the all-time-stats-page does not need to be generated at each run, once per day or twice per day should be enough for most cases I guess:

[CheetahGenerator]
    [[ToDate]]
        [[[statistics]]]
            template = statistics.html.tmpl
            stale_age = 43200 # Twice a day

Guess it simply takes this time? But on the other hand: why is the image generator so fast? The same user, that writes about the 180s, says that the Seasons report takes less than 1 sec for the same data.

jterr...@gmail.com

unread,
Jun 13, 2022, 4:44:51 PM6/13/22
to weewx-development
I tested your skin, and I observed that even with tables disabled, the "statistics.htlm" page take 2 minutes to generate, on a Raspberry PI 3B+, with a MySQL database containing data from 2005 up to today.

As far as I know, the Season skin is limiting the history to the current year, with an aggregation period of 1 day for all parameters.

The statistics.html  page of your skin is looking for "all-time" values, and if I am not wrong , it is doing several  separate queries : tables with an aggregation time of 1 day for all parameters, and diagrams with  various aggregation periods, depending  on the parameter.  The imply that several queries for all time data will be performed for the same parameter. Why not first choosing a given aggregation time, doing one query for each parameter, and using it for both charts and table?

That said, don't forget that the time needed to produce all-time charts and tables will anyway increase with time, since the database size will also increase with time.

The final result will depend also  on the archive interval configured by a user. I suspect that a with an archive interval of less than 5 minutes (I know that users have a 1 minutes interval ) the amount of data to process and the time needed to perform the queries of all-time data, even if it not done at every archive interval, will be a problem.

Tom Keffer

unread,
Jun 14, 2022, 7:09:34 PM6/14/22
to David Bätge, weewx-development
I'm not exactly sure what queries you are doing, but if they involve all time, then you really want to make sure that you don't have to search the whole database and, instead, use the daily summaries.  To do this:
  1. The aggregation time must be a multiple of a day; and
  2. The start time must be either on a midnight boundary or the first timestamp in the database; and
  3. The end time must either be on a midnight boundary, or the last timestamp in the database.
If any one of these is not true, then the main archive table must be used, and the query time will be much longer. 

Also, in general, SQLite is much faster than MySQL for these kinds of queries. See the Wiki article SQLite vs MySQL.

--
You received this message because you are subscribed to the Google Groups "weewx-development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to weewx-developm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/weewx-development/8296590d-ba9c-453d-a22d-71a8d7baad59n%40googlegroups.com.

David Bätge

unread,
Jun 14, 2022, 9:02:45 PM6/14/22
to weewx-development
I tested your skin, and I observed that even with tables disabled, the "statistics.htlm" page take 2 minutes to generate, on a Raspberry PI 3B+, with a MySQL database containing data from 2005 up to today.
Thank you for taking the time and sharing this info!

As far as I know, the Season skin is limiting the history to the current year, with an aggregation period of 1 day for all parameters.
Yes it seems so, I was not aware of that.

[...] Why not first choosing a given aggregation time, doing one query for each parameter, and using it for both charts and table?
This is a really good idea I also got in the meantime. These are many "duplicate" queries, which could be merged into one query. It collides with another idea to make the aggregation configurable by the user - individual for the graphs and the tables. I will see which direction I will take with that, thanks for the good suggestions!

[...] but if they involve all time, then you really want to make sure [...]
Well, I didn't know that! My all-time queries do only meet the first condition. I will update this and report back! Many thanks! 

David Bätge

unread,
Jun 16, 2022, 7:19:35 PM6/16/22
to weewx-development
@Tom I changed the behaviour according to your list and the generation time has decreased by 50%.

I added the boundary option to all $span calls. I assume the $week, $month and $year tags automatically use the daily summaries if the aggregation time is multiple of a day since there is no boundary option for them?

Using only one query for both, tables and diagrams would give another boost of ~15%. I will keep that in mind for later, this will need a refactoring in the Javascript part, which I do not have the time, currently. 
Reply all
Reply to author
Forward
0 new messages