I have a classical wide table with about a hundred of columns, and I need to store some additional data (scores) about some of the rows. These scores should be updatable, so I decide to implement an external dictionary to meet this goal:
```
the_data (Clickhouse table)
Id, field1, field2, .... field99
|
| connect using external dictionary
| layout = cache, id = Id
|
scores (PostgreSQL table)
Id, score1, score2, score3
```
Now I want to get the total report about those scores, so I query:
```
SELECT
avg(dictGetFloat32OrDefault('scores', 'score1', Id, toFloat32(0))) AS score1,
avg(dictGetUInt16OrDefault('scores', 'score2', Id, toUInt16(0))) AS score2
FROM the_data
SELECT
field1
avg(dictGetFloat32OrDefault('scores', 'score1', Id, toFloat32(0))) AS score1,
avg(dictGetUInt16OrDefault('scores', 'score2', Id, toUInt16(0))) AS score2
FROM the_data
GROUP BY field1
```
The problem arises when the number of rows with scores becomes millions, because these simple reports takes too much time to finish.
How can I pre-aggregate those values or maybe should I use another storage option for this dictionary?
Or maybe this use case does not even possible for CH to implement efficiently?