Yes, FINAL is slow (under-optimized).
It is implemented in following way:
1. Primary key is read from table (to do merge), even if it isn't used in query.
2. Streams of data from different data parts are merged.
3. Data is processed in single thread (because it is simplest way to do merge).
4. Even if different data parts doesn't overlap by primary keys, they are also going through merge.
5. Merge is not fully optimized, it contains row-by-row processing, virtual function calls, doing small copies etc
(althrough performance is not very bad, it could be done much better).
Optimizations are possible:
1. Don't read primary key and don't merge data, if for some range, it is discovered, that no other data within same range exists.
2. Maybe introduce separate unique key in addition to primary key, because it could be more compact. Example: primary key is (ClientID, Date, UserID, LogRowID) and unique key might be just LogRowID.
3. Do (dynamically) partition of data parts to multiple ranges, to merge them in parallel.
4. Further optimize merge.
We have not done these optimizations, because we don't use FINAL for production queries in serious load, only for rare queries or for small load.
Even ReplacingMergeTree was developed by request of another department, who convinced me, that they will use it in small workload.
And we don't recommend FINAL.
Keep in mind that using FINAL leads to a selection
that includes columns related to the primary key, in addition to the
columns specified in the SELECT. Additionally, the query will be
executed in a single stream, and data will be merged during query
execution. This means that when using FINAL, the query is processed more
slowly. In most cases, you should avoid using FINAL.