LowCardinality(String) bug(?) in HA cluster

34 views
Skip to first unread message

Csongor Fagyal

unread,
Jul 7, 2023, 8:56:48 AM7/7/23
to ClickHouse
Hello,

Please help me with this :)

We have this strange behavior that might or might not be a bug (we suspect the later).

- We do an " INSERT INTO <table> SELECT <complex_query> ", that works well both in standalone, and clustered environments.
--- For the cluster environment, we have 3 nodes, and all tables "mirror" mode, using ReplicatedMergeTree
--- We have about 100 columns, and 1 to 3 million rows
- However, if we change the string columns in <table> to LowCardinality(String), our values get "mixed up" for these columns; most(!) values are OK, but about 1-2% are not
--- This only happens in the cluster
--- Only if we use LowCardinality(String) instead of String
--- It happens with multiple columns, regardless of cardinality, but it's more pronounced where cardinality is high (lost of different values)
--- It's reproducible

We could check that the "...SELECT <complex_query>" part always gives good results, so the mix-up happens when the data is written.

Any ideas what could be behind this, and how to avoid it? Using "String" columns is a workaround, but slows down the table / increases the size...

Thank you,
- Csongor

Csongor Fagyal

unread,
Jul 7, 2023, 9:20:26 AM7/7/23
to ClickHouse
To give a pseudo-example, it looks like this:


Table table1, ReplicatedMergeTree
a UInt32,
b String

Table table2, ReplicatedMergeTree
a UInt32,
b LowCardinality(String)

INSERT <data> INTO table1
INSERT <data> INTO table2

SELECT * FROM table1
(correct values)
1 x
2 x
3 x
4 y
5 y
6 y
7 z
8 z

SELECT * FROM table1
(incorrect values)
1 x
2 x
3 x
4 y
5 x   <= incorrect
6 y
7 z
8 z


I have a hunch (which might be totally wrong, of course) that the mapping for LowCardinality is calculated on all nodes, but since this process is not in sync, it's calculated differently, so during insert, some mapping values are (at least temporarily) different, so the end result will also be different.

Another interesting thing to note is that if we put the SELECT branch into a MATERIALIZED VIEW, then we don't see this behavior - we have the same results for regular String / LowCardinality.

- Csongor

Alexey Milovidov

unread,
Jul 15, 2023, 4:00:31 PM7/15/23
to Csongor Fagyal, ClickHouse
Could you please validate it with the latest ClickHouse version?

--
You received this message because you are subscribed to the Google Groups "ClickHouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clickhouse+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/clickhouse/f78330b7-077b-45eb-b0aa-94f0df7f4cd3n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages