Multiple column in Group By clause

660 views
Skip to first unread message

Anup Tiwari

unread,
Feb 1, 2021, 7:54:54 AM2/1/21
to ksqldb-users
Hi Team,

In confluent 6 , there are some syntactic changes if we want multiple columns in a group by clause and their data too in separate columns(using AS_VALUE function).


The example given in above link is of only single key group by but when i tried to do it with multiple columns(3 columns), it is giving me error saying Non-aggregate SELECT expression(s) not part of GROUP BY: AS_VALUE(col1), AS_VALUE(col2), AS_VALUE(col3)
Either add the column to the GROUP BY or remove it from the SELECT
.


Regards,
Anup Tiwari

Sergio Pena Anaya

unread,
Feb 1, 2021, 11:18:37 AM2/1/21
to Anup Tiwari, ksqldb-users
Hi Anup,

Could you share the SELECT statement? Also, which columns are the keys?

- Sergio

--
You received this message because you are subscribed to the Google Groups "ksqldb-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ksql-users+...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/ksql-users/CAH8KkArp2cpjAxxqsY1iyrDs-S2YbCHzVwLkj3-txkxeMheBbA%40mail.gmail.com.

Anup Tiwari

unread,
Feb 1, 2021, 12:04:26 PM2/1/21
to Sergio Pena Anaya, ksqldb-users
Hi Sergio,

Please find below sample command where i want a composite rowkey of col1,col2 and col3 as well as their individual values in table :-

CREATE TABLE TEST_TABLE WITH (KAFKA_TOPIC='TEST_TABLE', PARTITIONS=10, REPLICAS=2) AS
SELECT
CAST(col1 AS STRING) + '|+|' + CAST(col2 AS STRING) + '|+|' + CAST(col3 AS STRING) as ROWKEY,
AS_VALUE(col1) as col1,
AS_VALUE(col2) as col2,
AS_VALUE(col3) as col3,
latest_by_offset(CAST(dep AS STRING)) as latest_dep
FROM

sample_stream

GROUP BY CAST(col1 AS STRING) + '|+|' + CAST(col2 AS STRING) + '|+|' + CAST(col3 AS STRING)
EMIT CHANGES;



Regards,
Anup Tiwari

Victoria Xia

unread,
Feb 1, 2021, 11:53:02 PM2/1/21
to ksqldb-users
Hi Anup,

The composite rowkey you're creating is the one that ksqlDB will automatically create if you GROUP BY multiple expressions. You can simplify your query to become the following:

CREATE TABLE TEST_TABLE WITH (KAFKA_TOPIC='TEST_TABLE', PARTITIONS=10, REPLICAS=2) AS
SELECT
col1,
col2,
col3,
AS_VALUE(col1) as col1_val,
AS_VALUE(col2) as col2_val,
AS_VALUE(col3) as col3_val,

latest_by_offset(CAST(dep AS STRING)) as latest_dep
FROM

sample_stream

GROUP BY col1, col2, col3
EMIT CHANGES;

This will result in a single key column that is "CAST(col1 AS STRING) + '|+|' + CAST(col2 AS STRING) + '|+|' + CAST(col3 AS STRING)" as desired. The column will be named KSQL_COL_0.

If you want the copies of col1, col2, and col3 in the value to be type STRING specifically, you can cast them if they are not already strings.

Best,
Victoria

P.S. This behavior is changing in ksqlDB 0.15, which will introduce native support for multiple key columns. With ksqlDB 0.15, you'll no longer need to concatenate the different columns together :)

Anup Tiwari

unread,
Feb 2, 2021, 1:16:08 AM2/2/21
to Victoria Xia, ksqldb-users
Hi Victoria,

Thanks for the reply, I will try this and let you know.
Actually the idea behind concatenating was renaming rowkey columns i.e. as you have rightly mentioned that if we go by your approach then the key column will be named as KSQL_COL_0 but i want to rename it to lets say rowkey.

This is from documentation here which has mentioned that "For groupings of multiple expressions, you can’t provide a name for the system-generated key column. You can work around this by combining the grouping columns manually, which enables you to provide an alias, for example:"


Regards,
Anup Tiwari


Victoria Xia

unread,
Feb 2, 2021, 1:38:07 PM2/2/21
to ksqldb-users
In that case you'll have to pursue your original approach instead. You can reconstruct the individual pieces you need (col1, col2, col3) from the new key using the SUBSTRING and INSTR methods. For example, col1 can be replaced with 

SUBSTRING(
CAST(col1 AS STRING) + '|+|' + CAST(col2 AS STRING) + '|+|' + CAST(col3 AS STRING),
1,
INSTR(CAST(col1 AS STRING) + '|+|' + CAST(col2 AS STRING) + '|+|' + CAST(col3 AS STRING), '|+|') - 1
)

and you can construct similar expressions for col2 and col3 as well.

Obviously, this is rather hacky and suboptimal. If you're willing to wait for ksqlDB 0.15, none of this will be necessary anymore as ksqlDB will support multiple key columns (no more string-concatenation necessary).

Best,
Victoria

Reply all
Reply to author
Forward
0 new messages