hyperUnique

lei feng

unread,

Jul 22, 2022, 4:09:16 AM7/22/22

to Druid User

Hi, how to use hyperUnique?

I want to realize the following functions by using hyperUnique.

`SELECT COUNT(DISTINCT(dimension)) FROM <datasource>`

But I can't understand the result. What does that mean?

Help!

Peter Marshall

unread,

Jul 22, 2022, 7:38:29 AM7/22/22

to Druid User

Hey!

I believe the screenshot you're showing is of the ingestion setup? In which case, it looks like you're telling Druid that it needs to ingest a hyperUnique-type column from your existing data.

First, I would use the official Apache Datasketches "HyperLogLog" over HyperUnique – https://druid.apache.org/docs/latest/development/extensions-core/datasketches-hll.html

Is there a reason for using HU?

Secondly, there are two modes of using sketches for approximation in Druid – either just at query time or by setting up datasketches inside the data itself (which is what it looks like you're doing there).

Check out this doc for information on the specific SQL functions to use:

https://druid.apache.org/docs/latest/querying/sql-aggregations.html#sketch-functions

Note that you can use a function "a regular column or an HLL sketch column" – if you do it on a sketched column, (a) it's more efficient for your underlying data capacity, and (b) it's faster :)

Is this helping?!

- pete

Ben Krug

unread,

Jul 25, 2022, 5:18:16 PM7/25/22

to druid...@googlegroups.com

Just to add further detail, it's showing that it will ingest a sketch with stats about the dimension that you're rolling up. At query time, you can use your query to get the (estimated) counts.

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/0d0705ef-454b-4448-996f-5604f80ae2bcn%40googlegroups.com.

Reply all

Reply to author

Forward