hyperUnique

89 views
Skip to first unread message

lei feng

unread,
Jul 22, 2022, 4:09:16 AM7/22/22
to Druid User
Hi, how to use  hyperUnique?

I want to realize the following functions by using hyperUnique.

`SELECT COUNT(DISTINCT(dimension)) FROM <datasource>`

But I can't understand the result.  What does that mean?

Help! 

384E5778-BEDC-4b58-AA95-0EFED6D528AC.png



Peter Marshall

unread,
Jul 22, 2022, 7:38:29 AM7/22/22
to Druid User
Hey!

I believe the screenshot you're showing is of the ingestion setup?  In which case, it looks like you're telling Druid that it needs to ingest a hyperUnique-type column from your existing data.

First, I would use the official Apache Datasketches "HyperLogLog" over HyperUnique – https://druid.apache.org/docs/latest/development/extensions-core/datasketches-hll.html
Is there a reason for using HU?

Secondly, there are two modes of using sketches for approximation in Druid – either just at query time or by setting up datasketches inside the data itself (which is what it looks like you're doing there).

Check out this doc for information on the specific SQL functions to use:

Note that you can use a function "a regular column or an HLL sketch column" – if you do it on a sketched column, (a) it's more efficient for your underlying data capacity, and (b) it's faster :)

Is this helping?!

- pete

Ben Krug

unread,
Jul 25, 2022, 5:18:16 PM7/25/22
to druid...@googlegroups.com
Just to add further detail, it's showing that it will ingest a sketch with stats about the dimension that you're rolling up.  At query time, you can use your query to get the (estimated) counts.

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/0d0705ef-454b-4448-996f-5604f80ae2bcn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages