Unexpected behavior with Thetasketches in Druid 31+

39 views
Skip to first unread message

Ben Smithgall

unread,
May 1, 2025, 1:34:24 PMMay 1
to Druid User
Hello all,

I had a question about Theta Sketches and some behavior that is different from Druid 29 to Druid 31+.

We generate Theta Sketches in a separate system and then batch-ingest them into Druid. This has been working fine but we are attempting to upgrade and running into an issue where previously fine queries are suddenly being rejected by the planner with error `INVALID INPUT` , but I don't think this is correct. For example, using this query from the Druid 31 release notes as a baseline, I cannot do ThetaSketch query operations on the `user_theta` column (see screenshot one). However, if I ingest that same exact query into a data source, suddenly those operations are successful (see screenshot two). I just want to check if this is expected behavior because it seems unintentional.

Please note that these screenshots were taken from a druid cluster running locally with the docker compose up suggested setup and no modifications from the provided sample environment or docker-compose.yml files.

Thanks,
Ben
Screenshot 1.png
Screenshot 2.png

gi...@imply.io

unread,
May 1, 2025, 2:45:18 PMMay 1
to Druid User
I think this stopped working after https://github.com/apache/druid/pull/16682. There is a check in here that is too aggressive: https://github.com/apache/druid/pull/16682/files#diff-c13165d74efe766ced94f9784fba968a9cdc1b1f5b0ba1910e5704f2bd83a725R125-R134. A similar check exists for HLL here: https://github.com/apache/druid/pull/16682/files#diff-ec671a956d4e41999f3d6e190594291446ae7ce434bfb3461607a0fbec8b9cc5R158-R167, but the HLL check only fires if "isValidComplexInputType" returns false.

If you want to try your hand at fixing this, try adding a "isValidComplexInputType" function to ThetaSketchBaseSqlAggregator and have it be used the same way as the analogous function in HllSketchBaseSqlAggregator. A test case should go in ThetaSketchSqlAggregatorTest. If you raise a PR doing this, I'll take a look. Or, you could just file a GitHub issue if you'd rather not approach the PR yourself.

Gian

Ben Smithgall

unread,
May 2, 2025, 9:15:39 AMMay 2
to Druid User
Hi Gian,

Thanks for the helpful direction, don't think I could have made any real progress without it. I've submitted a pull request along the lines you described.

Ben

Reply all
Reply to author
Forward
0 new messages