Hi all, as we were enjoying the seminar so much, we decided to have another talk during the semester break. This Wednesday at 13:00 we will have a talk by Seth Pettie in the theory seminar. Note the different time (we will be starting an hour later than usual).
Title: Everything you always wanted to know about Cardinality Estimation* (*but were afraid to ask)
Abstract:
The Cardinality Estimation/Distinct Elements problem is to approximate the number of distinct elements in a data stream using a small probabilistic data structure called a "sketch". This problem has been studied for 40 years, has many industrial applications, and is featured prominently in most courses on Big Data algorithmics. It is therefore a real puzzle to explain why research on this popular and fundamental problem has been unusually slow.
This talk presents a complete history of the Cardinality Estimation problem from Flajolet and Martin's seminal 1983 paper to the present, and includes an account of how the research community became fractured, delaying many natural developments by decades. I will present our recent efforts to achieve information-theoretically optimal cardinality sketches, which draws on two notions of "information" developed in the 20th century: Fisher information (governing optimal point estimation) and Shannon entropy (governing optimal space/communication).
Joint work with Dingyu Wang: