New synthetic data generator evaluation tools from NIST

Skip to first unread message

Gary Howarth

Sep 26, 2022, 4:09:07 PMSep 26
to opendp-community

Dear OpenDP Community:

The NIST Differential Privacy Challenge Series saw huge gains in synthetic data generation performance and the publication of a variety of open source tools. Yet, evaluating and benchmarking synthetic data remains difficult. NIST has now released new data and evaluation tools to investigate the performance of synthetic data generators. The Diverse Community Excerpt Data are real-world, limited-feature data (22 columns), drawn from the American Community Survey and divided into three distinct geographic partitions. The complimentary beta-release of the SDNist Report Generator provides a suite of both machine- and human-readable outputs with more than ten metrics including univariate and multivariate statistics, database distance metrics, principal component analysis, propensity, basic privacy evaluation, and other information-rich tools.

The Excerpt Data are specifically designed to be responsive to the research community needs: they are small enough to allow for tractable analysis, yet encompass real-world complexity of diverse communities. These data and evaluations provide a robust platform for conducting analysis and comparisons of synthetic data generators. The machine-readable outputs make it easy to compare approaches, test new methods, and analyze performance on varying populations. Measure the performance of your generator today, and sign up for opportunities from NIST to use these tools.

Gary Howarth, Physical Scientist, NIST,  &
Christine Task, Lead Privacy Researcher, Knexus Research Corporation

Reply all
Reply to author
0 new messages