NIST Demo Day 11/8--Registration, Repos, Recordings, Homework

7 views
Skip to first unread message

Christine Task

unread,
Nov 8, 2021, 8:42:57 AM11/8/21
to opendp-c...@g.harvard.edu, Howarth, Gary S. (Fed), Chris Clifton, Christine Task

Hi all,

 

This is just some quick information/reminder for this coming Monday’s Demo Day for the NIST Differential Privacy Synthetic Data Challenge.   We hope you’ll be able to join us!  But if not--

If you’d like to receive a pointer to the recordings, just fill out the short registration form below (even if you’re unable to attend) and we’ll send you one.  The link will stay live through Tuesday (11/9), so even if you’re noticing this a day late, you can still request the recording.  The contestants’ repos are now live as well, see links below and have fun!  

 

Date/Time:  Monday, 11/8, from 11am – 1:30pm ET

Registration Link: Please register here

Agenda:

11:00 am     Welcome -- Gary Howarth, Prize Manger, NIST PSCR

11:10 am     Introduction to the challenge – Christine Task, challenge Technical Lead, Knexus Research

11:25 am     Benchmark problems for synthetic data -- Nicolas Grislain, CSO, Sarus Technologies

11:30 am     Demo 1: Minutemen [2nd Place Winner] 

12:00 pm     Demo 2: DPSyn        [3rd Place Winner]

12:30 pm     Demo 3: Jim King     [4th Place Winner]

  1:00 pm     Open Problems Discussion 

  1:30 pm     Conclusion

 

Topics:
Challenge overview, tutorials on accessing/using the open sourced winning solutions so you can play around with them yourself and try them in your own work/research, and foreshadowing on public benchmark problems and future research.  For more background on the challenge, see the previous OpenDP list email, subject “victory vs sampling error”.  

 

The Tools/Code:  
For all three developed, open sourced synthetic data generators, the NIST Differential Privacy Challenge Website now lists the links to each team’s code repository.  There you will find executables, source code, quickstart guides with example data, and each team’s technical points contact (for any questions).  Our teams spent the summer making sure their data generators would be well documented, fully configurable and fun for you to tinker around with.  We’ll give you the formal tour on Monday, but you’re welcome to look ahead.

Audience Participation:  
We’re planning to take the last half hour to invite a discussion on a general question of interest--why do these solutions perform so well?   We’ll review a few tricks that contestants discovered were reliably helpful across all 6 sprints (including over event data, demographic data and GPS data), in 4 years of challenges. 

For some of these tricks we can see empirically that they work in diverse contexts, but we’re lacking formal analysis-- What properties of human data sets enable these techniques and how can we use them to prove tighter and much more realistic utility performance bounds on algorithms going forwards?  The plan is to summarize our current findings and outstanding questions in a white paper, to support future research.   And we’d appreciate your help.  

On Monday we’ll discuss four things—subsampling/weighting to reduce sensitivity, use of hard public constraints (identifying empty sections of the data space), use of soft public constraints (identifying sparse sections of the data space), and heavily pruning marginal queries (or vertically partitioning histograms).   We’ll provide definitions and quick illustrations of what we think are the interesting bits of each technique.   But these are just from our own observations, and as great at the challenges have been, we’re fully aware that we don’t have a monopoly on high utility DP.   So let’s make this a potluck.  What’s the one weird trick you’ve observed that improves performance in your own work on real world data?   Come with your own explanatory paragraph, citation/reference, technical report/arXiv write-up, or just a captivating anecdote (even if you can’t attend), and we’ll add them to our collection.  

 


Christine Task

Technical Lead, NIST Differential Privacy Synthetic Data Challenges

Lead Privacy Researcher, Knexus Research Corporation

Christi...@knexusresearch.com | https://knexusresearch.com/privacy/

 

 

  

 

Reply all
Reply to author
Forward
0 new messages