We’ve now begun the third and final sprint of the NIST Differential Privacy Temporal Map Challenge!
Relevant to that, we have a few interesting announcements and links for you:
You can check out the Sprint #2 Results on American Community Survey Data:
This sprint used 2012-2018 data, with records linked between years to create simulated individuals with up to seven records (meaning simple counting queries had sensitivity = 7). This problem was especially challenging for a variety of very real world reasons: heterogeneous data, complex variable definitions with structural zeros and consistency rules, sparse map segments, and the sequence length of 7. We found there were values of epsilon where all teams achieved equally poor performance, but at higher values clever strategies produced some very nice outcomes, including new advances for PGM and marginal-based approaches. If you'd like to see what happened, you can check out the detailed and illustrated Winner's Announcement here
. And if you want to try out this benchmark problem for your own research
, you can find the sprint 2 competitor's pack with data, scoring code and visualizer here: https://github.com/drivendataorg/deid2-runtime/tree/sprint-2
Our Final Sprint just started!
In our third and final sprint we're off to Chicago, and we're getting serious about our temporal data. In this data set with millions of taxi trips across 77 community areas, the maximum sequence length is now 200 trips per individual taxi driver
(one week's work for a real, non-simulated Chicago taxi driver). It's an interesting data set, and a good starting point for tackling synthetic automotive data with realistic individuals. What does our data look like? Some drivers don't work mondays, some weeks contain St. Patrick's Day, and at least one community contains O'Hare. Come check it out, and maybe try your hand!
There's no requirement to have participated in previous sprints
to join in this one,
but this is your last chance to join in this challenge: Final Submissions will be due May 17 (with algorithm pre-screening due May 10).
A kick-off webinar will happen shortly, Monday 4/5/21, at 12:30p ET
, and you can still join us here.
Of course, the recording will be available for viewing at your leisure on the challenge website.
Open Source And Development Contests: What are you doing with your summer vacation? Would it be improved with $10K? We are inviting anyone who has participated in any sprint of the challenge and had their solutions pass differential privacy validation, to release their code as open source and also to put a bit of time in this summer cleaning up their code and making it suitable for use beyond the challenge. We know the hectic pace of the challenge doesn’t leave a lot of space for good software engineering practices, so we're addressing that. We'll be rewarding $4K for open source releases (due July 5th), $1K for software development plans at (due July 5th), and up to $5K for completed plans and production software (due Oct 8th).... and we may be helping you partner with real world public safety data analysts to help ensure your systems suit their needs. See your hard work pay off and your solutions grow and thrive, and help others well beyond the scope of the challenge. We’ll be posting soon to the Sprint #3 forum with more details.
And that’s it for now! We’ll check back in again when it’s time to announce our final winners. If you’d like to take a shot at being one of them (or even if you'd just like to grab our competitor's pack and try playing around with the problem on your own time), then come check us out!