Hello OpenDP Community!
In 2018 and 2020 NIST announced Synthetic Data Challenges here. Now we have something new for you-- like a challenge, but collaborative rather than competitive. The Collaborative Research Cycle (CRC) aims to more formally understand data deidentification as a whole (synthetic data methods and others). We provide the data and metrology, the community provides deidentified data, and we all learn together.
Register here for our newsletter, and to see our kick-off webinar on 3/7/23.
Our premise is that the most interesting research problems happen in cycles: first the idea, which many of you already have, then the engineering to implement the idea on a real-world use case, then the engagement with real-world experts, and finally that's when we start to look more closely at the problem and realize "hey, something’s weird here".
And then new things happen. Research, engineering, and engagement lead to better research.
You can learn more about the project
here, or check out our
project website,
---but the gist is:
Because, then:
We're hoping for two things from this program-- First, some really fun math and data research problems (which we're already
seeing). But we are also hoping to accelerate the sort of robust, formal understanding of privacy systems that's necessary to ensure we can deploy them safely, without unexpected negative consequences.
If you'd like to follow along with us as we do all this, you can subscribe to our
newsletter. And if you think you might like to participate (either submitting deidentified data samples, or joining in the collaborative research efforts),
just
register a team.
We already have some great data deidentification techniques in our
collection. Do you have one you'd like to add? Register your team, and follow the website directions to submit it!
Christine Task
Lead Privacy Researcher
Knexus Research Corporation
Christi...@knexusresearch.com
Gary Howarth
NIST Scientist, Program Officer
National Institute of Standards and Technology
Hello all--
One minor adjustment to the previous email (below): If you’d like to sign up to the NIST CRC listserv to follow along with our project news and updates, just use this mailto link:
Join CRC listserv for news and updates (send an empty email to subscribe)
Thanks!
--Christine
Hello all,
The National Institute of Standards and Technology CRC program, announced here last month, is well under way using an innovative suite of metrics to collaboratively evaluate/visualize the behaviors of different
data deidentification techniques on diverse data. On the off chance you’d like to come collaboratively evaluate it too, we’re sharing preliminary results of interest to the community–
What really is privacy and utility at epsilon 10?
It turns out that can vary significantly, depending on what DP technique you’re using. So far our participants have looked at histograms, marginal methods, GAN and transformer networks, constraint satisfaction methods and even a
genetic algorithm. A nice colorful meta-report comparing these techniques at epsilon 10 is now available on the
OpenDP Slack (in the “#crc-office-hours” channel), and
we’ll be available there too, to answer questions and chat. Our thanks to OpenDP for hosting our office hours discussions! The slack link is a simple click through to check things out—no need to have previously been a member of the slack channel.
If you’d like to learn more about the techniques currently in our collection, see the CRC website. To help contribute new ones (or new samples of
existing techniques, with different configs), check our participant instructions. And to follow along with future updates like
the one above– you can join our listserv
(just send an empty email).
Next month we will be issuing a Call for Papers, soliciting bite-sized (3pg + abstract) workshop papers that identify, explore, and start to analyze some of the fundamental patterns we’re seeing across different algorithms as they attempt to deidentify our
diverse communities data. The submission deadline will be 9/29, and we’ll be holding discussions periodically through the summer.
We have interesting observations already in the epsilon 10 report, some of which may have implications for your own research if you’re working in the data deidentification space (and are concerned about performance on diverse populations). If you can take
a moment to drop into the slack and look at our reports, we’d love to have your thoughts.
Feel free to direct any questions to Gary Howarth, NIST or Christine Task, Knexus Research.
--
You received this message because you are subscribed to the Google Groups "opendp-community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
opendp-communi...@g.harvard.edu.
To view this discussion on the web visit
https://groups.google.com/a/g.harvard.edu/d/msgid/opendp-community/BN2P110MB11904EC7971DEDA66D030368E4B29%40BN2P110MB1190.NAMP110.PROD.OUTLOOK.COM.
Hi all—
What’s your favorite recent privacy research on tabular data (ex: census data, federal agency data—columns of information on individuals)? How do your contributions compare to other people’s?
Want to find out?
This December, NIST is holding a (virtual) debutante ball for our Tabular Benchmark Data, and you’re all invited. The
Diverse Communities Excerpts Data
is designed to be very challenging but tractable, by experts who are familiar with both requirements. It’s curated from the 2018-2019 American Community Survey, with features and demographically diverse geographies that showcase complex distributions over
a manageably small schema. Over the summer we’ve built tools to make it fun and easy to work with, and we’ve collected over 450 deidentified samples of this data from a variety of privacy techniques, research groups and stakeholders.
And we’ve learned a lot already. Grounding diverse research on common benchmark data enables us to efficiently compare, combine, and draw implications across observations from very different groups; we’re accelerating the natural
Collaborative Research Cycle.
We’d like to include your research in our work. From now until Nov 7th
we’re accepting non-archival 4-page Research
Report Submissions (Call for Papers) that apply existing or new research to the Diverse Communities Excerpts benchmark data.
As always, we will use submissions to motivate analysis of opportunities and roadblocks in this research area, and support future programs designed to address them. We welcome you
to contribute your perspective. We anticipate this work will result in an improved understanding of data privacy tools and a more comprehensive view of where that understanding is lacking, allowing us to identify new open research problems.
If you have any questions, concerns, would like to chat or get a tour of our resources---please feel free to reach out to Gary and myself!