Disclaimer
The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorised to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful. If you have received this email in error, please delete it and advise the sender.
.
On May 16, 2020, at 12:55 PM, Vitor Jesus <vitor...@kronikare.ai> wrote:
Hi John, Paul,
I think a BIT just identifying critical fields is already useful as a reference of good practices which is always needed. I find myself often having to explain over and over again in documents why I am blinding certain fields and not others when I could just reference community guidance.
However, I am not sure how practical is to associate a privacy-strength metric to BIT. There's a few reasons but chief among them is that resistance to re-id always depends (as John says) to the underlying de-id model and data set size.
About models, I think it was Jan who said somewhere that it will help with k-anonymity (absolutely correct, and along other). It will, however, not be too helpful with other "modern" models such as differential privacy (e.g., as boasted by Apple). DP is, in a nutshell, adding noise to the values while keeping the value useful.
I could imagine adapting the BIT for these cases -- not only identifying such fields but adding a further parameter (or metadata) to say that the fields are also "fuzzed" to within a certain numerical margin.
My 2p
--v
On 16/05/2020 06:16, Paul Knowles wrote:
Great stuff, John. You are 100% correct. I agree that we should specify your deductions in the report.
Paul
On Sat, May 16, 2020 at 3:54 AM John Wunderlich <jo...@jlinclabs.com> wrote:
Hi;
It appears to me that we may be conflating a number of re-identification issues, both on whether to focus on records/field or on dataset characteristics. As described below in a post from Privacy Analytics, one can characterize three kinds of re-identification attacks. See https://privacy-analytics.com/de-id-university/blog/re-identification-attacks/):
- The first type of attack aims to re-identify a specific person and relies upon preexisting knowledge about a person known to exist in the de-identified database. We term the risk for this as Prosecutor Risk. Attackers can use this to determine previously unknown information about their target.
- The second attack also aims to re-identify an individual but instead uses access to another source of public information about an individual or individuals that are also present in the de-identified dataset. We term this Journalist Risk. Attackers can use this to embarrass or cause reputation damage to either the individual or the owner of the dataset.
- The last attack involves re-identifying as many people as possible from the de-identified data even if this means some of them will be incorrectly identified. This last one is known as Marketer Risk. This is the what most people think of as a privacy breach.
From my point of view, BIT does two things
- The impact of BIT on individual records in a dataset is to remove identifiers and and quasi-identifiers so that any individual record is, by itself, and in the absence of special information about that record, de-identified. This may or may not reduce the individual identity attack risks, depending on what other information the attacker has and what information the attacker has. In other words BIT is helpful but not necessarily appropriate for protecting an individual record (the first two attacks described above).
- The impact of BIT on the measurable amount of re-identifiability of the dataset will depend on the content of the non-blinded elements of the dataset, what techniques were used to blind the blinded elements, the overall size of the dataset, and who has access to the dataset (insider, attacker, or publicly available). But overall, BIT can be expected to reduce the risk of dataset attacks to an acceptable level where the owner of the dataset takes appropriate other precautions. This is why I defined blinding in the report the way that I did. This is also the kind of attack that tend to be the focus of regulatory oversight and class action law suits.
The above is why I think it is important to focus the question of re-identification on the characteristics of the dataset that is the result of the blinding activity, and measuring success using measures that reflect the % or number or records that might be re-identifiable in the dataset as a whole.
Thanks,JW
_____________________________________________________________________________________________________________________________________________
Wg-isi mailing list
Wg-...@kantarainitiative.org
https://kantarainitiative.org/mailman/listinfo/wg-isi
Wg-isi mailing list
Wg-...@kantarainitiative.org
https://kantarainitiative.org/mailman/listinfo/wg-isi
--
Paul Knowles | Stem Cell
Chair of the Advisory Council
_______________________________________________ Wg-isi mailing list Wg-...@kantarainitiative.org https://kantarainitiative.org/mailman/listinfo/wg-isi
Wg-isi mailing list
Wg-...@turing.kantarainitiative.org
https://turing.kantarainitiative.org/mailman/listinfo/wg-isi