[WG-InfoSharing] Important: Webinar around privacy and risk using BIT and ARX

2 views
Skip to first unread message

Jim Pasquale

unread,
May 14, 2020, 6:50:13 PM5/14/20
to wg-consent...@kantarainitiative.org, Information Sharing Work Group, wg-...@kantarainitiative.org
Et al: Please be advise and feel free to pass along.

A recent conversation with a task force formalizing Paul’s work with BIT for a Kantara two series reports. A light version due second quarter and a full compliance report most likely Q4.

According to Paul Knowles: A few privacy geeks have earmarked the development of a risk overlay which would allow schema issuers to add a unique sensitivity level to any attributes that have been flagged in a schema base (post referencing the BIT). Many industries perform risk analysis on captured data. For example, "Date of Birth" might have a sensitivity level of 6 whereas "Actual Visit Date" might have a sensitivity level of 2. This information could be very useful when people start combining schemas from different industry sectors. Correlation experts could potentially build strong algorithms to give a definitive risk of correlation in these cross-industry cases.

With this in mind Jan is of the opinion: The risks of re-identification is more of a question of combining different attributes to be able to re-identify somebody. Through generalization and k-anonymity these risks can be reduced. To determine the risks it is an assessment exercise which is not directly related to BIT. I would move it to an assessment exercise. Have you worked with standards that try to define these assessment criterias?

It might be of interest to the group, next week I will be conducting a webinar together with a colleague of mine on how to anonymize/generalize PII type  data and assess the risks of re-identification using an open source platform called KNIME. We developed some components that help in this process.  And so Jan has pass this along for all of us to consider.

Hi Jim,
Here is a link to the webinar. 

Jan is interested to hear what you think of the assessment methods. Note they are based on ARX, another open source component. He plans to make a reference to Kantara and BIT.

BR,
Jan

https://arx.deidentifier.org


Disclaimer

The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorised to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful. If you have received this email in error, please delete it and advise the sender.

.

Mark Lizar

unread,
Sep 12, 2022, 6:11:08 AM9/12/22
to Vitor Jesus, wg-consent...@kantarainitiative.org, paul.knowles, wg-...@kantarainitiative.org, John Wunderlich, wg-info...@kantarainitiative.org

Hello, 

Very interesting conversation.  BiT and Differential transparency are interesting topics.  we have  posted a  stable Operational Transparency Specification draft,  

In it,  BiT has important role and is utilized as a tool for controlling a record (for personal data control) and, a Differential Privacy Ethics  Case study is presented for discussion around these points.    

As well as presenting an example record that utilizes, BiT, Differential Transparency, Differential Privacy and ZKP for specific attributes, which are used as identifiers. 

 It is great to see that others are wrestling with these points.

Mark

Operational Transparency Specification, (currently  under an expert review) -  will be formally opened up for comment shortly



On May 16, 2020, at 12:55 PM, Vitor Jesus <vitor...@kronikare.ai> wrote:

Hi John, Paul,

I think a BIT  just identifying critical fields is already useful as a reference of good practices which is always needed. I find myself often having to explain over and over again in documents why I am blinding certain fields and not others when I could just reference community guidance.

However, I am not sure how practical is to associate a privacy-strength metric to BIT. There's a few reasons but chief among them is that resistance to re-id always depends (as John says) to the underlying de-id model and data set size.

About models, I think it was Jan who said somewhere that it will help with k-anonymity (absolutely correct, and along other). It will, however, not be too helpful with other "modern" models such as differential privacy (e.g., as boasted by Apple). DP is, in a nutshell, adding noise to the values while keeping the value useful.

I could imagine adapting the BIT for these cases -- not only identifying such fields but adding a further parameter (or metadata) to say that the fields are also "fuzzed" to within a certain numerical margin.

My 2p
--v


On 16/05/2020 06:16, Paul Knowles wrote:
Great stuff, John. You are 100% correct. I agree that we should specify your deductions in the report.

Paul

On Sat, May 16, 2020 at 3:54 AM John Wunderlich <jo...@jlinclabs.com> wrote:
Hi;

It appears to me that we may be conflating a number of re-identification issues, both on whether to focus on records/field or on dataset characteristics. As described below in a post from Privacy Analytics, one can characterize three kinds of re-identification attacks. See  https://privacy-analytics.com/de-id-university/blog/re-identification-attacks/): 
  • The first type of attack aims to re-identify a specific person and relies upon preexisting knowledge about a person known to exist in the de-identified database. We term the risk for this as Prosecutor Risk. Attackers can use this to determine previously unknown information about their target.
  • The second attack also aims to re-identify an individual but instead uses access to another source of public information about an individual or individuals that are also present in the de-identified dataset. We term this Journalist Risk. Attackers can use this to embarrass or cause reputation damage to either the individual or the owner of the dataset.
  • The last attack involves re-identifying as many people as possible from the de-identified data even if this means some of them will be incorrectly identified. This last one is known as Marketer Risk. This is the what most people think of as a privacy breach.
From my point of view, BIT does two things
  1. The impact of BIT on individual records in a dataset is to remove identifiers and and quasi-identifiers so that any individual record is, by itself, and in the absence of special information about that record, de-identified. This may or may not reduce the individual identity attack risks, depending on what other information the attacker has and what information the attacker has. In other words BIT is helpful but not necessarily appropriate for protecting an individual record (the first two attacks described above).
  2. The impact of BIT on the measurable amount of re-identifiability of the dataset will depend on the content of the non-blinded elements of the dataset, what techniques were used to blind the blinded elements, the overall size of the dataset, and who has access to the dataset (insider, attacker, or publicly available). But overall, BIT can be expected to reduce the risk of dataset attacks to an acceptable level where the owner of the dataset takes appropriate other precautions. This is why I defined blinding in the report the way that I did. This is also the kind of attack that tend to be the focus of regulatory oversight and class action law suits.
The above is why I think it is important to focus the question of re-identification on the characteristics of the dataset that is the result of the blinding activity,  and measuring success using measures that reflect the % or number or records that might be re-identifiable in the dataset as a whole.


Thanks,
JW
_______________________________________________
Wg-isi mailing list
Wg-...@kantarainitiative.org
https://kantarainitiative.org/mailman/listinfo/wg-isi
_______________________________________________
Wg-isi mailing list
Wg-...@kantarainitiative.org
https://kantarainitiative.org/mailman/listinfo/wg-isi


--
Paul Knowles | Stem Cell 
Chair of the Advisory Council

_______________________________________________
Wg-isi mailing list
Wg-...@kantarainitiative.org
https://kantarainitiative.org/mailman/listinfo/wg-isi

_______________________________________________
Wg-isi mailing list
Wg-...@turing.kantarainitiative.org
https://turing.kantarainitiative.org/mailman/listinfo/wg-isi

Reply all
Reply to author
Forward
0 new messages