Dear colleagues
You are invited to attend a PhD Proposal Presentation by Mykola Grechaniuk.
Date: Tuesday 21st August, 2012
Time: 1:00pm
Location: D1-05, Mawson Lakes
Mykola Grechaniuk, PhD Candidate
Data Analytics Group
Principal Supervisor: Prof Jiuyong Li
Title:
Differential privacy and utility framework for privacy preserving data publishing
Abstract:
There are large quantities of digital data about human population in government and private data repositories. Such data can provide tremendous opportunities for research communities and policy-makers. For example, demographers can study and analyse the
relationships between economic, social, cultural and biological processes influencing human population; policy-makers can analyse the data and learn important information benefiting the society as a whole.
The major risk of releasing such data is revealing private information of individuals. This private information can be misused and abused if known to an unauthorised person ( adversary). The problem is to release some useful data (utility)
without compromising the privacy of individuals. Utility and privacy are competing goals: perfect privacy can be achieved by publishing nothing at all, but this has no utility; perfect utility can be obtained by publishing the data exactly as received, but
this offers no privacy.
The problem of privacy in databases has long and rich history stretching back to the 1970s. A robust and formal notion of privacy that satisfies most, if not all, requirements is a very tricky proposition and there have been many attempts at a definition.
Privacy-preserving data publishing, the focus of the proposed research, studies how to transform raw data into a version that is immunized against privacy attacks but that still supports effective data mining tasks.
During the last several years, privacy approaches for statistical databases have been driven by -differential privacy which is a rigorous privacy model that makes no assumption about an adversary’s background knowledge, i.e. knowledge that
an adversary can obtain from somewhere outside the published dataset for the purpose of maximising his chances to learn sensitive information of targeted individuals. Recently, some researchers identified another kind of knowledge, called foreground knowledge,
which also can be used by an adversary to maximise his chances to learn sensitive information of targeted individuals. This knowledge resides inside the published dataset and can be uncovered by an adversary by carrying out data mining. Is -differential privacy
‘immune’ against this foreground knowledge attack on privacy? If not, how to devise a new formal privacy model which is ‘immune’ against both foreground and background knowledge attacks?
Very recently, some researchers proved that -differential privacy implies statistical independence of entries in a dataset. In general, a realistic dataset does not satisfy such an assumption. Indeed, consider an outbreak of some disease or several diseases
in a highly populated area. Many patients with similar symptoms will be visiting same hospitals which will be recording patient’s data in their data sets and publishing the anonymized versions of those datasets. How to devise a new formal privacy model which
guarantees privacy protection regardless of whether or not entries in a dataset are statistically independent?
Suppose that a prescribed level of privacy is guaranteed by our new formal and rigorous privacy model for every individual in a published dataset. How to provide maximum utility of this dataset for legitimate data users?
The major objective of the proposed research is devising a novel unified formal privacy and utility framework which provides comprehensive answers to the above important questions.
Kind regards
Kerry