Pseudonymization risk analysis in distributed systems

Geoffrey Neumann, Paul Grace, Daniel Burns, Michael Surridge

Research output: Contribution to journalArticle

Abstract

In an era of big data, online services are becoming increasingly data-centric; they collect, process, analyze and anonymously disclose growing amounts of personal data in the form of pseudonymized data sets. It is crucial that such systems are engineered to both protect individual user (data subject) privacy and give back control of personal data to the user. In terms of pseudonymized data this means that unwanted individuals should not be able to deduce sensitive information about the user. However, the plethora of pseudonymization algorithms and tuneable parameters that currently exist make it difficult for a non expert developer (data controller) to understand and realise strong privacy guarantees. In this paper we propose a principled Model-Driven Engineering (MDE) framework to model data services in terms of their pseudonymization strategies and identify the risks to breaches of user privacy. A developer can explore alternative pseudonymization strategies to determine the effectiveness of their pseudonymization strategy in terms of quantifiable metrics: i) violations of privacy requirements for every user in the current data set; ii) the trade-off between conforming to these requirements and the usefulness of the data for its intended purposes. We demonstrate through an experimental evaluation that the information provided by the framework is useful, particularly in complex situations where privacy requirements are different for different users, and can inform decisions to optimize a chosen strategy in comparison to applying an off-the-shelf algorithm.
Original languageEnglish
Article number1
JournalJournal of Internet Services and Applications
Volume10
Issue number1
DOIs
Publication statusPublished - 8 Jan 2019

Fingerprint

Data privacy
Risk analysis
Controllers

Bibliographical note

© The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons license, and indicate if changes were made

Keywords

  • Privacy
  • Pseudonymization
  • Risk analysis

Cite this

Neumann, G., Grace, P., Burns, D., & Surridge, M. (2019). Pseudonymization risk analysis in distributed systems. Journal of Internet Services and Applications, 10(1), [1]. https://doi.org/10.1186/s13174-018-0098-z
Neumann, Geoffrey ; Grace, Paul ; Burns, Daniel ; Surridge, Michael. / Pseudonymization risk analysis in distributed systems. In: Journal of Internet Services and Applications. 2019 ; Vol. 10, No. 1.
@article{fa1174ef05cd45b5981e71495726d2e5,
title = "Pseudonymization risk analysis in distributed systems",
abstract = "In an era of big data, online services are becoming increasingly data-centric; they collect, process, analyze and anonymously disclose growing amounts of personal data in the form of pseudonymized data sets. It is crucial that such systems are engineered to both protect individual user (data subject) privacy and give back control of personal data to the user. In terms of pseudonymized data this means that unwanted individuals should not be able to deduce sensitive information about the user. However, the plethora of pseudonymization algorithms and tuneable parameters that currently exist make it difficult for a non expert developer (data controller) to understand and realise strong privacy guarantees. In this paper we propose a principled Model-Driven Engineering (MDE) framework to model data services in terms of their pseudonymization strategies and identify the risks to breaches of user privacy. A developer can explore alternative pseudonymization strategies to determine the effectiveness of their pseudonymization strategy in terms of quantifiable metrics: i) violations of privacy requirements for every user in the current data set; ii) the trade-off between conforming to these requirements and the usefulness of the data for its intended purposes. We demonstrate through an experimental evaluation that the information provided by the framework is useful, particularly in complex situations where privacy requirements are different for different users, and can inform decisions to optimize a chosen strategy in comparison to applying an off-the-shelf algorithm.",
keywords = "Privacy, Pseudonymization, Risk analysis",
author = "Geoffrey Neumann and Paul Grace and Daniel Burns and Michael Surridge",
note = "{\circledC} The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made",
year = "2019",
month = "1",
day = "8",
doi = "10.1186/s13174-018-0098-z",
language = "English",
volume = "10",
number = "1",

}

Neumann, G, Grace, P, Burns, D & Surridge, M 2019, 'Pseudonymization risk analysis in distributed systems', Journal of Internet Services and Applications, vol. 10, no. 1, 1. https://doi.org/10.1186/s13174-018-0098-z

Pseudonymization risk analysis in distributed systems. / Neumann, Geoffrey; Grace, Paul; Burns, Daniel; Surridge, Michael.

In: Journal of Internet Services and Applications, Vol. 10, No. 1, 1, 08.01.2019.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Pseudonymization risk analysis in distributed systems

AU - Neumann, Geoffrey

AU - Grace, Paul

AU - Burns, Daniel

AU - Surridge, Michael

N1 - © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made

PY - 2019/1/8

Y1 - 2019/1/8

N2 - In an era of big data, online services are becoming increasingly data-centric; they collect, process, analyze and anonymously disclose growing amounts of personal data in the form of pseudonymized data sets. It is crucial that such systems are engineered to both protect individual user (data subject) privacy and give back control of personal data to the user. In terms of pseudonymized data this means that unwanted individuals should not be able to deduce sensitive information about the user. However, the plethora of pseudonymization algorithms and tuneable parameters that currently exist make it difficult for a non expert developer (data controller) to understand and realise strong privacy guarantees. In this paper we propose a principled Model-Driven Engineering (MDE) framework to model data services in terms of their pseudonymization strategies and identify the risks to breaches of user privacy. A developer can explore alternative pseudonymization strategies to determine the effectiveness of their pseudonymization strategy in terms of quantifiable metrics: i) violations of privacy requirements for every user in the current data set; ii) the trade-off between conforming to these requirements and the usefulness of the data for its intended purposes. We demonstrate through an experimental evaluation that the information provided by the framework is useful, particularly in complex situations where privacy requirements are different for different users, and can inform decisions to optimize a chosen strategy in comparison to applying an off-the-shelf algorithm.

AB - In an era of big data, online services are becoming increasingly data-centric; they collect, process, analyze and anonymously disclose growing amounts of personal data in the form of pseudonymized data sets. It is crucial that such systems are engineered to both protect individual user (data subject) privacy and give back control of personal data to the user. In terms of pseudonymized data this means that unwanted individuals should not be able to deduce sensitive information about the user. However, the plethora of pseudonymization algorithms and tuneable parameters that currently exist make it difficult for a non expert developer (data controller) to understand and realise strong privacy guarantees. In this paper we propose a principled Model-Driven Engineering (MDE) framework to model data services in terms of their pseudonymization strategies and identify the risks to breaches of user privacy. A developer can explore alternative pseudonymization strategies to determine the effectiveness of their pseudonymization strategy in terms of quantifiable metrics: i) violations of privacy requirements for every user in the current data set; ii) the trade-off between conforming to these requirements and the usefulness of the data for its intended purposes. We demonstrate through an experimental evaluation that the information provided by the framework is useful, particularly in complex situations where privacy requirements are different for different users, and can inform decisions to optimize a chosen strategy in comparison to applying an off-the-shelf algorithm.

KW - Privacy

KW - Pseudonymization

KW - Risk analysis

UR - http://www.scopus.com/inward/record.url?scp=85059749335&partnerID=8YFLogxK

UR - https://jisajournal.springeropen.com/articles/10.1186/s13174-018-0098-z

U2 - 10.1186/s13174-018-0098-z

DO - 10.1186/s13174-018-0098-z

M3 - Article

VL - 10

IS - 1

M1 - 1

ER -

Neumann G, Grace P, Burns D, Surridge M. Pseudonymization risk analysis in distributed systems. Journal of Internet Services and Applications. 2019 Jan 8;10(1). 1. https://doi.org/10.1186/s13174-018-0098-z