LoPub: High-Dimensional Crowdsourced Data Publication with Local Differential Privacy

Xuebin Ren, Chia-mu Yu, Weiren Yu, Shusen Yang, Xinyu Yang, Julie A. McCann, Philip S. Yu

Research output: Contribution to journalArticle

Abstract

High-dimensional crowdsourced data collected from numerous users produces rich knowledge about our society. However, it also brings unprecedented privacy threats to the participants. Local differential privacy (LDP), a variant of differential privacy, is recently proposed as a state-of-the-art privacy notion. Unfortunately, achieving LDP on high-dimensional crowdsourced data publication raises great challenges in terms of both computational efficiency and data utility. To this end, based on Expectation Maximization (EM) algorithm and Lasso regression, we first propose efficient multi-dimensional joint distribution estimation algorithms with LDP. Then, we develop a Local differentially private high-dimensional data Publication algorithm, LoPub, by taking advantage of our distribution estimation techniques. In particular, correlations among multiple attributes are identified to reduce the dimensionality of crowdsourced data, thus speeding up the distribution learning process and achieving high data utility. Extensive experiments on realworld datasets demonstrate that our multivariate distribution estimation scheme significantly outperforms existing estimation schemes in terms of both communication overhead and estimation speed. Moreover, LoPub can keep, on average, 80% and 60% accuracy over the released datasets in terms of SVM and random forest classification, respectively.
Original languageEnglish
Pages (from-to)2151 - 2166
Number of pages13
JournalIEEE Transactions on Information Forensics and Security
Volume13
Issue number9
Early online date5 Mar 2018
DOIs
Publication statusPublished - 1 Sep 2018

Fingerprint

Computational efficiency
Communication
Experiments

Bibliographical note

© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Keywords

  • local differential privacy
  • high-dimensional data
  • crowdsourced data
  • data publication
  • data, crowdsourced data, data publication,

Cite this

Ren, Xuebin ; Yu, Chia-mu ; Yu, Weiren ; Yang, Shusen ; Yang, Xinyu ; McCann, Julie A. ; Yu, Philip S. / LoPub: High-Dimensional Crowdsourced Data Publication with Local Differential Privacy. In: IEEE Transactions on Information Forensics and Security. 2018 ; Vol. 13, No. 9. pp. 2151 - 2166.
@article{8f37b4ce680241a6a3b4a8fff36983fe,
title = "LoPub: High-Dimensional Crowdsourced Data Publication with Local Differential Privacy",
abstract = "High-dimensional crowdsourced data collected from numerous users produces rich knowledge about our society. However, it also brings unprecedented privacy threats to the participants. Local differential privacy (LDP), a variant of differential privacy, is recently proposed as a state-of-the-art privacy notion. Unfortunately, achieving LDP on high-dimensional crowdsourced data publication raises great challenges in terms of both computational efficiency and data utility. To this end, based on Expectation Maximization (EM) algorithm and Lasso regression, we first propose efficient multi-dimensional joint distribution estimation algorithms with LDP. Then, we develop a Local differentially private high-dimensional data Publication algorithm, LoPub, by taking advantage of our distribution estimation techniques. In particular, correlations among multiple attributes are identified to reduce the dimensionality of crowdsourced data, thus speeding up the distribution learning process and achieving high data utility. Extensive experiments on realworld datasets demonstrate that our multivariate distribution estimation scheme significantly outperforms existing estimation schemes in terms of both communication overhead and estimation speed. Moreover, LoPub can keep, on average, 80{\%} and 60{\%} accuracy over the released datasets in terms of SVM and random forest classification, respectively.",
keywords = "local differential privacy, high-dimensional data, crowdsourced data, data publication, data, crowdsourced data, data publication,",
author = "Xuebin Ren and Chia-mu Yu and Weiren Yu and Shusen Yang and Xinyu Yang and McCann, {Julie A.} and Yu, {Philip S.}",
note = "{\circledC} 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.",
year = "2018",
month = "9",
day = "1",
doi = "10.1109/TIFS.2018.2812146",
language = "English",
volume = "13",
pages = "2151 -- 2166",
journal = "IEEE Transactions on Information Forensics and Security",
issn = "1556-6013",
publisher = "IEEE",
number = "9",

}

LoPub: High-Dimensional Crowdsourced Data Publication with Local Differential Privacy. / Ren, Xuebin; Yu, Chia-mu; Yu, Weiren; Yang, Shusen; Yang, Xinyu; McCann, Julie A.; Yu, Philip S.

In: IEEE Transactions on Information Forensics and Security, Vol. 13, No. 9, 01.09.2018, p. 2151 - 2166.

Research output: Contribution to journalArticle

TY - JOUR

T1 - LoPub: High-Dimensional Crowdsourced Data Publication with Local Differential Privacy

AU - Ren, Xuebin

AU - Yu, Chia-mu

AU - Yu, Weiren

AU - Yang, Shusen

AU - Yang, Xinyu

AU - McCann, Julie A.

AU - Yu, Philip S.

N1 - © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

PY - 2018/9/1

Y1 - 2018/9/1

N2 - High-dimensional crowdsourced data collected from numerous users produces rich knowledge about our society. However, it also brings unprecedented privacy threats to the participants. Local differential privacy (LDP), a variant of differential privacy, is recently proposed as a state-of-the-art privacy notion. Unfortunately, achieving LDP on high-dimensional crowdsourced data publication raises great challenges in terms of both computational efficiency and data utility. To this end, based on Expectation Maximization (EM) algorithm and Lasso regression, we first propose efficient multi-dimensional joint distribution estimation algorithms with LDP. Then, we develop a Local differentially private high-dimensional data Publication algorithm, LoPub, by taking advantage of our distribution estimation techniques. In particular, correlations among multiple attributes are identified to reduce the dimensionality of crowdsourced data, thus speeding up the distribution learning process and achieving high data utility. Extensive experiments on realworld datasets demonstrate that our multivariate distribution estimation scheme significantly outperforms existing estimation schemes in terms of both communication overhead and estimation speed. Moreover, LoPub can keep, on average, 80% and 60% accuracy over the released datasets in terms of SVM and random forest classification, respectively.

AB - High-dimensional crowdsourced data collected from numerous users produces rich knowledge about our society. However, it also brings unprecedented privacy threats to the participants. Local differential privacy (LDP), a variant of differential privacy, is recently proposed as a state-of-the-art privacy notion. Unfortunately, achieving LDP on high-dimensional crowdsourced data publication raises great challenges in terms of both computational efficiency and data utility. To this end, based on Expectation Maximization (EM) algorithm and Lasso regression, we first propose efficient multi-dimensional joint distribution estimation algorithms with LDP. Then, we develop a Local differentially private high-dimensional data Publication algorithm, LoPub, by taking advantage of our distribution estimation techniques. In particular, correlations among multiple attributes are identified to reduce the dimensionality of crowdsourced data, thus speeding up the distribution learning process and achieving high data utility. Extensive experiments on realworld datasets demonstrate that our multivariate distribution estimation scheme significantly outperforms existing estimation schemes in terms of both communication overhead and estimation speed. Moreover, LoPub can keep, on average, 80% and 60% accuracy over the released datasets in terms of SVM and random forest classification, respectively.

KW - local differential privacy

KW - high-dimensional data

KW - crowdsourced data

KW - data publication

KW - data, crowdsourced data, data publication,

UR - http://ieeexplore.ieee.org/document/8306916/

U2 - 10.1109/TIFS.2018.2812146

DO - 10.1109/TIFS.2018.2812146

M3 - Article

VL - 13

SP - 2151

EP - 2166

JO - IEEE Transactions on Information Forensics and Security

JF - IEEE Transactions on Information Forensics and Security

SN - 1556-6013

IS - 9

ER -