High-dimensional crowdsourced data distribution estimation with local privacy

Xuebin Ren, Chia Mu Yu, Weiren Yu, Shusen Yang, Xinyu Yang, Julie McCann

Research output: Chapter in Book/Report/Conference proceedingConference publication

Abstract

High-dimensional crowdsourced data collected from a large number of users may produc3 rich knowledge for our society but also bring unprecedented privacy threats to participants. Recently differential privacy has been proposed as an effective means to mitigate privacy concerns. However, existing work on differential privacy suffers from the 'curse of high-dimensionality' (data with multiple attributes) and high scalability (data with large scale records). Moreover, traditional methods of differential privacy were achieved via aggregation results, which cannot guarantee local privacy for distributed users in crowdsourced systems. To deal with these issues, in this paper we propose a novel scheme that can efficiently estimate multivariate joint distribution for high-dimensional data with local privacy. On the client side, we employ randomized response techniques to locally transform data from distributed users into privacy-preserving bit strings, which can prevent potential inside privacy attacks in crowdsourced systems. On the server side, the crowdsourced bit strings are aggregated for multivariate distribution estimation. Specifically, we first propose a multivariate version of the expectation maximization (EM) based algorithm to estimate the joint distribution of high dimensional data. To speed up the performance, unlike the EM-based method that needs to scan each user's bit string, we propose to use Lasso regression to obtain the distribution estimation from the aggregation information only once, which can significantly reduce the computation time for multivariate distribution estimation. Extensive experiments on real-world datasets demonstrate the efficiency of our multivariate distribution estimation scheme over existing estimation schemes.

Original languageEnglish
Title of host publicationProceedings - 2016 16th IEEE International Conference on Computer and Information Technology, CIT 2016, 2016 6th International Symposium on Cloud and Service Computing, IEEE SC2 2016 and 2016 International Symposium on Security and Privacy in Social Networks and Big Data, SocialSec 2016
PublisherIEEE
Pages226-233
Number of pages8
ISBN (Electronic)978-1-5090-4314-9
DOIs
Publication statusPublished - 10 Mar 2017
Event16th IEEE International Conference on Computer and Information Technology, CIT 2016: 2016 6th International Symposium on Cloud and Service Computing IEEE SC2 2016 and 2016 International Symposium on Security and Privacy in Social Networks and Big Data SocialSec 2016 - Nadi, Fiji
Duration: 7 Dec 201610 Dec 2016

Conference

Conference16th IEEE International Conference on Computer and Information Technology, CIT 2016
CountryFiji
CityNadi
Period7/12/1610/12/16

Bibliographical note

-

Keywords

  • crowdsourced systems
  • distribution estimation
  • high-dimensional data
  • local privacy

Fingerprint Dive into the research topics of 'High-dimensional crowdsourced data distribution estimation with local privacy'. Together they form a unique fingerprint.

  • Cite this

    Ren, X., Yu, C. M., Yu, W., Yang, S., Yang, X., & McCann, J. (2017). High-dimensional crowdsourced data distribution estimation with local privacy. In Proceedings - 2016 16th IEEE International Conference on Computer and Information Technology, CIT 2016, 2016 6th International Symposium on Cloud and Service Computing, IEEE SC2 2016 and 2016 International Symposium on Security and Privacy in Social Networks and Big Data, SocialSec 2016 (pp. 226-233). IEEE. https://doi.org/10.1109/CIT.2016.57