A new method for preserving data privacy based on the non-negative matrix factorization clustering

Document Type : Original Article

Authors

1 Department of Computer Science

2 Department of Applied Mathematics, Faculty of mathematics and Computers, Shahid Bahonar University of Kerman, Kerman, Iran

Abstract

Companies are increasingly concerned about the disclosure and violation of users' privacy, and this has led many researchers to focus on developing of data privacy methods. These methods affect the original data and publish the data in a different form by keeping their features and relationships. This paper suggests an algorithm to generate data for publishing with different privacy level for a given original data set based on non-negative matrix factorization clustering. Implementation results with two different approaches to various standard data sets show that our proposed algorithm can satisfy the original data constraints in addition to generating data with different privacy levels. The method proposed in this article is implemented on high-dimensional datasets, while some fuzzy-based microaggregation methods cannot be implemented on them. Also, the experimental results using fuzzy $c$-mean show that the information loss is very small. Therefore, the proposed algorithm can publish data that can be relied on by legal users while preserving privacy.

Keywords

Main Subjects


[1] Bezdek, J.C., Ehrlich, R, Full, W. (1984) FCM: the fuzzy c-means clustering algorithm, Computers and Geo-sciences , 10 , 191-203.
 
[2] Berrya, M.W, Browne, M, Langville, A.N, Pauca, V.P and R.J. Plemmons. (2007) Algorithms and applications for approximate nonnegative matrix factorization, Computational Statistics and Data Analysis, 52, 155–173.
 
[3] Castro, O., Gentile, C., Spagnolo-Arrizabalaga, E. (2022) An algorithm for the microaggregation problem using column generation, Computers and Operations Research,68, 105817.
 
[4] César Fadel, A., Satoru Ochi, L., André de Moura Brito, J., Silva Semaan, G. (2021)Microaggregation heuristic applied to statistical disclosure control, Information Sciences, 548, 37-55.
 
[5] Domingo-Ferrer, J., and Mateo-Sanz J.M. (2002) Practical data-oriented microaggregation for statistical disclosure control, IEEE Transactions on Knowledge and Data Engineering, 14, 189-201.
 
[6] Edgar, B. Antoni,M. Agusti, A. (2022) Privacy-preserving process mining: A microaggregation-based approach, Journal of Information Security and Applications, 68, 103235.
 
[7] Elden, L. (2007) Matrix Methods in Data Mining and Pattern Recognition, Society for Industrial and Applied Mathematics 106-110.
 
[8] ]Hansen, S.L, Mukherjee, S. (2003) A polynomial algorithm for optimal univariate microaggregation, in IEEE Transactions on Knowledge and Data Engineering, 4, 1043-1044.
 
[9] Kiran, A., and Shirisha, N. (2022) K-Anonymization approach for privacy preservation using data perturbation techniques in data mining. Materials Today: Proceedings.
 
[10] Oganian, A, Domingo-Ferrer, J. (2000) On the Complexity of Optimal Microaggregation for Statistical Disclosure Control, Statistical Journal of the United Nations Economic Commission for Europe, 4, 345–354.
 
[11] Rodriguez-Garcia, M., Batet, M., Sánchez, D. (2019) Utility-preserving privacy protection of nominal data sets via semantic rank swapping, Information Fusion, 45, 282-295.
 
[12] Torra, V. (2017) Masking methods. In: Torra, V. (ed.) Data Privacy: Foundations, New Developments and the Big Data Challenge. Studies in Big Data, 28, 191–238.
 
[13] Torra, V. (2008) Constrained microaggregation: adding constraints for data editing, Transactions on data privacy, 1, 86–104.
 
[14] Torra, V. (2020) Fuzzy Clustering-based Microaggregation to Achieve Probabilistic K-anonymity for Data with Constraints, Journal of Intelligent and Fuzzy Systems, 39, 5999–6008.
 
[15] Vaidya, J, Zhu, Y. and C. Clifton, Privacy Preserving Data Mining, in Advances in Information Security, Springer, 19 2006, 1-121.
 
[16] Yao, A.C. (1982) Protocols for secure computations, 23rd Annual Symposium on Foundations of Computer Science, 160-164.
 
[17] Wang, Y. X, and Zhang, Y. J. (2013) Nonnegative Matrix Factorization: A Comprehensive Review, in IEEE Transactions on Knowledge and Data Engineering, 6, 1336-1353.
 
[18]https://archive.ics.uci.edu/ml/datasets/wine.
 
[19]https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(original).
 
[20]https://archive.ics.uci.edu/ml/datasets/iris.
 
[21]https://archive.ics.uci.edu/ml/datasets/haberman’s+survival.
 
[22]https://archive.ics.uci.edu/ml/datasets/glass+identification.