TY - GEN
T1 - On Different Formulations of a Continuous CTA Model
AU - Lesaja, Goran
AU - Iacob, Ionut
AU - Oganian, Anna
N1 - Publisher Copyright:
© 2020, Springer Nature Switzerland AG.
PY - 2020
Y1 - 2020
N2 - In this paper, we consider a Controlled Tabular Adjustment (CTA) model for statistical disclosure limitation of tabular data. The goal of the CTA model is to find the closest safe (masked) table to the original table that contains sensitive information. The measure of closeness is usually measured using ℓ1 or ℓ2 norm. However, in the norm-based CTA model, there is no control of how well the statistical properties of the data in the original table are preserved in the masked table. Hence, we propose a different criterion of “closeness” between the masked and original table which attempts to minimally change certain statistics used in the analysis of the table. The Chi-square statistic is among the most utilized measures for the analysis of data in two-dimensional tables. Hence, we propose a Chi-square CTA model which minimizes the objective function that depends on the difference of the Chi-square statistics of the original and masked table. The model is non-linear and non-convex and therefore harder to solve which prompted us to also consider a modification of this model which can be transformed into a linear programming model that can be solved more efficiently. We present numerical results for the two-dimensional table illustrating our novel approach and providing a comparison with norm-based CTA models.
AB - In this paper, we consider a Controlled Tabular Adjustment (CTA) model for statistical disclosure limitation of tabular data. The goal of the CTA model is to find the closest safe (masked) table to the original table that contains sensitive information. The measure of closeness is usually measured using ℓ1 or ℓ2 norm. However, in the norm-based CTA model, there is no control of how well the statistical properties of the data in the original table are preserved in the masked table. Hence, we propose a different criterion of “closeness” between the masked and original table which attempts to minimally change certain statistics used in the analysis of the table. The Chi-square statistic is among the most utilized measures for the analysis of data in two-dimensional tables. Hence, we propose a Chi-square CTA model which minimizes the objective function that depends on the difference of the Chi-square statistics of the original and masked table. The model is non-linear and non-convex and therefore harder to solve which prompted us to also consider a modification of this model which can be transformed into a linear programming model that can be solved more efficiently. We present numerical results for the two-dimensional table illustrating our novel approach and providing a comparison with norm-based CTA models.
KW - Chi-square statistic
KW - Controlled tabular adjustment models
KW - Interior-point methods
KW - Linear and non-linear optimization
KW - Statistical disclosure limitation
UR - http://www.scopus.com/inward/record.url?scp=85092085737&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-57521-2_12
DO - 10.1007/978-3-030-57521-2_12
M3 - Conference article
AN - SCOPUS:85092085737
SN - 9783030575205
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 166
EP - 179
BT - Privacy in Statistical Databases - UNESCO Chair in Data Privacy, International Conference, PSD 2020, Proceedings
A2 - Domingo-Ferrer, Josep
A2 - Muralidhar, Krishnamurty
PB - Springer Science and Business Media Deutschland GmbH
T2 - International Conference on Privacy in Statistical Databases, PSD 2020
Y2 - 23 September 2020 through 25 September 2020
ER -