[TOC]
Generalization hierarchy is defined in files in conf
folder.
I implement the datafly heuristic algorithm, which pseudo code is below:
The detailed comments for each function can be found in file k_anonymity.py
.
I evaluate the result for k = [5, 10, 50, 100]
, and calculate the distoration and precision. The calculation of distoration and precision is the same as given in lecture, where distortion is:
and precision is:
Query 1000
times for e
in [0.5, 1]
.
To prove the indistinguishable of two queries, I use bucket to gather outputs into 20 buckets, then calculate the probability for each bucket. Then calculate the quotient over each bucket probability. For two query result, query1
and query2
, I calculate both probability of query1
over probability over query2
, and also probability of query2
over probability over `query1. We can see in both cases, the quotient is smaller than
######2.1.3 1-Indistinguishable Proof
The proof is same as 0.5-indistinguishable proof. We can see for each case, the
To calculate the distortion, I used the RMSE as metric. Firstly I calculate the groundtruth, which is the true average age greater than 25 without adding noise. Then I calculate the RMSE of query result with groundtruth. We can see when
Query 1000
times for e
in [0.5, 1]
.
To prove the
The proof is same with
The metric I used here is 1-precision
. Precision
is calculated by #results which is the groudtruth / total query numbers. 1-precision
is a measure for distortion, as higher precison implies lower distortion. We can see when