Skip to content

Commit

Permalink
Remove output cells
Browse files Browse the repository at this point in the history
  • Loading branch information
mccalluc committed Oct 17, 2024
1 parent bad1db6 commit 3526c1a
Showing 1 changed file with 9 additions and 343 deletions.
352 changes: 9 additions & 343 deletions demo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,15 @@
"First, generate a fake dataset. In the future, let's check it in and use it if the [`--demo` flag](https://github.com/opendp/dp-creator-ii/issues/7) is given."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Make mock \n",
"\n",
"When [Add `--demo` CLI option](https://github.com/opendp/dp-creator-ii/pull/61) is merged, reference that code and delete these cells.\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
Expand Down Expand Up @@ -742,349 +751,6 @@
"source": [
"At this point, the privacy budget of the context, configured at the start with `epsilon` and `weights`, is exhausted: attempting to make another release will result in an error."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"# Other exports\n",
"\n",
"Below is a proposal for what the other export formats (text and CSV) would look like. We first make a data structure with everything we need, and then use generic methods to serialize that structure."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'inputs': {'csv_path': '/tmp/demo.csv',\n",
" 'contributions': 10,\n",
" 'epsilon': 2,\n",
" 'weights': [4, 4, 1, 1],\n",
" 'max_possible_rows': 1000000,\n",
" 'delta': 1e-07,\n",
" 'grade': {'min': 50, 'max': 100, 'bins_count': 10},\n",
" 'class_year': {'min': 1, 'max': 4, 'bins_count': 4}},\n",
" 'outputs': {'grade': {'mean': 84.25140291806959,\n",
" 'histogram': {'(55, 60]': 24,\n",
" '(60, 65]': 0,\n",
" '(65, 70]': 28,\n",
" '(70, 75]': 181,\n",
" '(75, 80]': 227,\n",
" '(80, 85]': 248,\n",
" '(85, 90]': 204,\n",
" '(90, 95]': 110,\n",
" '(95, inf]': 0}},\n",
" 'class_year': {'mean': 1.8125701459034793,\n",
" 'histogram': {'(-inf, 1]': 420,\n",
" '(1, 2]': 311,\n",
" '(2, 3]': 80,\n",
" '(3, inf]': 47}}}}"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"release = {\n",
" 'inputs': {\n",
" 'csv_path': csv_path,\n",
" 'contributions': contributions,\n",
" 'epsilon': epsilon,\n",
" 'weights': weights,\n",
" 'max_possible_rows': max_possible_rows,\n",
" 'delta': delta,\n",
" 'grade': {\n",
" 'min': grade_min,\n",
" 'max': grade_max,\n",
" 'bins_count': grade_bins_count,\n",
" },\n",
" 'class_year': {\n",
" 'min': class_year_min,\n",
" 'max': class_year_max,\n",
" 'bins_count': class_year_bins_count,\n",
" } \n",
" },\n",
" 'outputs': {\n",
" 'grade': {\n",
" 'mean': grade_mean.item(),\n",
" 'histogram': {v['grade_bin']: v['len'] for v in grade_histogram.to_dicts()}\n",
" },\n",
" 'class_year': {\n",
" 'mean': class_year_mean.item(),\n",
" 'histogram': {v['class_year_bin']: v['len'] for v in class_year_histogram.to_dicts()}\n",
" },\n",
" }\n",
"}\n",
"release"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Text export?\n",
"\n",
"Just use YAML, unless there are other requirements?"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"inputs:\n",
" class_year:\n",
" bins_count: 4\n",
" max: 4\n",
" min: 1\n",
" contributions: 10\n",
" csv_path: /tmp/demo.csv\n",
" delta: 1.0e-07\n",
" epsilon: 2\n",
" grade:\n",
" bins_count: 10\n",
" max: 100\n",
" min: 50\n",
" max_possible_rows: 1000000\n",
" weights:\n",
" - 4\n",
" - 4\n",
" - 1\n",
" - 1\n",
"outputs:\n",
" class_year:\n",
" histogram:\n",
" (-inf, 1]: 420\n",
" (1, 2]: 311\n",
" (2, 3]: 80\n",
" (3, inf]: 47\n",
" mean: 1.8125701459034793\n",
" grade:\n",
" histogram:\n",
" (55, 60]: 24\n",
" (60, 65]: 0\n",
" (65, 70]: 28\n",
" (70, 75]: 181\n",
" (75, 80]: 227\n",
" (80, 85]: 248\n",
" (85, 90]: 204\n",
" (90, 95]: 110\n",
" (95, inf]: 0\n",
" mean: 84.25140291806959\n",
"\n"
]
}
],
"source": [
"import yaml\n",
"\n",
"print(yaml.dump(release))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### CSV export?\n",
"\n",
"Flatten the data stucture to key value pairs and make a two-column CSV unless there are other requirements?"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>inputs.csv_path</th>\n",
" <td>/tmp/demo.csv</td>\n",
" </tr>\n",
" <tr>\n",
" <th>inputs.contributions</th>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>inputs.epsilon</th>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>inputs.weights</th>\n",
" <td>[4, 4, 1, 1]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>inputs.max_possible_rows</th>\n",
" <td>1000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>inputs.delta</th>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>inputs.grade.min</th>\n",
" <td>50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>inputs.grade.max</th>\n",
" <td>100</td>\n",
" </tr>\n",
" <tr>\n",
" <th>inputs.grade.bins_count</th>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>inputs.class_year.min</th>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>inputs.class_year.max</th>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>inputs.class_year.bins_count</th>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>outputs.grade.mean</th>\n",
" <td>84.251403</td>\n",
" </tr>\n",
" <tr>\n",
" <th>outputs.grade.histogram.(55, 60]</th>\n",
" <td>24</td>\n",
" </tr>\n",
" <tr>\n",
" <th>outputs.grade.histogram.(60, 65]</th>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>outputs.grade.histogram.(65, 70]</th>\n",
" <td>28</td>\n",
" </tr>\n",
" <tr>\n",
" <th>outputs.grade.histogram.(70, 75]</th>\n",
" <td>181</td>\n",
" </tr>\n",
" <tr>\n",
" <th>outputs.grade.histogram.(75, 80]</th>\n",
" <td>227</td>\n",
" </tr>\n",
" <tr>\n",
" <th>outputs.grade.histogram.(80, 85]</th>\n",
" <td>248</td>\n",
" </tr>\n",
" <tr>\n",
" <th>outputs.grade.histogram.(85, 90]</th>\n",
" <td>204</td>\n",
" </tr>\n",
" <tr>\n",
" <th>outputs.grade.histogram.(90, 95]</th>\n",
" <td>110</td>\n",
" </tr>\n",
" <tr>\n",
" <th>outputs.grade.histogram.(95, inf]</th>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>outputs.class_year.mean</th>\n",
" <td>1.81257</td>\n",
" </tr>\n",
" <tr>\n",
" <th>outputs.class_year.histogram.(-inf, 1]</th>\n",
" <td>420</td>\n",
" </tr>\n",
" <tr>\n",
" <th>outputs.class_year.histogram.(1, 2]</th>\n",
" <td>311</td>\n",
" </tr>\n",
" <tr>\n",
" <th>outputs.class_year.histogram.(2, 3]</th>\n",
" <td>80</td>\n",
" </tr>\n",
" <tr>\n",
" <th>outputs.class_year.histogram.(3, inf]</th>\n",
" <td>47</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 0\n",
"inputs.csv_path /tmp/demo.csv\n",
"inputs.contributions 10\n",
"inputs.epsilon 2\n",
"inputs.weights [4, 4, 1, 1]\n",
"inputs.max_possible_rows 1000000\n",
"inputs.delta 0.0\n",
"inputs.grade.min 50\n",
"inputs.grade.max 100\n",
"inputs.grade.bins_count 10\n",
"inputs.class_year.min 1\n",
"inputs.class_year.max 4\n",
"inputs.class_year.bins_count 4\n",
"outputs.grade.mean 84.251403\n",
"outputs.grade.histogram.(55, 60] 24\n",
"outputs.grade.histogram.(60, 65] 0\n",
"outputs.grade.histogram.(65, 70] 28\n",
"outputs.grade.histogram.(70, 75] 181\n",
"outputs.grade.histogram.(75, 80] 227\n",
"outputs.grade.histogram.(80, 85] 248\n",
"outputs.grade.histogram.(85, 90] 204\n",
"outputs.grade.histogram.(90, 95] 110\n",
"outputs.grade.histogram.(95, inf] 0\n",
"outputs.class_year.mean 1.81257\n",
"outputs.class_year.histogram.(-inf, 1] 420\n",
"outputs.class_year.histogram.(1, 2] 311\n",
"outputs.class_year.histogram.(2, 3] 80\n",
"outputs.class_year.histogram.(3, inf] 47"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from pandas import json_normalize\n",
"\n",
"json_normalize(release).transpose()"
]
}
],
"metadata": {
Expand Down

0 comments on commit 3526c1a

Please sign in to comment.