From 8353ded66ac9baf6494d40f2337a9a86ce0159dc Mon Sep 17 00:00:00 2001 From: hayesall Date: Wed, 2 Nov 2022 17:24:00 -0400 Subject: [PATCH] =?UTF-8?q?=F0=9F=97=91=EF=B8=8F=20Deprecate=20`boston=5Fh?= =?UTF-8?q?ousing`?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- srlearn/boston_housing/README.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/srlearn/boston_housing/README.md b/srlearn/boston_housing/README.md index b44b2ea..ea22a97 100644 --- a/srlearn/boston_housing/README.md +++ b/srlearn/boston_housing/README.md @@ -1,5 +1,17 @@ # boston_housing +!!! warning end + The Boston Housing dataset is deprecated. It is included here for backwards compatibility and reproducing results in old publications, but should not be used for benchmarking future results. + + The dataset contains a variable `B` which is ethically problematic. The original dataset authors assumed that Black neighbors were undesirable, and that this would affect housing prices. However, this assumption was encoded in a way that makes it impossible to analyze further. + + We recommend the "California Housing" dataset instead. + + **See also**: + + - M Carlisle, "[racist data destruction?](https://medium.com/@docintangible/racist-data-destruction-113e3eff54a8)" Medium.com, retrieved: 2022-11-02 + - [sklearn.datasets.load_boston (archived)](https://web.archive.org/web/20221014215704/https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_boston.html) + "Boston Housing" is a common benchmark dataset for regression. ## Task