Bart #539

caioguirado · 2022-08-24T13:09:25Z

Proposed changes

This PR proposes the implementation of Bayesian Additive Regression Trees (BART) as an additional method to the package.
The implementation allows the usage of BART for both a Classic ML problem setting and Uplift Modeling. The reson for including also the Classic ML setting was to allow easier validation of the method with synthetic data.

Currently the method works for regression and binary classification response types, and with binary treatment type.

References:
[1] Chipman et al. (2010)
[2] Hill (2011)
[3] Kapelner and Bleich (2014)
[4] Tan and Roy (2019)
BartPy

Types of changes

What types of changes does your code introduce to CausalML?
Put an x in the boxes that apply

Bugfix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation Update (if none of the other choices apply)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

I have read the CONTRIBUTING doc
I have signed the CLA
Lint and unit tests pass locally with my changes
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if appropriate)
Any dependent changes have been merged and published in downstream modules

Further comments

Some next steps are proposed for improvement:

Add parallelization:
In the example notebook added, there's a cProfile analysis of the methods that take the most time to execute inside the fit method. Both the computation of the individual tree residuals and prediction step are the top opportunities of improvement. They also have a very similar logic. Pratola et al. proposed a way of parallelizing it.
Add non-binary treatment support.
Add multi-class classification support.
Add MCMC statistics report and confidence intervals for BART predictions

jeongyoonlee · 2022-08-26T17:12:56Z

Thanks for the PR for BART, @caioguirado!

I will take a look further in details, but have two comments at the moments as follows:

Currently, it's failing at the Lint test. Could you please run black to reformat the code?
The test data set used in the example notebook looks too small. Could you please use more features e.g. ~10 with the mix of informative and non-informative features?

Thanks!

causalml/inference/tree/bart.py

t-tte · 2022-08-29T23:28:35Z

causalml/inference/tree/bart.py

+        if type_ == 'update_leaves':
+            if not root.left and not root.right:
+                self.leaves.append(root)
+                return


It's not clear to me why we need to return None here (and below).

t-tte · 2022-08-29T23:47:09Z

causalml/inference/tree/bart.py

+                response = np.where(y.reshape(-1, 1) == 1, max_els, min_els) # Z in the paper
+
+            for j in range(len(self.trees)):
+                residuals = self.compute_residual(X=X, y=response, trees=self.trees, j=j) # opportunity to parallelize according to Pratola et al. (2013) (https://arxiv.org/abs/1309.1906)


Not sure what it says in the reference, but if I understand things correctly, for each tree we compute the residuals by looping over every other tree, which indeed seems wasteful. Is it not possible to store the prediction of each tree to avoid computing the same prediction over and over again?

t-tte · 2022-08-29T23:49:52Z

causalml/inference/tree/bart.py

+            df_res = pd.DataFrame(np.array(predictions).T, columns=columns)
+
+            # From: https://github.com/uber/causalml/blob/c42e873061eb74ec9c3ca6ea991e113b886245ae/causalml/inference/tree/uplift.pyx
+            df_res['recommended_treatment'] = df_res.apply(np.argmax, axis=1)


Not necessarily something to solve here, but we need to change the terminology of "recommended treatment", because CATE is just one factor that is relevant for whether a treatment should be recommended or not.

t-tte · 2022-08-29T23:54:09Z

Dropped some comments but overall looks great to me. The example notebook fails to render for me at the moment but I'll take a look if/when you've added more predictors as Jeong suggested.

zhenyuz0500 · 2022-09-09T18:41:58Z

One of the BART features is supporting continuous treatment - it will be an excellent add to the package if this is supported. I'm wondering how much effort is needed to support continuous treatment.

Other than that, it will be great to compare BART's performance with some other models and show its advantage and value. Or maybe mention BART can support which scenarios that cannot be solved by other models in the current implementation.

ras44 · 2023-11-15T18:41:08Z

@jeongyoonlee happy to take a look at this PR again too

caioguirado added 5 commits August 24, 2022 14:39

Add BART logic

8eff987

Add tests for BART

7f937dc

Add notebook example for BART

e37cd0a

Update README.md with BART method and references

04330c9

Remove commented lines

a32f2ce

jeongyoonlee added the enhancement New feature or request label Aug 25, 2022

jeongyoonlee requested review from zhenyuz0500, jeongyoonlee, t-tte, huigangchen, paullo0106, vincewu51 and ppstacy August 25, 2022 16:10

t-tte reviewed Aug 29, 2022

View reviewed changes

causalml/inference/tree/bart.py Show resolved Hide resolved

t-tte reviewed Aug 29, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bart #539

Bart #539

caioguirado commented Aug 24, 2022

jeongyoonlee commented Aug 26, 2022

t-tte Aug 29, 2022

t-tte Aug 29, 2022

t-tte Aug 29, 2022

t-tte commented Aug 29, 2022

zhenyuz0500 commented Sep 9, 2022

ras44 commented Nov 15, 2023

Bart #539

Are you sure you want to change the base?

Bart #539

Conversation

caioguirado commented Aug 24, 2022

Proposed changes

Types of changes

Checklist

Further comments

jeongyoonlee commented Aug 26, 2022

t-tte Aug 29, 2022

Choose a reason for hiding this comment

t-tte Aug 29, 2022

Choose a reason for hiding this comment

t-tte Aug 29, 2022

Choose a reason for hiding this comment

t-tte commented Aug 29, 2022

zhenyuz0500 commented Sep 9, 2022

ras44 commented Nov 15, 2023