-
Notifications
You must be signed in to change notification settings - Fork 780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bart #539
base: master
Are you sure you want to change the base?
Bart #539
Conversation
Thanks for the PR for BART, @caioguirado! I will take a look further in details, but have two comments at the moments as follows:
Thanks! |
if type_ == 'update_leaves': | ||
if not root.left and not root.right: | ||
self.leaves.append(root) | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me why we need to return None here (and below).
response = np.where(y.reshape(-1, 1) == 1, max_els, min_els) # Z in the paper | ||
|
||
for j in range(len(self.trees)): | ||
residuals = self.compute_residual(X=X, y=response, trees=self.trees, j=j) # opportunity to parallelize according to Pratola et al. (2013) (https://arxiv.org/abs/1309.1906) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what it says in the reference, but if I understand things correctly, for each tree we compute the residuals by looping over every other tree, which indeed seems wasteful. Is it not possible to store the prediction of each tree to avoid computing the same prediction over and over again?
df_res = pd.DataFrame(np.array(predictions).T, columns=columns) | ||
|
||
# From: https://github.com/uber/causalml/blob/c42e873061eb74ec9c3ca6ea991e113b886245ae/causalml/inference/tree/uplift.pyx | ||
df_res['recommended_treatment'] = df_res.apply(np.argmax, axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily something to solve here, but we need to change the terminology of "recommended treatment", because CATE is just one factor that is relevant for whether a treatment should be recommended or not.
Dropped some comments but overall looks great to me. The example notebook fails to render for me at the moment but I'll take a look if/when you've added more predictors as Jeong suggested. |
One of the BART features is supporting continuous treatment - it will be an excellent add to the package if this is supported. I'm wondering how much effort is needed to support continuous treatment. Other than that, it will be great to compare BART's performance with some other models and show its advantage and value. Or maybe mention BART can support which scenarios that cannot be solved by other models in the current implementation. |
@jeongyoonlee happy to take a look at this PR again too |
Proposed changes
This PR proposes the implementation of Bayesian Additive Regression Trees (BART) as an additional method to the package.
The implementation allows the usage of BART for both a Classic ML problem setting and Uplift Modeling. The reson for including also the Classic ML setting was to allow easier validation of the method with synthetic data.
Currently the method works for regression and binary classification response types, and with binary treatment type.
References:
[1] Chipman et al. (2010)
[2] Hill (2011)
[3] Kapelner and Bleich (2014)
[4] Tan and Roy (2019)
BartPy
Types of changes
What types of changes does your code introduce to CausalML?
Put an
x
in the boxes that applyChecklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.Further comments
Some next steps are proposed for improvement:
Add parallelization:
In the example notebook added, there's a
cProfile
analysis of the methods that take the most time to execute inside thefit
method. Both the computation of the individual tree residuals and prediction step are the top opportunities of improvement. They also have a very similar logic. Pratola et al. proposed a way of parallelizing it.Add non-binary treatment support.
Add multi-class classification support.
Add MCMC statistics report and confidence intervals for BART predictions