Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does not work with pipelines #66

Open
dth5 opened this issue Aug 14, 2019 · 4 comments
Open

Does not work with pipelines #66

dth5 opened this issue Aug 14, 2019 · 4 comments

Comments

@dth5
Copy link

dth5 commented Aug 14, 2019

For tuning a single estimator this tool is awesome. But the standard gridsearch can actually accept a pipeline as an estimator, which allows you to evaluate different classifiers as parameters.

For some reason, this breaks with EvolutionaryAlgorithmSearchCV.

For example, set a pipeline like this:
pipe = Pipeline([
('imputer', SimpleImputer(strategy='median')),
('scaler' , StandardScaler()),
('classify', LogisticRegression())
])

Then define a parameter grid to include different classifiers:
param_grid_rf_big = [
{'classify': [RandomForestClassifier(),ExtraTreesClassifier()],
'classify__n_estimators': [500],
'classify__max_features': ['log2', 'sqrt', None],
'classify__min_samples_split': [2,3],
'classify__min_samples_leaf': [1,2,3],
'classify__criterion': ['gini',]
}
]

When you pass this to EvolutionaryAlgorithmSearchCV you should be able to set the estimator to 'pipe' and and the params to 'param_grid_rf_big' and let it evaluate. This works with gridsearchcv, but not with EvolutionaryAlgorithmSearchCV.

@rsteca
Copy link
Owner

rsteca commented Aug 15, 2019

Hi @dth5 ! Can you paste what error you receive when trying to execute that code? Thanks

@dth5
Copy link
Author

dth5 commented Aug 15, 2019

Hi, Unfortunately I have moved beyond that code so I don't have the exact run anymore. However, If you set up a toy classifier (X, y) and pass it to EvolutionaryAlgorithmSearchCV where the estimator=pipe (from above) and the param-grid is param_grid_rf_big, you'll see the issue. This may not really be a bug, because RandomizedSearchCV also does not support this kind of parameter grid. GridSearchCV does, however, and the next release of scikit-learn will fix RandomizedSearchCV to allow this also. I suspect it comes down to the fact that usually one passes a list of dictionaries { dict, dict, dict } to the grid-search, but if you want to also allow the classifier to be a parameter (and have blocks of different classifiers with different parameters), then you need to pass a list of lists of dictionaries [ {dict, dict, dict}, {dict}, { dict, dict} ], which is currently only possible in GridSearchCV.

@rsteca
Copy link
Owner

rsteca commented Aug 15, 2019

Ok, thanks. I will try to fix it when I have some free time

@RNarayan73
Copy link

Hello,

I have managed to tune hyperparameters for a classifier within a pipeline using the latest version 0.3. However, when I try to tune hyperparameters for transformers within the same pipeline, it throws up an error:
"ValueError: Provided coef_init does not match dataset."
The same pipeline with hyperparameter spaces for transformers and classifiers work fine with GridSearchCV, RandomozedSearchCV and external ones also BayesSearchCV, OptunaSearchCV, TuneSearchCV etc.
Let me know if you need more information.

Narayan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants