Can we exclude certain data and labels based on a condition? #28

katerinakarampasi · 2018-05-16T11:08:15Z

Based on the instructions, my personal comprehension is that we have to provide you the two basic functions, FeatureExtractor( ) and Classifier( ). I would like to access the whole data and exclude some of them, so afterwards I'll have to exclude their corresponding labels, as well. I can exclude the data based on the condition each time the FeatureExtractor is called but I can't do the same for the labels through it. So my question is if we will have to execute all the commands before FeatureExtractor is called (because that would solve my problem) or not.

kegl · 2018-05-16T11:24:22Z

You can remove data at training time (in fit) but not at transform/predict time. Providing labels to the FeatureExtractor at transform time would leak these labels on the test data. If you want to leave out some points from the training, you can do it in the fit function of the classifier.

katerinakarampasi · 2018-05-16T11:27:16Z

Ok thank you.
I don't know if I have to open a new topic but eventually what is quality check that we are provided with for the fmri and the anatomy data?

glemaitre · 2018-05-17T12:31:31Z

The quality check was done manually. Basically, visual inspection of the pre-processing steps (registration, segmentation) and inspection of the motions of the parameters were checked.

katerinakarampasi · 2018-05-17T12:33:51Z

Ok thank you.

zh1peng · 2018-05-22T21:07:46Z

Hi,
how to remove bad data during FeatureExtractor or Classifier still confuses me. Sorry this may be a very basic question, but it's been confusing for a few days.
I tried to impute bad data during Feature extraction, but it seemed it made the model worse. If I understand it correctly, the FeatureExtractor is supposed to return only new_X rather than both new_X and new_y. So it is hard to remove bad samples at this stage.

But if I put this step in Classifier under fit, I used

def fit (self, X, y)
X_new=X[some_good_idx]
y_new=y[some_good_idx]
self.clf.fit(X_new, y_new), 

def predict(self, X):
        return self.clf.predict(X)

def predict_proba(self, X):
        return self.clf.predict_proba(X)

it crashed when running CV evaluation with error `X has a different shape than during fitting.

kegl · 2018-05-22T21:18:01Z

Can you submit it? I can look at the trace there.

glemaitre · 2018-05-22T21:28:00Z

Modifying the starting kit, this should be something like this.

from sklearn.base import BaseEstimator
from sklearn.base import TransformerMixin


class FeatureExtractor(BaseEstimator, TransformerMixin):
    def fit(self, X_df, y):
        return self

    def transform(self, X_df):
        # get only the anatomical information
        X = X_df[[col for col in X_df.columns if col.startswith('anatomy')]]
        return X 


from sklearn.base import BaseEstimator
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline


class Classifier(BaseEstimator):
    def __init__(self):
        self.clf = make_pipeline(StandardScaler(), LogisticRegression())

    def fit(self, X, y):
        X_select = X['anatomy_select'] == 1
        self.clf.fit(X[X_select], y[X_select.values])
        return self
        
    def predict(self, X):
        return self.clf.predict(X)

    def predict_proba(self, X):
        return self.clf.predict_proba(X)

glemaitre · 2018-05-22T21:28:39Z

I tried and it works locally with the cross_validate and ramp_test_submission

zh1peng · 2018-05-22T22:05:43Z

Thank you, guys. I have tested the modified anatomy code, it works.
So I will double-check with my code to see if I can figure that out.

I think the error was caused by that I was trying to exclude the QC columns (i.e. anatomy_select) in the fit. It should be fine to include that column as they will be all ones and removed by feature selection.

glemaitre added question Further information is requested answered The question has been answered labels May 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can we exclude certain data and labels based on a condition? #28

Can we exclude certain data and labels based on a condition? #28

katerinakarampasi commented May 16, 2018

kegl commented May 16, 2018

katerinakarampasi commented May 16, 2018

glemaitre commented May 17, 2018

katerinakarampasi commented May 17, 2018

zh1peng commented May 22, 2018 •

edited

Loading

kegl commented May 22, 2018

glemaitre commented May 22, 2018

glemaitre commented May 22, 2018

zh1peng commented May 22, 2018 •

edited

Loading

Can we exclude certain data and labels based on a condition? #28

Can we exclude certain data and labels based on a condition? #28

Comments

katerinakarampasi commented May 16, 2018

kegl commented May 16, 2018

katerinakarampasi commented May 16, 2018

glemaitre commented May 17, 2018

katerinakarampasi commented May 17, 2018

zh1peng commented May 22, 2018 • edited Loading

kegl commented May 22, 2018

glemaitre commented May 22, 2018

glemaitre commented May 22, 2018

zh1peng commented May 22, 2018 • edited Loading

zh1peng commented May 22, 2018 •

edited

Loading

zh1peng commented May 22, 2018 •

edited

Loading