-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data with only numeric features - alternative to get_he_preprocessor() to avoid argument category_map #809
Comments
Hey @pranavn91, Thanks for opening the issue. To avoid categorical variables in the heae_preprocessor, heae_inv_preprocessor = get_he_preprocessor(X=X_train,
feature_names=['Age', 'Capital Gain', 'Capital Loss', 'Hours per week'],
category_map={},
feature_types={"Age": int, "Capital Gain": int, "Capital Loss": int, "Hours per week": int}) If however you're happy with your data just being floats you can use: heae_preprocessor, heae_inv_preprocessor = get_he_preprocessor(X=X_train,
feature_names=['Age', 'Capital Gain', 'Capital Loss', 'Hours per week'],
category_map={},
feature_types={}) and in this case, the you can then define the training set as follows: # Define trainset
trainset_input = heae_preprocessor(X_train).astype(np.float32)
trainset_outputs = {"output_1": trainset_input[:, :4]}
trainset = tf.data.Dataset.from_tensor_slices((trainset_input, trainset_outputs))
trainset = trainset.shuffle(1024).batch(128, drop_remainder=True) The Encoder and Decoder defined in the example is for categorical data. You can use: class Encoder(keras.Model):
def __init__(self, hidden_dim: int, latent_dim: int, **kwargs):
super().__init__(**kwargs)
self.fc1 = keras.layers.Dense(hidden_dim)
self.fc2 = keras.layers.Dense(latent_dim)
def call(self, x: tf.Tensor, **kwargs) -> tf.Tensor:
x = tf.nn.relu(self.fc1(x))
x = tf.nn.tanh(self.fc2(x))
return x
class Decoder(keras.Model):
def __init__(self, hidden_dim: int, output_dim, **kwargs):
super().__init__(**kwargs)
self.fc1 = keras.layers.Dense(hidden_dim)
self.fc2 = keras.layers.Dense(output_dim)
def call(self, x: tf.Tensor, **kwargs) -> List[tf.Tensor]:
x = tf.nn.relu(self.fc1(x))
return self.fc2(x) and then: from alibi.models.tensorflow import AE
# Define autoencoder path and create dir if it doesn't exist.
ae_path = os.path.join("tensorflow", "autoencoder")
if not os.path.exists(ae_path):
os.makedirs(ae_path)
# Define constants.
EPOCHS = 50 # epochs to train the autoencoder
HIDDEN_DIM = 128 # hidden dimension of the autoencoder
LATENT_DIM = 15 # define latent dimension
# Define the heterogeneous auto-encoder.
ae = AE(encoder=Encoder(hidden_dim=3, latent_dim=2),
decoder=Decoder(hidden_dim=3, output_dim=4))
# Define loss functions.
he_loss = keras.losses.MeanSquaredError()
# Compile model.
ae.compile(optimizer=keras.optimizers.Adam(learning_rate=1e-3),
loss=he_loss)
if len(os.listdir(ae_path)) == 0:
# Fit and save autoencoder.
ae.fit(trainset, epochs=EPOCHS)
ae.save(ae_path, save_format="tf")
else:
# Load the model.
ae = keras.models.load_model(ae_path, compile=False) |
Note also that for datasets with a small number of numerical features you might not even want to use the autoencoder as it's only there for dimensionality reduction. See this example as to how to do this. |
Thanks. I used the solution suggested and it is working without errors. |
So if no immutable/categorical data nor range is given- we can write as below? # Define constants
COEFF_SPARSITY = 0.5 # sparisty coefficient
COEFF_CONSISTENCY = 0.5 # consisteny coefficient
TRAIN_STEPS = 10000 # number of training steps -> consider increasing the number of steps
BATCH_SIZE = 100 # batch size
explainer = CounterfactualRLTabular(predictor=predictor,
encoder=ae.encoder,
decoder=ae.decoder,
latent_dim=2,
encoder_preprocessor=heae_preprocessor,
decoder_inv_preprocessor=heae_inv_preprocessor,
coeff_sparsity=COEFF_SPARSITY,
coeff_consistency=COEFF_CONSISTENCY,
category_map={},
feature_names=X.columns,
train_steps=TRAIN_STEPS,
batch_size=BATCH_SIZE,
backend="tensorflow") |
Hey, Did you try running it? What happened? I think i've made a minor mistake in the above, the class Decoder(keras.Model):
def __init__(self, hidden_dim: int, output_dim, **kwargs):
super().__init__(**kwargs)
self.fc1 = keras.layers.Dense(hidden_dim)
self.fc2 = keras.layers.Dense(output_dim)
def call(self, x: tf.Tensor, **kwargs) -> List[tf.Tensor]:
x = tf.nn.relu(self.fc1(x))
return [self.fc2(x)] this also means that when you train the autoencoder you'll need to use a list of losses: # Compile model.
ae.compile(optimizer=keras.optimizers.Adam(learning_rate=1e-3),
loss=[he_loss]) |
yes it is working thanks. From my understanding the example given https://docs.seldon.io/projects/alibi/en/stable/examples/cfrl_adult.html.
Am i correct? |
Hey @pranavn91,
It depends on what you're using the counterfactuals for. If your using the counterfactual to debug the model then it doesn't matter how good the model is the counterfactual can still be useful for understanding how the model is failing (Although if you're doing this be careful). Alternatively, if you're using the counterfactual to add functionality to the model then you'd want the model to be as accurate as possible. So as an example maybe you have a model that predicts the risk of some disease with some set of features (things like
The autoencoder is a dimensionality reduction step that makes the DDPG algorithm that the method is based on faster to train. We train the actor in the Latent space, so yes because we have to reconstruct the data using the decoder it's important that the autoencoder is well-trained. |
Hi, if i have binary(only value 0 ,1 ) and numerical features, and i want to use
or
For the encoder and decoder for binary features, can I follow the step how Encoder and Decoder were defined in the example(adlute census) for categorical data ? since binary features can be considered as categorial features? thanks for your help and insights ! |
I am trying the tutorial given in https://docs.seldon.io/projects/alibi/en/stable/examples/cfrl_adult.html. However my data has only numeric values so getting stuck at below line as no categorical values to pass for argument category_map? Please suggest what can i do to solve this error?
The text was updated successfully, but these errors were encountered: