From 2463b01a1f9bc4ad542a70fb975f4a82ee34d796 Mon Sep 17 00:00:00 2001 From: Manish Kumar Date: Fri, 28 Jun 2024 08:49:13 +0200 Subject: [PATCH] homepage updated --- docs/homepage.md | 107 +++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 104 insertions(+), 3 deletions(-) diff --git a/docs/homepage.md b/docs/homepage.md index 4992cb8..5dcad64 100644 --- a/docs/homepage.md +++ b/docs/homepage.md @@ -12,6 +12,32 @@ Mambular is a Python package that brings the power of Mamba architectures to tab - **PyTorch Lightning Under the Hood**: Built on top of PyTorch Lightning, Mambular models benefit from streamlined training processes, easy customization, and advanced features like distributed training and 16-bit precision. +## Models + +| Model | Description | +|---------------------|--------------------------------------------------------------------------------------------------| +| `Mambular` | An advanced model using Mamba blocks specifically designed for various tabular data tasks. | +| `FTTransformer` | A model leveraging transformer encoders, as introduced by [Gorishniy et al.](https://arxiv.org/abs/2106.11959), for tabular data. | +| `MLP` | A classical Multi-Layer Perceptron (MLP) model for handling tabular data tasks. | +| `ResNet` | An adaptation of the ResNet architecture for tabular data applications. | +| `TabTransformer` | A transformer-based model for tabular data introduced by [Huang et al.](https://arxiv.org/abs/2012.06678), enhancing feature learning capabilities. | + +All models are available for `regression`, `classification` and distributional regression, denoted by `LSS`. +Hence, they are available as e.g. `MambularRegressor`, `MambularClassifier` or `MambularLSS` + + + +## Documentation + +You can find the Mamba-Tabular API documentation [here](https://mamba-tabular.readthedocs.io/en/latest/index.html). + +## Installation + +Install Mambular using pip: +```sh +pip install mambular +``` + ## Preprocessing Mambular simplifies the preprocessing stage of model development with a comprehensive set of techniques to prepare your data for Mamba architectures. Our preprocessing module is designed to be both powerful and easy to use, offering a variety of options to efficiently transform your tabular data. @@ -19,7 +45,6 @@ Mambular simplifies the preprocessing stage of model development with a comprehe ### Data Type Detection and Transformation Mambular automatically identifies the type of each feature in your dataset and applies the most appropriate transformations for numerical and categorical variables. This includes: - - **Ordinal Encoding**: Categorical features are seamlessly transformed into numerical values, preserving their inherent order and making them model-ready. - **One-Hot Encoding**: For nominal data, Mambular employs one-hot encoding to capture the presence or absence of categories without imposing ordinality. - **Binning**: Numerical features can be discretized into bins, a useful technique for handling continuous variables in certain modeling contexts. @@ -102,7 +127,8 @@ from mambular.models import MambularLSS model = MambularLSS( dropout=0.2, d_model=64, - n_layers=8, + n_layers=8, + ) # Fit the model to your data @@ -117,10 +143,81 @@ model.fit( ``` + +### Implement your own model: +mambular allows users to easily integrate their custom models into the existing logic. Simply create a pytorch model and define its forward pass. Instead of inheriting from nn.Module, inherit from mambulars BaseModel. Each mambular model takse three arguments. The number of classes, e.g. = 1 for regression or = 2 for binary classification. For distributional regression, while this argument must be provided, it is determined automatically depending on the chosen distribution. Additionally, it takes two arguments directly passed from preprocessor. The cat_feature_info and num_feature_info for categorical and numerical feature information of e.g. the provided shape. Additionally, you can provide a config argument, which you can either implement similarly to the implemented configs, or simply use one of the Default Configs provided. A custom model could hence look just like this: + + +1. First, define your config + +```python +from dataclasses import dataclass + +@dataclass +class MyConfig: + lr: float = 1e-04 + lr_patience: int = 10 + weight_decay: float = 1e-06 + lr_factor: float = 0.1 +``` + +2. Second, define your model just as you would for a nn.Module. Simply define the architecture and the forward pass + +```python +from mambular.base_models import BaseModel +import torch +import torch.nn + +class MyCustomModel(BaseModel): + def __init__( + self, + cat_feature_info, + num_feature_info, + num_classes: int = 1, + config=None, + **kwargs, + ): + super().__init__(**kwargs) + self.save_hyperparameters(ignore=["cat_feature_info", "num_feature_info"]) + + input_dim = 0 + for feature_name, input_shape in num_feature_info.items(): + input_dim += input_shape + for feature_name, input_shape in cat_feature_info.items(): + input_dim += 1 + + self.linear = nn.Linear(input_dim, num_classes) + + def forward(self, num_features, cat_features): + x = num_features + cat_features + x = torch.cat(x, dim=1) + + # Pass through linear layer + output = self.linear(x) + return output +``` + +3. To leverage the mambular API, you can build a regression, classification or distributional regression model that can leverage all of mambulars built-in methods, by using the following: + +```python +from mambular.models import SklearnBaseRegressor + +class MyRegressor(SklearnBaseRegressor): + def __init__(self, **kwargs): + super().__init__(model=MyCustomModel, config=MyConfig, **kwargs) +``` + +4. Subsequently, you can fit, evaluate and predict with your model just like with any other mambualr model. +To achieve the same for classification or disrtibutional regression, instead of inheriting from the SklearnbaseRegressor, simply inherit from the SklearnBaseClassifier and SklearnBaseLSS. + +```python +regressor = MyRegressor(numerical_preprocessing="ple") +regressor.fit(X_train, y_train, max_epochs=50) +``` + ## Citation If you find this project useful in your research, please consider cite: - ```BibTeX @misc{2024, title={Mambular: Tabular Deep Learning with Mamba Architectures}, @@ -129,3 +226,7 @@ If you find this project useful in your research, please consider cite: year={2024} } ``` + +## License + +The entire codebase is under MIT license. \ No newline at end of file