Skip to content

Commit

Permalink
Merge pull request #45 from basf/doc_fix
Browse files Browse the repository at this point in the history
homepage update
  • Loading branch information
mkumar73 authored Jun 28, 2024
2 parents 7284027 + 2463b01 commit aa6b3be
Show file tree
Hide file tree
Showing 2 changed files with 106 additions and 3 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -171,3 +171,5 @@ examples/lightning_logs

docs/_build/doctrees/*
docs/_build/html/*

dev/*
107 changes: 104 additions & 3 deletions docs/homepage.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,39 @@ Mambular is a Python package that brings the power of Mamba architectures to tab
- **PyTorch Lightning Under the Hood**: Built on top of PyTorch Lightning, Mambular models benefit from streamlined training processes, easy customization, and advanced features like distributed training and 16-bit precision.


## Models

| Model | Description |
|---------------------|--------------------------------------------------------------------------------------------------|
| `Mambular` | An advanced model using Mamba blocks specifically designed for various tabular data tasks. |
| `FTTransformer` | A model leveraging transformer encoders, as introduced by [Gorishniy et al.](https://arxiv.org/abs/2106.11959), for tabular data. |
| `MLP` | A classical Multi-Layer Perceptron (MLP) model for handling tabular data tasks. |
| `ResNet` | An adaptation of the ResNet architecture for tabular data applications. |
| `TabTransformer` | A transformer-based model for tabular data introduced by [Huang et al.](https://arxiv.org/abs/2012.06678), enhancing feature learning capabilities. |

All models are available for `regression`, `classification` and distributional regression, denoted by `LSS`.
Hence, they are available as e.g. `MambularRegressor`, `MambularClassifier` or `MambularLSS`



## Documentation

You can find the Mamba-Tabular API documentation [here](https://mamba-tabular.readthedocs.io/en/latest/index.html).

## Installation

Install Mambular using pip:
```sh
pip install mambular
```

## Preprocessing

Mambular simplifies the preprocessing stage of model development with a comprehensive set of techniques to prepare your data for Mamba architectures. Our preprocessing module is designed to be both powerful and easy to use, offering a variety of options to efficiently transform your tabular data.

### Data Type Detection and Transformation

Mambular automatically identifies the type of each feature in your dataset and applies the most appropriate transformations for numerical and categorical variables. This includes:

- **Ordinal Encoding**: Categorical features are seamlessly transformed into numerical values, preserving their inherent order and making them model-ready.
- **One-Hot Encoding**: For nominal data, Mambular employs one-hot encoding to capture the presence or absence of categories without imposing ordinality.
- **Binning**: Numerical features can be discretized into bins, a useful technique for handling continuous variables in certain modeling contexts.
Expand Down Expand Up @@ -102,7 +127,8 @@ from mambular.models import MambularLSS
model = MambularLSS(
dropout=0.2,
d_model=64,
n_layers=8,
n_layers=8,

)

# Fit the model to your data
Expand All @@ -117,10 +143,81 @@ model.fit(

```


### Implement your own model:
mambular allows users to easily integrate their custom models into the existing logic. Simply create a pytorch model and define its forward pass. Instead of inheriting from nn.Module, inherit from mambulars BaseModel. Each mambular model takse three arguments. The number of classes, e.g. = 1 for regression or = 2 for binary classification. For distributional regression, while this argument must be provided, it is determined automatically depending on the chosen distribution. Additionally, it takes two arguments directly passed from preprocessor. The cat_feature_info and num_feature_info for categorical and numerical feature information of e.g. the provided shape. Additionally, you can provide a config argument, which you can either implement similarly to the implemented configs, or simply use one of the Default Configs provided. A custom model could hence look just like this:


1. First, define your config

```python
from dataclasses import dataclass

@dataclass
class MyConfig:
lr: float = 1e-04
lr_patience: int = 10
weight_decay: float = 1e-06
lr_factor: float = 0.1
```

2. Second, define your model just as you would for a nn.Module. Simply define the architecture and the forward pass

```python
from mambular.base_models import BaseModel
import torch
import torch.nn

class MyCustomModel(BaseModel):
def __init__(
self,
cat_feature_info,
num_feature_info,
num_classes: int = 1,
config=None,
**kwargs,
):
super().__init__(**kwargs)
self.save_hyperparameters(ignore=["cat_feature_info", "num_feature_info"])

input_dim = 0
for feature_name, input_shape in num_feature_info.items():
input_dim += input_shape
for feature_name, input_shape in cat_feature_info.items():
input_dim += 1

self.linear = nn.Linear(input_dim, num_classes)

def forward(self, num_features, cat_features):
x = num_features + cat_features
x = torch.cat(x, dim=1)

# Pass through linear layer
output = self.linear(x)
return output
```

3. To leverage the mambular API, you can build a regression, classification or distributional regression model that can leverage all of mambulars built-in methods, by using the following:

```python
from mambular.models import SklearnBaseRegressor

class MyRegressor(SklearnBaseRegressor):
def __init__(self, **kwargs):
super().__init__(model=MyCustomModel, config=MyConfig, **kwargs)
```

4. Subsequently, you can fit, evaluate and predict with your model just like with any other mambualr model.
To achieve the same for classification or disrtibutional regression, instead of inheriting from the SklearnbaseRegressor, simply inherit from the SklearnBaseClassifier and SklearnBaseLSS.

```python
regressor = MyRegressor(numerical_preprocessing="ple")
regressor.fit(X_train, y_train, max_epochs=50)
```

## Citation

If you find this project useful in your research, please consider cite:

```BibTeX
@misc{2024,
title={Mambular: Tabular Deep Learning with Mamba Architectures},
Expand All @@ -129,3 +226,7 @@ If you find this project useful in your research, please consider cite:
year={2024}
}
```

## License

The entire codebase is under MIT license.

0 comments on commit aa6b3be

Please sign in to comment.