InputNorm is a normalization layer capable of learning estimations of common scikit-learn scalers (such as the Yeo-Johnson / Box-Cox [1][2][3] based PowerTransformer) in a fully differentiable and numerically stable way.
To mimic the preprocessing steps of most data preprocessing pipelines, where scaling is applied before missing value imputation, InputNorm also accepts NaNs in its input (it will pass the missing values in its output accordingly).
The normalization is applied feature-wise. However, unlike BatchNorm, no running statistics are learned during the process, so the same results will be returned during both training and inference. Another difference between InputNorm and other normalization layers is that this layer is intended to be applied once, immediately after the input.
NOTE:
- The layer is sensitive to learning rate settings at the moment.
- Skip InputNorm's parameters when applying weight decay!
- You can directly apply a dropout layer after this layer to mimic random tree-like network structures.
- Although the layer handles most of the necessary steps for data preprocessing, extreme outliers can still hinder training! Consider clipping your inputs as a preprocessing step!