https://www.aicrowd.com/challenges/insurance-pricing-game
Feature engineering
- Binning
Separate continuous variable into segments (Clipping is implicitly done too)
Used for GLM to help capturing non-linear relationship - Interactions
a) Population density
b) Driver Gender combination
c) Vehicle feature interactions
vh_value * vh_weight
present_vh_value (exponential decay by vh_age)
and more... - Grouping
Grouped Med1 and Med2 together in policy type - Transformation
Log-transform, power transform of some continuous variables - History variable
Historical Claim amount, Historical claim count, year since last claim, change in NCD
- A XGBoost and Logistic regression model to predict whether a claim would be >3k.
- I stacked 7 base models using a Tweedie GLM as the meta-learner under 5 fold CV.
- Tweedie GLM
- Light GBM
- DeepForest
- XGBoost
- CatBoost
- Neural Network with Tweedie deviance as loss function
- Neural network with log-normal distribution likelihood as loss function (learning the mu and sigma of the loss)
- Tweedie GLM
- The script that is used to produce prediction inside the AICrowd environment
- Pricing strategy is incorporated in the predict_premium function
The final presentation is also uploaded to this repository.