This is a multi-classification task where you are expected to predict the response variable.
Given the attributes or features of a client, the task of the predictive system is to measure the level of risk in providing insurance to the client. Risk is categorized into 8 levels..
In this dataset, you are provided with over a hundred variables describing attributes of life insurance applicants. The task is to predict the "Response" variable for each Id in the test set. "Response" is an ordinal measure of risk that has 8 levels.
File descriptions
training.csv - the training set, contains the Response values testing.csv - the test set, you must predict the Response variable for all rows in this file sample_submission.csv - a sample submission file in the correct format
Data fields
Variable Description
Id A unique identifier associated with an application.
Product_Info_1-7 A set of normalized variables relating to the product applied for
Ins_Age Normalized age of applicant
Ht Normalized height of applicant
Wt Normalized weight of applicant
BMI Normalized BMI of applicant
Employment_Info_1-6 A set of normalized variables relating to the employment history of the applicant.
InsuredInfo_1-6 A set of normalized variables providing information about the applicant.
Insurance_History_1-9 A set of normalized variables relating to the insurance history of the applicant.
Family_Hist_1-5 A set of normalized variables relating to the family history of the applicant.
Medical_History_1-41 A set of normalized variables relating to the medical history of the applicant.
Medical_Keyword_1-48 A set of dummy variables relating to the presence of/absence of a medical keyword being associated with the application.
Response This is the target variable, an ordinal variable relating to the final decision associated with an application
The following variables are all categorical (nominal):
Product_Info_1, Product_Info_2, Product_Info_3, Product_Info_5, Product_Info_6, Product_Info_7, Employment_Info_2, Employment_Info_3, Employment_Info_5, InsuredInfo_1, InsuredInfo_2, InsuredInfo_3, InsuredInfo_4, InsuredInfo_5, InsuredInfo_6, InsuredInfo_7, Insurance_History_1, Insurance_History_2, Insurance_History_3, Insurance_History_4, Insurance_History_7, Insurance_History_8, Insurance_History_9, Family_Hist_1, Medical_History_2, Medical_History_3, Medical_History_4, Medical_History_5, Medical_History_6, Medical_History_7, Medical_History_8, Medical_History_9, Medical_History_11, Medical_History_12, Medical_History_13, Medical_History_14, Medical_History_16, Medical_History_17, Medical_History_18, Medical_History_19, Medical_History_20, Medical_History_21, Medical_History_22, Medical_History_23, Medical_History_25, Medical_History_26, Medical_History_27, Medical_History_28, Medical_History_29, Medical_History_30, Medical_History_31, Medical_History_33, Medical_History_34, Medical_History_35, Medical_History_36, Medical_History_37, Medical_History_38, Medical_History_39, Medical_History_40, Medical_History_41
The following variables are continuous:
Product_Info_4, Ins_Age, Ht, Wt, BMI, Employment_Info_1, Employment_Info_4, Employment_Info_6, Insurance_History_5, Family_Hist_2, Family_Hist_3, Family_Hist_4, Family_Hist_5
The following variables are discrete:
Medical_History_1, Medical_History_10, Medical_History_15, Medical_History_24, Medical_History_32
Medical_Keyword_1-48 are dummy variables.