Data Mining and Decision Analysis based on Haberman's Data

The attached Haberman's dataset (HabermansSurvivalData.xlsx) contains cases from a study that was conducted between 1958 and 1970 at the University of Chicago's Billings Hospital on the survival of patients who had undergone surgery for breast cancer.

Data

This data set is named "Haberman's Survival Data" because it was first used in a research paper by S. J. Haberman.

This data set has 306 records. The following are the attributes.

• Age of patient at time of operation (numerical)

• Patient's year of operation (year - 1900, numerical)

• Number of positive axillary nodes detected (numerical)

− Survival status (class attribute)

• 1 = the patient survived 5 years or longer

• 2 = the patient died within 5 year

Objective

The objective of the project is to find the best performing classification model to predict if a patient will survive five years or longer after the surgery.

Response

The data set named "Haberman's Survival Data" was analyzed using the following models:

• Decision tree

• kNN

• Naive Bayesian

• ANN

• SVM

• Ensemble learners (Voting, Bagging and Random Forest)

Some of the models were analyzed using the numerical to binomial operator as allowed by Rapid Miner. Others were analyzed without the numerical to binomial operator. A detailed list of what operators were analyzed in each case is presented in Table 1.

Analysis for each of the models was applied using Rapid Miner Table 2 and Table 3 present the corresponding results for each of the models. Table 2 presents a compilation of each of the Root Mean Squared Error (RMSE) and Squared Error (SE) results with tolerances and Micro averages for each of the applicable models.

Table 3 presents the results for the accuracy value for each of the Decision Tree, kNN, and Naïve Bayesian models. All accuracy results are provided with tolerances and Micro averages if available.

According to the analysis performed the kNN model was the best performing model as it has the lowest values for RMSE and SE. The kNN’s RMSE value was 0.237 and the SE value was 0.056 with a tolerance of +/- 0.084. The kNN’s accuracy was the highest with a value of 94.88%.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.gitignore		.gitignore
BA170D~1.RMP		BA170D~1.RMP
BA521F~1.RMP		BA521F~1.RMP
BA710C~1.RMP		BA710C~1.RMP
BA9C4D~1.RMP		BA9C4D~1.RMP
BAB05C~1.RMP		BAB05C~1.RMP
BAE2E6~1.RMP		BAE2E6~1.RMP
BAN-706-036-Final-Project-Martinez-Suarez-Julio-Mod-14-ANN-Numerical.rmp		BAN-706-036-Final-Project-Martinez-Suarez-Julio-Mod-14-ANN-Numerical.rmp
BAN-706-037-Final-Project-Martinez-Suarez-Julio-Mod-14-Decision-Tree_Binomial.rmp		BAN-706-037-Final-Project-Martinez-Suarez-Julio-Mod-14-Decision-Tree_Binomial.rmp
BAN-706-042-Final-Project-Martinez-Suarez-Julio-Mod-14-kNN-Binomial.rmp		BAN-706-042-Final-Project-Martinez-Suarez-Julio-Mod-14-kNN-Binomial.rmp
BAN-706-043-Final-Project-Martinez-Suarez-Julio-Mod-14-kNN-Numerical.rmp		BAN-706-043-Final-Project-Martinez-Suarez-Julio-Mod-14-kNN-Numerical.rmp
BAN-706-046-Final-Project-Martinez-Suarez-Julio-Mod-14-SVM-Numerical.rmp		BAN-706-046-Final-Project-Martinez-Suarez-Julio-Mod-14-SVM-Numerical.rmp
FIG-001.jpg		FIG-001.jpg
FIG-002.jpg		FIG-002.jpg
FIG-003.jpg		FIG-003.jpg
FIG-004.jpg		FIG-004.jpg
FIG-005.jpg		FIG-005.jpg
FIG-006.jpg		FIG-006.jpg
FIG-007.jpg		FIG-007.jpg
FIG-008.jpg		FIG-008.jpg
FIG-009.jpg		FIG-009.jpg
FIG-010.jpg		FIG-010.jpg
FIG-011.jpg		FIG-011.jpg
FIG-012.jpg		FIG-012.jpg
FIG-013.jpg		FIG-013.jpg
FIG-014.jpg		FIG-014.jpg
FIG-015.jpg		FIG-015.jpg
FIG-016.jpg		FIG-016.jpg
FIG-017.jpg		FIG-017.jpg
FIG-018.jpg		FIG-018.jpg
FIG-019.jpg		FIG-019.jpg
FIG-020.jpg		FIG-020.jpg
FIG-021.jpg		FIG-021.jpg
FIG-022.jpg		FIG-022.jpg
FIG-023.jpg		FIG-023.jpg
FIG-024.jpg		FIG-024.jpg
FIG-025.jpg		FIG-025.jpg
FIG-026.jpg		FIG-026.jpg
FIG-027.jpg		FIG-027.jpg
FIG-028.jpg		FIG-028.jpg
FIG-029.jpg		FIG-029.jpg
FIG-030.jpg		FIG-030.jpg
FIG-031.jpg		FIG-031.jpg
FIG-032.jpg		FIG-032.jpg
HabermansSurvivalData.xlsx		HabermansSurvivalData.xlsx
LICENSE		LICENSE
README.md		README.md
Table-001-Final-Project.jpg		Table-001-Final-Project.jpg
Table-002-Final-Project.jpg		Table-002-Final-Project.jpg
Table-003-Final-Project.jpg		Table-003-Final-Project.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Mining and Decision Analysis based on Haberman's Data

Data

Objective

Response

Decision Tree (Numerical)

Decision Tree (Binomial)

kNN (Numerical)

kNN (Binomial)

Naïve Bayesian (Numerical)

Naïve Bayesian (Binomial)

Artificial Neural Network (Numerical)

SVM (Numerical)

Ensemble Learners (Voting)

Ensemble Learners (Bagging)

Ensemble Learners (Random Forest)

About

Releases

Packages

License

julioeli86/RapidMiner-Portfolio

Folders and files

Latest commit

History

Repository files navigation

Data Mining and Decision Analysis based on Haberman's Data

Data

Objective

Response

Decision Tree (Numerical)

Decision Tree (Binomial)

kNN (Numerical)

kNN (Binomial)

Naïve Bayesian (Numerical)

Naïve Bayesian (Binomial)

Artificial Neural Network (Numerical)

SVM (Numerical)

Ensemble Learners (Voting)

Ensemble Learners (Bagging)

Ensemble Learners (Random Forest)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages