Skip to content

altruist7/IrisDataAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kidney Data & IrisDataAnalysis

1. Kidney Disease Data Analysis

Consider the data of chronic kidney disease

  • Import the dataset from https://www.kaggle.com/mansoordaku/ckdisease (Links to an external site.) . (1 point)(Hint: Convert txt to csv for ease of use.)
  • Extract X as all columns except the first column and Y as first column. (1 points)
  • Visualize the dataset. (2 points)
  • Split the data into training set and testing set. (1 points) Perform 10-fold cross validation. (1 point)
  • Train a Logistic regression model for the dataset. (2 points)
  • Display the coefficients and form the logistic regression equation. (1 point)
  • Compute the accuracy and confusion matrix. (2 points)
  • Plot the decision boundary. (1 point)
  • Create an output .csv file consisting actual Test set values of Y (column name: Actual) and Predictions of Y(column name: Predicted). (1 points)

2. Iris Data Analysis

Considering the Iris flowers data with response variable as Class.

  • Import the data dataset from https://archive.ics.uci.edu/ml/machine-learning-databases/iris/ (Links to an external site.) (1 points).
  • Identify the presence of missing values, write the code to fill the missing values with mean for numerical attributes and mode value for categorical attributes. (1 points)
  • Extract X as all columns except the Class column and Y as Class column. (1 points)
  • Split the data into training set and testing set. (1 points)
  • Model the classifier using GaussianNB, BernoulliNB and MultinomialNB (3 points)
  • Compute the accuracy and confusion matrix for each models. (3 points)
  • Plot the decision boundary, visualize training and test results of all the models (3 points)