Consider the data of chronic kidney disease
- Import the dataset from https://www.kaggle.com/mansoordaku/ckdisease (Links to an external site.) . (1 point)(Hint: Convert txt to csv for ease of use.)
- Extract X as all columns except the first column and Y as first column. (1 points)
- Visualize the dataset. (2 points)
- Split the data into training set and testing set. (1 points) Perform 10-fold cross validation. (1 point)
- Train a Logistic regression model for the dataset. (2 points)
- Display the coefficients and form the logistic regression equation. (1 point)
- Compute the accuracy and confusion matrix. (2 points)
- Plot the decision boundary. (1 point)
- Create an output .csv file consisting actual Test set values of Y (column name: Actual) and Predictions of Y(column name: Predicted). (1 points)
Considering the Iris flowers data with response variable as Class.
- Import the data dataset from https://archive.ics.uci.edu/ml/machine-learning-databases/iris/ (Links to an external site.) (1 points).
- Identify the presence of missing values, write the code to fill the missing values with mean for numerical attributes and mode value for categorical attributes. (1 points)
- Extract X as all columns except the Class column and Y as Class column. (1 points)
- Split the data into training set and testing set. (1 points)
- Model the classifier using GaussianNB, BernoulliNB and MultinomialNB (3 points)
- Compute the accuracy and confusion matrix for each models. (3 points)
- Plot the decision boundary, visualize training and test results of all the models (3 points)