Skip to content

In this project, I coded a prediction program over a ready-made dataset in order to avoid such situations.

Notifications You must be signed in to change notification settings

UfukBlbn/Data-Mining-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Data Mining/Python

In this project, I coded a prediction program over a ready-made dataset in order to avoid such situations.

You can see the all graph, when you click on the Classification.ipynb.

  1. INTRODUCTION As everyone knows, the accuracy of diagnosis in disease treatments greatly affects the treatment. There are many factors that affect the diagnosis of the disease: Time has become the primary factor in diagnosing diseases. In addition to that there is a huge mistakes about diagnosing the diseases. The use of artificial intelligence can be a very effective method to save time and prevent possible mistakes. To give an example : diagnosis prediction is one of the core research tasks in EHR (Electronic Health Record) data mining, which aims to predict the future visit information according to the historical visit records. If we teach the machine the disease information of previous patients by making the necessary programming, we can provide great convenience in the diagnosis and diagnosis of future patients. In this project, I coded a prediction program over a ready-made dataset in order to avoid such situations.

  2. MATERIALS AND METHODS In this project, I used the data of more than 550 patients who were previously diagnosed. Whether the tumor cells of the patients were benign or malignant was previously determined. Many properties such as cell diameter, area, density and symmetry were used while determining. When we look in general terms, we try to make predictions by classifying the existing data set. The project consists of 2 parts. Making predictions with the tool and predicting with the software language. And I preferred to use more than one method to increase the accuracy of the prediction and find the most accurate path.

1-) Support vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outliers detection.

2-) The Naive Bayes classifier is based on Bayes' theorem. It is a lazy learning algorithm, it can also work on unstable data sets. The way the algorithm works calculates the probability of each state for an element and classifies it according to the one with the highest probability value.

3-) Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression.

4-) Random forests is a supervised learning algorithm. It can be used both for classification and regression. It is also the most flexible and easy to use algorithm. A forest is comprised of trees. It is said that the more trees it has, the more robust a forest is.

5-) K nearest neighbors is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure (e.g., distance functions).

svmachine plot

graph svm

About

In this project, I coded a prediction program over a ready-made dataset in order to avoid such situations.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published