Classification of Gamma and Hadron events by training classifieres and machine learning algorithms on the MAGIC Gamma Telescope dataset.
The MAGIC gamme telescope dataset is generated to simulate registration of high energy gamma particles in a ground-based atmospheric Cherenkov gamma telescope using the imaging technique. The dataset consists of two classes; gammas (signal) and hadrons (background). There are 12332 gamma events and 6688 hadron events.
The dataset consists of 10 continuous features and 1 binary class label. The features are listed below:
- fLength: continuous # major axis of ellipse [mm]
- fWidth: continuous # minor axis of ellipse [mm]
- fSize: continuous # 10-log of sum of content of all pixels [in #phot]
- fConc: continuous # ratio of sum of two highest pixels over fSize [ratio]
- fConc1: continuous # ratio of highest pixel over fSize [ratio]
- fAsym: continuous # distance from highest pixel to center, projected onto major axis [mm]
- fM3Long: continuous # 3rd root of third moment along major axis [mm]
- fM3Trans: continuous # 3rd root of third moment along minor axis [mm]
- fAlpha: continuous # angle of major axis with vector to origin [deg]
- fDist: continuous # distance from origin to center of ellipse [mm]
- class: g,h # gamma (signal), hadron (background)
The dataset isn't ready to be used for classification algorithms. It needs to be preprocessed. The preprocessing steps are listed below:
The dataset is imbalanced. There are 12332 gamma events and 6688 hadron events. The number of gamma events is much higher than the number of hadron events. This can cause the classifier to be biased towards the gamma events. To solve this problem, the dataset is balanced by downsampling the gamma events to the number of hadron events. The number of gamma events is reduced to 6688.
The dataset is split into training and testing sets. The training set is used to train the classifier. The testing set is used to test the classifier. The dataset is split into 80% training set and 20% testing set.
Classification algorithms are used to classify the events into gamma and hadron events. 6 classification algorithms are used in this project. The algorithms are listed below:
- No parameters are used in the classifier.
- N-estimators parameter is used in the classifier. The value of the parameter is tuned using the cross-validation method. The value of the parameter is 200.
- K-neighbors parameter is used in the classifier. The value of the parameter is tuned using the cross-validation method. The value of the parameter is 21.
- N-estimators parameter is used in the classifier. The value of the parameter is tuned using the cross-validation method. The value of the parameter is 400.
- No parameters are used in the classifier.
- The neural network consists of 2 hidden layers. The number of neurons in each layer is tuned using the cross-validation method. The number of neurons in the first hidden layer is 50 and the number of neurons in the second hidden layer is 90.
Algorithm | Accuracy | Precision | Recall | Average F1 |
---|---|---|---|---|
Decision Tree | 0.79 | 0.79 | 0.79 | 0.79 |
AdaBoost | 0.82 | 0.82 | 0.82 | 0.82 |
KNN | 0.77 | 0.78 | 0.77 | 0.77 |
Random Forest | 0.87 | 0.87 | 0.87 | 0.87 |
Naive Bayes | 0.65 | 0.69 | 0.65 | 0.63 |
Neural Network | 0.85 | 0.86 | 0.85 | 0.85 |
The README.md file contains an overview of the project, it is recommended to open notebook as it contains the code and further explanation for the results.