This machine learning project uses anomaly detection models to detect the submersible pump impeller casting defects through images.
Casting is a manufacturing process in which a liquid material is usually poured into a mould, which contains a hollow cavity of the desired shape, and then allowed to solidify.
Source: Casting.
- Isolation Forest serves as the entry point of the project and contains feature extraction, data transformation, and IF model.
- Local Outlier Factor contains LOF model.
- One Class SVM contains one-class SVC model.
- Autoencoder contains autoencoder model (deep learning).
The image dataset is obtained through Kaggle, which consists of two different types:
- 512*512 greyscale without augmentation
- 300*300 greyscale with augmentation
Source: casting product image data for quality inspection.
Even though casting technology has become better overtime, the casting process in industry is never perfect because external factors such as defects in the molding and raw materials can exist. As a result, defective casting products can be produced. Often times, it is laborious to inspect the casting products manually to separate the defective from the normal ones. What if we can automate this process? By using machine learning on images, the model can help us detect the casting products with defects.
As the image set consists of greyscale images, the frequency distribution of the greyscale color from 0 (pure black) to 255 (pure white) is plotted for each image. Hence, each sample consists of 256 features.
In general, there are two different types of detecting anomalies:
- Outlier detection: The training data contains outliers which are defined as observations that are far from the others. Outlier detection estimators thus try to fit the regions where the training data is the most concentrated, ignoring the deviant observations.
- Novelty detection: The training data is not polluted by outliers and we are interested in detecting whether a new observation is an outlier. In this context an outlier is also called a novelty.
Source: 2.7. Novelty and Outlier Detection.
- Image set: 512*512 greyscale without augmentation.
- Hyperparameter tuning: number of trees.
- Outlier detection: 58% in accuracy.
- Image set: 512*512 greyscale without augmentation.
- Hyperparameter tuning: number of neighbours.
- Outlier detection: 71% in accuracy.
- Novelty detection: 70% in accuracy.
- Image set: 512*512 greyscale without augmentation.
- Hyperparameter tuning: nu (see explanation below).
- Outlier detection: 63% in accuracy.
- Novelty detection: 82% in accuracy.
'nu' is an upper bound on the fraction of margin errors and a lower bound of the fraction of support vectors. A margin error corresponds to a sample that lies on the wrong side of its margin boundary: it is either misclassified, or it is correctly classified but does not lie beyond the margin.
Source: 1.4.7.3. NuSVC.
- Image set: 300*300 greyscale with augmentation (DL performs better with large number of images)
- Hyperparameter tuning: threshold (see explanation below).
- Novelty detection: 94% in accuracy.
The anomalies are detected by calculating whether the reconstruction loss is greater than a fixed threshold. For this, we will calculate the mean average error for normal samples from the training set, then classify future examples as anomalous (defective) if the reconstruction error is higher than one standard deviation from the training set.
For this image set, LOF and one class SVM models have decent performance while IF does not perform well. We can see that the autoencoder model has the best performance. As it uses neural network, a lot of hidden information in the input features can be extracted and becomes a determining factor in the predictions.