A PyTorch implementation of a supervised machine learning model to classify different morphologies of interference signals in radio telescope data.
Project report: PDF
Contributors:
- Akshay Suresh (mentor, project lead)
- Ryan J. Hill (Cornell University undergraduate research intern, Fall 2019)
- Ethan S. Bair (Cornell University undergraduate research intern, Fall 2019)
Interference signals from human technologies frequently compound searches for exotic astrophysical phenomena. With modern radio telescopes generating data at > 100 GB/hr rates, automated methods are necessary to identify and flag data segments rife with interference. Unflagged data chunks can then be processed via subsequent pipelines tuned to specific science cases.
Here, we experiment with multiple toy convolutional neural network (CNN) models to distinguish between various morphologies of interference signals in radio telescope data.
As a first pass, we defined the following 5 classes for our signal classification task.
llnb
: Long-lived narrowband interference + background noiseslnb
: Short-lived narrowband interference + background noisellbb
: Long-lived broadband interference + background noiseslbb
: Short-lived broadband interference + background noisenoise
: Background noise only
Simulated frequency-time diagrams of the first 4 signal classes are presented below. Slide credit: Ryan J. Hill
NOTE: In our study, we generated simulated data to ensure that our training and validation data are balanced across all classes. This choice allows us to evaluate model performance using the accuracy metric. Refer to the Appendix of our project report for the full confusion matrices obtained with different CNN models.
Figure credit: Ethan S. Bair Trialing CNNs of different depths, we observe a growth in network accuracy across all signal classes with increasing model depth. However, the incremental gain in network accuracy diminishes with every added layer. Setting a 95% accuracy threshold, the above plot suggests that an 8/9-layer CNN model would be adequate for our classification problem.
- Our definition of interference signal classes is overly simplistic and needs refinement based on inputs from real-world radio telescope data.
- Models do not account for scenarios where multiple signal classes are present in a single frequency-time snippet. For instance, what if an astrophysical signal of interest overlaps in time with two bright interference signals of different bandwidths?
- Perhaps multilabel classification is worth an exercise.
- Alternatively, we can take a look at image segmentation problems.
Please submit an issue to voice any problems or requests. Constructive critcisms are always welcome.