GitHub - samresume/SWANSF-DataPreprocessing-Sampling-Notebooks: These notebooks provide a comprehensive workflow, from start to finish, for processing and analyzing the SWAN-SF dataset. They include detailed steps for reading the dataset files, performing full preprocessing, and executing classification.

Getting Started with the SWAN-SF Data Analysis

Welcome to our GitHub repository!

These notebooks provide a comprehensive workflow, from start to finish, for processing and analyzing the SWAN-SF dataset. They include detailed steps for reading the dataset files, performing full preprocessing, and generating a .pkl file for the processed data. Several missing value imputation techniques are implemented, such as Mean Imputation, Next-value Imputation, and our novel method—Fast Pearson Correlation-based K-nearest Neighbors (FPCKNN) Imputation.

In addition, we address class overlap with the Near Decision Boundary Sample Removal (NDBSR) technique. Various normalization methods are also applied, including Min-Max Scaling, Z-Score Normalization, and our proprietary LSBZM (Log, Square Root, BoxCox, Z-Score, and Min-Max) Normalization technique.

The notebooks further implement multiple over-sampling techniques such as SMOTE, ADASYN, TimeGAN, and Gaussian Noise Injection (GNI), as well as two under-sampling methods: Random Under Sampling and Tomek Links. These preprocessing steps collectively enhance the classification performance for predicting solar flares.

The classification models used include SVM, Random Forest, k-NN, Multilayer Perceptron, LSTM, GRU, RNN, and 1D-CNN, all designed to predict solar flares within a 24-hour window.

By using these files, researchers can significantly reduce the time required—by months—to preprocess the SWAN-SF dataset, while achieving high accuracy in solar flare prediction.

Prerequisites

Before you start, make sure you have the following:

SWAN-SF Dataset: Download it from Harvard Dataverse.
Python Packages: Ensure you have these packages installed: pandas, numpy, matplotlib, seaborn, tensorflow, tqdm, pickle, sklearn, scipy, imblearn. The code for timegan is included in the repository, so no additional installation is required for this package.

Setting Up Your Environment

Directory Setup: Modify the following lines in the source code to match your system's directory structure:
```
data_dir = "<Your path>/SWANSF/Downloaded_Data/"  
data_dir_save = "<Your path>/SWANSF/code/"  
```
Sequential Execution: Start from Notebook 1 and proceed in order. Each notebook relies on the data prepared in the previous steps.

Notebooks Overview

Notebook 1: Reads SWAN-SF samples and combines them into a single .pkl file (time series samples) and a .csv file (labels for each partition).
Notebook 2: Focuses on Missing Value Imputation, utilizing data from Notebook 1.
Notebook 3: Centers on Near Decision Boundary Sample Removal.
Notebook 4 & 5: Concentrate on Normalization.
Notebook 6: Offers Visualizations of the dataset.
Notebook 7 & 8: Implement Classification using eight classifiers.
Notebook 9: Applies Over-sampling techniques.
Notebook 10: Combines Over- and Under-sampling techniques.
Notebook 11, 12, & 13: Apply preprocessing techniques post-sampling (Normalization).
Notebook 14, 15, 16, & 17: Implement Classification using eight classifiers after Sampling.
Notebook 18: Presents Final Visualizations.

How To Cite

The paper associated with these notebooks has been published. We kindly ask you to provide a citation to acknowledge our work. Thank you for your support!

DOI: 10.3847/1538-4365/ad7c4a.

@article{EskandariNasab_2024,
    doi = {10.3847/1538-4365/ad7c4a},
    url = {https://dx.doi.org/10.3847/1538-4365/ad7c4a},
    year = {2024},
    month = {oct},
    publisher = {The American Astronomical Society},
    volume = {275},
    number = {1},
    pages = {6},
    author = {MohammadReza EskandariNasab and Shah Muhammad Hamdi and Soukaina Filali Boubrahimi},
    title = {Impacts of Data Preprocessing and Sampling Techniques on Solar Flare Prediction from Multivariate Time Series Data of Photospheric Magnetic Field Parameters},
    journal = {The Astrophysical Journal Supplement Series}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
CITATION.cff		CITATION.cff
LICENSE		LICENSE
Part10_BothOverUnderSampling.ipynb		Part10_BothOverUnderSampling.ipynb
Part11_VectorDataAfterSampling.ipynb		Part11_VectorDataAfterSampling.ipynb
Part12_NormalizationAfterSampling.ipynb		Part12_NormalizationAfterSampling.ipynb
Part13_NormalizationAfterSampling.ipynb		Part13_NormalizationAfterSampling.ipynb
Part14_ClassificationAfterSampling_ML.ipynb		Part14_ClassificationAfterSampling_ML.ipynb
Part15_ClassificationAfterSampling_ML.ipynb		Part15_ClassificationAfterSampling_ML.ipynb
Part16_ClassificationAfterSampling_DL.ipynb		Part16_ClassificationAfterSampling_DL.ipynb
Part17_ClassificationAfterSampling_DL.ipynb		Part17_ClassificationAfterSampling_DL.ipynb
Part18_FinalVisualizations.ipynb		Part18_FinalVisualizations.ipynb
Part1_Reading-SWANSF-Dataset.ipynb		Part1_Reading-SWANSF-Dataset.ipynb
Part2_Imputation.ipynb		Part2_Imputation.ipynb
Part3_NDBSR.ipynb		Part3_NDBSR.ipynb
Part4_Normalization.ipynb		Part4_Normalization.ipynb
Part5_Normalization.ipynb		Part5_Normalization.ipynb
Part6_Visualizations.ipynb		Part6_Visualizations.ipynb
Part7_Classification_ML.ipynb		Part7_Classification_ML.ipynb
Part8_Classification_DL.ipynb		Part8_Classification_DL.ipynb
Part9_OverSampling.ipynb		Part9_OverSampling.ipynb
README.md		README.md
meth.svg		meth.svg
timegan.py		timegan.py
tutorial.pdf		tutorial.pdf
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting Started with the SWAN-SF Data Analysis

Prerequisites

Setting Up Your Environment

Notebooks Overview

How To Cite

About

Releases 1

Packages

Languages

License

samresume/SWANSF-DataPreprocessing-Sampling-Notebooks

Folders and files

Latest commit

History

Repository files navigation

Getting Started with the SWAN-SF Data Analysis

Prerequisites

Setting Up Your Environment

Notebooks Overview

How To Cite

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages