Skip to content

Latest commit

 

History

History
33 lines (21 loc) · 1.87 KB

File metadata and controls

33 lines (21 loc) · 1.87 KB

Prediction of Specific Star Formation Rate (SSFR) Using Synthetic Data and Identifying regions of interest

This work has been accepted in the IEEE Conference on Engineering Informatics 2024 (ICEI 2024) being held at IIT-Dharwad and will be presented shortly.

Project Overview

This project focuses on the advanced prediction of Specific Star Formation Rate (SSFR) in galaxies using real and synthetic data from the Sloan Digital Sky Survey Data Release 7 (SDSS-DR7). By leveraging machine learning models, we aim to enhance the predictive performance and better find regions of star formation activity.

Key Features

Data Sources: Utilizes SDSS-DR7 data and synthetic datasets generated to augment the original data. Machine Learning Models: Implements various models including: Linear Regression Ridge Regression Lasso Regression Random Forest Gradient Boosting Feature Engineering: Novel features based on celestial position and derived transformations. Synthetic Data Generation: Augments the dataset while preserving feature correlations, enhancing model robustness. Used the Gaussian Copola model. Evaluation Metrics: Assesses models using Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R-squared (R²) scores. Clustering Techniques: Applies DBSCAN to identify regions of interest in celestial maps. Visualizations: Provides interactive tools for user-friendly exploration of SSFR predictions.

Results

The Random Forest model demonstrated the best balance of predictive performance and computational efficiency. The incorporation of synthetic data and advanced feature engineering strategies significantly improved model accuracy and generalizability. Top 10 regions of interest was plotted and an interactive tool was developed.

My Contribution

I developed the whole original dataset analysing, synthetic data generation and regions of interest model