Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anomaly Detection in Time Series #987

Merged
merged 12 commits into from
Nov 10, 2024
9 changes: 9 additions & 0 deletions Anomaly Detection in Time Series /Dataset/Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
### 📊 Dataset
The model uses a synthetic time series dataset generated specifically for anomaly detection. Key details include:

- **Structure**: The dataset consists of a single-variable time series with 1,000 timesteps.
- **Data Scaling**: The values are scaled between 0 and 1 using MinMaxScaler to standardize the data range, which helps the model learn more effectively.
- **Time Steps**: A window of 10 time steps is used to create sequences for training the LSTM model, making it capable of learning temporal dependencies.
- **Anomalies**: Certain points in the series represent anomalies, which the model is trained to identify.

This synthetic dataset enables the model to learn from a controlled data source with a known pattern of anomalies, making it suitable for evaluating the accuracy of different anomaly detection algorithms.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Large diffs are not rendered by default.

64 changes: 64 additions & 0 deletions Anomaly Detection in Time Series /Model/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# 📈 Anomaly Detection in Time Series

### 🔴 Goal
The objective of this project is to develop an effective model to detect anomalies in time series data using LSTM networks and other algorithms, aiming to achieve high accuracy and reliability.

### 📊 Dataset
The model uses a synthetic time series dataset generated for anomaly detection.

---

## 📝 Description
This project focuses on implementing a robust anomaly detection system using multiple algorithms on time series data, primarily leveraging LSTM (Long Short-Term Memory) networks. Additionally, models such as Facebook Prophet and Isolation Forest are applied to compare effectiveness in detecting anomalies. The process includes:

1. **Data Preprocessing**: Cleaning and preparing time series data for modeling.
2. **Exploratory Data Analysis (EDA)**: Analyzing data distribution, trends, and seasonality.
3. **Model Implementation**: Applying LSTM and other models to identify patterns and detect outliers.
4. **Performance Evaluation**: Assessing model accuracy to determine the best approach.

---

## 💻 Models Implemented
- **LSTM (Long Short-Term Memory) Network**
- **Facebook Prophet**
- **Isolation Forest**
- **Other Anomaly Detection Algorithms**

## 🛠️ Libraries Needed
To run this project, ensure you have the following libraries installed:
- `numpy`
- `pandas`
- `matplotlib`
- `seaborn`
- `tensorflow` (for LSTM models)
- `scikit-learn`
- `fbprophet`

Install the libraries using:
```bash
pip install numpy pandas matplotlib seaborn tensorflow scikit-learn fbprophet
```
## 📊 Exploratory Data Analysis (EDA) Results
In the EDA section, the following analyses were conducted:

- **Trend Analysis**: Understanding the overall trend in the data.
- **Seasonality Check**: Identifying any seasonal patterns.
- **Data Distribution**: Visualizing the spread and outliers in the data.

EDA findings helped in selecting and tuning models to better capture the characteristics of anomalies in time series data.

---

## 📈 Performance of the Models Based on Accuracy Scores
| Model | Accuracy Score |
|------------------------|----------------|
| **LSTM Network** | 79% |
| **Facebook Prophet** | 86% |
| **Isolation Forest** | 88% |

The table above compares model accuracy to highlight the most effective approach for detecting anomalies in this synthetic time series dataset.

---

## ✅ Conclusion
This project demonstrates that while LSTM networks provide a reliable approach to anomaly detection in time series data, models like Facebook Prophet and Isolation Forest achieved higher accuracy in this synthetic dataset, with Isolation Forest reaching the top performance at 88%. Each model's classification metrics indicate that they performed well in identifying non-anomalous points but faced challenges with anomaly recall due to the limited representation of anomalies in the dataset. The accuracy comparison suggests that Isolation Forest may be more effective for detecting anomalies in this context, making it the preferred choice over LSTM and Facebook Prophet.
18 changes: 18 additions & 0 deletions Anomaly Detection in Time Series /requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
### 1. **Python Version**
- Python 3.7 or higher

### 2. **Libraries**
The following Python libraries are required for this project. You can install them using pip:

- **TensorFlow** (for LSTM model)
- **Keras** (high-level neural networks API for TensorFlow)
- **pandas** (for data manipulation)
- **numpy** (for numerical computing)
- **matplotlib** (for plotting)
- **seaborn** (for statistical data visualization)
- **scikit-learn** (for machine learning models and metrics)
- **fbprophet** (for Facebook Prophet model)
- **IsolationForest** (for Isolation Forest model)
- **statsmodels** (for statistical modeling)
- **scipy** (for scientific and technical computing)
- **yfinance** (for data retrieval, if needed)
Loading