LogisticRegFromScratch4CreditRisk

This project implements Logistic Regression from scratch to predict credit risk using a dataset containing information about individuals, their financial attributes, and loan details. Instead of relying on libraries like sklearn for the machine learning algorithm, all key components—such as data preprocessing, gradient descent, cost function with regularization, and evaluation metrics—are implemented manually using NumPy and Pandas.

Objectives

Build a logistic regression model from scratch.
Preprocess the data effectively with encoding, scaling, and dataset splitting.
Implement cost function with regularization and gradient descent optimization.
Plot loss curves to visualize model convergence.
Evaluate the model using precision and recall metrics.
Make predictions on a test dataset.

Project workflow

Data Preprocessing

One-Hot Encoding: Applied to categorical columns (person_home_ownership, loan_intent).
Boolean Conversion: cb_person_default_on_file transformed into binary integers.
Z-Score Normalization: Scales numerical features to have mean = 0 and standard deviation = 1.
Dataset Splitting: 80% training, 20% testing.

Logistic Regression Model

Sigmoid Function

The sigmoid function maps predictions into probabilities:
Formula : $$\sigma(z) = \frac{1}{1 + e^{-z}}$$

Cost Function with Regularization

Measures the model’s performance while penalizing large weights
Formula : $$J(w, b) = -\frac{1}{m} \sum \left[ y \log(\hat{y}) + (1 - y) \log(1 - \hat{y}) \right] + \frac{\lambda}{2m} \sum w^2$$

Gradient Descent

Optimize the predictions
Formula : $$w = w - \alpha \cdot \frac{\partial J}{\partial w}$$ and $$b = b - \alpha \cdot \frac{\partial J}{\partial b}$$

Auto-Convergence Check

Checks whether the cost function has stabilized below a threshold (epsilon = 0.00001).

Model Training and Prediction

Training: The model is trained using gradient descent.
Prediction: The trained model predicts loan status on test data.

Evaluation Metrics

Precision: Measures accuracy of positive predictions.
- $$\text{Precision} = \frac{TP}{TP + FP}$$
Recall: Measures the ability to detect positive cases.
- $$\text{Recall} = \frac{TP}{TP + FN}$$

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
LogRegFromScratch4CreditRisk.ipynb		LogRegFromScratch4CreditRisk.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LogisticRegFromScratch4CreditRisk

Objectives

Project workflow

Data Preprocessing

Logistic Regression Model

Model Training and Prediction

Evaluation Metrics

About

Releases

Packages

Languages

abhijitchavda/LogisticRegFromScratch4CreditRisk

Folders and files

Latest commit

History

Repository files navigation

LogisticRegFromScratch4CreditRisk

Objectives

Project workflow

Data Preprocessing

Logistic Regression Model

Model Training and Prediction

Evaluation Metrics

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages