This project implements Logistic Regression from scratch to predict credit risk using a dataset containing information about individuals, their financial attributes, and loan details. Instead of relying on libraries like sklearn for the machine learning algorithm, all key components—such as data preprocessing, gradient descent, cost function with regularization, and evaluation metrics—are implemented manually using NumPy and Pandas.
- Build a logistic regression model from scratch.
- Preprocess the data effectively with encoding, scaling, and dataset splitting.
- Implement cost function with regularization and gradient descent optimization.
- Plot loss curves to visualize model convergence.
- Evaluate the model using precision and recall metrics.
- Make predictions on a test dataset.
- One-Hot Encoding: Applied to categorical columns (person_home_ownership, loan_intent).
- Boolean Conversion: cb_person_default_on_file transformed into binary integers.
- Z-Score Normalization: Scales numerical features to have mean = 0 and standard deviation = 1.
- Dataset Splitting: 80% training, 20% testing.
- Sigmoid Function
- The sigmoid function maps predictions into probabilities:
- Formula :
$$\sigma(z) = \frac{1}{1 + e^{-z}}$$
- Cost Function with Regularization
- Measures the model’s performance while penalizing large weights
- Formula :
$$J(w, b) = -\frac{1}{m} \sum \left[ y \log(\hat{y}) + (1 - y) \log(1 - \hat{y}) \right] + \frac{\lambda}{2m} \sum w^2$$
- Gradient Descent
- Optimize the predictions
- Formula :
$$w = w - \alpha \cdot \frac{\partial J}{\partial w}$$ and$$b = b - \alpha \cdot \frac{\partial J}{\partial b}$$
- Auto-Convergence Check
- Checks whether the cost function has stabilized below a threshold (epsilon = 0.00001).
- Training: The model is trained using gradient descent.
- Prediction: The trained model predicts loan status on test data.
- Precision: Measures accuracy of positive predictions.
$$\text{Precision} = \frac{TP}{TP + FP}$$
- Recall: Measures the ability to detect positive cases.
$$\text{Recall} = \frac{TP}{TP + FN}$$