Passwords are an essential part of system security. Though there are many alternatives to passwords for access control,
in many applications, the password is the more compellingly authenticating the identity.
Password strength meters provide quick and easy visual feedback on what makes a strong password. A small meter indicates
how strong a proposed password is when creating a new account or changing passwords.
Our abroach aims to develop a password strength checker that determines the strength of a password. Some popular
password strength algorithms predict the strength of password using machine learning algorithms. A password strength
checker analyzes the combination of digits, letters, and special symbols in your password. It is generated by training a
machine learning model on a labelled dataset of different password combinations of letters and special symbols.
The model learns from data which combinations of letters and symbols are considered strong or weak passwords. So, to
create an application that checks the strength of passwords, we need a labelled dataset containing various combinations
of letters and symbols.
Our dataset is from Kaggle, it is used for training a machine learning model to predict the strength of a password. We can use that information for this task. What distinguishes our
approach from other strength meters? To begin with, it is completely based on machine learning rather than rules. Second,
It is only saved passwords that were marked as weak “0”, medium “1”, or strong “2” by all three strength meters.
This implies that all the passwords were either weak, medium, or strong.
The passwords used in our analysis are from the 000webhost leak that is available online. A tool called PARS by Georgia Tech university has all the commercial password meters integrated into it and is used to determine the strength of the passwords. Data Link
Dataset has the following columns:
- password: String contains sequence of letters, numbers, and special characters ... etc.
- strength: A categorical column that has a value form 0 to 2, 0 for weak, 1 for medium, 2 for strong.
Here is a snapshot of the data
password | strength |
---|---|
kzde5577 | 1 |
kino3434 | 1 |
visi7k1yr | 1 |
megzy123 | 1 |
lamborghin1 | 1 |
The data is incorrect format, so it needs some modification to process easily. Here is a snapshot after modifications:
password | strength | length | small | capital | special | numeric |
---|---|---|---|---|---|---|
kzde5577 | 1 | 8 | 4 | 0 | 0 | 4 |
kino3434 | 1 | 8 | 4 | 0 | 0 | 4 |
visi7k1yr | 1 | 9 | 7 | 0 | 0 | 2 |
megzy123 | 1 | 8 | 5 | 0 | 0 | 3 |
lamborghin1 | 1 | 11 | 10 | 0 | 0 | 1 |
To explore more about data, the following questions need to be answered:
- What is the distribution of strength column?
- What is the distribution of length column?
- What is the distribution of small column?
- What is the distribution of capital column?
- What is the distribution of special column?
- What is the distribution of numeric column?
- Set up the data with required columns
- Split data into train set & test set
- Scaling using standard scaler
- Fitting DecisionTreeClassifier Model
- Plotting the tree
- Classification report (precision, recall, f1-score)
- Confusion matrix & Heatmap
- Fitting Logistic Regression Model
- Classification report (precision, recall, f1-score)
- Confusion matrix & Heatmap
- Fitting Linear Support Vector Classifier Model
- Classification report (precision, recall, f1-score)
- Confusion matrix & Heatmap