Skip to content

Commit

Permalink
Update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
alistratov committed Oct 14, 2024
1 parent c9beaf1 commit d886816
Show file tree
Hide file tree
Showing 3 changed files with 43 additions and 16 deletions.
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ pip install data-password-entropy

```python-repl
>>> from data_password_entropy import password_entropy
>>> password_entropy('password')
35
>>> password_entropy('Vgk4@HDk6X7gEp7')
Expand All @@ -20,7 +19,7 @@ pip install data-password-entropy


## Overview
The `data-password-entropy` package provides function...
The `data-password-entropy` package provides a function to calculate the entropy of a password, measuring its strength against brute-force attacks. Unlike traditional rule-based methods that enforce specific criteria—such as minimum length or mandatory punctuation—which can either reject strong, unconventional passwords or accept weak ones like `P@ssw0rd`, entropy-based evaluation offers a more accurate assessment. By assigning a numerical value to a password's complexity and unpredictability, this empirical algorithm ensures that a password achieving an entropy score of 80 bits is considered sufficiently secure for most applications.


## Documentation
Expand Down
51 changes: 42 additions & 9 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,49 @@
The `data-password-entropy` package provides a function to calculate the entropy of a password. The entropy is a measure of the password's strength in resisting brute-force attacks. The function uses a simple, empirical algorithm to determine the password's entropy based on the characters it contains.
The `data-password-entropy` package provides a function to calculate the entropy of a password. Entropy measures a password's strength in resisting brute-force attacks. The function employs a straightforward, empirical algorithm to determine the password's entropy based on the characters it contains.

The common approach to ensuring password quality typically involves enforcing specific rules, such as minimum length requirements and mandatory inclusion of punctuation characters. However, these rigid rules can inadvertently restrict a small but significant portion of users who prefer creating passwords based on their own criteria. For instance, consider why the password `jackslippedonicefellonhisass`, which is highly secure due to its length and unpredictability, should be deemed unacceptable simply because it doesn't include punctuation. Conversely, the password `P@ssw0rd`, which complies with many conventional requirements, frequently appears in public password databases, underscoring its weakness despite adhering to standard rules.

Evaluating a password based on its information entropy provides a more nuanced and accurate measure of its strength. Entropy assigns a numerical value to a password, reflecting its complexity and resistance to brute-force attacks. Unlike rule-based systems, entropy-based evaluation considers the overall unpredictability and diversity of characters within the password. According to the algorithm implemented in this module, a password achieving an entropy score of **80 bits** is deemed sufficiently secure for most applications.

## Table of contents
* [Description](#description)
* [Installation](#installation)
* [Example](#example)
* [API Reference](#api-reference)
* [Performance](#performance)
* [Links](#links)
* [License](#license)


# Description
Information entropy, also known as password quality or password strength when used in a discussion of the information security, is a measure of a password in resisting brute-force attacks.
## Description
Information entropy, also referred to as password quality or password strength, quantifies a password's ability to withstand brute-force attacks.

There are a lot of different ways to determine a password's entropy.

### How It Works
The data-password-entropy package uses a simple, empirical algorithm to calculate password entropy through the following steps:

* Character classification:
* Categorization: each character in the password is assigned to a specific class, such as numbers, lowercase letters, uppercase letters, or others.
* Assumption: characters within the same class are assumed to have an equal probability of being selected.
* Symbol base expansion: incorporating characters from multiple classes increases the total number of possible symbols (symbol base), thereby enhancing the password's entropy.

There are a lot of different ways to determine a password's entropy. We use a simple, empirical algorithm: first, all characters from the string splitted to several classes, such as numbers, lower- or upper-case letters. Any characters from one class have equal probability of being in the password. Mix of the characters from the different classes extends the number of possible symbols (symbols base) in the password and thereby increases its entropy. Then, we calculate the effective length of the password to ensure the next rules:
* Effective length calculation:
* Orderliness reduction: sequences like `1234` are considered less secure than `1342` because ordered sequences reduce total entropy.
* Repeating characters: repeating sequences, such as `aaaa`, diminish entropy compared to more varied character arrangements.

* some orderliness decreases total entropy, so '1234' is weaker password than '1342',
* Character classes:
* ASCII characters: characters with Unicode code points up to 127 are categorized into predefined classes (e.g., numbers, uppercase letters).
* Non-ASCII characters: all characters with code points above 127 are grouped into a single class.

* repeating sequences decrease total entropy, so 'a' x 100 insignificantly stronger than 'a' x 4 (it may seem, that's too insignificantly).
There is no well-defined approach to processing national or extended Unicode characters. For instance, the Greek letters block in the Unicode Character Database comprises approximately 400 symbols. However, not all of these symbols are used with equal frequency. An attacker who knows that a password may contain Greek letters is unlikely to target the simple α (Greek letter Alpha) with the same probability as the more complex ἆ (Greek small letter Alpha with psili and perispomeni). This disparity in usage patterns makes it impractical to assign distinct probabilities to each individual character within a script or Unicode block.

Do not expect too much: an algorithm does not check the password's weakness with dictionary lookup, can not evaluate obfuscation like 'p@ssw0rd', sequences from a keyboard row or personally related information like date of birth.
Therefore, to maintain simplicity and efficiency, all characters with Unicode code points above 127 are grouped into a single class. This approach strikes a balance between accuracy and practicality, ensuring that the entropy calculation remains both manageable and sufficiently representative of the password's complexity.

Probability of characters occurring depends on the capacity of character class only. Perhaps, it should be taken into account a prevalence of symbol class actually — it is very unlikely to find a control character in the password. But common password policies don't allow control characters, spaces or extended characters in passwords, therefore, so they should not occur in practice.
### Limitations
* No dictionary checks: the algorithm does not verify passwords against known weak passwords or dictionary words.
* No obfuscation evaluation: it cannot assess the complexity introduced by character substitutions like `p@ssw0rd`.
* No sequence detection: keyboard sequences or other patterned inputs are not specifically handled.
* No personal information assessment: the algorithm does not account for passwords containing personally identifiable information, such as names or dates of birth.

Similarly, there is no well-defined approach to process national characters. For example, the Greek letters block in Unicode Character Database contains about 400 symbols, but not all of them have equivalent frequency of usage. An intruder, who knows that password may contain Greek letters, will not probe the α (Greek letter Alpha) with the same probability as the ἆ (Greek small letter Alpha with psili and perispomeni), therefore it might be incorrect to consider a whole UCD block or script as a base for calculating probabilities. We consider all characters with codes higher than 127 form one class.

Expand All @@ -43,9 +65,20 @@ e = password_entropy('Vgk4@HDk6X7gEp7')
# returns 85
```

## API Reference
### `password_entropy(password: str) -> int`

Calculates the entropy of the provided password.

**Parameters:**
- `password` (`str`): The password string to evaluate.

**Returns:**
- `int`: The entropy value in bits representing the password's strength.


## Performance
On a modern processor core (as of 2024), the module can perform approximately 100,000 calls per second for random passwords of 32 characters.
On a modern processor core (as of 2024), the module can perform approximately 100,000 calls per second for random passwords consisting of 32 characters.


## Links
Expand Down
5 changes: 0 additions & 5 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,6 @@ site_author: Oleh Alistratov
repo_url: https://github.com/alistratov/password-entropy-py
edit_uri: blob/main/docs/

#nav:
# - Usage: index.md
# - References: references.md
# - License: license.md

theme: readthedocs

markdown_extensions:
Expand Down

0 comments on commit d886816

Please sign in to comment.