Skip to content

Commit

Permalink
Fix sphinx/build_docs warnings for ciphers (#12485)
Browse files Browse the repository at this point in the history
* Fix sphinx/build_docs warnings for ciphers

* Fix
  • Loading branch information
MaximSmolskiy authored Dec 30, 2024
1 parent 94b3777 commit f45e392
Show file tree
Hide file tree
Showing 7 changed files with 170 additions and 125 deletions.
7 changes: 5 additions & 2 deletions ciphers/autokey.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
"""
https://en.wikipedia.org/wiki/Autokey_cipher
An autokey cipher (also known as the autoclave cipher) is a cipher that
incorporates the message (the plaintext) into the key.
The key is generated from the message in some automated fashion,
Expand All @@ -10,8 +11,9 @@

def encrypt(plaintext: str, key: str) -> str:
"""
Encrypt a given plaintext (string) and key (string), returning the
Encrypt a given `plaintext` (string) and `key` (string), returning the
encrypted ciphertext.
>>> encrypt("hello world", "coffee")
'jsqqs avvwo'
>>> encrypt("coffee is good as python", "TheAlgorithms")
Expand Down Expand Up @@ -74,8 +76,9 @@ def encrypt(plaintext: str, key: str) -> str:

def decrypt(ciphertext: str, key: str) -> str:
"""
Decrypt a given ciphertext (string) and key (string), returning the decrypted
Decrypt a given `ciphertext` (string) and `key` (string), returning the decrypted
ciphertext.
>>> decrypt("jsqqs avvwo", "coffee")
'hello world'
>>> decrypt("vvjfpk wj ohvp su ddylsv", "TheAlgorithms")
Expand Down
77 changes: 47 additions & 30 deletions ciphers/caesar_cipher.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,51 +7,58 @@ def encrypt(input_string: str, key: int, alphabet: str | None = None) -> str:
"""
encrypt
=======
Encodes a given string with the caesar cipher and returns the encoded
message
Parameters:
-----------
* input_string: the plain-text that needs to be encoded
* key: the number of letters to shift the message by
* `input_string`: the plain-text that needs to be encoded
* `key`: the number of letters to shift the message by
Optional:
* alphabet (None): the alphabet used to encode the cipher, if not
* `alphabet` (``None``): the alphabet used to encode the cipher, if not
specified, the standard english alphabet with upper and lowercase
letters is used
Returns:
* A string containing the encoded cipher-text
More on the caesar cipher
=========================
The caesar cipher is named after Julius Caesar who used it when sending
secret military messages to his troops. This is a simple substitution cipher
where every character in the plain-text is shifted by a certain number known
as the "key" or "shift".
Example:
Say we have the following message:
"Hello, captain"
``Hello, captain``
And our alphabet is made up of lower and uppercase letters:
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
``abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ``
And our shift is "2"
And our shift is ``2``
We can then encode the message, one letter at a time. "H" would become "J",
since "J" is two letters away, and so on. If the shift is ever two large, or
We can then encode the message, one letter at a time. ``H`` would become ``J``,
since ``J`` is two letters away, and so on. If the shift is ever two large, or
our letter is at the end of the alphabet, we just start at the beginning
("Z" would shift to "a" then "b" and so on).
(``Z`` would shift to ``a`` then ``b`` and so on).
Our final message would be "Jgnnq, ecrvckp"
Our final message would be ``Jgnnq, ecrvckp``
Further reading
===============
* https://en.m.wikipedia.org/wiki/Caesar_cipher
Doctests
========
>>> encrypt('The quick brown fox jumps over the lazy dog', 8)
'bpm yCqks jzwEv nwF rCuxA wDmz Bpm tiHG lwo'
Expand Down Expand Up @@ -85,23 +92,28 @@ def decrypt(input_string: str, key: int, alphabet: str | None = None) -> str:
"""
decrypt
=======
Decodes a given string of cipher-text and returns the decoded plain-text
Parameters:
-----------
* input_string: the cipher-text that needs to be decoded
* key: the number of letters to shift the message backwards by to decode
* `input_string`: the cipher-text that needs to be decoded
* `key`: the number of letters to shift the message backwards by to decode
Optional:
* alphabet (None): the alphabet used to decode the cipher, if not
* `alphabet` (``None``): the alphabet used to decode the cipher, if not
specified, the standard english alphabet with upper and lowercase
letters is used
Returns:
* A string containing the decoded plain-text
More on the caesar cipher
=========================
The caesar cipher is named after Julius Caesar who used it when sending
secret military messages to his troops. This is a simple substitution cipher
where very character in the plain-text is shifted by a certain number known
Expand All @@ -110,27 +122,29 @@ def decrypt(input_string: str, key: int, alphabet: str | None = None) -> str:
Example:
Say we have the following cipher-text:
"Jgnnq, ecrvckp"
``Jgnnq, ecrvckp``
And our alphabet is made up of lower and uppercase letters:
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
``abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ``
And our shift is "2"
And our shift is ``2``
To decode the message, we would do the same thing as encoding, but in
reverse. The first letter, "J" would become "H" (remember: we are decoding)
because "H" is two letters in reverse (to the left) of "J". We would
continue doing this. A letter like "a" would shift back to the end of
the alphabet, and would become "Z" or "Y" and so on.
reverse. The first letter, ``J`` would become ``H`` (remember: we are decoding)
because ``H`` is two letters in reverse (to the left) of ``J``. We would
continue doing this. A letter like ``a`` would shift back to the end of
the alphabet, and would become ``Z`` or ``Y`` and so on.
Our final message would be "Hello, captain"
Our final message would be ``Hello, captain``
Further reading
===============
* https://en.m.wikipedia.org/wiki/Caesar_cipher
Doctests
========
>>> decrypt('bpm yCqks jzwEv nwF rCuxA wDmz Bpm tiHG lwo', 8)
'The quick brown fox jumps over the lazy dog'
Expand All @@ -150,41 +164,44 @@ def brute_force(input_string: str, alphabet: str | None = None) -> dict[int, str
"""
brute_force
===========
Returns all the possible combinations of keys and the decoded strings in the
form of a dictionary
Parameters:
-----------
* input_string: the cipher-text that needs to be used during brute-force
* `input_string`: the cipher-text that needs to be used during brute-force
Optional:
* alphabet: (None): the alphabet used to decode the cipher, if not
* `alphabet` (``None``): the alphabet used to decode the cipher, if not
specified, the standard english alphabet with upper and lowercase
letters is used
More about brute force
======================
Brute force is when a person intercepts a message or password, not knowing
the key and tries every single combination. This is easy with the caesar
cipher since there are only all the letters in the alphabet. The more
complex the cipher, the larger amount of time it will take to do brute force
Ex:
Say we have a 5 letter alphabet (abcde), for simplicity and we intercepted the
following message:
"dbc"
Say we have a ``5`` letter alphabet (``abcde``), for simplicity and we intercepted
the following message: ``dbc``,
we could then just write out every combination:
ecd... and so on, until we reach a combination that makes sense:
"cab"
``ecd``... and so on, until we reach a combination that makes sense:
``cab``
Further reading
===============
* https://en.wikipedia.org/wiki/Brute_force
Doctests
========
>>> brute_force("jFyuMy xIH'N vLONy zILwy Gy!")[20]
"Please don't brute force me!"
Expand Down
99 changes: 51 additions & 48 deletions ciphers/decrypt_caesar_with_chi_squared.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,103 +11,106 @@ def decrypt_caesar_with_chi_squared(
"""
Basic Usage
===========
Arguments:
* ciphertext (str): the text to decode (encoded with the caesar cipher)
* `ciphertext` (str): the text to decode (encoded with the caesar cipher)
Optional Arguments:
* cipher_alphabet (list): the alphabet used for the cipher (each letter is
a string separated by commas)
* frequencies_dict (dict): a dictionary of word frequencies where keys are
the letters and values are a percentage representation of the frequency as
a decimal/float
* case_sensitive (bool): a boolean value: True if the case matters during
decryption, False if it doesn't
* `cipher_alphabet` (list): the alphabet used for the cipher (each letter is
a string separated by commas)
* `frequencies_dict` (dict): a dictionary of word frequencies where keys are
the letters and values are a percentage representation of the frequency as
a decimal/float
* `case_sensitive` (bool): a boolean value: ``True`` if the case matters during
decryption, ``False`` if it doesn't
Returns:
* A tuple in the form of:
(
most_likely_cipher,
most_likely_cipher_chi_squared_value,
decoded_most_likely_cipher
)
* A tuple in the form of:
(`most_likely_cipher`, `most_likely_cipher_chi_squared_value`,
`decoded_most_likely_cipher`)
where...
- most_likely_cipher is an integer representing the shift of the smallest
chi-squared statistic (most likely key)
- most_likely_cipher_chi_squared_value is a float representing the
chi-squared statistic of the most likely shift
- decoded_most_likely_cipher is a string with the decoded cipher
(decoded by the most_likely_cipher key)
where...
- `most_likely_cipher` is an integer representing the shift of the smallest
chi-squared statistic (most likely key)
- `most_likely_cipher_chi_squared_value` is a float representing the
chi-squared statistic of the most likely shift
- `decoded_most_likely_cipher` is a string with the decoded cipher
(decoded by the most_likely_cipher key)
The Chi-squared test
====================
The caesar cipher
-----------------
The caesar cipher is a very insecure encryption algorithm, however it has
been used since Julius Caesar. The cipher is a simple substitution cipher
where each character in the plain text is replaced by a character in the
alphabet a certain number of characters after the original character. The
number of characters away is called the shift or key. For example:
Plain text: hello
Key: 1
Cipher text: ifmmp
(each letter in hello has been shifted one to the right in the eng. alphabet)
| Plain text: ``hello``
| Key: ``1``
| Cipher text: ``ifmmp``
| (each letter in ``hello`` has been shifted one to the right in the eng. alphabet)
As you can imagine, this doesn't provide lots of security. In fact
decrypting ciphertext by brute-force is extremely easy even by hand. However
one way to do that is the chi-squared test.
one way to do that is the chi-squared test.
The chi-squared test
-------------------
--------------------
Each letter in the english alphabet has a frequency, or the amount of times
it shows up compared to other letters (usually expressed as a decimal
representing the percentage likelihood). The most common letter in the
english language is "e" with a frequency of 0.11162 or 11.162%. The test is
completed in the following fashion.
english language is ``e`` with a frequency of ``0.11162`` or ``11.162%``.
The test is completed in the following fashion.
1. The ciphertext is decoded in a brute force way (every combination of the
26 possible combinations)
``26`` possible combinations)
2. For every combination, for each letter in the combination, the average
amount of times the letter should appear the message is calculated by
multiplying the total number of characters by the frequency of the letter
multiplying the total number of characters by the frequency of the letter.
| For example:
| In a message of ``100`` characters, ``e`` should appear around ``11.162``
times.
For example:
In a message of 100 characters, e should appear around 11.162 times.
3. Then, to calculate the margin of error (the amount of times the letter
SHOULD appear with the amount of times the letter DOES appear), we use
the chi-squared test. The following formula is used:
3. Then, to calculate the margin of error (the amount of times the letter
SHOULD appear with the amount of times the letter DOES appear), we use
the chi-squared test. The following formula is used:
Let:
- n be the number of times the letter actually appears
- p be the predicted value of the number of times the letter should
appear (see item ``2``)
- let v be the chi-squared test result (referred to here as chi-squared
value/statistic)
Let:
- n be the number of times the letter actually appears
- p be the predicted value of the number of times the letter should
appear (see #2)
- let v be the chi-squared test result (referred to here as chi-squared
value/statistic)
::
(n - p)^2
--------- = v
p
(n - p)^2
--------- = v
p
4. Each chi squared value for each letter is then added up to the total.
The total is the chi-squared statistic for that encryption key.
5. The encryption key with the lowest chi-squared value is the most likely
to be the decoded answer.
Further Reading
================
===============
* http://practicalcryptography.com/cryptanalysis/text-characterisation/chi-squared-
statistic/
* http://practicalcryptography.com/cryptanalysis/text-characterisation/chi-squared-statistic/
* https://en.wikipedia.org/wiki/Letter_frequency
* https://en.wikipedia.org/wiki/Chi-squared_test
* https://en.m.wikipedia.org/wiki/Caesar_cipher
Doctests
========
>>> decrypt_caesar_with_chi_squared(
... 'dof pz aol jhlzhy jpwoly zv wvwbshy? pa pz avv lhzf av jyhjr!'
... ) # doctest: +NORMALIZE_WHITESPACE
Expand Down
Loading

0 comments on commit f45e392

Please sign in to comment.