Skip to content
paul-wolf edited this page Feb 14, 2014 · 4 revisions

Class Methods

  • render()

    • return a single randomized string
  • render_list()

    • return a list of strings
  • dump()

    • The dump() method provides some useful information for debugging a template.

Examples

For brevity, we'll alias the StringGenerator:

>>> from strgen import StringGenerator as SG

Random nucleotide sequence of length 80:

>>> SG("[ACGT]{80}").render()
u'GGGGCTGTCTGAATGTGTATGTCAACTTGAATTTCAGGGCCTTTCTCTTCCGCACCGGGTGGCGCTACCCAGTAGTGGGC'

Not intended to be a valid sequence.

USA interstate Powerball lottery number (whiteballs only). The requirement is for five numbers, non-repeating, between 0 - 59.

>>> "-".join(SG("[0-5][0-9]").render_list(5,unique=True))
u'47-38-18-45-11'

Notice, we can't generate a unique list of these without additional code. Combinations of the above sequence of five numbers are considered the same. It is the unique set that must be considered and not the permutation of the numbers. Therefore, to generate a unique set against previous generations requires additional coding.

Password Requirements

"a password shall have 6 - 20 characters of which at least one must be a digit and at least one must be a special character"

>>> SG("[\l\d]{4:18}&[\d]&[\p]").render()
    u'3H!BjN'

Notice that we are saying, we want a string with just one special character, which is one of:

!"#$%&\'()*+,-./:;<=>?@[\]^_`{|}~

Since each time we use a character class without a quantifier we add exactly one character, we need to say 4:18 instead of 6:20 for the quantifier.

We could have any number of digits, but the likelihood of a digit occurring is less than the likelihood of a letter occurring because there are 52 upper and lower case ascii letters and just 10 digits to choose from.

The above specification is ambiguous as they usually are. The assumption is made that:

[\l\d]{20}

is guaranteed to result in a string with some letters, but it's not. Although unlikely, the string might contain only digits. Therefore, to guarantee letters as well, you would need to append an additional ascii letter class with the shuffle operator. Then we adjust the quantifier to account for the additional character:

>>> SG("[\l\d]{3:17}&[\l]&[\d]&[\p]").render()
    u'oGkgMZME6uY.fRBAF'

What if you want to have roughly the same number of digits as letters or, more precisely, the probability of an equal distribution of digits as letters over a given sample using render_list()? If we just use [\l\d], we'll get mostly letters with a digit occurring now and again.

You would need to create a class that has approximately as many digits as letters:

>>> pprint(SG("[\l\d\d\d\d]{10}").render_list(10))
[u'J2nhOz5cdE',
 u'5t1BzCHB90',
 u'1OOGmisUi9',
 u'5445466SIX',
 u'kF2pI9d365',
 u'Q9M7K9Pk41',
 u'817ghpC693',
 u'OF5W005a3K',
 u'F81jvv36Y4',
 u'7Bx6u6jWI9']

But, of course, the class from which we are randomly choosing characters is not a unique set, since we are repeating digit characters. You can check what the character set looks like with the dump() method:

>>> SG("[\l\d\d\d\d]{10}").dump()
StringGenerator version: 0.1.3
Python version: 2.7.6 (default, Jan  4 2014, 09:33:05)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)]
Random method provider class: SystemRandom
sequence:
-1:10:abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789012345678901234567890123456789
u'7a1FsB19Tp'

Note the repeated series of digits.

Large Result Strings

There is nothing against generating large result strings if you have the computing resources:

>>> SG("[\l]{1000000}").render()

How Random Is It?

StringGenerator uses the Python random package. It will try to use the SystemRandom class methods which in turn rely on os.urandom(). Quoting from the Python 3 documentation:

The returned data should be unpredictable enough for cryptographic applications, though its exact quality depends on the OS implementation. On a Unix-like system this will query /dev/urandom, and on Windows it will use CryptGenRandom().

If the OS is not able to support this, StringGenerator falls back to the random package methods about which the documentation says:

Almost all module functions depend on the basic function random(), which generates a random float uniformly in the semi-open range [0.0, 1.0). Python uses the Mersenne Twister as the core generator. It produces 53-bit precision floats and has a period of 2**19937-1. The underlying implementation in C is both fast and threadsafe. The Mersenne Twister is one of the most extensively tested random number generators in existence. However, being completely deterministic, it is not suitable for all purposes, and is completely unsuitable for cryptographic purposes.

Therefore, the quality of the random data is related to the underlying Python methods and the design of the template.

Clone this wiki locally