The library implements a regular expression engine based on nondeterministic
finite automata (NFA). It uses the Thompson's
construction
algorithm to transform regular expressions into NFAs. No backreferences are used
during the matching process which should make this engine much faster than the
standard Python's re
module.
Supported features:
- common operators like
*
,+
,?
, and|
- ranges, e.g.
[a-z]
- groups, e.g.
(abc)
- starting and ending position indicators:
^
and$
- the caret
^
operator
Prerequisites:
- Python version 3.6.6
- virtualenv or Python 3
venv
module (optional)
Create a virtual environment with virtualenv
or venv
in the directory with
the library sources:
$ virtualenv venv # virtualenv approach
or
$ python -m venv venv # venv approach
Activate the environment with
$ . ./venv/bin/activate
and install the library
$ python setup.py install
To check the installation was successful, it will be a good idea to run the tests. The library uses pytest as a testing framework. If you've followed the instructions from the previous section, pytest is already installed in your virtual environment.
To run the tests do
$ pytest tests/
To get a test coverage report run
$ pytest --cov=regex tests/
A test coverage example report:
---------- coverage: platform darwin, python 3.6.6-final-0 -----------
Name Stmts Miss Cover
----------------------------------------------------------------------------------------------
venv/lib/python3.6/site-packages/regex-0.1-py3.6.egg/regex/__init__.py 3 0 100%
venv/lib/python3.6/site-packages/regex-0.1-py3.6.egg/regex/compiler.py 92 4 96%
venv/lib/python3.6/site-packages/regex-0.1-py3.6.egg/regex/exceptions.py 2 0 100%
venv/lib/python3.6/site-packages/regex-0.1-py3.6.egg/regex/executor.py 25 0 100%
venv/lib/python3.6/site-packages/regex-0.1-py3.6.egg/regex/tokenizer.py 177 14 92%
----------------------------------------------------------------------------------------------
TOTAL 299 18 94%
The API is simple and consists of one function named match
. The function
takes a POSIX-like regular expression and a string and returns True
or
False
depending on whether the string matcher the expression or not.
>>> from regex import match
>>> match(r'a|b', 'a')
True
>>> match(r'[a-z0-9]*(!+|\?+)123', 'aaa000999zzzbbb???123')
True
>>> match(r'foo', 'bar')
False
The function raises the MalformedRegex
exception if the regular expression
can't be parsed.
The library ships with the command line tool named regex
. It is a simple
app that takes a regular expression as its first argument and a string as
the second. For example:
$ regex "a?a?b" "ab"
The string 'ab' matches
Regular Expression Matching Can Be Simple And Fast by Russ Cox.