C++ interface to PCRE2 library compatible with <regex>
See:
This header-only library implements std::basic_regex, std::sub_match, std::match_results, std::regex_match, std::regex_search, std::regex_replace, std::regex_iterator, std::regex_token_iterator, and std::regex_error interfaces for the PCRE2 library.
Unsupported features:
regex_traits
: collation is performed by the PCRE library itself, and there is no way to affect its behaviorregex_constants::syntax_option_type::ECMAScript
,regex_constants::syntax_option_type::basic
,regex_constants::syntax_option_type::extended
,regex_constants::syntax_option_type::awk
,regex_constants::syntax_option_type::grep
,regex_constants::syntax_option_type::egrep
are not supported for obvious reasons: if you need another matching engine, there is no need to use PCRE :-)regex_constants::match_flag_type::match_not_bow
,regex_constants::match_flag_type::match_not_eow
: these features are not supported by PCREregex_constants::match_flag_type::match_any
is always on (its description says: "If more than one match is possible, then any match is an acceptable result"; I believe this is always true for PCRE)regex_constants::match_flag_type::match_prev_avail
: if I get the description right, you can just unsetregex_constants::match_flag_type::match_not_bol
flagpcre2::regex_constants::error_type
andstd::regex_constants::error_type
constants are different: PCRE2 does not officially provides constants for compilation errors (only for match errors), and therefore there is no portable way to match PCRE2 errors to std::regex_constants::error_type; in addition, PCRE2 returns much more possible errors than stdc++std::wregex
,std::wcsub_match
,std::wssub_match
,std::wcmatch
,std::wsmatch
,std::wcregex_iterator
,std::wsregex_iterator
,std::wcregex_token_iterator
, andstd::wsregex_token_iterator
are not supported: the size ofwchar_t
differes across platforms, which makes it unsuitable for PCRE (however, the library does providepcre2::regex16
,pcre2::regex32
,pcre2::c16sub_match
,pcre2::c32sub_match
,pcre2::c16match
,pcre2::c32match
,pcre2::s16match
,pcre2::s32match
,pcre2::c16regex_iterator
,pcre2::c32regex_iterator
,pcre2::s16regex_iterator
,pcre2::s32regex_iterator
,pcre2::c16regex_token_iterator
,pcre2::c32regex_token_iterator
,pcre2::s16regex_token_iterator
,pcre2::s32regex_token_iterator
forchar16_t
/std::u16string
andchar32_t
/std::u32string
types)
NB: char16_t
/char32_t
are supported only if sizeof(char16_t) == 2
and sizeof(char32_t) == 4
(the standard says that
char16_t
(char32_t
) has the same size as std::uint_least16_t
(std::uint_least32_t
) type, and their size may be more
than 2 (4) bytes).
Differences from stdc++:
regex_constants::match_flag_type::format_sed
behaves differently than in libstdc++: as far as I can tell, libstdc++ does not handle sed rules properly: there is no way to escape&
or\<digit>
in the format pattern. PCRE2++, however, follows the rules more strictly- new option:
regex_constants::utf
: causes PCRE2 to regard both the pattern and the subject strings that are subsequently processed as strings of UTF characters instead of single-code-unit strings - new option:
regex_constants::ucp
: changes the way PCRE2 processes \B, \b, \D, \d, \S, \s, \W, \w, and some of the POSIX character classes. By default, only ASCII characters are recognized, but when this option is set, Unicode properties are used instead to classify characters
The library depends upon pcre2.h
header file (provided by libpcre2-dev
package in Ubuntu).
8-bit features require linking in pcre2-8
library (libpcre2-8-0
package in Ubuntu).
16-bit features require linking in pcre2-16
library (libpcre2-16-0
package in Ubuntu).
32-bit features require linking in pcre2-32
library (libpcre2-32-0
package in Ubuntu).