This is a toy library to match regular expression in ocaml. Real regular expression library such as pcre, re2 or tre are big and complex this is not meant to be used as a serious library. The plan is to use memoization on top of Bzrowosky derivatives to reach an asymptotic O(1) behaviour (derivatives can be used to constructed DFAs). Not quite sure yet how to tie tagged transitions in all this.
Interesting articles:
-
Regular expressions can be simple and fast: Russ cox has a great series of articles on regular expression matching. This article explains the problems with recursive backtrace based engines and gives a great overview of NFA/DFA based methods.
-
Regular expression derivatives re-examined: This is not the first article on Brzowosky derivatives (this would be his Brzowsky 1964 Derivatives of Regular expressions) but it is not behind a paywall and does a good job summarizing the previous research. The author also show that with some small reductions on the derived regular expressions there's a finite number of different regular expressions that can be reached while parsing any regular expression thus making the algorithm sound to generate DFA's
-
NFAs with Tagged Transitions, their Conversion to Deterministic Automata and Application to Regular Expressions: Ville Laurikari's article explaining how to do subgroup matching in NFA and DFA without using backtracking. This technique is used in in Ville's TRE.
-
Hash consing: Hash consig can be used to reduce structural equality test to pointer equality. This, combined with hashing (embedded in the node) can be used to make O(1) smart constructors (they require equality tests) and hashtables.
-
LTU's discussion on Regular expression matching: Plenty of interesting insights.
OCaml specific:
-
Christian Lindig and Norman Ramsey's RX: Derivative based regular expression engine. Small and clean.
-
Jerome Vouillon's RE: Uses a Lazy DFA construction. Probably the most full-featured pure ocaml regular expression library.
-
Claude Marche's Regexp: Standard NFA->DFA construction; uses fancy datastructures.
-
Jean Cristophe Filliatre's hash consing: paper about hash consing.