Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interline wildcards #42

Merged
merged 7 commits into from
May 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ authors = ["Edd Barrett <vext01@gmail.com>", "Laurence Tratt <laurie@tratt.net>"
readme = "README.md"
license = "Apache-2.0/MIT"
categories = ["development-tools"]
edition = "2018"
edition = "2021"

[dependencies]
regex = "1.8"
126 changes: 110 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,115 @@
# fm

`fm` is a simple non-backtracking fuzzy text matcher useful for matching
multi-line patterns and text. At its most basic the wildcard operator `...`
default) can be used in the following ways:

* If a line consists solely of `...` it means "match zero or more lines of text".
* If a line starts with `...`, the search is not anchored to the start of the line.
* If a line ends with `...`, the search is not anchored to the end of the line.

Note that `...` can appear both at the start and end of a line and if a line
consists of `......` (i.e. starts and ends with the wildcard with nothing
inbetween), it will match exactly one line. If the wildcard operator appears in
any other locations, it is matched literally. Wildcard matching does not
backtrack, so if a line consists solely of `...` then the next matching line
anchors the remainder of the search.

The following examples show `fm` in action using its defaults:
`fm` is a simple limited backtracking fuzzy text matcher useful for matching
multi-line *patterns* and *literal* text. Wildcard operators can be used to
match parts of a line and to skip multiple lines of text. For example this
*pattern*:

```text
...A
...
D...
```

will successfully match against literals such as:

```text
xyzA
B
C
Dxyz
```


## Intraline matching

The intraline wildcard operator `...` can appear at the start and/or end of a
line. `...X...` matches any literal line that contains "X"; `...X` matches any
literal line that ends with "X"; and `X...` matches any literal line that
starts with "X". `......` matches exactly one literal line (i.e. the contents
of the literal line are irrelevant but this will not match against the end
of the literal text).


## Interline matching

There are two interline wildcard operators that match zero or more literal
lines until a match for the next *item* is found, at which point the search is
*anchored* (i.e. backtracking will not occur before the anchor). An item is
either:

* A single pattern line.
* A group of pattern lines. A group is the sequence of pattern lines between
two interline wildcard operators or, if no wildcard operator is found, the
end of the pattern.

The interline wildcards are:

* The *prefix match* wildcard `...` matches until it finds a match for the
line immediately after the interline operator ("the prefix"), at which
point the search is anchored. This wildcard does not backtrack.
* The *group match* wildcard `..~` matches until it finds a match for the
next group, at which point the search is anchored. This wildcard
backtracks, though never further than one group.

Interline wildcards cannot directly follow each other i.e. `...\n...?` is an
invalid pattern. Interline wildcards can appear at the beginning or end of
a pattern: at the end of a pattern, both interline wildcards have identical
semantics to each other.

Consider this pattern:

```text
A
...
B
C
...
```

This will match successfully against the literal:

```text
A
D
B
C
E
```

but fail to match against the literal:

```text
A
D
B
B
C
E
```

because the `...` matches against the first "B", which anchors the search, then
immediately fails to match against the second "B".

In contrast the pattern:

```text
A
..~
B
C
...
```

does match the literal because `..~` backtracks on the second "B".

There are two reasons why you should default to using `...` rather than `..~`.
Most obviously `...` does not backtrack and has linear performance. Less
obviously `...` is a more rigorous test, since it cannot skip prefix matches
(i.e. the next line after the `...` in the pattern) in the literal.


## API

```rust
use fm::FMatcher;
Expand Down
Loading