Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interline wildcards #42

Merged
merged 7 commits into from
May 17, 2024
Merged

Conversation

ltratt
Copy link
Member

@ltratt ltratt commented May 16, 2024

The most important commit here is ab7a1fe --- hopefully its commit message gives a decent overview of what this PR as a whole aims to do.

If this looks like what you were hoping this feature might be, then I will start to put in place the bits to be able to use it experimentally with yk (I have confirmed locally that with this change all yk's tests pass, but obviously none of them need the new feature yet).

Wildcards within, and between, lines behave very differently, so split
them into two different constants. Right now this doesn't make any
difference (the constants are the same!), but it makes it clearer in the
code which is which at different points.
This is so, when we shortly add new syntax, we can warn users of the
change.
This means only having to edit one file instead of keeping two
nearly-identical things in sync. It also means that the doc strings in
the README are tested.
@vext01
Copy link
Member

vext01 commented May 16, 2024

OK, so lets see if I understand.

This introduces two new bits of syntax ..? and ..*:

  • ..? is what ... on a line on its own used to be
  • ..* means the following (presumably non-wildcard) lines will be grouped.

Is that right?

What does ... on a line on its own now do?

README.md Outdated

will, through backtracing, successfully match the literal.

```text
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code is not being referenced, and probably doesn't need to be here. Same in the commit message. Also the commit message repeats itself after this paragraph:

There are two reasons why you should default to using `..?` rather than `..*`.
Most obviously `..?` does not backtrack and has linear performance. Less
obviously `..?` prevents literals from matching when they contain multiple
similar sequences. Informally, `..?` makes for more rigorous testing: `..?` can
be thought of as "the next thing that matches must look like X" whereas `..*`
says "skip things that are almost like X until you find something that is
definitely X".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, fixed in efa7f6d. The commit message duplication is a mistake too.

@ptersilie
Copy link
Member

What does ... on a line on its own now do?

From my understanding nothing. This only works at the beginning/end of a line now.

I must admit I find ..* still a bit unintuitive, as it looks a bit like a greedy match, but that isn't what it is at all. It's just saying, interpret the next consecutive lines as a group and make sure to match them full. But I guess it works, and since I have no better idea, I better shut up now.

@ltratt
Copy link
Member Author

ltratt commented May 16, 2024

What does ... on a line on its own now do?
From my understanding nothing. This only works at the beginning/end of a line now.

Yes, using it on a line on its own throws an error.

I'm largely unworried what syntax we choose, so if ..? and ..* are offensive, I'm open to suggestions as to alternatives.

@vext01
Copy link
Member

vext01 commented May 16, 2024

It feels bad to write this, but I'm not sure again.

Two things are niggling me:

  • Your last proposal wasn't a breaking change for anyone, but this is a breaking change for everyone, even those who don't need the new functionality.
  • I don't find the syntax intuitive.

I now wonder if I'm being too picky. I've asked for a feature, you've implemented it twice, and I'm still whining 😆

Having realised that I won't get perfection, I think I'd be OK with one of the following:

  • The old proposal with ((( and ))).
  • Keep the old meaning of ... and introduce only ..* for group matching. At least that way it's not a breaking change.

Perhaps the latter is marginally better, since it's shorter to type? I don't know.

I wonder what @ptersilie thinks.

@vext01
Copy link
Member

vext01 commented May 16, 2024

I wonder what @ptersilie thinks.

Ah, he already posted, sorry.

@ltratt
Copy link
Member Author

ltratt commented May 16, 2024

Your last proposal wasn't a breaking change for anyone, but this is a breaking change for everyone

It breaks things but it errors if you use the things that have broken: ... is now a syntax error. This feels better than changing the semantics of ... to me.

That said, if we want to turn ..? back into ... and have ..* (or some other syntax) for the "backtracking group" thing, I'm fine with that too.

@vext01
Copy link
Member

vext01 commented May 16, 2024

That said, if we want to turn ..? back into ... and have ..* (or some other syntax) for the "backtracking group" thing, I'm fine with that too.

That would be the second bullet point it my last post. I think I'd be OK with this as a compromise.

And again, sorry to have wasted your time.

@ltratt
Copy link
Member Author

ltratt commented May 16, 2024

No time has been wasted --- no need to apologise!

@ltratt
Copy link
Member Author

ltratt commented May 17, 2024

@ptersilie @vext01 8877f30 and 3c1f4f9 change the syntax as per our offline discussion.

6f4acba tries to clarify the documentation to avoid some of the misunderstandings we've previously had. ... is now called "the prefix wildcard" to make (I hope!) clearer that it's not searching for the next group. Does that help?

@vext01
Copy link
Member

vext01 commented May 17, 2024

I think I'm OK with this.

Assuming @ptersilie is, what should we do? Merge this into master and test it for a while?

@ltratt
Copy link
Member Author

ltratt commented May 17, 2024

Yes, I think we merge it into master, patch it into yk, and just see how we go. I'd definitely like to sit on it for a little while before making update fm/lang_tester releases.

@ltratt
Copy link
Member Author

ltratt commented May 17, 2024

It feels like now is probably an appropriate time to squash. OK to do so?

@vext01
Copy link
Member

vext01 commented May 17, 2024

Go ahead.

This allows "group matching", which also implies backtracing.

Consider this pattern:

```text
A
..?
B
C
..?
```

This will match successfully against the literal:

```text
A
D
B
C
E
```

but fail to match against the literal:

```text
A
D
B
B
C
E
```

because the `..?` matched against the first "B", anchored the search, then
immediately failed to match against the second "B".

In contrast the pattern:

```text
A
..*
B
C
..?
```

will, through backtracing, successfully match the literal.

```text
A
..?
B
C
..*
D
E
```

There are two reasons why you should default to using `..?` rather than `..*`.
Most obviously `..?` does not backtrack and has linear performance. Less
obviously `..?` prevents literals from matching when they contain multiple
similar sequences. Informally, `..?` makes for more rigorous testing: `..?` can
be thought of as "the next thing that matches must look like X" whereas `..*`
says "skip things that are almost like X until you find something that is
definitely X".

Consider this pattern:

```text
A
..?
B
C
..?
```

This will match successfully against the literal:

```text
A
D
B
C
E
```

but fail to match against the literal:

```text
A
D
B
B
C
E
```

because the `..?` matched against the first "B", anchored the search, then
immediately failed to match against the second "B".

In contrast the pattern:

```text
A
..*
B
C
..?
```

will, through backtracing, successfully match the literal.

```text
A
..?
B
C
..*
D
E
```

There are two reasons why you should default to using `..?` rather than `..*`.
Most obviously `..?` does not backtrack and has linear performance. Less
obviously `..?` prevents literals from matching when they contain multiple
similar sequences. Informally, `..?` makes for more rigorous testing: `..?` can
be thought of as "the next thing that matches must look like X" whereas `..*`
says "skip things that are almost like X until you find something that is
definitely X".
@ltratt
Copy link
Member Author

ltratt commented May 17, 2024

Squashed.

@vext01
Copy link
Member

vext01 commented May 17, 2024

Are we merging? If so, mark it ready. It's currently a draft PR.

@ltratt ltratt marked this pull request as ready for review May 17, 2024 17:33
@ltratt
Copy link
Member Author

ltratt commented May 17, 2024

Oops, de-drafted!

@vext01 vext01 added this pull request to the merge queue May 17, 2024
Merged via the queue into softdevteam:master with commit e812719 May 17, 2024
2 checks passed
@ltratt ltratt deleted the interline_wildcards branch May 18, 2024 07:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants