-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New 'completions' command, and various tab-completion fixes and improvements #649
base: main
Are you sure you want to change the base?
Conversation
We establish the readline-equivalent $input word for consistency, so that the $input value that we pass to processCallback() will be equivalent to the value passed by GNU Readline in the same situation. This is not strictly necessary for the present code, but may prove to be useful for subsequent enhancements.
ClassNamesMatcher was the only matcher that was using preg_quote(); all of the others mis-handled regexp special characters.
This will be used in place of tests for the T_STRING token, which is inherently unreliable for completion purposes, because an incomplete identifier can be any of ~70 different tokens.
Unless there are multiple token variables being acted upon, name the completion candidate $input, for consistency with the other Matchers.
We rejected plain '$' as $token, so don't allow self::T_VARIABLE. T_OPEN_TAG is fine, but was already handled.
A T_VARIABLE cannot be completed to a Keyword.
Don't assume that one command isn't the prefix of another. Do the cheap empty() test first.
…tched() methods tokenIsValidIdentifier($token, true) allows the empty string to match.
- Optimisation for ClassAttributesMatcher and ClassMethodsMatcher. - Also remove additional redundant type-checking, already catered for by '===' equality test.
This ensures that a $token which is valid neither as an identifier nor as a variable will only appear as a 'previous' token, and never as the token being actively completed. This reduces the variety of cases which the matchers need to care about. This also ensures that completing against whitespace does not attempt to complete the preceding token (which was already complete, and not what the user asked for).
Doing so prevents a CodeArgument from seeing that trailing whitespace, which in turn meant that the 'completions' command was unable to tell whether the user wanted to complete the previous token, or (after a space) an empty string.
Both indicate that whatever comes next is a new expression.
This includes blacklisting T_NEW and T_NS_SEPARATOR when not completing classes; and blacklisting T_OBJECT_OPERATOR and T_DOUBLE_COLON when not completing attributes/methods for objects and classes (respectively).
If we encounter a needCompleteClass() token, don't continue to prepend the tokens which preceded that.
ClassNamesMatcher::getMatches() passes the incomplete class, so we need to support an incomplete (potentially empty) final component.
Some Style CI issues to resolve there. I'm done for tonight, but I'll try to fix those tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh. Sorry about the code style thrashing 😬
I'm a littttle bit worried that all these changes and bug fixes didn't break any tests.
On master
, it looks like fully a third of matchers have no test coverage at all :(
https://codecov.io/gh/bobthecow/psysh/tree/master/src/TabCompletion/Matcher
The remainder are in a decent place, but the ones that are < 100% tend to drop coverage for edge cases, so it wouldn't surprise me if we would have found more of the bugs you've uncovered if we'd tested empty states, etc.
For a change this size, I'd feel a lot better about things if (1) we made sure existing functionality was covered and passing on master
before this pull request, and (2) we added edge case tests for all the bugs fixed, and made sure they fail on master and work after this pull request.
* @return array | ||
*/ | ||
protected function getVariables() | ||
protected function getVariables($dollarPrefix = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like an implementation detail for the VariablesMatcher. Will anything else ever need it? Should that matcher do the prefixing itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether anything else will ever use it; but if they do, I think it would be nicer if it was as simple as passing an argument to the existing function, so I'd be inclined to leave it where it is. (I'm happy to change it if you disagree, though.)
…asMatched() methods
I've pushed some fixup commits (to be squashed later) to deal with the initial style issues. |
Travis says: Fatal error: Default value for parameters with a class type hint can only be NULL in /home/travis/build/bobthecow/psysh/src/TabCompletion/Matcher/AbstractMatcher.php on line 216
Codecov Report
@@ Coverage Diff @@
## master #649 +/- ##
============================================
- Coverage 68.34% 68.32% -0.03%
- Complexity 2287 2302 +15
============================================
Files 131 132 +1
Lines 5447 5566 +119
============================================
+ Hits 3723 3803 +80
- Misses 1724 1763 +39
Continue to review full report at Codecov.
|
Oh good grief. The "resolve conversation" button silently ignores the comment you entered for the purpose of closing the conversation? I'd actually commented on everything I closed :( |
@phil-s what's the status on this PR? |
You were right in that more tests are needed. I started looking at that (which highlighted some bugs), before a combination of work getting busy and myself getting slightly injured stalled my progress. I'll have to check where I got to -- probably some more WIP commits I could push. |
This branch implements the new command
completions
for issue #555,and also fixes most of the pre-existing problems I encountered while testing
the command and my wrapper. The fixes may well resolve some of the other
tab-completion issues which have been raised previously.
The command is not enabled by default; I'm enabling it in my config file by setting:
There are plenty of changes here, but I've kept the individual commits
very focused, so hopefully it's all fairly easy to review.
The
Psy\TabCompletion\Matcher
systems are doing smart things with PHPtokens, but are also regularly getting tripped up by them and failing to
offer any completions at all (while sometimes simultaneously treating huge
numbers of irrelevant thing as completion options in the background).
The completion process starts with:
And then generally the last token is popped from the array and checked to
see whether it could be a valid token for that Matcher's interests, and the
problem is that these tests consider a very limited set of tokens, which
then excludes the large number of PHP keywords which (a) have their own
token, and (b) are completely valid prefixes for an as-yet incomplete
identifier.
So for instance, if you defined a function
abstraction()
and then try touse tab completion to input that name, you can complete from "abstrac" and
"abstracti", but "abstract" gives you no results, because that's the token
T_ABSTRACT
, andFunctionsMatcher::hasMatched()
, like most of thematchers, ignores text which wasn't parsed as
T_STRING
.I think the following would be the current list of un-completable
identifiers, but obviously it's subject to change as the language evolves,
so maintaining a big whitelist doesn't seem like the way forward.
That's 67 tokens (which is nearly 50%). The remaining 72 are:
We need all the Matchers to stop caring whether or not the last token in the
list was parsed as
T_STRING
(in particular), because implicitly that tokenis not yet complete, and therefore the parser can't know what it's
supposed to be. They should ignore the token type, and instead just check
that text to see whether it would be valid as the prefix of an identifier,
which we can do by matching against the
CONSTANT_SYNTAX
regexp[1].In addition to the bug where no completions are supplied, the token
behaviour can also do something akin to the opposite, and cause certain
matchers to return everything they know about as a completion, whether or
not it matches the initial input.
This happens when a matcher's
getMatches()
method callsAbstractMatcher::getInput()
which firstly sets$var
to an empty string,but then (on account of tokens) never gets to change it to anything else.
getMatches()
then 'filters' All Of The Things It Knows About usingAbstractMatcher::startsWith()
, to find which of those things begins withan empty string. That method happily agrees that everything starts with
an empty string, and so All Of The Things are merged into the set of
completions.
You can observe this by adding
print_r($matches);
(or other logging) toAutoCompleter::processCallback()
right before it returns, and thenattempting completion.
var
or$var
for example.GNU Readline is masking this bug by not presenting any of the candidates
which do not actually start with the original incomplete word/prefix[2],
making it seem as if the right thing is happening behind the scenes; but the
new 'completions' command would list all of them, which is undesirable.
(We could make the command do its own additional filtering step, but it's
better to produce the correct set of candidates in the first place.)
Finally, there's a lot of complexity (and consequently some bugs) caused by
not normalising the token sequence to ensure that the Thing To Be Completed
is always the final token. At present, if the user effectively
tab-completes at an empty string, then we end up with a token sequence
ending with the previous (and already-complete) token. Some of the
matchers are attempting to handle this by looking for tokens-of-interest in
both the current and the previous token; but it greatly simplifies the
logic if we instead ensure that the token sequence is more consistent, so
that if we're completing an empty string then the token sequence simply ends
with an empty string 'token'.
[1] I do have one question about
CONSTANT_SYNTAX
andVAR_SYNTAX
:https://www.php.net/manual/en/language.variables.basics.php gives a
near-identical pattern (and suggests that this applies to all PHP
identifiers, rather than just variables); but your pattern includes the
character 0x7f (DEL), whereas the one in the manual starts that range from
0x80. I'm not sure whether this indicates a change/fix to the manual since
the code was written, or if you've intentionally added that DEL char to your
regexps (but I can't think why that would be).
The manual says:
So this may be a bug in the current regexps?
[2] What the incomplete word/prefix will actually be is another source of
confusion. AFAICS, PHP's readline support provides no control over the set
of characters which act as word breaks for the purpose of completing the
current 'word'; and the defaults we're stuck with produce some fairly
inconsistent behaviours for completing PHP code.
The current code was handling things appropriately, but there was absolutely
no documentation of the reasons why certain things were the way they were.
I ended up spending a bunch of time trying to make things more consistent
for the new 'completions' command, only to find that those changes didn't
work in readline itself.
This was a documentation problem in large part (and the new documentation
will be beneficial to anyone looking to hack on this code in the future),
but we also need to define those word break characters in the code so that
the 'completions' command will supply the same arguments that Readline would
supply to the callback function.