Skip to content

Commit

Permalink
Update REPL multiline logic and improve identifier parse error messag…
Browse files Browse the repository at this point in the history
…es (#2824)

The REPL attempts to determine if a line continuation is needed to
create a valid formula. Today, it does this by using the Power Fx parser
and attempting to interpret the error messages, which turns out doesn't
work so well. It also isn't very smart about operators and doesn't
handle named formulas or UDFs.

For example, these would not continue:
- `IsMatch( "asdf", "(?x)` long regex best started on the next line
- `NamedFormula =`     long named formula best started on the next line
- `Func( x: Number ): Text =`   long udf, best started on the next line
- `1 + ` long operand, best started on the next line

The new logic is written by hand specifically to answer this
continuation question and does a much better job. It is always a best
guess and only determines when to send the formula to the parser/eval
loop - annoyance either being too conservative or liberal is all that
this controls. Even if a continuation is detected where there should not
be one, the user can always enter a blank line to conclude the
continuation. Not detecting a continuation will be annoying and should
result an issue being raised, but the user can always remove the extra
newline.

Yes, we'll need to add logic in two places if the parser changes
significantly, although that is not anticipated anytime soon. If this
becomes a burden long term, we can look at enhancing the mainline parser
to detect continuation situations, but at this point that feels like the
REPL tail wagging the Power Fx dog.

In the process of learning how identifiers were being parsed, I found
the error messages were very poor for an identifier parser error. I
didn't change the logic, just updated the messages and added some more
tests.
  • Loading branch information
gregli-msft authored Jan 23, 2025
1 parent 222fbe9 commit 8798153
Show file tree
Hide file tree
Showing 6 changed files with 438 additions and 43 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -692,6 +692,7 @@ internal static class TexlStrings
public static ErrorResourceKey ErrCannotCoerce_SourceType_TargetType = new ErrorResourceKey("ErrCannotCoerce_SourceType_TargetType");
public static ErrorResourceKey ErrNumberOrStringExpected = new ErrorResourceKey("ErrNumberOrStringExpected");
public static ErrorResourceKey ErrClosingBracketExpected = new ErrorResourceKey("ErrClosingBracketExpected");
public static ErrorResourceKey ErrClosingIdentifierExpected = new ErrorResourceKey("ErrClosingIdentifierExpected");
public static ErrorResourceKey ErrEmptyInvalidIdentifier = new ErrorResourceKey("ErrEmptyInvalidIdentifier");
public static ErrorResourceKey ErrIncompatibleTypes = new ErrorResourceKey("ErrIncompatibleTypes");
public static ErrorResourceKey ErrIncompatibleTypesForEquality_Left_Right = new ErrorResourceKey("ErrIncompatibleTypesForEquality_Left_Right");
Expand Down
8 changes: 4 additions & 4 deletions src/libraries/Microsoft.PowerFx.Core/Parser/TexlParser.cs
Original file line number Diff line number Diff line change
Expand Up @@ -1429,13 +1429,13 @@ private Identifier ParseIdentifier(Token at = null)
if (_curs.TidCur == TokKind.Ident)
{
tok = _curs.TokMove().As<IdentToken>();
if (tok.HasDelimiterStart && !tok.HasDelimiterEnd)
if (tok.IsModified)
{
PostError(tok, TexlStrings.ErrClosingBracketExpected);
PostError(tok, TexlStrings.ErrEmptyInvalidIdentifier);
}
else if (tok.IsModified)
else if (tok.HasDelimiterStart && !tok.HasDelimiterEnd)
{
PostError(tok, TexlStrings.ErrEmptyInvalidIdentifier);
PostError(tok, TexlStrings.ErrClosingIdentifierExpected);
}
}
else
Expand Down
266 changes: 242 additions & 24 deletions src/libraries/Microsoft.PowerFx.Repl/Services/MultilineProcessor.cs
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,13 @@
// Licensed under the MIT license.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Linq.Expressions;
using System.Text;
using System.Text.RegularExpressions;
using Microsoft.PowerFx.Core.Localization;
using Microsoft.PowerFx.Core.Utils;
using Microsoft.PowerFx.Repl.Functions;
using Microsoft.PowerFx.Syntax;
using Microsoft.PowerFx.Types;
Expand All @@ -23,37 +25,251 @@ public class MultilineProcessor
// false if we're on a subsequent line.
public bool IsFirstLine => _commandBuffer.Length == 0;

// Return null if we need more input.
// else return string containing multiple lines together.
public virtual string HandleLine(string line, ParserOptions parserOptions)
// Recursively parses a formula to see if there are any open (, {, [, comments, strings, or if the formula ends with a unary prefix or binary operator
// Recursion happens for string interoplation
// An earlier version of this routine attempted to use the Power Fx parser directly, but interpreting erorr messages to determine continuation situations was not accurate enough
private int ParseFormulaToClose(int bufferIndex, bool closeOnCurly, ref bool complete, ref bool error)
{
_commandBuffer.AppendLine(line);
var brackets = new Stack<char>(); // stack of [, {, ( to ensure proper matching
var close = false; // exit the loop as we've found the closing } if cloneOnCurly is true
var leftOpen = false; // an identfier, string, or inline comment was left open when the end of the string was encountered
var lastOperator = false; // was the last non-whitespace, non-comment character an operator?

if (closeOnCurly)
{
brackets.Push('}');
}

for (; !close && !error && !leftOpen && bufferIndex < _commandBuffer.Length; bufferIndex++)
{
switch (_commandBuffer[bufferIndex])
{
// text string
case '"':
var stringInterpolation = bufferIndex > 0 && _commandBuffer[bufferIndex - 1] == '$';

for (bufferIndex++; bufferIndex < _commandBuffer.Length; bufferIndex++)
{
if (_commandBuffer[bufferIndex] == '"')
{
if (bufferIndex + 1 < _commandBuffer.Length && _commandBuffer[bufferIndex + 1] == '"')
{
// skip repeated quote
bufferIndex++;
}
else
{
// end delimiter reached
break;
}
}
else if (stringInterpolation && _commandBuffer[bufferIndex] == '{')
{
if (bufferIndex + 1 < _commandBuffer.Length && _commandBuffer[bufferIndex + 1] == '{')
{
// skip repeated {
bufferIndex++;
}
else
{
// recurse in for string interpolation island
bufferIndex = ParseFormulaToClose(bufferIndex + 1, true, ref complete, ref error);
}
}
}

// reached end of string before we found our ending delimiter
if (bufferIndex == _commandBuffer.Length)
{
leftOpen = true;
}

lastOperator = false;
break;

// delimited identifier, which can't span lines but may have other characters within that should be ignored
case '\'':
for (bufferIndex++; bufferIndex < _commandBuffer.Length; bufferIndex++)
{
if (_commandBuffer[bufferIndex] == '\'')
{
if (bufferIndex + 1 < _commandBuffer.Length && _commandBuffer[bufferIndex + 1] == '\'')
{
// skip repeated quote
bufferIndex++;
}
else
{
// end delimiter reached
break;
}
}
else if (CharacterUtils.IsTabulation(_commandBuffer[bufferIndex]) || CharacterUtils.IsLineTerm(_commandBuffer[bufferIndex]))
{
// invalid in identifier names
error = true;
break;
}
}

// reached end of string before we found our ending delimiter
if (bufferIndex == _commandBuffer.Length)
{
leftOpen = true;
}

lastOperator = false;
break;

// comments or division operator
case '/':
if (bufferIndex + 1 < _commandBuffer.Length)
{
if (_commandBuffer[bufferIndex + 1] == '/')
{
for (bufferIndex += 2; bufferIndex < _commandBuffer.Length && _commandBuffer[bufferIndex] != '\n' && _commandBuffer[bufferIndex] != '\r'; bufferIndex++)
{
}

// no leftOpen, the comment is closed by the end of the buffer without an error
}
else if (_commandBuffer[bufferIndex + 1] == '*')
{
for (bufferIndex += 2; bufferIndex + 1 < _commandBuffer.Length && !(_commandBuffer[bufferIndex] == '*' && _commandBuffer[bufferIndex + 1] == '/'); bufferIndex++)
{
}

// reached end of comment before we found our ending delimiter
if (bufferIndex + 1 >= _commandBuffer.Length)
{
leftOpen = true;
}
else
{
// skip past closing '/'
bufferIndex++;
}
}
else
{
// division operator
lastOperator = true;
}
}
else
{
// division operator at end of buffer
lastOperator = true;
}

// if it was indeed a comment and not division, lastOperator not updated, comments ignored
break;

// table notation
case '[':
brackets.Push(']');
lastOperator = false;
break;

// function parameters and grouping notation
case '(':
brackets.Push(')');
lastOperator = false;
break;

// record notation
case '{':
brackets.Push('}');
lastOperator = false;
break;

// closing notation
case ']':
case ')':
case '}':
if (lastOperator || brackets.Count == 0 || _commandBuffer[bufferIndex] != brackets.Pop())
{
error = true;
}

var commandBufferString = _commandBuffer.ToString();
// if no error, brackets.Count has been decremented after brackets.Pop for this closing bracket in the first test
else if (brackets.Count == 0 && _commandBuffer[bufferIndex] == '}' && closeOnCurly)
{
close = true;
}

// We use the parser results to determine if the command is complete or more lines are needed.
// The Engine features and parser options do not need to match what we will actually use,
// this just needs to be good enough to detect the errors below for multiline processing.
var result = Engine.Parse(commandBufferString, options: parserOptions);
lastOperator = false;
break;

// We will get this error, with this argument, if we are missing closing punctuation.
var missingClose = result.Errors.Any(error => error.MessageKey == "ErrExpectedFound_Ex_Fnd" &&
((TokKind)error.MessageArgs.Last() == TokKind.ParenClose ||
(TokKind)error.MessageArgs.Last() == TokKind.CurlyClose ||
(TokKind)error.MessageArgs.Last() == TokKind.BracketClose));
// whitespace
case ' ':
case '\t':
case '\n':
case '\r':
// lastOperator not updated, whitepace ignored
break;

// However, we will get false positives from the above if the statement is very malformed.
// For example: Mid("a", 2,) where the second error about ParenClose expected at the end is incorrect.
// In this case, more characters will not help and we should complete the command and report the errors with what we have.
var badToken = result.Errors.Any(error => error.MessageKey == "ErrBadToken");
// binary and unary prefix operators that can't end a formula and may well be continued on the next line
case '=':
case '>':
case '<':
case ':':
case '+':
case '-':
case '*': // division is handled above with comments
case '^':
case '&': // string concatenation and &&
case '|': // ||
case '!': // not and old property selector
lastOperator = true;
break;

// everything else
default:
lastOperator = false;
break;
}
}

complete &= brackets.Count == 0 && !leftOpen && !lastOperator;

return bufferIndex;
}

// Return null if we need more input.
// else return string containing multiple lines together.
public virtual string HandleLine(string line, ParserOptions parserOptions)
{
_commandBuffer.AppendLine(line);

var complete = true; // the buffer is complete, no longer continue, any open phrase will make this false
var error = false; // an error was encountered, no longer continue, overriding a false complete

// An empty line (possibly with spaces) will also finish the command.
// This is the ultimate escape hatch for the user if our closing detection logic above fails.
var emptyLine = Regex.IsMatch(commandBufferString, @"\n[ \t]*\r?\n$", RegexOptions.Multiline);
// This is the ultimate escape hatch if our closing detection logic above fails.
var emptyLine = line.Trim() == string.Empty;

if (!missingClose || badToken || emptyLine)
if (!emptyLine)
{
Clear();
if (parserOptions.TextFirst)
{
for (int bufferIndex = 0; complete && bufferIndex >= 0 && bufferIndex < _commandBuffer.Length; bufferIndex++)
{
if ((bufferIndex == 0 || _commandBuffer[bufferIndex - 1] != '$') && _commandBuffer[bufferIndex] == '$' && bufferIndex + 1 < _commandBuffer.Length && _commandBuffer[bufferIndex + 1] == '{')
{
bufferIndex = ParseFormulaToClose(bufferIndex + 2, true, ref complete, ref error);
}
}
}
else
{
ParseFormulaToClose(0, false, ref complete, ref error);
}
}

if (complete || error || emptyLine)
{
string commandBufferString = _commandBuffer.ToString();

// Removes one newline from the end (\r\n or just \n) for the enter provided by the user.
// Important for TextFirst lexer mode where a newline would be significant.
Expand All @@ -63,9 +279,11 @@ public virtual string HandleLine(string line, ParserOptions parserOptions)
}
else if (commandBufferString.EndsWith("\n", StringComparison.InvariantCulture))
{
commandBufferString = commandBufferString.Substring(0, commandBufferString.Length - 1);
commandBufferString = commandBufferString.Substring(0, _commandBuffer.Length - 1);
}


Clear();

return commandBufferString;
}
else
Expand Down
15 changes: 13 additions & 2 deletions src/strings/PowerFxResources.en-US.resx
Original file line number Diff line number Diff line change
Expand Up @@ -3705,12 +3705,23 @@
<value>Edit your formula so that it includes a bracket.</value>
<comment>1 How to fix the error. </comment>
</data>
<data name="ErrorResource_ErrClosingIdentifierExpected_ShortMessage" xml:space="preserve">
<value>Expected closing of identifier. We expect a closing (') at this point in the formula.</value>
<comment>Error Message.</comment>
</data>
<data name="ErrorResource_ErrClosingIdentifierExpected_LongMessage" xml:space="preserve">
<value>A closing single quote indicates the end of an identifier (for example, 'First Name').</value>
</data>
<data name="ErrorResource_ErrClosingIdentifierExpected_HowToFix_1" xml:space="preserve">
<value>Edit your formula so that it includes a closing single quote.</value>
<comment>1 How to fix the error. </comment>
</data>
<data name="ErrorResource_ErrEmptyInvalidIdentifier_ShortMessage" xml:space="preserve">
<value>The identifier has no valid text.</value>
<value>The identifier is empty or contains an invalid character, such as a tab or newline.</value>
<comment>Error Message.</comment>
</data>
<data name="ErrorResource_ErrEmptyInvalidIdentifier_HowToFix_1" xml:space="preserve">
<value>Ensure you have text for your identifier. This error occurs when the identifier is all blanks or spaces.</value>
<value>Ensure you have valid text in your your identifier. This error occurs when the identifier is blank, only has spaces, or contains a tab or newline.</value>
<comment>1 How to fix the error. </comment>
</data>
<data name="ErrUnOrderedTypeForComparison_Type" xml:space="preserve">
Expand Down
Loading

0 comments on commit 8798153

Please sign in to comment.