有没有一种方法可以轻松调整ANTLR4的错误消息?

问题描述：

目前我正在研究自己的语法，我想在NoViableAlternative，InputMismatch，UnwantedToken，MissingToken和LexerNoViableAltException上显示特定的错误消息.

Currenlty I'm working on my own grammar and I would like to have specific error messages on NoViableAlternative, InputMismatch, UnwantedToken, MissingToken and LexerNoViableAltException.

我已经扩展了Lexer.class，并且已经覆盖了notifyListeners，以将默认错误消息token recognition error at:更改为我自己的错误消息.同样，我扩展了DefaultErrorStrategy并覆盖了所有报告方法，例如reportNoViableAlternative，reportInputMismatch，reportUnwantedToken，reportMissingToken.

I already extended the Lexer.class and have overridden the notifyListeners to change the default error message token recognition error at: to my own one. As well I extended the DefaultErrorStrategy and have overridden all report methods, like reportNoViableAlternative, reportInputMismatch, reportUnwantedToken, reportMissingToken.

所有这些操作的目的是更改消息，这些消息将传递给侦听器ANTLRErrorListener的syntaxError()方法.

The purpose of all that is to change the messages, which will be passed to the syntaxError() method of the listener ANTLRErrorListener.

这是扩展的Lexer.class的一个小例子:

Here's a small example of the extended Lexer.class:

    @Override
    public void notifyListeners(LexerNoViableAltException lexerNoViableAltException) {
        String text = this._input.getText(Interval.of(this._tokenStartCharIndex, this._input.index()));
        String msg = "Operator " + this.getErrorDisplay(text) + " is unkown.";
        ANTLRErrorListener listener = this.getErrorListenerDispatch();
        listener.syntaxError(this, null, this._tokenStartLine, this._tokenStartCharPositionInLine, msg,
            lexerNoViableAltException);
    }

或者对于DefaultErrorStrategy:

    @Override
    protected void reportNoViableAlternative(Parser recognizer, NoViableAltException noViableAltException) {
        TokenStream tokens = recognizer.getInputStream();
        String input;
        if (tokens != null) {
            if (noViableAltException.getStartToken().getType() == -1) {
                input = "<EOF>";
            } else {
                input = tokens.getText(noViableAltException.getStartToken(), noViableAltException.getOffendingToken());
            }
        } else {
            input = "<unknown input>";
        }

        String msg = "Invalid operation " + input + ".";
        recognizer.notifyErrorListeners(noViableAltException.getOffendingToken(), msg, noViableAltException);
    }

所以我读了这个线程处理ANTLR4中的错误并且想知道在定制方面是否没有更简单的解决方案?

So I read this thread Handling errors in ANTLR4 and was wondering if there's no easier solution when it comes to the point of customising?

答

我改善ANTLR4错误消息的策略有些不同.我在错误侦听器中使用了syntaxError覆盖(我对词法分析器和解析器都有一个).通过使用给定的值和其他一些诸如LL1Analyzer之类的东西，您可以创建非常精确的错误消息. 词典错误侦听器的处理非常简单(希望您可以理解C ++代码):

My strategy for improving the ANTLR4 error messages is a bit different. I use a syntaxError override in my error listeners (I have one for both the lexer and the parser). By using the given values and a few other stuff like the LL1Analyzer you can create pretty precise error messages. The lexer error listener's handling is pretty straight forward (hopefully C++ code is understandable for you):

void LexerErrorListener::syntaxError(Recognizer *recognizer, Token *, size_t line,
                                     size_t charPositionInLine, const std::string &, std::exception_ptr ep) {
  // The passed in string is the ANTLR generated error message which we want to improve here.
  // The token reference is always null in a lexer error.
  std::string message;
  try {
    std::rethrow_exception(ep);
  } catch (LexerNoViableAltException &) {
    Lexer *lexer = dynamic_cast<Lexer *>(recognizer);
    CharStream *input = lexer->getInputStream();
    std::string text = lexer->getErrorDisplay(input->getText(misc::Interval(lexer->tokenStartCharIndex, input->index())));
    if (text.empty())
      text = " "; // Should never happen.

    switch (text[0]) {
      case '/':
        message = "Unfinished multiline comment";
        break;
      case '"':
        message = "Unfinished double quoted string literal";
        break;
      case '\'':
        message = "Unfinished single quoted string literal";
        break;
      case '`':
        message = "Unfinished back tick quoted string literal";
        break;

      default:
        // Hex or bin string?
        if (text.size() > 1 && text[1] == '\'' && (text[0] == 'x' || text[0] == 'b')) {
          message = std::string("Unfinished ") + (text[0] == 'x' ? "hex" : "binary") + " string literal";
          break;
        }

        // Something else the lexer couldn't make sense of (likely there is no rule that accepts this input).
        message = "\"" + text + "\" is no valid input at all";
        break;
    }
    owner->addError(message, 0, lexer->tokenStartCharIndex, line, charPositionInLine,
                    input->index() - lexer->tokenStartCharIndex);
  }
}

此代码表明，我们根本不使用原始消息，而是检查令牌文本以了解问题所在.在这里，我们主要处理未封闭的字符串:

This code shows that we don't use the original message at all and instead examine the token text to see what's wrong. Here we mostly deal with unclosed strings:

解析器错误侦听器要复杂得多，而且太大，无法在此处发布.它是由不同来源组成的组合，用于构造实际的错误消息:

The parser error listener is much more complicated and too large to post here. It's a combination of different sources to construct the actual error message:

Parser.getExpectedTokens():使用LL1Analyzer从ATN中的给定位置(所谓的跟随集)获取下一个可能的词法分析器令牌.但是，它会通过谓词进行查找，这可能是个问题(如果使用这样的谓词).

Parser.getExpectedTokens(): uses the LL1Analyzer to get the next possible lexer tokens from a given position in the ATN (the socalled follow-set). It looks through predicates however, which might be a problem (if you use such).

标识符和关键字:在特定情况下，某些特定的关键字通常被允许作为普通标识符，这会使用一组实际上是标识符的关键字来创建跟随集，因此需要进行额外的检查以避免将它们显示为期望值:

Identifiers & keywords: often certain keywords are allowed as normal identifiers in specific situations, which creates follow-sets with a list of keywords that are actually meant to be identifiers, so that needs an extra check to avoid showing them as expected values:

解析器规则调用堆栈，在调用错误侦听器期间，解析器具有当前解析器规则上下文(Parser.getRuleContext())，可用于遍历调用堆栈，以查找可为您提供更多信息的规则上下文错误位置的特定信息(例如，从*匹配上升到假设的expr规则将告诉您此时实际上是期望的表达式).

Parser rule invocation stack, during the call to the error listener the parser has the current parser rule context (Parser.getRuleContext()) which you can use to walk up the invocation stack, to find rule contexts that give you more specific information of the error location (for example, walking up from a * match to a hypothetical expr rule tells you that actually an expression is expected at this point).

给定的异常:如果为null，则错误是有关丢失或不需要的单个令牌的，这很容易处理.如果异常具有值，则可以检查该异常以获取更多详细信息.这里值得一提的是，不使用异常的内容(无论如何都是稀疏的)，而是使用先前收集的值.最常见的异常类型是NoViableAlt和InputMismatch，当错误位置为EOF或诸如输入在此位置无效"之类的东西时，它们都可以转换为输入不完整".然后，可以通过从上面提到的(和图像中所示)的规则调用堆栈和/或后续集构造的期望来增强两者.

The given exception: if this is null the error is about a missing or unwanted single token, which is pretty easy to handle. If the exception has a value you can examine it for further details. Worth mentioning here is that the content of the exception is not used (and pretty sparse anyway), instead we use the values that were collected previously. The most common exception types are NoViableAlt and InputMismatch, which you can both translate to either "input is incomplete" when the error position is EOF or something like "input is not valid at this position". Both can then be enhanced with an expectation constructed from the rule invocation stack and/or the follow-set as mentioned (and shown in the image) above.

有没有一种方法可以轻松调整ANTLR4的错误消息?

相关推荐