如何获取antlr解析的错误信息?

问题描述:

我用 antlr 4.4 写了一个这样的语法:

I wrote a grammar with antlr 4.4 like this :

grammar CSV;

file
  :  row+ EOF
  ;

row
  :  value (Comma value)* (LineBreak | EOF)
  ;

value
  :  SimpleValueA
  |  QuotedValue
  ;

Comma
  :  ','
  ;

LineBreak
  :  '\r'? '\n'
  |  '\r'
  ;

SimpleValue
  :  ~(',' | '\r' | '\n' | '"')+
  ;

QuotedValue
  :  '"' ('""' | ~'"')* '"'
  ;

然后我使用 antlr 4.4 来生成解析器 &词法分析器,这个过程是成功的

then I use antlr 4.4 for generating parser & lexer, this process is successful

在生成类之后我写了一些使用语法的java代码

after generate classes I wrote some java code for using grammar

import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.CommonTokenStream;

public class Main {

    public static void main(String[] args)
    {
        String source =  "\"a\",\"b\",\"c";
        CSVLexer lex = new CSVLexer(new ANTLRInputStream(source));
        CommonTokenStream tokens = new CommonTokenStream(lex);
        tokens.fill();
        CSVParser parser = new CSVParser(tokens);
        CSVParser.FileContext file = parser.file();
    }
}

以上所有代码都是CSV字符串的解析器例如:""a","b",c"

all of above code is a parser for CSV strings for example : ""a","b",c"

窗口输出:

line 1:8 token recognition error at: '"c'
line 1:10 missing {SimpleValue, QuotedValue} at '<EOF>'

我想知道如何从代码隐藏中的方法(getErrors() 或 ...)中获取此错误而不是输出窗口的结果

I want to know How I can get this errors from a method (getErrors() or ...) in code-behind not as result of output window

有人可以帮我吗?

恕我直言,使用 ANTLR 进行 CSV 解析是一个核心选​​项,但既然您正在这样做...

Using ANTLR for CSV parsing is a nuclear option IMHO, but since you're at it...

  • 实现接口ANTLRErrorListener.您可以为此扩展 BaseErrorListener.收集错误并将它们附加到列表中.
  • 调用 parser.removeErrorListeners() 删除默认监听器
  • 调用 parser.addErrorListener(yourListenerInstance) 添加自己的监听器
  • 解析您的输入
  • Implement the interface ANTLRErrorListener. You may extend BaseErrorListener for that. Collect the errors and append them to a list.
  • Call parser.removeErrorListeners() to remove the default listeners
  • Call parser.addErrorListener(yourListenerInstance) to add your own listener
  • Parse your input

现在,对于词法分析器,您可以执行相同的操作 removeErrorListeners/addErrorListener,或者在末尾添加以下规则:

Now, for the lexer, you may either do the same thing removeErrorListeners/addErrorListener, or add the following rule at the end:

UNKNOWN_CHAR : . ;

有了这个规则,词法分析器永远不会失败(当它不能做任何其他事情时,它会生成 UNKNOWN_CHAR 标记)并且所有错误都将由解析器生成(因为它不会知道如何处理这些 UNKNOWN_CHAR 标记).我推荐这种方法.

With this rule, the lexer will never fail (it will generate UNKNOWN_CHAR tokens when it can't do anything else) and all errors will be generated by the parser (because it won't know what to do with these UNKNOWN_CHAR tokens). I recommend this approach.