如何获取antlr解析的错误信息?

问题描述：

我用 antlr 4.4 写了一个这样的语法:

I wrote a grammar with antlr 4.4 like this :

grammar CSV;

file
  :  row+ EOF
  ;

row
  :  value (Comma value)* (LineBreak | EOF)
  ;

value
  :  SimpleValueA
  |  QuotedValue
  ;

Comma
  :  ','
  ;

LineBreak
  :  '\r'? '\n'
  |  '\r'
  ;

SimpleValue
  :  ~(',' | '\r' | '\n' | '"')+
  ;

QuotedValue
  :  '"' ('""' | ~'"')* '"'
  ;

然后我使用 antlr 4.4 来生成解析器 &词法分析器，这个过程是成功的

then I use antlr 4.4 for generating parser & lexer, this process is successful

在生成类之后我写了一些使用语法的java代码

after generate classes I wrote some java code for using grammar

import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.CommonTokenStream;

public class Main {

    public static void main(String[] args)
    {
        String source =  "\"a\",\"b\",\"c";
        CSVLexer lex = new CSVLexer(new ANTLRInputStream(source));
        CommonTokenStream tokens = new CommonTokenStream(lex);
        tokens.fill();
        CSVParser parser = new CSVParser(tokens);
        CSVParser.FileContext file = parser.file();
    }
}

以上所有代码都是CSV字符串的解析器例如:""a","b",c"

all of above code is a parser for CSV strings for example : ""a","b",c"

窗口输出:

line 1:8 token recognition error at: '"c'
line 1:10 missing {SimpleValue, QuotedValue} at '<EOF>'

我想知道如何从代码隐藏中的方法(getErrors() 或 ...)中获取此错误而不是输出窗口的结果

I want to know How I can get this errors from a method (getErrors() or ...) in code-behind not as result of output window

有人可以帮我吗?

答

恕我直言，使用 ANTLR 进行 CSV 解析是一个核心选项，但既然您正在这样做...

Using ANTLR for CSV parsing is a nuclear option IMHO, but since you're at it...

实现接口ANTLRErrorListener.您可以为此扩展 BaseErrorListener.收集错误并将它们附加到列表中.
调用 parser.removeErrorListeners() 删除默认监听器
调用 parser.addErrorListener(yourListenerInstance) 添加自己的监听器
解析您的输入

Implement the interface ANTLRErrorListener. You may extend BaseErrorListener for that. Collect the errors and append them to a list.
Call parser.removeErrorListeners() to remove the default listeners
Call parser.addErrorListener(yourListenerInstance) to add your own listener
Parse your input

现在，对于词法分析器，您可以执行相同的操作 removeErrorListeners/addErrorListener，或者在末尾添加以下规则:

Now, for the lexer, you may either do the same thing removeErrorListeners/addErrorListener, or add the following rule at the end:

UNKNOWN_CHAR : . ;

有了这个规则，词法分析器永远不会失败(当它不能做任何其他事情时，它会生成 UNKNOWN_CHAR 标记)并且所有错误都将由解析器生成(因为它不会知道如何处理这些 UNKNOWN_CHAR 标记).我推荐这种方法.

With this rule, the lexer will never fail (it will generate UNKNOWN_CHAR tokens when it can't do anything else) and all errors will be generated by the parser (because it won't know what to do with these UNKNOWN_CHAR tokens). I recommend this approach.

如何获取antlr解析的错误信息?

相关推荐