重新读取Lucene TokenStream时遇到问题

问题描述:

我正在使用Lucene 4.6,并且由于如何处理TokenStream,显然不清楚如何重用TokenStream:

I am using Lucene 4.6, and am apparently unclear on how to reuse a TokenStream, because I get the exception:

java.lang.IllegalStateException: TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.

在第二遍开始时

.我已经阅读了Javadoc,但是仍然缺少一些东西.这是引发上述异常的简单示例:

at the start of the second pass. I've read the Javadoc, but I'm still missing something. Here is a simple example that throws the above exception:

@Test
public void list() throws Exception {
  String text = "here are some words";
  TokenStream ts = new StandardTokenizer(Version.LUCENE_46, new StringReader(text));
  listTokens(ts);
  listTokens(ts);
}

public static void listTokens(TokenStream ts) throws Exception {
  CharTermAttribute termAtt = ts.addAttribute(CharTermAttribute.class);
  try {
    ts.reset();
    while (ts.incrementToken()) {
      System.out.println("token text: " + termAtt.toString());
    }
    ts.end();
  }
  finally {
    ts.close();
  }
}

我尝试不调用TokenStream.end()TokenStream.close(),以为也许应该只在最后调用它们,但是我遇到了同样的异常.

I've tried not calling TokenStream.end() or TokenStream.close() thinking maybe they should only be called at the very end, but I get the same exception.

有人可以提出建议吗?

Exception可能会出现问题,列出您多次在调用reset()的情况.在Tokenizer的实现中明确不允许这样做.由于java.io.Reader api不能保证所有子类都支持reset()操作,因此Tokenizer毕竟不能假定可以重置传入的Reader.

The Exception lists, as a possible issue, calling reset() multiple times, which you are doing. This is explicitly not allowed in the implementation of Tokenizer. Since the the java.io.Reader api does not guarantee support of the reset() operation by all subclasses, the Tokenizer can't assume that the Reader passed in can be reset, after all.

您可以简单地构造一个新的TokenStream,或者我相信您可以调用Tokenizer.setReader(Reader)(在这种情况下,您当然必须先close()).

You may simply construct a new TokenStream, or I believe you could call Tokenizer.setReader(Reader) (in which case you certainly must close() it first).