如何正确使用复制领域上的Solr的自动完成

问题描述:

我想用一个搜索引擎的自动完成我的网站上。

I want to use the "autocomplete" for a search engine on my site.

所以,我有一个名为shortdesc,定义如下领域:

So, I have a field called shortdesc with the following definition:

<field name="shortdesc" type="text_de" indexed="true" stored="false" />

字段类型:

<fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index"> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LengthFilterFactory" min="3" max="20"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt" format="snowball" enablePositionIncrements="true"/>
        <filter class="solr.GermanNormalizationFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt" format="snowball" enablePositionIncrements="true"/>
        <filter class="solr.GermanNormalizationFilterFactory"/>
   </analyzer>
</fieldType>

所以,现在做自动完成,我需要一个额外的字段(field_autocomplete)其中,我也会复制字段shortdesc。
该字段被定义为(我不需要检索从该字段的数据):

So, now for do the autocomplete, I need an extra field (field_autocomplete) where Im gonna copy the field shortdesc. This field is defined as (I don't need to retrieve data from this field):

<field name="field_autocomplete" type="text_autocomplete" indexed="true" stored="false" multiValued="true" />

和类型定义:

<fieldType name="text_autocomplete" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt" enablePositionIncrements="true" />
        <filter class="solr.GermanNormalizationFilterFactory"/>
        <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.EnglishMinimalStemFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" enablePositionIncrements="true" />
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="25"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
     </analyzer>
     <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt" enablePositionIncrements="true" />
        <filter class="solr.GermanNormalizationFilterFactory"/>
        <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.EnglishMinimalStemFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" enablePositionIncrements="true" />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
</fieldType>

然后,对于复制字段:

And then, for copy the field:

    <copyField source="shortdesc" dest="field_autocomplete"/>

好吧,那么,我的拳头的问题:

Ok, then, my fist question:


  • 在当前索引,所有的领域text_autocomplete的内容,来自shortdesc的副本,这是否意味着比在球场上shortdesc值进行处理,然后复制到field_autocomplete?在这种情况下,我并不需要应用的类型text_autocomplete所述的过滤器,因为它们比在text_de相同,源会来与过滤器已经应用?这是正确的或我指定所有的过滤器(每个领域我想捕捉?

和另一个问题:


  • 当我使用分析仪,如果我在介绍该领域text_de属于停止字一个字,过滤器被应用,这个词我以前不出现:

    但是,当我在球场上text_autocomplete一样,似乎这个词是存在的,存储期限,过​​滤器并没有什么也不做......

任何人都可以给我一个线索这个两件事情,越来越疯了吗?

Can anybody give me a clue about this two things that are getting crazy ?


  • 您将需要重新定义所有的过滤器。从源字段没有被应用。

有关 Copyfield : -

原文是从源字段发送到目标栏,
  前始发或目的的任何配置分析仪
  现场被调用。

The original text is sent from the "source" field to the "dest" field, before any configured analyzers for the originating or destination field are invoked.


  • 的阻滤波器似乎缺少格式=滚雪球这似乎是在做分析的差异。结果
    此外,通常,建议有在两个索引和查询时间相同断词和过滤器,以使索引术语的搜索术语相匹配。 SO可能只是想再次检查配置。

  • The stop filter seems to be missing format="snowball" which seems to be making the difference in the analysis.
    Also, usually it is recommended to have the same tokenizers and filters at both index and query time so that the indexed term matches the searched term. SO may just want to check the configurations again.