ElasticSearch-定义自定义字母顺序进行排序

ElasticSearch-定义自定义字母顺序进行排序

问题描述:

我正在使用ElasticSearch 2.4.2(通过Java的HibernateSearch 5.7.1.Final).

I'm using ElasticSearch 2.4.2 (via HibernateSearch 5.7.1.Final from Java).

我对字符串排序有问题. 我的应用程序的语言带有变音符号,其中包含特定的字母 订购.例如,Ł直接在L之后,ÓO之后,依此类推. 因此,您应该按以下方式对字符串进行排序:

I have a problem with string sorting. The language of my application has diacritics, which have a specific alphabetic ordering. For example Ł goes directly after L, Ó goes after O, etc. So you are supposed to sort the strings like this:

 Dla
 Dła
 Doa
 Dóa
 Dza
 Eza

ElasticSearch首先按典型字母排序,然后移动所有奇怪的字符 末尾的字母:

ElasticSearch sorts by typical letters first, and moves all strange letters to at the end:

 Dla
 Doa
 Dza
 Dła
 Dóa
 Eza

我可以为ElasticSearch添加自定义字母顺序吗? 也许有一些插件吗? 我需要编写自己的插件吗?我该如何开始?

Can I add a custom letter ordering for ElasticSearch? Maybe there are some plugins for this? Do I need to write my own plugin? How do I start?

我找到了插件对于ElasticSearch的波兰语语言, 但据我了解,它是用于分析的,而分析不是解决方案 就我而言,因为它将忽略变音符号,并留下混有LŁ的单词:

I found a plugin for Polish language for ElasticSearch, but as I understand it is for analysing, and analysing is not a solution in my case, because it will ignore diacritics and leave words with L and Ł mixed:

 Dla
 Dłb
 Dlc

这有时是可以接受的,但在我的特定用例中是不可接受的.

This would sometimes be acceptable, but is not acceptable in my specific usecase.

对此,我将不胜感激.

我从未使用过它,但是有一个插件可以满足您的需求:

I've never used it, but there is a plugin that could fit your needs: the ICU collation plugin.

您将必须使用icu_collation令牌过滤器,该过滤器会将令牌转换为归类密钥.因此,您需要在Hibernate Search中使用单独的@Field(例如myField_sort).

You will have to use the icu_collation token filter, which will turns the tokens into collation keys. For that reason you will need to use a separate @Field (e.g. myField_sort) in Hibernate Search.

您可以使用@Field(name = "myField_sort", analyzer = @Analyzer(definition = "myCollationAnalyzer"))将特定的分析器分配给您的字段,然后使用其中一个实体上的类似名称来定义此分析器(类型,参数):

You can assign a specific analyzer to your field with @Field(name = "myField_sort", analyzer = @Analyzer(definition = "myCollationAnalyzer")), and define this analyzer (type, parameters) with something like that on one of your entities:

@Entity
@Indexed
@AnalyzerDef(
    name = "myCollationAnalyzer",
    filters = {
        @TokenFilterDef(
            name = "polish_collation",
            factory = ElasticsearchTokenFilterFactory.class,
            params = {
                @Parameter(name = "type", value = "'icu_collation'"),
                @Parameter(name = "language", value = "'pl'")
            }
        )
    }
)
public class MyEntity {

有关更多信息,请参见文档: https ://docs.jboss.org/hibernate/stable/search/reference/zh-CN/html_single/#_custom_analyzers

See the documentation for more information: https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#_custom_analyzers

现在承认它有点笨拙,但是在下一个使用

It's admittedly a bit clumsy right now, but analyzer configuration will get a bit cleaner in the next Hibernate Search version with normalizers and analyzer definition providers.

注意:通常,您的字段需要声明为可排序(@SortableField(forField = "myField_sort")).

Note: as usual, your field will need to be declared as sortable (@SortableField(forField = "myField_sort")).