使用同义词过滤器的Elasticsearch
我有以下文件:
-
南非
-
北非
我想从以下位置检索非洲南部"文档:
I want to retrieve my "south africa" document from:
-
非洲
(a) -
southafrica
(b) -
safrica
(c)
-
s africa
(a) -
southafrica
(b) -
safrica
(c)
我定义了以下过滤器和分析器:
I defined the followings filters and analyzers:
POST test_index
{
"settings": {
"analysis": {
"filter": {
"synonym_filter": {
"type": "synonym",
"synonyms": [
"south,s",
"north,n"
]
},
"shingle_filter": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 3,
"token_separator": ""
}
},
"analyzer": {
"my_shingle": {
"type": "custom",
"tokenizer": "standard",
"filter": ["shingle_filter"]
},
"my_shingle_synonym": {
"type": "custom",
"tokenizer": "standard",
"filter": ["shingle_filter", "synonym_filter"]
},
"my_synonym_shingle": {
"type": "custom",
"tokenizer": "standard",
"filter": ["synonym_filter", "shingle_filter"]
}
}
}
},
"mappings": {}
}
1)使用 my_shingle (南非)
将被索引为 south
, southafrica
, africa
1) With my_shingle south africa
will be indexed as south
, southafrica
, africa
2)使用 my_shingle_synonym (非洲南非洲)将被索引为 south
, s
, southafrica
,非洲
2) With my_shingle_synonym south africa
will be indexed as south
, s
, southafrica
, africa
3)使用 my_synonym_shingle 南非洲将被索引为 south
, souths
, southsafrica
, s
, safrica
,非洲
3) With my_synonym_shingle south africa
will be indexed as south
, souths
, southsafrica
, s
, safrica
, africa
所以
-
(1)我会找到b
(1) I will find b
(2)我会找到a,b
(2) I will find a, b
(3)我会找到一个c
(3) I will find a, c
我希望将非洲南部
索引为: south
, s
, southafrica
, safrica
,非洲
I want south africa
to be indexed as: south
, s
, southafrica
, safrica
, africa
您不必不必根据需要输出所有可能的令牌.您可以通过在多字段上使用不同的分析器来解决您的问题.
You do not have to output all possible tokens as per your requirement. Your problem can be solved by using different analyzers on multi fields.
您将像这样定义所需字段的 mapping
.
You would define mapping
of your desired field like this.
"mappings": {
"your_mapping": {
"properties": {
"name": {
"type": "string",
"analyzer": "my_shingle",
"fields": {
"synonym": {
"type": "string",
"analyzer": "my_synonym_shingle"
}
}
}
}
}
}
示例文档以建立索引
PUT test_index/your_mapping/1
{
"name" : "south africa"
}