elasticsearch之快速上手 一、elasticsearch的简单操作 二、elasticsearch的CURD 三、elasticsearch之查询的两种方式 四、elasticsearch - term和match 五、elasticsearch之排序查询 六、elasticsearch之分页查询 七、elasticsearch之布尔查询 八、elasticsearch之查询结果过滤 九、elasticsearch之高亮查询 十、elasticsearch之聚合函数
前言
现在,让我们启动一个节点和kibana。
接下来的一切操作都在kibana
中Dev Tools
下的Console
里完成。
创建一篇文档
现在,我们试图将小黑的小姨妈的个人信息录入elasticsearch。我们只要输入:
1
{
"name": "小黑的小姨妈",
"age": 18
}
PUT表示创建命令。虽然命令可以小写,但是我们推荐大写。在以REST ful
风格返回的结果中:
{
"_index" : "t1",
"_type" : "type1",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
结果中的result
则是操作类型,现在是created
,表示第一次创建。如果我们再次点击执行该命令,那么result
则会是updated
。我们细心则会发现_version
开始是1,现在你每点击一次就会增加一次。表示第几次更改。
查询所有索引
现在,我们再来学习一条命令:
GET _cat/indices?v
返回的结果如下图:
上图中,展示当前集群中索引情况,包括,索引的健康状况、UUID、主副分片个数、大小等信息。你发现我们创建的t1
索引了吗?
查询指定的索引信息
我们来单独看看t1
索引:
GET t1
返回的结果如下:
{
"t1" : {
"aliases" : { },
"mappings" : {
"doc" : {
"properties" : {
"age" : {
"type" : "long"
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
},
"settings" : {
"index" : {
"creation_date" : "1553163739688",
"number_of_shards" : "5",
"number_of_replicas" : "1",
"uuid" : "_7jNW5XATheeK84zKkPwlw",
"version" : {
"created" : "6050499"
},
"provided_name" : "t1"
}
}
}
}
返回了t1
索引的创建信息。
查询文档信息
那我们来查看我们刚才创建的那篇文档:
1
返回的结果如下:
{
"_index" : "t1",
"_type" : "doc",
"_id" : "1",
"_version" : 2,
"found" : true,
"_source" : {
"name" : "小黑的小姨妈",
"age" : 18
}
}
返回了我们刚才创建的文档信息。
我们再来为小黑添加两个姨妈:
2
{
"name": "小黑的二姨妈",
"age": 16
}
PUT t1/doc/3
{
"name": "小黑的三姨妈",
"age": 19
}
刚才,我们学会了查询小黑的一个姨妈,那么该如何查询所有姨妈呢?
GET t1/doc/_search
返回结果如下:
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 1.0,
"hits" : [
{
"_index" : "t1",
"_type" : "doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"name" : "小黑的二姨妈",
"age" : 16
}
},
{
"_index" : "t1",
"_type" : "doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "小黑的小姨妈",
"age" : 18
}
},
{
"_index" : "t1",
"_type" : "doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"name" : "小黑的三姨妈",
"age" : 19
}
}
]
}
}
现在小黑跟他的姨妈们闹了别扭,就想删除这个姨妈,该怎么办呢?
删除指定索引
我们其实直接删除这个t1
索引就可以了:
DELETE /t1
DELETE 是删除命令,返回结果如下:
{
"acknowledged" : true
}
返回结果提示删除确认成功。
如果此时再查询索引情况,则会发现t1
已经不存在了,所有的文档也就不存在了。
二、elasticsearch的CURD
#
我们之前已经学过了elasticsearch的简单操作了。
接下来,洒家要给大家讲述一个真实的故事..........
故事一定是要伴随着赵忠祥老师的声音作为开始,雨季就要来临了,又到了动物们发情的季节了......
#
《知否知否,应是绿肥红瘦之改编》,编剧:张开
让我们将镜头切换到北宋时期某位官人的府邸,府里男主人是:
1
{
"name":"顾老二",
"age":30,
"from": "gu",
"desc": "皮肤黑、武器长、性格直",
"tags": ["黑", "长", "直"]
}
他明处貌似还有俩老婆:
2
{
"name":"大娘子",
"age":18,
"from":"sheng",
"desc":"肤白貌美,娇憨可爱",
"tags":["白", "富","美"]
}
PUT zhifou/doc/3
{
"name":"龙套偏房",
"age":22,
"from":"gu",
"desc":"mmp,没怎么看,不知道怎么形容",
"tags":["造数据", "真","难"]
}
家里红旗不倒,家外彩旗飘摇:
4
{
"name":"石头",
"age":29,
"from":"gu",
"desc":"粗中有细,狐假虎威",
"tags":["粗", "大","猛"]
}
PUT zhifou/doc/5
{
"name":"魏行首",
"age":25,
"from":"广云台",
"desc":"仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!",
"tags":["闭月","羞花"]
}
注意:当执行PUT
命令时,如果数据不存在,则新增该条数据,如果数据存在则修改该条数据。
咱们通过GET
命令查询一下:
1
结果如下:
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source" : {
"name" : "顾老二",
"age" : 30,
"from" : "gu",
"desc" : "皮肤黑、武器长、性格直",
"tags" : [
"黑",
"长",
"直"
]
}
}
查询也没啥问题,但是你可能说了,人家老二是黄种人,怎么是黑的呢?好吧咱改改desc
和tags
:
1
{
"desc":"皮肤很黄,武器很长,性格很直",
"tags":["很黄","很长", "很直"]
}
上例,我们仅修改了desc
和tags
两处,而name
、age
和from
三个属性没有变化,我们可以忽略不写吗?查查看:
1
结果如下:
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_version" : 3,
"found" : true,
"_source" : {
"desc" : "皮肤很黄,武器很长,性格很直",
"tags" : [
"很黄",
"很长",
"很直"
]
}
}
哎呀,出事故了!修改是修改了,但结果不太理想啊,因为name
、age
和from
属性都没啦!
注意:PUT
命令,在做修改操作时,如果未指定其他的属性,则按照指定的属性进行修改操作。也就是如上例所示的那样,我们修改时只修改了desc
和tags
两个属性,其他的属性并没有一起添加进去。
很明显,这是病!dai治!怎么治?上车,咱们继续往下走!
#
让我们首先恢复一下事故现场:
1
{
"name":"顾老二",
"age":30,
"from": "gu",
"desc": "皮肤黑、武器长、性格直",
"tags": ["黑", "长", "直"]
}
我们要将黑修改成黄:
1/_update
{
"doc": {
"desc": "皮肤很黄,武器很长,性格很直",
"tags": ["很黄","很长", "很直"]
}
}
上例中,我们使用POST
命令,在id
后面跟_update
,要修改的内容放到doc
文档(属性)中即可。
我们再来查询一次:
1
结果如下:
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_version" : 5,
"found" : true,
"_source" : {
"name" : "顾老二",
"age" : 30,
"from" : "gu",
"desc" : "皮肤很黄,武器很长,性格很直",
"tags" : [
"很黄",
"很长",
"很直"
]
}
}
结果如上例所示,现在其他的属性没有变化,只有desc
和tags
属性被修改。
注意:POST
命令,这里可用来执行修改操作(还有其他的功能),POST
命令配合_update
完成修改操作,指定修改的内容放到doc
中。
写了这么多,我也发现我上面有讲的不对
的地方——石头不是跟顾老二不清不楚,石头是跟小桃不清不楚!好吧,刚才那个数据是一个错误示范!我们这就把它干掉!
#
4
很简单,通过DELETE
命令,就可以删除掉那个错误示范了!
删除效果如下:
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "4",
"_version" : 4,
"result" : "deleted",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 4,
"_primary_term" : 1
}
我们再来查询一遍:
4
结果如下:
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "4",
"found" : false
}
上例中,found:false
表示查询数据不存在。
#
我们上面已经不知不觉的使用熟悉这种简单查询方式,通过 GET
命令查询指定文档:
1
结果如下:
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_version" : 5,
"found" : true,
"_source" : {
"name" : "顾老二",
"age" : 30,
"from" : "gu",
"desc" : "皮肤很黄,武器很长,性格很直",
"tags" : [
"很黄",
"很长",
"很直"
]
}
}
查询没那么简单,预知后事如何,请听下回分解:复杂查询
that's all
三、elasticsearch之查询的两种方式
#
简单的没挑战,来点复杂的,比如查看来自顾家的都有哪些人怎么查呢?elasticsearch提供两种查询方式:
- 查询字符串(query string),简单查询,就像是像传递URL参数一样去传递查询语句,被称为简单搜索或查询字符串(query string)搜索。
- 另外一种是通过DSL语句来进行查询,被称为DSL查询(Query DSL),DSL是Elasticsearch提供的一种丰富且灵活的查询语言,该语言以json请求体的形式出现,通过restful请求与Elasticsearch进行交互。
#
1
{
"name":"顾老二",
"age":30,
"from": "gu",
"desc": "皮肤黑、武器长、性格直",
"tags": ["黑", "长", "直"]
}
PUT zhifou/doc/2
{
"name":"大娘子",
"age":18,
"from":"sheng",
"desc":"肤白貌美,娇憨可爱",
"tags":["白", "富","美"]
}
PUT zhifou/doc/3
{
"name":"龙套偏房",
"age":22,
"from":"gu",
"desc":"mmp,没怎么看,不知道怎么形容",
"tags":["造数据", "真","难"]
}
PUT zhifou/doc/4
{
"name":"石头",
"age":29,
"from":"gu",
"desc":"粗中有细,狐假虎威",
"tags":["粗", "大","猛"]
}
PUT zhifou/doc/5
{
"name":"魏行首",
"age":25,
"from":"广云台",
"desc":"仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!",
"tags":["闭月","羞花"]
}
#
/doc/_search?q=from:gu
还是使用GET
命令,通过_serarch
查询,查询条件是什么呢?条件是from
属性是gu
家的人都有哪些。最后,别忘了_search
和from
属性中间的英文分隔符?
。
结果如下:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "4",
"_score" : 0.6931472,
"_source" : {
"name" : "石头",
"age" : 29,
"from" : "gu",
"desc" : "粗中有细,狐假虎威",
"tags" : [
"粗",
"大",
"猛"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"name" : "顾老二",
"age" : 30,
"from" : "gu",
"desc" : "皮肤黑、武器长、性格直",
"tags" : [
"黑",
"长",
"直"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "3",
"_score" : 0.2876821,
"_source" : {
"name" : "龙套偏房",
"age" : 22,
"from" : "gu",
"desc" : "mmp,没怎么看,不知道怎么形容",
"tags" : [
"造数据",
"真",
"难"
]
}
}
]
}
}
我们来重点说下hits
,hits
是返回的结果集——所有from
属性为gu
的结果集。重点中的重点是_score
得分,得分是什么呢?根据算法算出跟查询条件的匹配度,匹配度高得分就高。后面再说这个算法是怎么回事。
#
我们现在使用DSL方式,来完成刚才的查询,查看来自顾家的都有哪些人。
/doc/_search
{
"query": {
"match": {
"from": "gu"
}
}
}
上例,查询条件是一步步构建出来的,将查询条件添加到match
中即可,而match
则是查询所有from
字段的值中含有gu
的结果就会返回。
当然结果没啥变化:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "4",
"_score" : 0.6931472,
"_source" : {
"name" : "石头",
"age" : 29,
"from" : "gu",
"desc" : "粗中有细,狐假虎威",
"tags" : [
"粗",
"大",
"猛"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"name" : "顾老二",
"age" : 30,
"from" : "gu",
"desc" : "皮肤黑、武器长、性格直",
"tags" : [
"黑",
"长",
"直"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "3",
"_score" : 0.2876821,
"_source" : {
"name" : "龙套偏房",
"age" : 22,
"from" : "gu",
"desc" : "mmp,没怎么看,不知道怎么形容",
"tags" : [
"造数据",
"真",
"难"
]
}
}
]
}
}
see also:[Elasticsearch查询规则(一)match和term](https://www.jianshu.com/p/eb30eee13923) that's all
四、elasticsearch - term和match
前言
现在,是时候学习两种最常用的查询方法了,match和term了。
车速太快,系好安全带,睁大眼,不要在前进的道路上迷失了!
match查询
准备数据
1
{
"name":"顾老二",
"age":30,
"from": "gu",
"desc": "皮肤黑、武器长、性格直",
"tags": ["黑", "长", "直"]
}
PUT zhifou/doc/2
{
"name":"大娘子",
"age":18,
"from":"sheng",
"desc":"肤白貌美,娇憨可爱",
"tags":["白", "富","美"]
}
PUT zhifou/doc/3
{
"name":"龙套偏房",
"age":22,
"from":"gu",
"desc":"mmp,没怎么看,不知道怎么形容",
"tags":["造数据", "真","难"]
}
PUT zhifou/doc/4
{
"name":"石头",
"age":29,
"from":"gu",
"desc":"粗中有细,狐假虎威",
"tags":["粗", "大","猛"]
}
PUT zhifou/doc/5
{
"name":"魏行首",
"age":25,
"from":"广云台",
"desc":"仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!",
"tags":["闭月","羞花"]
}
match系列之match(按条件查询)
我们查看来自顾家的都有哪些人。
/doc/_search
{
"query": {
"match": {
"from": "gu"
}
}
}
上例,查询条件是一步步构建出来的,将查询条件添加到match
中即可,而match
则是查询所有from
字段的值中含有gu
的结果就会返回。
结果如下:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "4",
"_score" : 0.6931472,
"_source" : {
"name" : "石头",
"age" : 29,
"from" : "gu",
"desc" : "粗中有细,狐假虎威",
"tags" : [
"粗",
"大",
"猛"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"name" : "顾老二",
"age" : 30,
"from" : "gu",
"desc" : "皮肤黑、武器长、性格直",
"tags" : [
"黑",
"长",
"直"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "3",
"_score" : 0.2876821,
"_source" : {
"name" : "龙套偏房",
"age" : 22,
"from" : "gu",
"desc" : "mmp,没怎么看,不知道怎么形容",
"tags" : [
"造数据",
"真",
"难"
]
}
}
]
}
}
match系列之match_all(查询全部)
除了按条件查询之外,我们还可以查询zhifou
索引下的doc
类型中的所有文档,那就是查询全部:
/doc/_search
{
"query": {
"match_all": {}
}
}
match_all
的值为空,表示没有查询条件,那就是查询全部。就像select * from table_name
一样。
查询结果如下:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 1.0,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"name" : "魏行首",
"age" : 25,
"from" : "广云台",
"desc" : "仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!",
"tags" : [
"闭月",
"羞花"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"name" : "大娘子",
"age" : 18,
"from" : "sheng",
"desc" : "肤白貌美,娇憨可爱",
"tags" : [
"白",
"富",
"美"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"name" : "石头",
"age" : 29,
"from" : "gu",
"desc" : "粗中有细,狐假虎威",
"tags" : [
"粗",
"大",
"猛"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "顾老二",
"age" : 30,
"from" : "gu",
"desc" : "皮肤黑、武器长、性格直",
"tags" : [
"黑",
"长",
"直"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"name" : "龙套偏房",
"age" : 22,
"from" : "gu",
"desc" : "mmp,没怎么看,不知道怎么形容",
"tags" : [
"造数据",
"真",
"难"
]
}
}
]
}
}
返回的是zhifou
索引下doc
类型的所有文档!
match系列之match_phrase(短语查询)
我们现在已经对match有了基本的了解,match查询的是散列映射,包含了我们希望搜索的字段和字符串。也就说,只要文档中只要有我们希望的那个关键字,但也因此带来了一些问题。
首先来创建一些示例:
1
{
"title": "中国是世界上人口最多的国家"
}
PUT t1/doc/2
{
"title": "美国是世界上军事实力最强大的国家"
}
PUT t1/doc/3
{
"title": "北京是中国的首都"
}
现在,当我们以中国
作为搜索条件,我们希望只返回和中国
相关的文档。我们首先来使用match
查询:
/doc/_search
{
"query": {
"match": {
"title": "中国"
}
}
}
结果如下:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.68324494,
"hits" : [
{
"_index" : "t1",
"_type" : "doc",
"_id" : "1",
"_score" : 0.68324494,
"_source" : {
"title" : "中国是世界上人口最多的国家"
}
},
{
"_index" : "t1",
"_type" : "doc",
"_id" : "3",
"_score" : 0.5753642,
"_source" : {
"title" : "北京是中国的首都"
}
},
{
"_index" : "t1",
"_type" : "doc",
"_id" : "2",
"_score" : 0.39556286,
"_source" : {
"title" : "美国是世界上军事实力最强大的国家"
}
}
]
}
}
虽然如期的返回了中国
的文档。但是却把和美国
的文档也返回了,这并不是我们想要的。是怎么回事呢?因为这是elasticsearch在内部对文档做分词的时候,对于中文来说,就是一个字一个字分的,所以,我们搜中国
,中
和国
都符合条件,返回,而美国的国
也符合。
而我们认为中国
是个短语,是一个有具体含义的词。所以elasticsearch在处理中文分词方面比较弱势。后面会讲针对中文的插件。
但目前我们还有办法解决,那就是使用短语查询:
/doc/_search
{
"query": {
"match_phrase": {
"title": {
"query": "中国"
}
}
}
}
这里match_phrase
是在文档中搜索指定的词组,而中国
则正是一个词组,所以愉快的返回了。
那么,现在我们要想搜索中国
和世界
相关的文档,但又忘记其余部分了,怎么做呢?用match
也不行,那就继续用match_phrase
试试:
/doc/_search
{
"query": {
"match_phrase": {
"title": "中国世界"
}
}
}
返回结果也是空的,因为没有中国世界
这个短语。
我们搜索中国
和世界
这两个指定词组时,但又不清楚两个词组之间有多少别的词间隔。那么在搜的时候就要留有一些余地。这时就要用到了slop
了。相当于正则中的中国.*?世界
。这个间隔默认为0,导致我们刚才没有搜到,现在我们指定一个间隔。
/doc/_search
{
"query": {
"match_phrase": {
"title": {
"query": "中国世界",
"slop": 2
}
}
}
}
现在,两个词组之间有了2个词的间隔,这个时候,就可以查询到结果了:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.7445889,
"hits" : [
{
"_index" : "t1",
"_type" : "doc",
"_id" : "1",
"_score" : 0.7445889,
"_source" : {
"title" : "中国是世界上人口最多的国家"
}
}
]
}
}
slop
间隔你可以根据需要适当改动。
match系列之match_phrase_prefix(最左前缀查询)
现在凌晨2点半,单身狗小黑为了缓解寂寞,就准备搜索几个beautiful girl
来陪伴自己。但是由于英语没过2级,但单词beautiful
拼到bea
就不知道往下怎么拼了。这个时候,我们的智能搜索要帮他啊,elasticsearch就看自己的词库有啥事bea
开头的词,结果还真发现了两个:
1
{
"title": "maggie",
"desc": "beautiful girl you are beautiful so"
}
PUT t3/doc/2
{
"title": "sun and beach",
"desc": "I like basking on the beach"
}
但这里用match
和match_phrase
都不太合适,因为小黑输入的不是完整的词。那怎么办呢?我们用match_phrase_prefix
来搞:
/doc/_search
{
"query": {
"match_phrase_prefix": {
"desc": "bea"
}
}
}
结果如下:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.39556286,
"hits" : [
{
"_index" : "t3",
"_type" : "doc",
"_id" : "1",
"_score" : 0.39556286,
"_source" : {
"title" : "maggie",
"desc" : "beautiful girl,you are beautiful so"
}
},
{
"_index" : "t3",
"_type" : "doc",
"_id" : "2",
"_score" : 0.2876821,
"_source" : {
"title" : "sun and beach",
"desc" : "I like basking on the beach"
}
}
]
}
}
前缀查询是短语查询类似,但前缀查询可以更进一步的搜索词组,只不过它是和词组中最后一个词条进行前缀匹配(如搜这样的you are bea
)。应用也非常的广泛,比如搜索框的提示信息,当使用这种行为进行搜索时,最好通过max_expansions
来设置最大的前缀扩展数量,因为产生的结果会是一个很大的集合,不加限制的话,影响查询性能。
/doc/_search
{
"query": {
"match_phrase_prefix": {
"desc": {
"query": "bea",
"max_expansions": 1
}
}
}
}
但是,如果此时你去尝试加上max_expansions
测试后,你会发现并没有如你想想的一样,仅返回一条数据,而是返回了多条数据。max_expansions
执行的是搜索的编辑(Levenshtein)距离。那什么是编辑距离呢?编辑距离是一种计算两个字符串间的差异程度的字符串度量(string metric)。我们可以认为编辑距离就是从一个字符串修改到另一个字符串时,其中编辑单个字符(比如修改、插入、删除)所需要的最少次数。俄罗斯科学家Vladimir Levenshtein于1965年提出了这一概念。
我们再引用elasticsearch官网的一段话:该max_expansions设置定义了在停止搜索之前模糊查询将匹配的最大术语数,也可以对模糊查询的性能产生显着影响。但是,减少查询字词会产生负面影响,因为查询提前终止可能无法找到某些有效结果。重要的是要理解max_expansions查询限制在分片级别工作,这意味着即使设置为1,多个术语可能匹配,所有术语都来自不同的分片。此行为可能使其看起来好像max_expansions没有生效,因此请注意,计算返回的唯一术语不是确定是否有效的有效方法max_expansions。。
我想你也没看懂这句话是啥意思,但我们只需知道该参数工作于分片层,也就是Lucene部分,超出我们的研究范围了。
我们快刀斩乱麻的记住,使用前缀查询会非常的影响性能,要对结果集进行限制,就加上这个参数。
match系列之multi_match(多字段查询)
现在,我们有一个50个字段的索引,我们要在多个字段中查询同一个关键字,该怎么做呢?
1
{
"title": "maggie is beautiful girl",
"desc": "beautiful girl you are beautiful so"
}
PUT t3/doc/2
{
"title": "beautiful beach",
"desc": "I like basking on the beach,and you? beautiful girl"
}
我们先用原来的方法查询:
/doc/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "beautiful"
}
},
{
"match": {
"desc": "beautiful"
}
}
]
}
}
}
使用must
来限制两个字段(值)中必须同时含有关键字。这样虽然能达到目的,但是当有很多的字段呢,我们可以用multi_match
来做:
/doc/_search
{
"query": {
"multi_match": {
"query": "beautiful",
"fields": ["title", "desc"]
}
}
}
我们将多个字段放到fields
列表中即可。以达到匹配多个字段的目的。
除此之外,multi_match
甚至可以当做match_phrase
和match_phrase_prefix
使用,只需要指定type
类型即可:
/doc/_search
{
"query": {
"multi_match": {
"query": "gi",
"fields": ["title"],
"type": "phrase_prefix"
}
}
}
GET t3/doc/_search
{
"query": {
"multi_match": {
"query": "girl",
"fields": ["title"],
"type": "phrase"
}
}
}
小结:
- match:返回所有匹配的分词。
- match_all:查询全部。
- match_phrase:短语查询,在match的基础上进一步查询词组,可以指定
slop
分词间隔。 - match_phrase_prefix:前缀查询,根据短语中最后一个词组做前缀匹配,可以应用于搜索提示,但注意和
max_expanions
搭配。其实默认是50....... - multi_match:多字段查询,使用相当的灵活,可以完成
match_phrase
和match_phrase_prefix
的工作。
term查询
默认情况下,elasticsearch在对文档分析期间(将文档分词后保存到倒排索引中),会对文档进行分词,比如默认的标准分析器会对文档进行:
- 删除大多数的标点符号。
- 将文档分解为单个词条,我们称为token。
- 将token转为小写。
完事再保存到倒排索引上,当然,原文件还是要保存一分的,而倒排索引使用来查询的。
例如Beautiful girl!
,在经过分析后是这样的了:
POST _analyze
{
"analyzer": "standard",
"text": "Beautiful girl!"
}
# 结果
["beautiful", "girl"]
而当在使用match查询时,elasticsearch同样会对查询关键字进行分析:
PUT w10
{
"mappings": {
"doc":{
"properties":{
"t1":{
"type": "text"
}
}
}
}
}
PUT w10/doc/1
{
"t1": "Beautiful girl!"
}
PUT w10/doc/2
{
"t1": "sexy girl!"
}
GET w10/doc/_search
{
"query": {
"match": {
"t1": "Beautiful girl!"
}
}
}
也就是对查询关键字Beautiful girl!
进行分析,得到["beautiful", "girl"]
,然后分别将这两个单独的token去索引w10
中进行查询,结果就是将两篇文档都返回。
这在有些情况下是非常好用的,但是,如果我们想查询确切的词怎么办?也就是精确查询,将Beautiful girl!
当成一个token而不是分词后的两个token。
这就要用到了term查询了,term查询的是没有经过分析的查询关键字。
但是,这同样需要限制,如果你要查询的字段类型(如上例中的字段t1
类型是text
)是text
(因为elasticsearch会对文档进行分析,上面说过),那么你得到的可能是不尽如人意的结果或者压根没有结果:
/doc/_search
{
"query": {
"term": {
"t1": "Beautiful girl!"
}
}
}
如上面的查询,将不会有结果返回,因为索引w10
中的两篇文档在经过elasticsearch分析后没有一个分词是Beautiful girl!
,那此次查询结果为空也就好理解了。
所以,我们这里得到一个论证结果:不要使用term对类型是text的字段进行查询,要查询text类型的字段,请改用match查询。
学会了吗?那再来一个示例,你说一下结果是什么:
/doc/_search
{
"query": {
"term": {
"t1": "Beautiful"
}
}
}
答案是,没有结果返回!因为elasticsearch在对文档进行分析时,会经过小写!人家倒排索引上存的是小写的beautiful
,而我们查询的是大写的Beautiful
。
所以,要想有结果你这样:
/doc/_search
{
"query": {
"term": {
"t1": "beautiful"
}
}
}
那,term查询可以查询哪些类型的字段呢,例如elasticsearch会将keyword类型的字段当成一个token保存到倒排索引上,你可以将term和keyword结合使用。
最后,要想使用term查询多个精确的值怎么办?我只能说:亲,这里推荐卸载es呢!低调又不失尴尬的玩笑!
这里推荐使用terms
查询:
/doc/_search
{
"query": {
"terms": {
"t1": ["beautiful", "sexy"]
}
}
}
欢迎斧正,that's all see also: [官网:如何在Elasticsearch中使用模糊搜索](https://www.elastic.co/blog/found-fuzzy-search) | [elasticsearch模糊匹配max_expansions&min_similarity](https://stackoverflow.com/questions/7148615/elasticsearch-fuzzy-matching-max-expansions-min-similarity) | [elasticsearch 全文搜索 match_phrase_prefix 查询中的 max_expansions 该怎么用?](https://segmentfault.com/q/1010000017179306/a-1020000017196690)| [term query](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html#query-dsl-term-query)
五、elasticsearch之排序查询
#
我们之前学过几种查询方式了,但是结果顺序都是elasticsearch决定的。我们来给查询结果搞上我们定制的顺序。
#
1
{
"name":"顾老二",
"age":30,
"from": "gu",
"desc": "皮肤黑、武器长、性格直",
"tags": ["黑", "长", "直"]
}
PUT zhifou/doc/2
{
"name":"大娘子",
"age":18,
"from":"sheng",
"desc":"肤白貌美,娇憨可爱",
"tags":["白", "富","美"]
}
PUT zhifou/doc/3
{
"name":"龙套偏房",
"age":22,
"from":"gu",
"desc":"mmp,没怎么看,不知道怎么形容",
"tags":["造数据", "真","难"]
}
PUT zhifou/doc/4
{
"name":"石头",
"age":29,
"from":"gu",
"desc":"粗中有细,狐假虎威",
"tags":["粗", "大","猛"]
}
PUT zhifou/doc/5
{
"name":"魏行首",
"age":25,
"from":"广云台",
"desc":"仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!",
"tags":["闭月","羞花"]
}
#
#
想到排序,出现在脑海中的无非就是升(正)序和降(倒)序。比如我们查询顾府都有哪些人,并根据age字段按照降序,并且,我只想看nmae
和age
字段:
/doc/_search
{
"query": {
"match": {
"from": "gu"
}
},
"sort": [
{
"age": {
"order": "desc"
}
}
]
}
上例,在条件查询的基础上,我们又通过sort
来做排序,根据age
字段排序,是降序呢还是升序,由order
字段控制,desc
是降序。
结果如下:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : null,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_score" : null,
"_source" : {
"name" : "顾老二",
"age" : 30,
"from" : "gu",
"desc" : "皮肤黑、武器长、性格直",
"tags" : [
"黑",
"长",
"直"
]
},
"sort" : [
30
]
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "4",
"_score" : null,
"_source" : {
"name" : "石头",
"age" : 29,
"from" : "gu",
"desc" : "粗中有细,狐假虎威",
"tags" : [
"粗",
"大",
"猛"
]
},
"sort" : [
29
]
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "3",
"_score" : null,
"_source" : {
"name" : "龙套偏房",
"age" : 22,
"from" : "gu",
"desc" : "mmp,没怎么看,不知道怎么形容",
"tags" : [
"造数据",
"真",
"难"
]
},
"sort" : [
22
]
}
]
}
}
上例中,结果是以降序排列方式返回的。
#
那么想要升序怎么搞呢?
/doc/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"age": {
"order": "asc"
}
}
]
}
上例,想要以升序的方式排列,只需要将order
值换为asc
就可以了。
结果如下:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : null,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "2",
"_score" : null,
"_source" : {
"name" : "大娘子",
"age" : 18,
"from" : "sheng",
"desc" : "肤白貌美,娇憨可爱",
"tags" : [
"白",
"富",
"美"
]
},
"sort" : [
18
]
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "3",
"_score" : null,
"_source" : {
"name" : "龙套偏房",
"age" : 22,
"from" : "gu",
"desc" : "mmp,没怎么看,不知道怎么形容",
"tags" : [
"造数据",
"真",
"难"
]
},
"sort" : [
22
]
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "5",
"_score" : null,
"_source" : {
"name" : "魏行首",
"age" : 25,
"from" : "广云台",
"desc" : "仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!",
"tags" : [
"闭月",
"羞花"
]
},
"sort" : [
25
]
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "4",
"_score" : null,
"_source" : {
"name" : "石头",
"age" : 29,
"from" : "gu",
"desc" : "粗中有细,狐假虎威",
"tags" : [
"粗",
"大",
"猛"
]
},
"sort" : [
29
]
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_score" : null,
"_source" : {
"name" : "顾老二",
"age" : 30,
"from" : "gu",
"desc" : "皮肤黑、武器长、性格直",
"tags" : [
"黑",
"长",
"直"
]
},
"sort" : [
30
]
}
]
}
}
上例,可以看到结果是以age
从小到大的顺序返回结果。
#
那么,你可能会问,除了age
,能不能以别的属性作为排序条件啊?来试试:
/chengyuan/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"name": {
"order": "asc"
}
}
]
}
上例,我们以name
属性来排序,来看结果:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "zhifou",
"node": "wrtr435jSgi7_naKq2Y_zQ",
"reason": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
}
],
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
}
},
"status": 400
}
结果跟我们想象的不一样,报错了!
注意:在排序的过程中,只能使用可排序的属性进行排序。那么可以排序的属性有哪些呢?
- 数字
- 日期
其他的都不行!
欢迎斧正,that's all
六、elasticsearch之分页查询
#
随着数据量的 不断增大,查询结果也展示的越来越长,很多时候,我们仅是查询几条数据,不用全部显示出来。那又该怎么做呢?这里就要用到分页了。
#
1
{
"name":"顾老二",
"age":30,
"from": "gu",
"desc": "皮肤黑、武器长、性格直",
"tags": ["黑", "长", "直"]
}
PUT zhifou/doc/2
{
"name":"大娘子",
"age":18,
"from":"sheng",
"desc":"肤白貌美,娇憨可爱",
"tags":["白", "富","美"]
}
PUT zhifou/doc/3
{
"name":"龙套偏房",
"age":22,
"from":"gu",
"desc":"mmp,没怎么看,不知道怎么形容",
"tags":["造数据", "真","难"]
}
PUT zhifou/doc/4
{
"name":"石头",
"age":29,
"from":"gu",
"desc":"粗中有细,狐假虎威",
"tags":["粗", "大","猛"]
}
PUT zhifou/doc/5
{
"name":"魏行首",
"age":25,
"from":"广云台",
"desc":"仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!",
"tags":["闭月","羞花"]
}
#
我们来看看elasticsearch是怎么将结果分页的:
/doc/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"age": {
"order": "desc"
}
}
],
"from": 2,
"size": 1
}
上例,首先以age
降序排序,查询所有。并且在查询的时候,添加两个属性from
和size
来控制查询结果集的数据条数。
- from:从哪开始查
- size:返回几条结果
如上例的结果:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : null,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "5",
"_score" : null,
"_source" : {
"name" : "魏行首",
"age" : 25,
"from" : "广云台",
"desc" : "仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!",
"tags" : [
"闭月",
"羞花"
]
},
"sort" : [
25
]
}
]
}
}
上例中,在返回的结果集中,从第2条开始,返回1条数据。
那如果想要从第2条开始,返回2条结果怎么做呢?
/doc/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"age": {
"order": "desc"
}
}
],
"from": 2,
"size": 2
}
上例中,我们指定from
为2,意为从第2条开始返回,返回多少呢?size
意为2条。
还可以这样:
/doc/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"age": {
"order": "desc"
}
}
],
"from": 4,
"size": 2
}
上例中,从第4条开始返回2条数据。
结果如下:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : null,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "2",
"_score" : null,
"_source" : {
"name" : "大娘子",
"age" : 18,
"from" : "sheng",
"desc" : "肤白貌美,娇憨可爱",
"tags" : [
"白",
"富",
"美"
]
},
"sort" : [
18
]
}
]
}
}
上例中仅有一条数据,那是为啥呢?因为我们现在只有5条数据,从第4条开始查询,就只有1条符合条件,所以,就返回了1条数据。
学到这里,我们也可以看到,我们的查询条件越来越多,开始仅是简单查询,慢慢增加条件查询,增加排序,对返回结果进行限制。所以,我们可以说:对于elasticsearch
来说,所有的条件都是可插拔的,彼此之间用,
分割。比如说,我们在查询中,仅对返回结果进行限制:
/doc/_search
{
"query": {
"match_all": {}
},
"from": 4,
"size": 2
}
上例中,在所有的返回结果中,结果从4开始返回2条数据。
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 1.0,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"name" : "龙套偏房",
"age" : 22,
"from" : "gu",
"desc" : "mmp,没怎么看,不知道怎么形容",
"tags" : [
"造数据",
"真",
"难"
]
}
}
]
}
}
但我们只有1条符合条件的数据。
欢迎斧正,that's all
七、elasticsearch之布尔查询
前言
布尔查询是最常用的组合查询,根据子查询的规则,只有当文档满足所有子查询条件时,elasticsearch引擎才将结果返回。布尔查询支持的子查询条件共4中:
- must(and)
- should(or)
- must_not(not)
- filter
下面我们来看看每个子查询条件都是怎么玩的。
准备数据
1
{
"name":"顾老二",
"age":30,
"from": "gu",
"desc": "皮肤黑、武器长、性格直",
"tags": ["黑", "长", "直"]
}
PUT zhifou/doc/2
{
"name":"大娘子",
"age":18,
"from":"sheng",
"desc":"肤白貌美,娇憨可爱",
"tags":["白", "富","美"]
}
PUT zhifou/doc/3
{
"name":"龙套偏房",
"age":22,
"from":"gu",
"desc":"mmp,没怎么看,不知道怎么形容",
"tags":["造数据", "真","难"]
}
PUT zhifou/doc/4
{
"name":"石头",
"age":29,
"from":"gu",
"desc":"粗中有细,狐假虎威",
"tags":["粗", "大","猛"]
}
PUT zhifou/doc/5
{
"name":"魏行首",
"age":25,
"from":"广云台",
"desc":"仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!",
"tags":["闭月","羞花"]
}
must
现在,我们用布尔查询所有from
属性为gu
的数据:
/doc/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"from": "gu"
}
}
]
}
}
}
上例中,我们通过在bool
属性(字段)内使用must
来作为查询条件,那么条件是什么呢?条件同样被match
包围,就是from
为gu
的所有数据。
这里需要注意的是must
字段对应的是个列表,也就是说可以有多个并列的查询条件,一个文档满足各个子条件后才最终返回。
结果如下:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "4",
"_score" : 0.6931472,
"_source" : {
"name" : "石头",
"age" : 29,
"from" : "gu",
"desc" : "粗中有细,狐假虎威",
"tags" : [
"粗",
"大",
"猛"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"name" : "顾老二",
"age" : 30,
"from" : "gu",
"desc" : "皮肤黑、武器长、性格直",
"tags" : [
"黑",
"长",
"直"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "3",
"_score" : 0.2876821,
"_source" : {
"name" : "龙套偏房",
"age" : 22,
"from" : "gu",
"desc" : "mmp,没怎么看,不知道怎么形容",
"tags" : [
"造数据",
"真",
"难"
]
}
}
]
}
}
上例中,可以看到,所有from
属性为gu
的数据查询出来了。
那么,我们想要查询from
为gu
,并且age
为30
的数据怎么搞呢?
/doc/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"from": "gu"
}
},
{
"match": {
"age": 30
}
}
]
}
}
}
上例中,在must
列表中,在增加一个age
为30
的条件。
结果如下:
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.287682,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_score" : 1.287682,
"_source" : {
"name" : "顾老二",
"age" : 30,
"from" : "gu",
"desc" : "皮肤黑、武器长、性格直",
"tags" : [
"黑",
"长",
"直"
]
}
}
]
}
}
上例,符合条件的数据被成功查询出来了。
注意:现在你可能慢慢发现一个现象,所有属性值为列表的,都可以实现多个条件并列存在
should
那么,如果要查询只要是from
为gu
或者tags
为闭月
的数据怎么搞?
/doc/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"from": "gu"
}
},
{
"match": {
"tags": "闭月"
}
}
]
}
}
}
上例中,或关系的不能用must
的了,而是要用should
,只要符合其中一个条件就返回。
结果如下:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "4",
"_score" : 0.6931472,
"_source" : {
"name" : "石头",
"age" : 29,
"from" : "gu",
"desc" : "粗中有细,狐假虎威",
"tags" : [
"粗",
"大",
"猛"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "5",
"_score" : 0.5753642,
"_source" : {
"name" : "魏行首",
"age" : 25,
"from" : "广云台",
"desc" : "仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!",
"tags" : [
"闭月",
"羞花"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"name" : "顾老二",
"age" : 30,
"from" : "gu",
"desc" : "皮肤黑、武器长、性格直",
"tags" : [
"黑",
"长",
"直"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "3",
"_score" : 0.2876821,
"_source" : {
"name" : "龙套偏房",
"age" : 22,
"from" : "gu",
"desc" : "mmp,没怎么看,不知道怎么形容",
"tags" : [
"造数据",
"真",
"难"
]
}
}
]
}
}
返回了所有符合条件的结果。
must_not
那么,如果我想要查询from
既不是gu
并且tags
也不是可爱
,还有age
不是18
的数据怎么办?
/doc/_search
{
"query": {
"bool": {
"must_not": [
{
"match": {
"from": "gu"
}
},
{
"match": {
"tags": "可爱"
}
},
{
"match": {
"age": 18
}
}
]
}
}
}
上例中,must
和should
都不能使用,而是使用must_not
,又在内增加了一个age
为18
的条件。
结果如下:
{
"took" : 9,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"name" : "魏行首",
"age" : 25,
"from" : "广云台",
"desc" : "仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!",
"tags" : [
"闭月",
"羞花"
]
}
}
]
}
}
上例中,只有魏行首这一条数据,因为只有魏行首既不是顾家的人,标签没有可爱那一项,年龄也不等于18!
这里有点需要补充,条件中age
对应的18
你写成整形还是字符串都没啥……
filter
那么,如果要查询from
为gu
,age
大于25
的数据怎么查?
/doc/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"from": "gu"
}
}
],
"filter": {
"range": {
"age": {
"gt": 25
}
}
}
}
}
}
这里就用到了filter
条件过滤查询,过滤条件的范围用range
表示,gt
表示大于,大于多少呢?是25。
结果如下:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "4",
"_score" : 0.6931472,
"_source" : {
"name" : "石头",
"age" : 29,
"from" : "gu",
"desc" : "粗中有细,狐假虎威",
"tags" : [
"粗",
"大",
"猛"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"name" : "顾老二",
"age" : 30,
"from" : "gu",
"desc" : "皮肤黑、武器长、性格直",
"tags" : [
"黑",
"长",
"直"
]
}
}
]
}
}
上例中,age
大于25
的条件都已经筛选出来了。
那么要查询from
是gu
,age
大于等于30
的数据呢?
/doc/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"from": "gu"
}
}
],
"filter": {
"range": {
"age": {
"gte": 30
}
}
}
}
}
}
上例中,大于等于用gte
表示。
结果如下:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"name" : "顾老二",
"age" : 30,
"from" : "gu",
"desc" : "皮肤黑、武器长、性格直",
"tags" : [
"黑",
"长",
"直"
]
}
}
]
}
}
那么,要查询age
小于25
的呢?
/doc/_search
{
"query": {
"bool": {
"filter": {
"range": {
"age": {
"lt": 25
}
}
}
}
}
}
上例中,小于用lt
表示,结果如下:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.0,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "2",
"_score" : 0.0,
"_source" : {
"name" : "大娘子",
"age" : 18,
"from" : "sheng",
"desc" : "肤白貌美,娇憨可爱",
"tags" : [
"白",
"富",
"美"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "3",
"_score" : 0.0,
"_source" : {
"name" : "龙套偏房",
"age" : 22,
"from" : "gu",
"desc" : "mmp,没怎么看,不知道怎么形容",
"tags" : [
"造数据",
"真",
"难"
]
}
}
]
}
}
在查询一个age
小于等于18
的怎么办呢?
/doc/_search
{
"query": {
"bool": {
"filter": {
"range": {
"age": {
"lte": 18
}
}
}
}
}
}
上例中,小于等于用lte
表示。结果如下:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.0,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "2",
"_score" : 0.0,
"_source" : {
"name" : "大娘子",
"age" : 18,
"from" : "sheng",
"desc" : "肤白貌美,娇憨可爱",
"tags" : [
"白",
"富",
"美"
]
}
}
]
}
}
要查询from
是gu
,age
在25~30
之间的怎么查?
/doc/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"from": "gu"
}
}
],
"filter": {
"range": {
"age": {
"gte": 25,
"lte": 30
}
}
}
}
}
}
上例中,使用lte
和gte
来限定范围。结果如下:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "4",
"_score" : 0.6931472,
"_source" : {
"name" : "石头",
"age" : 29,
"from" : "gu",
"desc" : "粗中有细,狐假虎威",
"tags" : [
"粗",
"大",
"猛"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"name" : "顾老二",
"age" : 30,
"from" : "gu",
"desc" : "皮肤黑、武器长、性格直",
"tags" : [
"黑",
"长",
"直"
]
}
}
]
}
}
那么,要查询from
是sheng
,age
小于等于25
的怎么查呢?其实结果,我们可能已经想到了,只有一条,因为只有盛家小六符合结果。
/doc/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"from": "sheng"
}
}
],
"filter": {
"range": {
"age": {
"lte": 25
}
}
}
}
}
}
结果果然不出洒家所料!
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "2",
"_score" : 0.6931472,
"_source" : {
"name" : "大娘子",
"age" : 18,
"from" : "sheng",
"desc" : "肤白貌美,娇憨可爱",
"tags" : [
"白",
"富",
"美"
]
}
}
]
}
}
但是,洒家手一抖,将must
换为should
看看会发生什么?
/doc/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"from": "sheng"
}
}
],
"filter": {
"range": {
"age": {
"lte": 25
}
}
}
}
}
}
结果如下:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "2",
"_score" : 0.6931472,
"_source" : {
"name" : "大娘子",
"age" : 18,
"from" : "sheng",
"desc" : "肤白貌美,娇憨可爱",
"tags" : [
"白",
"富",
"美"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "5",
"_score" : 0.0,
"_source" : {
"name" : "魏行首",
"age" : 25,
"from" : "广云台",
"desc" : "仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!",
"tags" : [
"闭月",
"羞花"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "3",
"_score" : 0.0,
"_source" : {
"name" : "龙套偏房",
"age" : 22,
"from" : "gu",
"desc" : "mmp,没怎么看,不知道怎么形容",
"tags" : [
"造数据",
"真",
"难"
]
}
}
]
}
}
结果有点出乎意料,因为龙套偏房和魏行首不属于盛家,但也被查询出来了。那你要问了,怎么肥四?小老弟!这是因为在查询过程中,优先经过filter
过滤,因为should
是或关系,龙套偏房和魏行首的年龄符合了filter
过滤条件,也就被放行了!所以,如果在filter
过滤条件中使用should
的话,结果可能不会尽如人意!建议使用must
代替。
注意:filter
工作于bool
查询内。比如我们将刚才的查询条件改一下,把filter
从bool
中挪出来。
/doc/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"from": "sheng"
}
}
]
},
"filter": {
"range": {
"age": {
"lte": 25
}
}
}
}
}
如上例所示,我们将filter
与bool
平级,看查询结果:
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "[bool] malformed query, expected [END_OBJECT] but found [FIELD_NAME]",
"line": 12,
"col": 5
}
],
"type": "parsing_exception",
"reason": "[bool] malformed query, expected [END_OBJECT] but found [FIELD_NAME]",
"line": 12,
"col": 5
},
"status": 400
}
结果报错了!所以,filter
工作位置很重要。
小结:
-
must
:与关系,相当于关系型数据库中的and
。 -
should
:或关系,相当于关系型数据库中的or
。 -
must_not
:非关系,相当于关系型数据库中的not
。 -
filter
:过滤条件。 -
range
:条件筛选范围。 -
gt
:大于,相当于关系型数据库中的>
。 -
gte
:大于等于,相当于关系型数据库中的>=
。 -
lt
:小于,相当于关系型数据库中的<
。 -
lte
:小于等于,相当于关系型数据库中的<=
。
八、elasticsearch之查询结果过滤
#
在未来,一篇文档可能有很多(是的,很多!不要被我们的示例这仨俩字段所迷惑)的字段,每次查询都默认给我们返回全部,在数据量很大的时候,是的,比如我只想查姑娘的手机号,你一并给我个喜好啊、三围什么的算什么?是要告诉洒家,hi,小老弟,要撩妹么?
所以,我们对结果做一些过滤,清清白白的告诉elasticsearch,小老弟,我只是查!水!表!
#
1
{
"name":"顾老二",
"age":30,
"from": "gu",
"desc": "皮肤黑、武器长、性格直",
"tags": ["黑", "长", "直"]
}
#
现在,在所有的结果中,我只需要查看name
和age
两个属性,其他的不要怎么办?
/doc/_search
{
"query": {
"match": {
"name": "顾老二"
}
},
"_source": ["name", "age"]
}
如上例所示,在查询中,通过_source
来控制仅返回name
和age
属性。
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.8630463,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_score" : 0.8630463,
"_source" : {
"name" : "顾老二",
"age" : 30
}
}
]
}
}
在数据量很大的时候,我们需要什么字段,就返回什么字段就好了,提高查询效率。整个三围啥的可low了,有本事上图!
欢迎斧正,that's all
九、elasticsearch之高亮查询
前言
如果返回的结果集中很多符合条件的结果,那怎么能一眼就能看到我们想要的那个结果呢?比如下面网站所示的那样,我们搜索elasticsearch
,在结果集中,将所有elasticsearch
高亮显示?
如上图我们搜索思否一样。
我们该怎么做呢?
准备数据
4
{
"name":"石头",
"age":29,
"from":"gu",
"desc":"粗中有细,狐假虎威",
"tags":["粗", "大","猛"]
}
默认高亮显示
我们来查询:
/doc/_search
{
"query": {
"match": {
"name": "石头"
}
},
"highlight": {
"fields": {
"name": {}
}
}
}
上例中,我们使用highlight
属性来实现结果高亮显示,需要的字段名称添加到fields
内即可,elasticsearch
会自动帮我们实现高亮。
结果如下:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.5098256,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "4",
"_score" : 1.5098256,
"_source" : {
"name" : "石头",
"age" : 29,
"from" : "gu",
"desc" : "粗中有细,狐假虎威",
"tags" : [
"粗",
"大",
"猛"
]
},
"highlight" : {
"name" : [
"<em>石</em><em>头</em>"
]
}
}
]
}
}
上例中,elasticsearch
会自动将检索结果用标签包裹起来,用于在页面中渲染。
自定义高亮显示
但是,你可能会问,我不想用em
标签, 我这么牛逼,应该用个b
标签啊!好的,elasticsearch
同样考虑到你很牛逼,所以,我们可以自定义标签。
/chengyuan/_search
{
"query": {
"match": {
"from": "gu"
}
},
"highlight": {
"pre_tags": "<b class='key' style='color:red'>",
"post_tags": "</b>",
"fields": {
"from": {}
}
}
}
上例中,在highlight
中,pre_tags
用来实现我们的自定义标签的前半部分,在这里,我们也可以为自定义的标签添加属性和样式。post_tags
实现标签的后半部分,组成一个完整的标签。至于标签中的内容,则还是交给fields
来完成。
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.5753642,
"hits" : [
{
"_index" : "zhifou",
"_type" : "chengyuan",
"_id" : "1",
"_score" : 0.5753642,
"_source" : {
"name" : "老二",
"age" : 30,
"sex" : "male",
"birth" : "1070-10-11",
"from" : "gu",
"desc" : "皮肤黑,武器长,性格直",
"tags" : [
"黑",
"长",
"直"
]
},
"highlight" : {
"name" : [
"<b class='key' style='color:red'>老</b><b class='key' style='color:red'>二</b>"
]
}
}
]
}
}
需要注意的是:自定义标签中属性或样式中的逗号一律用英文状态的单引号表示,应该与外部elasticsearch
语法的双引号区分开。
至此,基本的查询,我们已经能胜任绝大数的应用场景。接下来我们来看一下更多关于结果处理的函数。
欢迎斧正,that's all
十、elasticsearch之聚合函数
#
聚合函数大家都不陌生,elasticsearch中也没玩出新花样,所以,这一章相对简单,只需要记得:
- avg
- max
- min
- sum
以及各自的用法即可。先来看求平均。
#
1
{
"name":"顾老二",
"age":30,
"from": "gu",
"desc": "皮肤黑、武器长、性格直",
"tags": ["黑", "长", "直"]
}
PUT zhifou/doc/2
{
"name":"大娘子",
"age":18,
"from":"sheng",
"desc":"肤白貌美,娇憨可爱",
"tags":["白", "富","美"]
}
PUT zhifou/doc/3
{
"name":"龙套偏房",
"age":22,
"from":"gu",
"desc":"mmp,没怎么看,不知道怎么形容",
"tags":["造数据", "真","难"]
}
PUT zhifou/doc/4
{
"name":"石头",
"age":29,
"from":"gu",
"desc":"粗中有细,狐假虎威",
"tags":["粗", "大","猛"]
}
PUT zhifou/doc/5
{
"name":"魏行首",
"age":25,
"from":"广云台",
"desc":"仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!",
"tags":["闭月","羞花"]
}
#
现在的需求是查询from
是gu
的人的平均年龄。
/doc/_search
{
"query": {
"match": {
"from": "gu"
}
},
"aggs": {
"my_avg": {
"avg": {
"field": "age"
}
}
},
"_source": ["name", "age"]
}
上例中,首先匹配查询from
是gu
的数据。在此基础上做查询平均值的操作,这里就用到了聚合函数,其语法被封装在aggs
中,而my_avg
则是为查询结果起个别名,封装了计算出的平均值。那么,要以什么属性作为条件呢?是age
年龄,查年龄的什么呢?是avg
,查平均年龄。
返回结果如下:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "4",
"_score" : 0.6931472,
"_source" : {
"name" : "石头",
"age" : 29
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"name" : "顾老二",
"age" : 30
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "3",
"_score" : 0.2876821,
"_source" : {
"name" : "龙套偏房",
"age" : 22
}
}
]
},
"aggregations" : {
"my_avg" : {
"value" : 27.0
}
}
}
上例中,在查询结果的最后是平均值信息,可以看到是27岁。
虽然我们已经使用_source
对字段做了过滤,但是还不够。我不想看都有哪些数据,只想看平均值怎么办?别忘了size
!
/doc/_search
{
"query": {
"match": {
"from": "gu"
}
},
"aggs": {
"my_avg": {
"avg": {
"field": "age"
}
}
},
"size": 0,
"_source": ["name", "age"]
}
上例中,只需要在原来的查询基础上,增加一个size
就可以了,输出几条结果,我们写上0,就是输出0条查询结果。
查询结果如下:
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"my_avg" : {
"value" : 27.0
}
}
}
查询结果中,我们看hits
下的total
值是3,说明有三条符合结果的数据。最后面返回平均值是27。
#
那怎么查最大值呢?
/doc/_search
{
"query": {
"match": {
"from": "gu"
}
},
"aggs": {
"my_max": {
"max": {
"field": "age"
}
}
},
"size": 0
}
上例中,只需要在查询条件中将avg
替换成max
即可。
返回结果如下:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"my_max" : {
"value" : 30.0
}
}
}
在返回的结果中,可以看到年龄最大的是30岁。
#
那怎么查最小值呢?
/doc/_search
{
"query": {
"match": {
"from": "gu"
}
},
"aggs": {
"my_min": {
"min": {
"field": "age"
}
}
},
"size": 0
}
最小值则用min
表示。
返回结果如下:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"my_min" : {
"value" : 22.0
}
}
}
返回结果中,年龄最小的是22岁。
#
那么,要是想知道它们的年龄总和是多少怎么办呢?
/doc/_search
{
"query": {
"match": {
"from": "gu"
}
},
"aggs": {
"my_sum": {
"sum": {
"field": "age"
}
}
},
"size": 0
}
上例中,求和用sum
表示。
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"my_sum" : {
"value" : 81.0
}
}
}
从返回的结果可以发现,年龄总和是81岁。
#
现在我想要查询所有人的年龄段,并且按照15~20,20~25,25~30
分组,并且算出每组的平均年龄。
分析需求,首先我们应该先把分组做出来。
/doc/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"age_group": {
"range": {
"field": "age",
"ranges": [
{
"from": 15,
"to": 20
},
{
"from": 20,
"to": 25
},
{
"from": 25,
"to": 30
}
]
}
}
}
}
上例中,在aggs
的自定义别名age_group
中,使用range
来做分组,field
是以age
为分组,分组使用ranges
来做,from
和to
是范围,我们根据需求做出三组。
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"age_group" : {
"buckets" : [
{
"key" : "15.0-20.0",
"from" : 15.0,
"to" : 20.0,
"doc_count" : 1
},
{
"key" : "20.0-25.0",
"from" : 20.0,
"to" : 25.0,
"doc_count" : 1
},
{
"key" : "25.0-30.0",
"from" : 25.0,
"to" : 30.0,
"doc_count" : 2
}
]
}
}
}
返回的结果中可以看到,已经拿到了三个分组。doc_count
为该组内有几条数据,此次共分为三组,查询出4条内容。还有一条数据的age
属性值是30
,不在分组的范围内!
那么接下来,我们就要对每个小组内的数据做平均年龄处理。
/doc/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"age_group": {
"range": {
"field": "age",
"ranges": [
{
"from": 15,
"to": 20
},
{
"from": 20,
"to": 25
},
{
"from": 25,
"to": 30
}
]
},
"aggs": {
"my_avg": {
"avg": {
"field": "age"
}
}
}
}
}
}
上例中,在分组下面,我们使用aggs
对age
做平均数处理,这样就可以了。
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"age_group" : {
"buckets" : [
{
"key" : "15.0-20.0",
"from" : 15.0,
"to" : 20.0,
"doc_count" : 1,
"my_avg" : {
"value" : 18.0
}
},
{
"key" : "20.0-25.0",
"from" : 20.0,
"to" : 25.0,
"doc_count" : 1,
"my_avg" : {
"value" : 22.0
}
},
{
"key" : "25.0-30.0",
"from" : 25.0,
"to" : 30.0,
"doc_count" : 2,
"my_avg" : {
"value" : 27.0
}
}
]
}
}
}
在结果中,我们可以清晰的看到每组的平均年龄(my_avg
的value
中)。
注意:聚合函数的使用,一定是先查出结果,然后对结果使用聚合函数做处理
小结:
- avg:求平均
- max:最大值
- min:最小值
- sum:求和
欢迎斧正,that's all
前言
现在,让我们启动一个节点和kibana。
接下来的一切操作都在kibana
中Dev Tools
下的Console
里完成。
创建一篇文档
现在,我们试图将小黑的小姨妈的个人信息录入elasticsearch。我们只要输入:
1
{
"name": "小黑的小姨妈",
"age": 18
}
PUT表示创建命令。虽然命令可以小写,但是我们推荐大写。在以REST ful
风格返回的结果中:
{
"_index" : "t1",
"_type" : "type1",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
结果中的result
则是操作类型,现在是created
,表示第一次创建。如果我们再次点击执行该命令,那么result
则会是updated
。我们细心则会发现_version
开始是1,现在你每点击一次就会增加一次。表示第几次更改。
查询所有索引
现在,我们再来学习一条命令:
GET _cat/indices?v
返回的结果如下图:
上图中,展示当前集群中索引情况,包括,索引的健康状况、UUID、主副分片个数、大小等信息。你发现我们创建的t1
索引了吗?
查询指定的索引信息
我们来单独看看t1
索引:
GET t1
返回的结果如下:
{
"t1" : {
"aliases" : { },
"mappings" : {
"doc" : {
"properties" : {
"age" : {
"type" : "long"
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
},
"settings" : {
"index" : {
"creation_date" : "1553163739688",
"number_of_shards" : "5",
"number_of_replicas" : "1",
"uuid" : "_7jNW5XATheeK84zKkPwlw",
"version" : {
"created" : "6050499"
},
"provided_name" : "t1"
}
}
}
}
返回了t1
索引的创建信息。
查询文档信息
那我们来查看我们刚才创建的那篇文档:
1
返回的结果如下:
{
"_index" : "t1",
"_type" : "doc",
"_id" : "1",
"_version" : 2,
"found" : true,
"_source" : {
"name" : "小黑的小姨妈",
"age" : 18
}
}
返回了我们刚才创建的文档信息。
我们再来为小黑添加两个姨妈:
2
{
"name": "小黑的二姨妈",
"age": 16
}
PUT t1/doc/3
{
"name": "小黑的三姨妈",
"age": 19
}
刚才,我们学会了查询小黑的一个姨妈,那么该如何查询所有姨妈呢?
GET t1/doc/_search
返回结果如下:
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 1.0,
"hits" : [
{
"_index" : "t1",
"_type" : "doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"name" : "小黑的二姨妈",
"age" : 16
}
},
{
"_index" : "t1",
"_type" : "doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "小黑的小姨妈",
"age" : 18
}
},
{
"_index" : "t1",
"_type" : "doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"name" : "小黑的三姨妈",
"age" : 19
}
}
]
}
}
现在小黑跟他的姨妈们闹了别扭,就想删除这个姨妈,该怎么办呢?
删除指定索引
我们其实直接删除这个t1
索引就可以了:
DELETE /t1
DELETE 是删除命令,返回结果如下:
{
"acknowledged" : true
}
返回结果提示删除确认成功。
如果此时再查询索引情况,则会发现t1
已经不存在了,所有的文档也就不存在了。
#
我们之前已经学过了elasticsearch的简单操作了。
接下来,洒家要给大家讲述一个真实的故事..........
故事一定是要伴随着赵忠祥老师的声音作为开始,雨季就要来临了,又到了动物们发情的季节了......
#
《知否知否,应是绿肥红瘦之改编》,编剧:张开
让我们将镜头切换到北宋时期某位官人的府邸,府里男主人是:
1
{
"name":"顾老二",
"age":30,
"from": "gu",
"desc": "皮肤黑、武器长、性格直",
"tags": ["黑", "长", "直"]
}
他明处貌似还有俩老婆:
2
{
"name":"大娘子",
"age":18,
"from":"sheng",
"desc":"肤白貌美,娇憨可爱",
"tags":["白", "富","美"]
}
PUT zhifou/doc/3
{
"name":"龙套偏房",
"age":22,
"from":"gu",
"desc":"mmp,没怎么看,不知道怎么形容",
"tags":["造数据", "真","难"]
}
家里红旗不倒,家外彩旗飘摇:
4
{
"name":"石头",
"age":29,
"from":"gu",
"desc":"粗中有细,狐假虎威",
"tags":["粗", "大","猛"]
}
PUT zhifou/doc/5
{
"name":"魏行首",
"age":25,
"from":"广云台",
"desc":"仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!",
"tags":["闭月","羞花"]
}
注意:当执行PUT
命令时,如果数据不存在,则新增该条数据,如果数据存在则修改该条数据。
咱们通过GET
命令查询一下:
1
结果如下:
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source" : {
"name" : "顾老二",
"age" : 30,
"from" : "gu",
"desc" : "皮肤黑、武器长、性格直",
"tags" : [
"黑",
"长",
"直"
]
}
}
查询也没啥问题,但是你可能说了,人家老二是黄种人,怎么是黑的呢?好吧咱改改desc
和tags
:
1
{
"desc":"皮肤很黄,武器很长,性格很直",
"tags":["很黄","很长", "很直"]
}
上例,我们仅修改了desc
和tags
两处,而name
、age
和from
三个属性没有变化,我们可以忽略不写吗?查查看:
1
结果如下:
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_version" : 3,
"found" : true,
"_source" : {
"desc" : "皮肤很黄,武器很长,性格很直",
"tags" : [
"很黄",
"很长",
"很直"
]
}
}
哎呀,出事故了!修改是修改了,但结果不太理想啊,因为name
、age
和from
属性都没啦!
注意:PUT
命令,在做修改操作时,如果未指定其他的属性,则按照指定的属性进行修改操作。也就是如上例所示的那样,我们修改时只修改了desc
和tags
两个属性,其他的属性并没有一起添加进去。
很明显,这是病!dai治!怎么治?上车,咱们继续往下走!
#
让我们首先恢复一下事故现场:
1
{
"name":"顾老二",
"age":30,
"from": "gu",
"desc": "皮肤黑、武器长、性格直",
"tags": ["黑", "长", "直"]
}
我们要将黑修改成黄:
1/_update
{
"doc": {
"desc": "皮肤很黄,武器很长,性格很直",
"tags": ["很黄","很长", "很直"]
}
}
上例中,我们使用POST
命令,在id
后面跟_update
,要修改的内容放到doc
文档(属性)中即可。
我们再来查询一次:
1
结果如下:
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_version" : 5,
"found" : true,
"_source" : {
"name" : "顾老二",
"age" : 30,
"from" : "gu",
"desc" : "皮肤很黄,武器很长,性格很直",
"tags" : [
"很黄",
"很长",
"很直"
]
}
}
结果如上例所示,现在其他的属性没有变化,只有desc
和tags
属性被修改。
注意:POST
命令,这里可用来执行修改操作(还有其他的功能),POST
命令配合_update
完成修改操作,指定修改的内容放到doc
中。
写了这么多,我也发现我上面有讲的不对
的地方——石头不是跟顾老二不清不楚,石头是跟小桃不清不楚!好吧,刚才那个数据是一个错误示范!我们这就把它干掉!
#
4
很简单,通过DELETE
命令,就可以删除掉那个错误示范了!
删除效果如下:
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "4",
"_version" : 4,
"result" : "deleted",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 4,
"_primary_term" : 1
}
我们再来查询一遍:
4
结果如下:
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "4",
"found" : false
}
上例中,found:false
表示查询数据不存在。
#
我们上面已经不知不觉的使用熟悉这种简单查询方式,通过 GET
命令查询指定文档:
1
结果如下:
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_version" : 5,
"found" : true,
"_source" : {
"name" : "顾老二",
"age" : 30,
"from" : "gu",
"desc" : "皮肤很黄,武器很长,性格很直",
"tags" : [
"很黄",
"很长",
"很直"
]
}
}
查询没那么简单,预知后事如何,请听下回分解:复杂查询
that's all
#
简单的没挑战,来点复杂的,比如查看来自顾家的都有哪些人怎么查呢?elasticsearch提供两种查询方式:
- 查询字符串(query string),简单查询,就像是像传递URL参数一样去传递查询语句,被称为简单搜索或查询字符串(query string)搜索。
- 另外一种是通过DSL语句来进行查询,被称为DSL查询(Query DSL),DSL是Elasticsearch提供的一种丰富且灵活的查询语言,该语言以json请求体的形式出现,通过restful请求与Elasticsearch进行交互。
#
1
{
"name":"顾老二",
"age":30,
"from": "gu",
"desc": "皮肤黑、武器长、性格直",
"tags": ["黑", "长", "直"]
}
PUT zhifou/doc/2
{
"name":"大娘子",
"age":18,
"from":"sheng",
"desc":"肤白貌美,娇憨可爱",
"tags":["白", "富","美"]
}
PUT zhifou/doc/3
{
"name":"龙套偏房",
"age":22,
"from":"gu",
"desc":"mmp,没怎么看,不知道怎么形容",
"tags":["造数据", "真","难"]
}
PUT zhifou/doc/4
{
"name":"石头",
"age":29,
"from":"gu",
"desc":"粗中有细,狐假虎威",
"tags":["粗", "大","猛"]
}
PUT zhifou/doc/5
{
"name":"魏行首",
"age":25,
"from":"广云台",
"desc":"仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!",
"tags":["闭月","羞花"]
}
#
/doc/_search?q=from:gu
还是使用GET
命令,通过_serarch
查询,查询条件是什么呢?条件是from
属性是gu
家的人都有哪些。最后,别忘了_search
和from
属性中间的英文分隔符?
。
结果如下:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "4",
"_score" : 0.6931472,
"_source" : {
"name" : "石头",
"age" : 29,
"from" : "gu",
"desc" : "粗中有细,狐假虎威",
"tags" : [
"粗",
"大",
"猛"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"name" : "顾老二",
"age" : 30,
"from" : "gu",
"desc" : "皮肤黑、武器长、性格直",
"tags" : [
"黑",
"长",
"直"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "3",
"_score" : 0.2876821,
"_source" : {
"name" : "龙套偏房",
"age" : 22,
"from" : "gu",
"desc" : "mmp,没怎么看,不知道怎么形容",
"tags" : [
"造数据",
"真",
"难"
]
}
}
]
}
}
我们来重点说下hits
,hits
是返回的结果集——所有from
属性为gu
的结果集。重点中的重点是_score
得分,得分是什么呢?根据算法算出跟查询条件的匹配度,匹配度高得分就高。后面再说这个算法是怎么回事。
#
我们现在使用DSL方式,来完成刚才的查询,查看来自顾家的都有哪些人。
/doc/_search
{
"query": {
"match": {
"from": "gu"
}
}
}
上例,查询条件是一步步构建出来的,将查询条件添加到match
中即可,而match
则是查询所有from
字段的值中含有gu
的结果就会返回。
当然结果没啥变化:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "4",
"_score" : 0.6931472,
"_source" : {
"name" : "石头",
"age" : 29,
"from" : "gu",
"desc" : "粗中有细,狐假虎威",
"tags" : [
"粗",
"大",
"猛"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"name" : "顾老二",
"age" : 30,
"from" : "gu",
"desc" : "皮肤黑、武器长、性格直",
"tags" : [
"黑",
"长",
"直"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "3",
"_score" : 0.2876821,
"_source" : {
"name" : "龙套偏房",
"age" : 22,
"from" : "gu",
"desc" : "mmp,没怎么看,不知道怎么形容",
"tags" : [
"造数据",
"真",
"难"
]
}
}
]
}
}
see also:[Elasticsearch查询规则(一)match和term](https://www.jianshu.com/p/eb30eee13923) that's all
前言
现在,是时候学习两种最常用的查询方法了,match和term了。
车速太快,系好安全带,睁大眼,不要在前进的道路上迷失了!
match查询
准备数据
1
{
"name":"顾老二",
"age":30,
"from": "gu",
"desc": "皮肤黑、武器长、性格直",
"tags": ["黑", "长", "直"]
}
PUT zhifou/doc/2
{
"name":"大娘子",
"age":18,
"from":"sheng",
"desc":"肤白貌美,娇憨可爱",
"tags":["白", "富","美"]
}
PUT zhifou/doc/3
{
"name":"龙套偏房",
"age":22,
"from":"gu",
"desc":"mmp,没怎么看,不知道怎么形容",
"tags":["造数据", "真","难"]
}
PUT zhifou/doc/4
{
"name":"石头",
"age":29,
"from":"gu",
"desc":"粗中有细,狐假虎威",
"tags":["粗", "大","猛"]
}
PUT zhifou/doc/5
{
"name":"魏行首",
"age":25,
"from":"广云台",
"desc":"仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!",
"tags":["闭月","羞花"]
}
match系列之match(按条件查询)
我们查看来自顾家的都有哪些人。
/doc/_search
{
"query": {
"match": {
"from": "gu"
}
}
}
上例,查询条件是一步步构建出来的,将查询条件添加到match
中即可,而match
则是查询所有from
字段的值中含有gu
的结果就会返回。
结果如下:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "4",
"_score" : 0.6931472,
"_source" : {
"name" : "石头",
"age" : 29,
"from" : "gu",
"desc" : "粗中有细,狐假虎威",
"tags" : [
"粗",
"大",
"猛"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"name" : "顾老二",
"age" : 30,
"from" : "gu",
"desc" : "皮肤黑、武器长、性格直",
"tags" : [
"黑",
"长",
"直"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "3",
"_score" : 0.2876821,
"_source" : {
"name" : "龙套偏房",
"age" : 22,
"from" : "gu",
"desc" : "mmp,没怎么看,不知道怎么形容",
"tags" : [
"造数据",
"真",
"难"
]
}
}
]
}
}
match系列之match_all(查询全部)
除了按条件查询之外,我们还可以查询zhifou
索引下的doc
类型中的所有文档,那就是查询全部:
/doc/_search
{
"query": {
"match_all": {}
}
}
match_all
的值为空,表示没有查询条件,那就是查询全部。就像select * from table_name
一样。
查询结果如下:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 1.0,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"name" : "魏行首",
"age" : 25,
"from" : "广云台",
"desc" : "仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!",
"tags" : [
"闭月",
"羞花"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"name" : "大娘子",
"age" : 18,
"from" : "sheng",
"desc" : "肤白貌美,娇憨可爱",
"tags" : [
"白",
"富",
"美"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"name" : "石头",
"age" : 29,
"from" : "gu",
"desc" : "粗中有细,狐假虎威",
"tags" : [
"粗",
"大",
"猛"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "顾老二",
"age" : 30,
"from" : "gu",
"desc" : "皮肤黑、武器长、性格直",
"tags" : [
"黑",
"长",
"直"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"name" : "龙套偏房",
"age" : 22,
"from" : "gu",
"desc" : "mmp,没怎么看,不知道怎么形容",
"tags" : [
"造数据",
"真",
"难"
]
}
}
]
}
}
返回的是zhifou
索引下doc
类型的所有文档!
match系列之match_phrase(短语查询)
我们现在已经对match有了基本的了解,match查询的是散列映射,包含了我们希望搜索的字段和字符串。也就说,只要文档中只要有我们希望的那个关键字,但也因此带来了一些问题。
首先来创建一些示例:
1
{
"title": "中国是世界上人口最多的国家"
}
PUT t1/doc/2
{
"title": "美国是世界上军事实力最强大的国家"
}
PUT t1/doc/3
{
"title": "北京是中国的首都"
}
现在,当我们以中国
作为搜索条件,我们希望只返回和中国
相关的文档。我们首先来使用match
查询:
/doc/_search
{
"query": {
"match": {
"title": "中国"
}
}
}
结果如下:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.68324494,
"hits" : [
{
"_index" : "t1",
"_type" : "doc",
"_id" : "1",
"_score" : 0.68324494,
"_source" : {
"title" : "中国是世界上人口最多的国家"
}
},
{
"_index" : "t1",
"_type" : "doc",
"_id" : "3",
"_score" : 0.5753642,
"_source" : {
"title" : "北京是中国的首都"
}
},
{
"_index" : "t1",
"_type" : "doc",
"_id" : "2",
"_score" : 0.39556286,
"_source" : {
"title" : "美国是世界上军事实力最强大的国家"
}
}
]
}
}
虽然如期的返回了中国
的文档。但是却把和美国
的文档也返回了,这并不是我们想要的。是怎么回事呢?因为这是elasticsearch在内部对文档做分词的时候,对于中文来说,就是一个字一个字分的,所以,我们搜中国
,中
和国
都符合条件,返回,而美国的国
也符合。
而我们认为中国
是个短语,是一个有具体含义的词。所以elasticsearch在处理中文分词方面比较弱势。后面会讲针对中文的插件。
但目前我们还有办法解决,那就是使用短语查询:
/doc/_search
{
"query": {
"match_phrase": {
"title": {
"query": "中国"
}
}
}
}
这里match_phrase
是在文档中搜索指定的词组,而中国
则正是一个词组,所以愉快的返回了。
那么,现在我们要想搜索中国
和世界
相关的文档,但又忘记其余部分了,怎么做呢?用match
也不行,那就继续用match_phrase
试试:
/doc/_search
{
"query": {
"match_phrase": {
"title": "中国世界"
}
}
}
返回结果也是空的,因为没有中国世界
这个短语。
我们搜索中国
和世界
这两个指定词组时,但又不清楚两个词组之间有多少别的词间隔。那么在搜的时候就要留有一些余地。这时就要用到了slop
了。相当于正则中的中国.*?世界
。这个间隔默认为0,导致我们刚才没有搜到,现在我们指定一个间隔。
/doc/_search
{
"query": {
"match_phrase": {
"title": {
"query": "中国世界",
"slop": 2
}
}
}
}
现在,两个词组之间有了2个词的间隔,这个时候,就可以查询到结果了:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.7445889,
"hits" : [
{
"_index" : "t1",
"_type" : "doc",
"_id" : "1",
"_score" : 0.7445889,
"_source" : {
"title" : "中国是世界上人口最多的国家"
}
}
]
}
}
slop
间隔你可以根据需要适当改动。
match系列之match_phrase_prefix(最左前缀查询)
现在凌晨2点半,单身狗小黑为了缓解寂寞,就准备搜索几个beautiful girl
来陪伴自己。但是由于英语没过2级,但单词beautiful
拼到bea
就不知道往下怎么拼了。这个时候,我们的智能搜索要帮他啊,elasticsearch就看自己的词库有啥事bea
开头的词,结果还真发现了两个:
1
{
"title": "maggie",
"desc": "beautiful girl you are beautiful so"
}
PUT t3/doc/2
{
"title": "sun and beach",
"desc": "I like basking on the beach"
}
但这里用match
和match_phrase
都不太合适,因为小黑输入的不是完整的词。那怎么办呢?我们用match_phrase_prefix
来搞:
/doc/_search
{
"query": {
"match_phrase_prefix": {
"desc": "bea"
}
}
}
结果如下:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.39556286,
"hits" : [
{
"_index" : "t3",
"_type" : "doc",
"_id" : "1",
"_score" : 0.39556286,
"_source" : {
"title" : "maggie",
"desc" : "beautiful girl,you are beautiful so"
}
},
{
"_index" : "t3",
"_type" : "doc",
"_id" : "2",
"_score" : 0.2876821,
"_source" : {
"title" : "sun and beach",
"desc" : "I like basking on the beach"
}
}
]
}
}
前缀查询是短语查询类似,但前缀查询可以更进一步的搜索词组,只不过它是和词组中最后一个词条进行前缀匹配(如搜这样的you are bea
)。应用也非常的广泛,比如搜索框的提示信息,当使用这种行为进行搜索时,最好通过max_expansions
来设置最大的前缀扩展数量,因为产生的结果会是一个很大的集合,不加限制的话,影响查询性能。
/doc/_search
{
"query": {
"match_phrase_prefix": {
"desc": {
"query": "bea",
"max_expansions": 1
}
}
}
}
但是,如果此时你去尝试加上max_expansions
测试后,你会发现并没有如你想想的一样,仅返回一条数据,而是返回了多条数据。max_expansions
执行的是搜索的编辑(Levenshtein)距离。那什么是编辑距离呢?编辑距离是一种计算两个字符串间的差异程度的字符串度量(string metric)。我们可以认为编辑距离就是从一个字符串修改到另一个字符串时,其中编辑单个字符(比如修改、插入、删除)所需要的最少次数。俄罗斯科学家Vladimir Levenshtein于1965年提出了这一概念。
我们再引用elasticsearch官网的一段话:该max_expansions设置定义了在停止搜索之前模糊查询将匹配的最大术语数,也可以对模糊查询的性能产生显着影响。但是,减少查询字词会产生负面影响,因为查询提前终止可能无法找到某些有效结果。重要的是要理解max_expansions查询限制在分片级别工作,这意味着即使设置为1,多个术语可能匹配,所有术语都来自不同的分片。此行为可能使其看起来好像max_expansions没有生效,因此请注意,计算返回的唯一术语不是确定是否有效的有效方法max_expansions。。
我想你也没看懂这句话是啥意思,但我们只需知道该参数工作于分片层,也就是Lucene部分,超出我们的研究范围了。
我们快刀斩乱麻的记住,使用前缀查询会非常的影响性能,要对结果集进行限制,就加上这个参数。
match系列之multi_match(多字段查询)
现在,我们有一个50个字段的索引,我们要在多个字段中查询同一个关键字,该怎么做呢?
1
{
"title": "maggie is beautiful girl",
"desc": "beautiful girl you are beautiful so"
}
PUT t3/doc/2
{
"title": "beautiful beach",
"desc": "I like basking on the beach,and you? beautiful girl"
}
我们先用原来的方法查询:
/doc/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "beautiful"
}
},
{
"match": {
"desc": "beautiful"
}
}
]
}
}
}
使用must
来限制两个字段(值)中必须同时含有关键字。这样虽然能达到目的,但是当有很多的字段呢,我们可以用multi_match
来做:
/doc/_search
{
"query": {
"multi_match": {
"query": "beautiful",
"fields": ["title", "desc"]
}
}
}
我们将多个字段放到fields
列表中即可。以达到匹配多个字段的目的。
除此之外,multi_match
甚至可以当做match_phrase
和match_phrase_prefix
使用,只需要指定type
类型即可:
/doc/_search
{
"query": {
"multi_match": {
"query": "gi",
"fields": ["title"],
"type": "phrase_prefix"
}
}
}
GET t3/doc/_search
{
"query": {
"multi_match": {
"query": "girl",
"fields": ["title"],
"type": "phrase"
}
}
}
小结:
- match:返回所有匹配的分词。
- match_all:查询全部。
- match_phrase:短语查询,在match的基础上进一步查询词组,可以指定
slop
分词间隔。 - match_phrase_prefix:前缀查询,根据短语中最后一个词组做前缀匹配,可以应用于搜索提示,但注意和
max_expanions
搭配。其实默认是50....... - multi_match:多字段查询,使用相当的灵活,可以完成
match_phrase
和match_phrase_prefix
的工作。
term查询
默认情况下,elasticsearch在对文档分析期间(将文档分词后保存到倒排索引中),会对文档进行分词,比如默认的标准分析器会对文档进行:
- 删除大多数的标点符号。
- 将文档分解为单个词条,我们称为token。
- 将token转为小写。
完事再保存到倒排索引上,当然,原文件还是要保存一分的,而倒排索引使用来查询的。
例如Beautiful girl!
,在经过分析后是这样的了:
POST _analyze
{
"analyzer": "standard",
"text": "Beautiful girl!"
}
# 结果
["beautiful", "girl"]
而当在使用match查询时,elasticsearch同样会对查询关键字进行分析:
PUT w10
{
"mappings": {
"doc":{
"properties":{
"t1":{
"type": "text"
}
}
}
}
}
PUT w10/doc/1
{
"t1": "Beautiful girl!"
}
PUT w10/doc/2
{
"t1": "sexy girl!"
}
GET w10/doc/_search
{
"query": {
"match": {
"t1": "Beautiful girl!"
}
}
}
也就是对查询关键字Beautiful girl!
进行分析,得到["beautiful", "girl"]
,然后分别将这两个单独的token去索引w10
中进行查询,结果就是将两篇文档都返回。
这在有些情况下是非常好用的,但是,如果我们想查询确切的词怎么办?也就是精确查询,将Beautiful girl!
当成一个token而不是分词后的两个token。
这就要用到了term查询了,term查询的是没有经过分析的查询关键字。
但是,这同样需要限制,如果你要查询的字段类型(如上例中的字段t1
类型是text
)是text
(因为elasticsearch会对文档进行分析,上面说过),那么你得到的可能是不尽如人意的结果或者压根没有结果:
/doc/_search
{
"query": {
"term": {
"t1": "Beautiful girl!"
}
}
}
如上面的查询,将不会有结果返回,因为索引w10
中的两篇文档在经过elasticsearch分析后没有一个分词是Beautiful girl!
,那此次查询结果为空也就好理解了。
所以,我们这里得到一个论证结果:不要使用term对类型是text的字段进行查询,要查询text类型的字段,请改用match查询。
学会了吗?那再来一个示例,你说一下结果是什么:
/doc/_search
{
"query": {
"term": {
"t1": "Beautiful"
}
}
}
答案是,没有结果返回!因为elasticsearch在对文档进行分析时,会经过小写!人家倒排索引上存的是小写的beautiful
,而我们查询的是大写的Beautiful
。
所以,要想有结果你这样:
/doc/_search
{
"query": {
"term": {
"t1": "beautiful"
}
}
}
那,term查询可以查询哪些类型的字段呢,例如elasticsearch会将keyword类型的字段当成一个token保存到倒排索引上,你可以将term和keyword结合使用。
最后,要想使用term查询多个精确的值怎么办?我只能说:亲,这里推荐卸载es呢!低调又不失尴尬的玩笑!
这里推荐使用terms
查询:
/doc/_search
{
"query": {
"terms": {
"t1": ["beautiful", "sexy"]
}
}
}
欢迎斧正,that's all see also: [官网:如何在Elasticsearch中使用模糊搜索](https://www.elastic.co/blog/found-fuzzy-search) | [elasticsearch模糊匹配max_expansions&min_similarity](https://stackoverflow.com/questions/7148615/elasticsearch-fuzzy-matching-max-expansions-min-similarity) | [elasticsearch 全文搜索 match_phrase_prefix 查询中的 max_expansions 该怎么用?](https://segmentfault.com/q/1010000017179306/a-1020000017196690)| [term query](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html#query-dsl-term-query)
#
我们之前学过几种查询方式了,但是结果顺序都是elasticsearch决定的。我们来给查询结果搞上我们定制的顺序。
#
1
{
"name":"顾老二",
"age":30,
"from": "gu",
"desc": "皮肤黑、武器长、性格直",
"tags": ["黑", "长", "直"]
}
PUT zhifou/doc/2
{
"name":"大娘子",
"age":18,
"from":"sheng",
"desc":"肤白貌美,娇憨可爱",
"tags":["白", "富","美"]
}
PUT zhifou/doc/3
{
"name":"龙套偏房",
"age":22,
"from":"gu",
"desc":"mmp,没怎么看,不知道怎么形容",
"tags":["造数据", "真","难"]
}
PUT zhifou/doc/4
{
"name":"石头",
"age":29,
"from":"gu",
"desc":"粗中有细,狐假虎威",
"tags":["粗", "大","猛"]
}
PUT zhifou/doc/5
{
"name":"魏行首",
"age":25,
"from":"广云台",
"desc":"仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!",
"tags":["闭月","羞花"]
}
#
#
想到排序,出现在脑海中的无非就是升(正)序和降(倒)序。比如我们查询顾府都有哪些人,并根据age字段按照降序,并且,我只想看nmae
和age
字段:
/doc/_search
{
"query": {
"match": {
"from": "gu"
}
},
"sort": [
{
"age": {
"order": "desc"
}
}
]
}
上例,在条件查询的基础上,我们又通过sort
来做排序,根据age
字段排序,是降序呢还是升序,由order
字段控制,desc
是降序。
结果如下:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : null,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_score" : null,
"_source" : {
"name" : "顾老二",
"age" : 30,
"from" : "gu",
"desc" : "皮肤黑、武器长、性格直",
"tags" : [
"黑",
"长",
"直"
]
},
"sort" : [
30
]
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "4",
"_score" : null,
"_source" : {
"name" : "石头",
"age" : 29,
"from" : "gu",
"desc" : "粗中有细,狐假虎威",
"tags" : [
"粗",
"大",
"猛"
]
},
"sort" : [
29
]
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "3",
"_score" : null,
"_source" : {
"name" : "龙套偏房",
"age" : 22,
"from" : "gu",
"desc" : "mmp,没怎么看,不知道怎么形容",
"tags" : [
"造数据",
"真",
"难"
]
},
"sort" : [
22
]
}
]
}
}
上例中,结果是以降序排列方式返回的。
#
那么想要升序怎么搞呢?
/doc/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"age": {
"order": "asc"
}
}
]
}
上例,想要以升序的方式排列,只需要将order
值换为asc
就可以了。
结果如下:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : null,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "2",
"_score" : null,
"_source" : {
"name" : "大娘子",
"age" : 18,
"from" : "sheng",
"desc" : "肤白貌美,娇憨可爱",
"tags" : [
"白",
"富",
"美"
]
},
"sort" : [
18
]
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "3",
"_score" : null,
"_source" : {
"name" : "龙套偏房",
"age" : 22,
"from" : "gu",
"desc" : "mmp,没怎么看,不知道怎么形容",
"tags" : [
"造数据",
"真",
"难"
]
},
"sort" : [
22
]
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "5",
"_score" : null,
"_source" : {
"name" : "魏行首",
"age" : 25,
"from" : "广云台",
"desc" : "仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!",
"tags" : [
"闭月",
"羞花"
]
},
"sort" : [
25
]
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "4",
"_score" : null,
"_source" : {
"name" : "石头",
"age" : 29,
"from" : "gu",
"desc" : "粗中有细,狐假虎威",
"tags" : [
"粗",
"大",
"猛"
]
},
"sort" : [
29
]
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_score" : null,
"_source" : {
"name" : "顾老二",
"age" : 30,
"from" : "gu",
"desc" : "皮肤黑、武器长、性格直",
"tags" : [
"黑",
"长",
"直"
]
},
"sort" : [
30
]
}
]
}
}
上例,可以看到结果是以age
从小到大的顺序返回结果。
#
那么,你可能会问,除了age
,能不能以别的属性作为排序条件啊?来试试:
/chengyuan/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"name": {
"order": "asc"
}
}
]
}
上例,我们以name
属性来排序,来看结果:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "zhifou",
"node": "wrtr435jSgi7_naKq2Y_zQ",
"reason": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
}
],
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
}
},
"status": 400
}
结果跟我们想象的不一样,报错了!
注意:在排序的过程中,只能使用可排序的属性进行排序。那么可以排序的属性有哪些呢?
- 数字
- 日期
其他的都不行!
欢迎斧正,that's all
#
随着数据量的 不断增大,查询结果也展示的越来越长,很多时候,我们仅是查询几条数据,不用全部显示出来。那又该怎么做呢?这里就要用到分页了。
#
1
{
"name":"顾老二",
"age":30,
"from": "gu",
"desc": "皮肤黑、武器长、性格直",
"tags": ["黑", "长", "直"]
}
PUT zhifou/doc/2
{
"name":"大娘子",
"age":18,
"from":"sheng",
"desc":"肤白貌美,娇憨可爱",
"tags":["白", "富","美"]
}
PUT zhifou/doc/3
{
"name":"龙套偏房",
"age":22,
"from":"gu",
"desc":"mmp,没怎么看,不知道怎么形容",
"tags":["造数据", "真","难"]
}
PUT zhifou/doc/4
{
"name":"石头",
"age":29,
"from":"gu",
"desc":"粗中有细,狐假虎威",
"tags":["粗", "大","猛"]
}
PUT zhifou/doc/5
{
"name":"魏行首",
"age":25,
"from":"广云台",
"desc":"仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!",
"tags":["闭月","羞花"]
}
#
我们来看看elasticsearch是怎么将结果分页的:
/doc/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"age": {
"order": "desc"
}
}
],
"from": 2,
"size": 1
}
上例,首先以age
降序排序,查询所有。并且在查询的时候,添加两个属性from
和size
来控制查询结果集的数据条数。
- from:从哪开始查
- size:返回几条结果
如上例的结果:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : null,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "5",
"_score" : null,
"_source" : {
"name" : "魏行首",
"age" : 25,
"from" : "广云台",
"desc" : "仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!",
"tags" : [
"闭月",
"羞花"
]
},
"sort" : [
25
]
}
]
}
}
上例中,在返回的结果集中,从第2条开始,返回1条数据。
那如果想要从第2条开始,返回2条结果怎么做呢?
/doc/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"age": {
"order": "desc"
}
}
],
"from": 2,
"size": 2
}
上例中,我们指定from
为2,意为从第2条开始返回,返回多少呢?size
意为2条。
还可以这样:
/doc/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"age": {
"order": "desc"
}
}
],
"from": 4,
"size": 2
}
上例中,从第4条开始返回2条数据。
结果如下:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : null,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "2",
"_score" : null,
"_source" : {
"name" : "大娘子",
"age" : 18,
"from" : "sheng",
"desc" : "肤白貌美,娇憨可爱",
"tags" : [
"白",
"富",
"美"
]
},
"sort" : [
18
]
}
]
}
}
上例中仅有一条数据,那是为啥呢?因为我们现在只有5条数据,从第4条开始查询,就只有1条符合条件,所以,就返回了1条数据。
学到这里,我们也可以看到,我们的查询条件越来越多,开始仅是简单查询,慢慢增加条件查询,增加排序,对返回结果进行限制。所以,我们可以说:对于elasticsearch
来说,所有的条件都是可插拔的,彼此之间用,
分割。比如说,我们在查询中,仅对返回结果进行限制:
/doc/_search
{
"query": {
"match_all": {}
},
"from": 4,
"size": 2
}
上例中,在所有的返回结果中,结果从4开始返回2条数据。
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 1.0,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"name" : "龙套偏房",
"age" : 22,
"from" : "gu",
"desc" : "mmp,没怎么看,不知道怎么形容",
"tags" : [
"造数据",
"真",
"难"
]
}
}
]
}
}
但我们只有1条符合条件的数据。
欢迎斧正,that's all
前言
布尔查询是最常用的组合查询,根据子查询的规则,只有当文档满足所有子查询条件时,elasticsearch引擎才将结果返回。布尔查询支持的子查询条件共4中:
- must(and)
- should(or)
- must_not(not)
- filter
下面我们来看看每个子查询条件都是怎么玩的。
准备数据
1
{
"name":"顾老二",
"age":30,
"from": "gu",
"desc": "皮肤黑、武器长、性格直",
"tags": ["黑", "长", "直"]
}
PUT zhifou/doc/2
{
"name":"大娘子",
"age":18,
"from":"sheng",
"desc":"肤白貌美,娇憨可爱",
"tags":["白", "富","美"]
}
PUT zhifou/doc/3
{
"name":"龙套偏房",
"age":22,
"from":"gu",
"desc":"mmp,没怎么看,不知道怎么形容",
"tags":["造数据", "真","难"]
}
PUT zhifou/doc/4
{
"name":"石头",
"age":29,
"from":"gu",
"desc":"粗中有细,狐假虎威",
"tags":["粗", "大","猛"]
}
PUT zhifou/doc/5
{
"name":"魏行首",
"age":25,
"from":"广云台",
"desc":"仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!",
"tags":["闭月","羞花"]
}
must
现在,我们用布尔查询所有from
属性为gu
的数据:
/doc/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"from": "gu"
}
}
]
}
}
}
上例中,我们通过在bool
属性(字段)内使用must
来作为查询条件,那么条件是什么呢?条件同样被match
包围,就是from
为gu
的所有数据。
这里需要注意的是must
字段对应的是个列表,也就是说可以有多个并列的查询条件,一个文档满足各个子条件后才最终返回。
结果如下:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "4",
"_score" : 0.6931472,
"_source" : {
"name" : "石头",
"age" : 29,
"from" : "gu",
"desc" : "粗中有细,狐假虎威",
"tags" : [
"粗",
"大",
"猛"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"name" : "顾老二",
"age" : 30,
"from" : "gu",
"desc" : "皮肤黑、武器长、性格直",
"tags" : [
"黑",
"长",
"直"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "3",
"_score" : 0.2876821,
"_source" : {
"name" : "龙套偏房",
"age" : 22,
"from" : "gu",
"desc" : "mmp,没怎么看,不知道怎么形容",
"tags" : [
"造数据",
"真",
"难"
]
}
}
]
}
}
上例中,可以看到,所有from
属性为gu
的数据查询出来了。
那么,我们想要查询from
为gu
,并且age
为30
的数据怎么搞呢?
/doc/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"from": "gu"
}
},
{
"match": {
"age": 30
}
}
]
}
}
}
上例中,在must
列表中,在增加一个age
为30
的条件。
结果如下:
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.287682,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_score" : 1.287682,
"_source" : {
"name" : "顾老二",
"age" : 30,
"from" : "gu",
"desc" : "皮肤黑、武器长、性格直",
"tags" : [
"黑",
"长",
"直"
]
}
}
]
}
}
上例,符合条件的数据被成功查询出来了。
注意:现在你可能慢慢发现一个现象,所有属性值为列表的,都可以实现多个条件并列存在
should
那么,如果要查询只要是from
为gu
或者tags
为闭月
的数据怎么搞?
/doc/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"from": "gu"
}
},
{
"match": {
"tags": "闭月"
}
}
]
}
}
}
上例中,或关系的不能用must
的了,而是要用should
,只要符合其中一个条件就返回。
结果如下:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "4",
"_score" : 0.6931472,
"_source" : {
"name" : "石头",
"age" : 29,
"from" : "gu",
"desc" : "粗中有细,狐假虎威",
"tags" : [
"粗",
"大",
"猛"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "5",
"_score" : 0.5753642,
"_source" : {
"name" : "魏行首",
"age" : 25,
"from" : "广云台",
"desc" : "仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!",
"tags" : [
"闭月",
"羞花"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"name" : "顾老二",
"age" : 30,
"from" : "gu",
"desc" : "皮肤黑、武器长、性格直",
"tags" : [
"黑",
"长",
"直"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "3",
"_score" : 0.2876821,
"_source" : {
"name" : "龙套偏房",
"age" : 22,
"from" : "gu",
"desc" : "mmp,没怎么看,不知道怎么形容",
"tags" : [
"造数据",
"真",
"难"
]
}
}
]
}
}
返回了所有符合条件的结果。
must_not
那么,如果我想要查询from
既不是gu
并且tags
也不是可爱
,还有age
不是18
的数据怎么办?
/doc/_search
{
"query": {
"bool": {
"must_not": [
{
"match": {
"from": "gu"
}
},
{
"match": {
"tags": "可爱"
}
},
{
"match": {
"age": 18
}
}
]
}
}
}
上例中,must
和should
都不能使用,而是使用must_not
,又在内增加了一个age
为18
的条件。
结果如下:
{
"took" : 9,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"name" : "魏行首",
"age" : 25,
"from" : "广云台",
"desc" : "仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!",
"tags" : [
"闭月",
"羞花"
]
}
}
]
}
}
上例中,只有魏行首这一条数据,因为只有魏行首既不是顾家的人,标签没有可爱那一项,年龄也不等于18!
这里有点需要补充,条件中age
对应的18
你写成整形还是字符串都没啥……
filter
那么,如果要查询from
为gu
,age
大于25
的数据怎么查?
/doc/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"from": "gu"
}
}
],
"filter": {
"range": {
"age": {
"gt": 25
}
}
}
}
}
}
这里就用到了filter
条件过滤查询,过滤条件的范围用range
表示,gt
表示大于,大于多少呢?是25。
结果如下:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "4",
"_score" : 0.6931472,
"_source" : {
"name" : "石头",
"age" : 29,
"from" : "gu",
"desc" : "粗中有细,狐假虎威",
"tags" : [
"粗",
"大",
"猛"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"name" : "顾老二",
"age" : 30,
"from" : "gu",
"desc" : "皮肤黑、武器长、性格直",
"tags" : [
"黑",
"长",
"直"
]
}
}
]
}
}
上例中,age
大于25
的条件都已经筛选出来了。
那么要查询from
是gu
,age
大于等于30
的数据呢?
/doc/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"from": "gu"
}
}
],
"filter": {
"range": {
"age": {
"gte": 30
}
}
}
}
}
}
上例中,大于等于用gte
表示。
结果如下:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"name" : "顾老二",
"age" : 30,
"from" : "gu",
"desc" : "皮肤黑、武器长、性格直",
"tags" : [
"黑",
"长",
"直"
]
}
}
]
}
}
那么,要查询age
小于25
的呢?
/doc/_search
{
"query": {
"bool": {
"filter": {
"range": {
"age": {
"lt": 25
}
}
}
}
}
}
上例中,小于用lt
表示,结果如下:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.0,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "2",
"_score" : 0.0,
"_source" : {
"name" : "大娘子",
"age" : 18,
"from" : "sheng",
"desc" : "肤白貌美,娇憨可爱",
"tags" : [
"白",
"富",
"美"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "3",
"_score" : 0.0,
"_source" : {
"name" : "龙套偏房",
"age" : 22,
"from" : "gu",
"desc" : "mmp,没怎么看,不知道怎么形容",
"tags" : [
"造数据",
"真",
"难"
]
}
}
]
}
}
在查询一个age
小于等于18
的怎么办呢?
/doc/_search
{
"query": {
"bool": {
"filter": {
"range": {
"age": {
"lte": 18
}
}
}
}
}
}
上例中,小于等于用lte
表示。结果如下:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.0,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "2",
"_score" : 0.0,
"_source" : {
"name" : "大娘子",
"age" : 18,
"from" : "sheng",
"desc" : "肤白貌美,娇憨可爱",
"tags" : [
"白",
"富",
"美"
]
}
}
]
}
}
要查询from
是gu
,age
在25~30
之间的怎么查?
/doc/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"from": "gu"
}
}
],
"filter": {
"range": {
"age": {
"gte": 25,
"lte": 30
}
}
}
}
}
}
上例中,使用lte
和gte
来限定范围。结果如下:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "4",
"_score" : 0.6931472,
"_source" : {
"name" : "石头",
"age" : 29,
"from" : "gu",
"desc" : "粗中有细,狐假虎威",
"tags" : [
"粗",
"大",
"猛"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"name" : "顾老二",
"age" : 30,
"from" : "gu",
"desc" : "皮肤黑、武器长、性格直",
"tags" : [
"黑",
"长",
"直"
]
}
}
]
}
}
那么,要查询from
是sheng
,age
小于等于25
的怎么查呢?其实结果,我们可能已经想到了,只有一条,因为只有盛家小六符合结果。
/doc/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"from": "sheng"
}
}
],
"filter": {
"range": {
"age": {
"lte": 25
}
}
}
}
}
}
结果果然不出洒家所料!
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "2",
"_score" : 0.6931472,
"_source" : {
"name" : "大娘子",
"age" : 18,
"from" : "sheng",
"desc" : "肤白貌美,娇憨可爱",
"tags" : [
"白",
"富",
"美"
]
}
}
]
}
}
但是,洒家手一抖,将must
换为should
看看会发生什么?
/doc/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"from": "sheng"
}
}
],
"filter": {
"range": {
"age": {
"lte": 25
}
}
}
}
}
}
结果如下:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "2",
"_score" : 0.6931472,
"_source" : {
"name" : "大娘子",
"age" : 18,
"from" : "sheng",
"desc" : "肤白貌美,娇憨可爱",
"tags" : [
"白",
"富",
"美"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "5",
"_score" : 0.0,
"_source" : {
"name" : "魏行首",
"age" : 25,
"from" : "广云台",
"desc" : "仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!",
"tags" : [
"闭月",
"羞花"
]
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "3",
"_score" : 0.0,
"_source" : {
"name" : "龙套偏房",
"age" : 22,
"from" : "gu",
"desc" : "mmp,没怎么看,不知道怎么形容",
"tags" : [
"造数据",
"真",
"难"
]
}
}
]
}
}
结果有点出乎意料,因为龙套偏房和魏行首不属于盛家,但也被查询出来了。那你要问了,怎么肥四?小老弟!这是因为在查询过程中,优先经过filter
过滤,因为should
是或关系,龙套偏房和魏行首的年龄符合了filter
过滤条件,也就被放行了!所以,如果在filter
过滤条件中使用should
的话,结果可能不会尽如人意!建议使用must
代替。
注意:filter
工作于bool
查询内。比如我们将刚才的查询条件改一下,把filter
从bool
中挪出来。
/doc/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"from": "sheng"
}
}
]
},
"filter": {
"range": {
"age": {
"lte": 25
}
}
}
}
}
如上例所示,我们将filter
与bool
平级,看查询结果:
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "[bool] malformed query, expected [END_OBJECT] but found [FIELD_NAME]",
"line": 12,
"col": 5
}
],
"type": "parsing_exception",
"reason": "[bool] malformed query, expected [END_OBJECT] but found [FIELD_NAME]",
"line": 12,
"col": 5
},
"status": 400
}
结果报错了!所以,filter
工作位置很重要。
小结:
-
must
:与关系,相当于关系型数据库中的and
。 -
should
:或关系,相当于关系型数据库中的or
。 -
must_not
:非关系,相当于关系型数据库中的not
。 -
filter
:过滤条件。 -
range
:条件筛选范围。 -
gt
:大于,相当于关系型数据库中的>
。 -
gte
:大于等于,相当于关系型数据库中的>=
。 -
lt
:小于,相当于关系型数据库中的<
。 -
lte
:小于等于,相当于关系型数据库中的<=
。
#
在未来,一篇文档可能有很多(是的,很多!不要被我们的示例这仨俩字段所迷惑)的字段,每次查询都默认给我们返回全部,在数据量很大的时候,是的,比如我只想查姑娘的手机号,你一并给我个喜好啊、三围什么的算什么?是要告诉洒家,hi,小老弟,要撩妹么?
所以,我们对结果做一些过滤,清清白白的告诉elasticsearch,小老弟,我只是查!水!表!
#
1
{
"name":"顾老二",
"age":30,
"from": "gu",
"desc": "皮肤黑、武器长、性格直",
"tags": ["黑", "长", "直"]
}
#
现在,在所有的结果中,我只需要查看name
和age
两个属性,其他的不要怎么办?
/doc/_search
{
"query": {
"match": {
"name": "顾老二"
}
},
"_source": ["name", "age"]
}
如上例所示,在查询中,通过_source
来控制仅返回name
和age
属性。
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.8630463,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_score" : 0.8630463,
"_source" : {
"name" : "顾老二",
"age" : 30
}
}
]
}
}
在数据量很大的时候,我们需要什么字段,就返回什么字段就好了,提高查询效率。整个三围啥的可low了,有本事上图!
欢迎斧正,that's all
前言
如果返回的结果集中很多符合条件的结果,那怎么能一眼就能看到我们想要的那个结果呢?比如下面网站所示的那样,我们搜索elasticsearch
,在结果集中,将所有elasticsearch
高亮显示?
如上图我们搜索思否一样。
我们该怎么做呢?
准备数据
4
{
"name":"石头",
"age":29,
"from":"gu",
"desc":"粗中有细,狐假虎威",
"tags":["粗", "大","猛"]
}
默认高亮显示
我们来查询:
/doc/_search
{
"query": {
"match": {
"name": "石头"
}
},
"highlight": {
"fields": {
"name": {}
}
}
}
上例中,我们使用highlight
属性来实现结果高亮显示,需要的字段名称添加到fields
内即可,elasticsearch
会自动帮我们实现高亮。
结果如下:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.5098256,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "4",
"_score" : 1.5098256,
"_source" : {
"name" : "石头",
"age" : 29,
"from" : "gu",
"desc" : "粗中有细,狐假虎威",
"tags" : [
"粗",
"大",
"猛"
]
},
"highlight" : {
"name" : [
"<em>石</em><em>头</em>"
]
}
}
]
}
}
上例中,elasticsearch
会自动将检索结果用标签包裹起来,用于在页面中渲染。
自定义高亮显示
但是,你可能会问,我不想用em
标签, 我这么牛逼,应该用个b
标签啊!好的,elasticsearch
同样考虑到你很牛逼,所以,我们可以自定义标签。
/chengyuan/_search
{
"query": {
"match": {
"from": "gu"
}
},
"highlight": {
"pre_tags": "<b class='key' style='color:red'>",
"post_tags": "</b>",
"fields": {
"from": {}
}
}
}
上例中,在highlight
中,pre_tags
用来实现我们的自定义标签的前半部分,在这里,我们也可以为自定义的标签添加属性和样式。post_tags
实现标签的后半部分,组成一个完整的标签。至于标签中的内容,则还是交给fields
来完成。
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.5753642,
"hits" : [
{
"_index" : "zhifou",
"_type" : "chengyuan",
"_id" : "1",
"_score" : 0.5753642,
"_source" : {
"name" : "老二",
"age" : 30,
"sex" : "male",
"birth" : "1070-10-11",
"from" : "gu",
"desc" : "皮肤黑,武器长,性格直",
"tags" : [
"黑",
"长",
"直"
]
},
"highlight" : {
"name" : [
"<b class='key' style='color:red'>老</b><b class='key' style='color:red'>二</b>"
]
}
}
]
}
}
需要注意的是:自定义标签中属性或样式中的逗号一律用英文状态的单引号表示,应该与外部elasticsearch
语法的双引号区分开。
至此,基本的查询,我们已经能胜任绝大数的应用场景。接下来我们来看一下更多关于结果处理的函数。
欢迎斧正,that's all
#
聚合函数大家都不陌生,elasticsearch中也没玩出新花样,所以,这一章相对简单,只需要记得:
- avg
- max
- min
- sum
以及各自的用法即可。先来看求平均。
#
1
{
"name":"顾老二",
"age":30,
"from": "gu",
"desc": "皮肤黑、武器长、性格直",
"tags": ["黑", "长", "直"]
}
PUT zhifou/doc/2
{
"name":"大娘子",
"age":18,
"from":"sheng",
"desc":"肤白貌美,娇憨可爱",
"tags":["白", "富","美"]
}
PUT zhifou/doc/3
{
"name":"龙套偏房",
"age":22,
"from":"gu",
"desc":"mmp,没怎么看,不知道怎么形容",
"tags":["造数据", "真","难"]
}
PUT zhifou/doc/4
{
"name":"石头",
"age":29,
"from":"gu",
"desc":"粗中有细,狐假虎威",
"tags":["粗", "大","猛"]
}
PUT zhifou/doc/5
{
"name":"魏行首",
"age":25,
"from":"广云台",
"desc":"仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!",
"tags":["闭月","羞花"]
}
#
现在的需求是查询from
是gu
的人的平均年龄。
/doc/_search
{
"query": {
"match": {
"from": "gu"
}
},
"aggs": {
"my_avg": {
"avg": {
"field": "age"
}
}
},
"_source": ["name", "age"]
}
上例中,首先匹配查询from
是gu
的数据。在此基础上做查询平均值的操作,这里就用到了聚合函数,其语法被封装在aggs
中,而my_avg
则是为查询结果起个别名,封装了计算出的平均值。那么,要以什么属性作为条件呢?是age
年龄,查年龄的什么呢?是avg
,查平均年龄。
返回结果如下:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "4",
"_score" : 0.6931472,
"_source" : {
"name" : "石头",
"age" : 29
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"name" : "顾老二",
"age" : 30
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "3",
"_score" : 0.2876821,
"_source" : {
"name" : "龙套偏房",
"age" : 22
}
}
]
},
"aggregations" : {
"my_avg" : {
"value" : 27.0
}
}
}
上例中,在查询结果的最后是平均值信息,可以看到是27岁。
虽然我们已经使用_source
对字段做了过滤,但是还不够。我不想看都有哪些数据,只想看平均值怎么办?别忘了size
!
/doc/_search
{
"query": {
"match": {
"from": "gu"
}
},
"aggs": {
"my_avg": {
"avg": {
"field": "age"
}
}
},
"size": 0,
"_source": ["name", "age"]
}
上例中,只需要在原来的查询基础上,增加一个size
就可以了,输出几条结果,我们写上0,就是输出0条查询结果。
查询结果如下:
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"my_avg" : {
"value" : 27.0
}
}
}
查询结果中,我们看hits
下的total
值是3,说明有三条符合结果的数据。最后面返回平均值是27。
#
那怎么查最大值呢?
/doc/_search
{
"query": {
"match": {
"from": "gu"
}
},
"aggs": {
"my_max": {
"max": {
"field": "age"
}
}
},
"size": 0
}
上例中,只需要在查询条件中将avg
替换成max
即可。
返回结果如下:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"my_max" : {
"value" : 30.0
}
}
}
在返回的结果中,可以看到年龄最大的是30岁。
#
那怎么查最小值呢?
/doc/_search
{
"query": {
"match": {
"from": "gu"
}
},
"aggs": {
"my_min": {
"min": {
"field": "age"
}
}
},
"size": 0
}
最小值则用min
表示。
返回结果如下:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"my_min" : {
"value" : 22.0
}
}
}
返回结果中,年龄最小的是22岁。
#
那么,要是想知道它们的年龄总和是多少怎么办呢?
/doc/_search
{
"query": {
"match": {
"from": "gu"
}
},
"aggs": {
"my_sum": {
"sum": {
"field": "age"
}
}
},
"size": 0
}
上例中,求和用sum
表示。
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"my_sum" : {
"value" : 81.0
}
}
}
从返回的结果可以发现,年龄总和是81岁。
#
现在我想要查询所有人的年龄段,并且按照15~20,20~25,25~30
分组,并且算出每组的平均年龄。
分析需求,首先我们应该先把分组做出来。
/doc/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"age_group": {
"range": {
"field": "age",
"ranges": [
{
"from": 15,
"to": 20
},
{
"from": 20,
"to": 25
},
{
"from": 25,
"to": 30
}
]
}
}
}
}
上例中,在aggs
的自定义别名age_group
中,使用range
来做分组,field
是以age
为分组,分组使用ranges
来做,from
和to
是范围,我们根据需求做出三组。
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"age_group" : {
"buckets" : [
{
"key" : "15.0-20.0",
"from" : 15.0,
"to" : 20.0,
"doc_count" : 1
},
{
"key" : "20.0-25.0",
"from" : 20.0,
"to" : 25.0,
"doc_count" : 1
},
{
"key" : "25.0-30.0",
"from" : 25.0,
"to" : 30.0,
"doc_count" : 2
}
]
}
}
}
返回的结果中可以看到,已经拿到了三个分组。doc_count
为该组内有几条数据,此次共分为三组,查询出4条内容。还有一条数据的age
属性值是30
,不在分组的范围内!
那么接下来,我们就要对每个小组内的数据做平均年龄处理。
/doc/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"age_group": {
"range": {
"field": "age",
"ranges": [
{
"from": 15,
"to": 20
},
{
"from": 20,
"to": 25
},
{
"from": 25,
"to": 30
}
]
},
"aggs": {
"my_avg": {
"avg": {
"field": "age"
}
}
}
}
}
}
上例中,在分组下面,我们使用aggs
对age
做平均数处理,这样就可以了。
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"age_group" : {
"buckets" : [
{
"key" : "15.0-20.0",
"from" : 15.0,
"to" : 20.0,
"doc_count" : 1,
"my_avg" : {
"value" : 18.0
}
},
{
"key" : "20.0-25.0",
"from" : 20.0,
"to" : 25.0,
"doc_count" : 1,
"my_avg" : {
"value" : 22.0
}
},
{
"key" : "25.0-30.0",
"from" : 25.0,
"to" : 30.0,
"doc_count" : 2,
"my_avg" : {
"value" : 27.0
}
}
]
}
}
}
在结果中,我们可以清晰的看到每组的平均年龄(my_avg
的value
中)。
注意:聚合函数的使用,一定是先查出结果,然后对结果使用聚合函数做处理
小结:
- avg:求平均
- max:最大值
- min:最小值
- sum:求和
欢迎斧正,that's all