match系列操作

1.match查询

(1)使用get请求的方式

URL只能使用英文字母、阿拉伯数字和某些标点符号，不能使用其他文字和符号。(详情解释参考https://www.cnblogs.com/xiaojiulin/p/10598658.html)

因此我将张三字符串先url编码(使用utf8)得到%e5%bc%a0%e4%b8%89，然后结合curl发起get请求

curl -XGET "http://192.168.236.131:9200/trade_info/csrcb/_search?q=cust_name_s.keyword：%E5%BC%A0%E4%B8%89"

为了数据展示的全面一点，方便截屏，我就没有使用pretty格式化查询结果的json

(2)使用post请求的方式也可以得到上面的结果，只能单字段查询，不能多字段组合查询

2.match_phrase

像上面的match运用到中文的时候会有分词的因素干扰，若未装中文分词，比如查询txn_remark_s字段有红包的

若使用match查询查出含有”红””包”二字的结果，将”红色包包购买”以及”拼夕夕购物包邮”的记录也查询了出来

若不使用中文分词，使用match_phrase就可以达到类似于一种分词的效果，不过不是真正的分词，match_pharse里面有个slop的字段的值的概念，如不设置的话，默认为0，这样就不会查出”红色包包购买”以及”拼夕夕购物包邮”的记录

下面看下,match_phrase中的slop的用途区别

若设置slop不为0的话，相当于相当于正则中的红.*?包。slop不设置值的话，这个间隔默认为0

3.match_phrase_prefix（最左前缀查询即以什么开头）

但是前缀查询是一个非常消耗性能的，若对结果集进行限制，可以使用max_expansions参数进行限制，若不指定的话，默认的参数是50，例如查询的语句条件限制是

{
    "query":{
        "match_phrase_prefix":{
            "txn_remark_s":{
                "query":"you are bea",
                "max_expansions":1
            }
        }
    }
}

解释一下，这里并不是查询以you are bea开头的进行匹配，而是和match_phrase查询的工作方式基本相同，将查询字符串中的最后一个单词当做一个前缀。就是类似于you are bea*的查询，当然此时默认的slop也是0。

4.multi_match多字段查询

multi_match是要在多个字段中查询同一个关键字

mulit_match也可以当做match_phrase和match_phrase_prefix使用，只需要指定type类型即可，

{
    "query": {
        "multi_match": {
            "query": "bean",
            ##"type": "phrase_prefix",  ##当设置属性 type:phrase_prefix时 等同于 最左前缀查询
            ##"type": "phrase",         ##当设置属性 type:phrase时 等同于 短语查询
            "fields": [
                "remark1_s",
                "remark2_s"
            ]
        }
    }
}

其中type也可以为cross_field，该查询把query条件拆分成各个分词，然后在各个字段上执行匹配分词，默认情况下，只要有一个字段匹配，那么返回文档。

例如

{
    "multi_match": {
        "query": "beauty bean",
        "type": "cross_fields",
        "fields": ["remark1_s", "remark2_s"],
        "operator": "and"
    }
}

query参数拆分成beauty和bean两个分词，当参数operator为and时，字段remark1_s或remark2_s必须包含beauty，并且fremark1_s或remark2_s必须包含bean。

如果参数operator为or，字段remark1_s或remark2_s必须包含beauty，或者字段remark1_s或remark2_s必须包含bean，其等价的逻辑是，只要字段remark1_s或remark2_s中包含 beauty或bean就返回文档。

排序查询

es的排序很简单与sql的指定哪个字段排序类似，以关系型数据库mysql为例，以交易金额字段txn_amt_d正序，是order by txn_amt_d asc ；倒序展示是order by txn_amt_d desc。而es中的指定规则也是asc和desc

例如匹配cust_name_s字段为张三，按txn_amt_d字段正序排序

curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search" -H "Content-Type:application/json" -d '{
    "query": {
        "match": {
            "cust_name_s": "张三"
        }
    },
    "sort": [{
        "txn_amt_d": {
            "order": "desc"
        }
    }]
} '

可以看到sort对应的字段的值是一个集合，也就是说，是可以多字段排序的，对应order字段的值desc表示倒序，asc表示正序，但是es的排序的字段是不可以进行分词的，所以例如使用txn_date_s字段进行倒序的时候sort指定的就必须是txn_date_s.keyword，当然若字段的属性是数字和日期是可以排序的。

此处实际测试中遇到一个问题，使用id_s字段排序的时候，并未达到我实际的想要的效果(是根据第一个字符进行排序的)，此处的疑问欢迎帮忙指导解答

分页查询

es的分页查询很简单，只需在查询的post报文中添加from字段和size字段的值即可，from表示从第n-1个记录开始（mysql中第一条是0），size表示查询往后的n条记录，这有点类似于mysql中的limit

curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search" -H "Content-Type:application/json" -d '{
    "query": {
        "match": {
            "cust_name_s": "rebort"
        }
    },
    "sort": [{
        "txn_date_s.keyword": {
            "order": "desc"
        }
    }],
    "from": "3",
    "size": "3"
}

bool查询(must和should)

1.must (must字段对应的是个列表，也就是说可以有多个并列的查询条件，一个文档满足各个子条件后才最终返回，类似于sql查询条件的and)

例如查询cust_name_s为rebort且txn_type_s字段为信用卡的文档(查询字段不分词)，类似于sql语句

select * from trade_info.csrcb where cust_name_s = 'rebort' and txn_type_s = '信用卡';

而使用es的查询即：(由于目前只写到match，其实=最好使用term来精确查询，match毕竟是模糊查询，下面讲到term的时候遇到=的时候会使用term)

curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search" -H "Content-Type:application/json" -d '{
    "query": {
        "bool": {
            "must": [
	            {"match": {"cust_name_s.keyword": "rebort"}},
		        {"match": {"txn_type_s.keyword": "信用卡"}}
	        ]
        }
    }
} '

2.should(查询只需满足一个条件即可，类似sql中查询条件的or的效果)

例如查询cust_name_s为rebort或者txn_type_s字段为信用卡的文档(查询字段不分词)，类似于sql语句，但是我只需要展示id_s，cust_name_s，txn_amt_d，txn_date_s字段，将查询结果过滤，提高查询效率

select id_s, cust_name_s, txn_amt_d, txn_date_s from trade_info.csrcb where cust_name_s = 'rebort' or txn_type_s = '信用卡';

而使用es的查询即

curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search" -H "Content-Type:application/json" -d '{
    "query": {
        "bool": {
            "should": [
	            {"match": {"cust_name_s.keyword": "rebort"}},
		        {"match": {"txn_type_s.keyword": "信用卡"}}
	        ]
        }
    },
    "_source": ["id_s", "cust_name_s", "txn_amt_d", "txn_date_s"]
} '

3.must_not(类似于sql中的not… and not …)

例如查询cust_name_s不为rebort并且txn_type_s字段不为信用卡的文档(查询字段不分词)，类似于sql语句

select * from trade_info.csrcb where not cust_name_s = 'rebort' and not txn_type_s = '信用卡'

而使用es查询的语句

curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search" -H "Content-Type:application/json" -d '{
    "query": {
        "bool": {
            "must_not": [
	            {"match": {"cust_name_s.keyword": "rebort"}},
		        {"match": {"txn_type_s.keyword": "信用卡"}}
	        ]
        }
    }
} '

4.filter(条件过滤查询，与must查询相比而言，filter查询不计算评分，查询效率高；有缓存，推荐使用）

例如查询满足条件的txn_amt在(100,2000]的文档

curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search" -H "Content-Type:application/json" -d '{
    "query": {
        "bool": {
	        "filter": {
		        "range": {
		            "txn_amt_d": {
		                "gt":100,
			            "lte":2000
		            }
		        }
	        }
        }
    }
} '

范围使用range，gt表示>；lt表示<；gte表示>=；lte表示<=

可以使用filter与bool进行复杂的多条件查询

例如sql中的查询是这样的



select * from trade_info.csrcb where (txn_amt_d > 100 and txn_amt_d <= 1000) or (cust_name_s = 'rebort' and 'txn_date_s' > '2021-02-21 00:00:00' and txn_date_s < '2021-02-22 00:00:00')

对应的es的查询就是这样的

curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search" -H "Content-Type:application/json" -d '{
    "query": {
        "constant_score": {
            "filter": {
                "bool": {
                    "should": [{
                        "range": {
                            "txn_amt_d": {
                                "gt": 100,
                                "lte": 1000
                            }
                        }
                    },
                    {
                        "bool": {
                            "must": [{
                                "match": {
                                    "cust_name_s.keyword": "rebort"
                                }
                            },
                            {
                                "range": {
                                    "txn_date_s.keyword": {
                                        "gt": "2021-02-21 00:00:00",
                                        "lt": "2021-02-23 00:00:00"
                                    }
                                }
                            }]
                        }
                    }]
                }
            }
        }
    }
} '

es查询的高亮显示

es查询默认的使用的是em标签的，也可以自定义高亮显示效果，ES自定义高亮显示（在highlight中，pre_tags用来实现我们的自定义标签的前半部分，在这里也可以为自定义的标签添加属性和样式。post_tags实现标签的后半部分，组成一个完整的标签。标签中的内容由fields来完成）

curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search" -H "Content-Type:application/json" -d '{
    "query": {
        "term": {
            "cust_name_s.keyword": "john"
        }
    },
    "highlight": {
        "pre_tags": ["<b class='key' style='color:red'>"],
        "post_tags": ["</b>"],
        "fields": {
            "cust_name_s.keyword": {}
        }
    }
} '

精确查询

term查找单值精确查询

单值精确查询使用term，使用精确查询的时候字段不能使用分词，使用match是模糊匹配，需根据业务场景来确定自己的需求

现在查看下term不使用分词和使用分词的查询结果

terms查找多值精确查询

terms查询类似于sql中的in效果，同样也是不能分词

例如sql的查询语句

select * from trade_info.csrcb where txn_type_s in ('信用卡','借记卡','支付宝扫码支付');

对应的es查询语句就应该是

curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search" -H "Content-Type:application/json" -d '{
    "query": {
        "terms": {
            "txn_type_s.keyword": ["信用卡", "借记卡", "支付宝扫码支付"]
        }
    }
} '