ElasticSearch

小德 2021-12-02 12:57:59

Categories： Tags：

ElasticSearch搜索

1.Doug Cutting

美国工程师,迷上搜索引擎,做了Lucene,java编写,加入全文检索功能。

2.大数据两个问题：存储、计算

Lucene是一套信息检索工具包，jar包，不包含搜索引擎系统

3.ElasticSearch是基于Lucene做了一些封装和增强。

简称es，开源、高扩展得分布式全文检索引擎。

数据分析：elasticsearch+logstash+klbana ELK技术

4.作用：高亮关键词、实时搜索、搜索纠错等

5.elasticsearch和solr的区别

6.ElasticSearch安装

最低要求JDK1.8+

下载官网:https://www.elastic.co/cn/

解压即用

双击bin/bat文件启动

解决：X-PACK IS NOT SUPPORTED AND MACHINE LEARNING IS NOT AVAILABLE FOR [WINDOWS-X86]

字面意思就是X-Pack不受我当前的win系统支持，需要在elasticsearch.yml文件中加入一段代码xpack.ml.enabled: false
elasticsearch.yml在我们\elasticsearch-6.3.0\config目录下，打开后直接在末尾加上这段代码即可
1
xpack.ml.enabled: false

访问http://127.0.0.1:9200

7.安装可视化界面head依赖Node.js

下载地址:https://github.com/mobz/elasticsearch-head/

下载后进入D:\ElasticSearch\elasticsearch-head-master的CMD

敲命令：npm run start

‘grunt’ 不是内部或外部命令，也不是可运行的程序
或批处理文件。

1 2	npm install -g grunt-cli npm install grunt --save-dev

跨域访问：

elasticsearch.yml在我们\elasticsearch-6.3.0\config目录下，打开后直接在末尾加上这段代码即可

1 2	http.cors.enabled:true http.cors.allow-origin:"*"

索引就是数据库

head是数据展示工具

8.安装kibana

管理elasticsearch

官网:https://www.elastic.co/cn/kibana

版本要和es的版本对应

汉化目录 config下kibana.yml，加上i18n.locale: “zh-CN”,重启项目

9.ES核心概念

面向文档，和关系型数据库的对比,一起皆是json

关系型数据库	ElasticSearch
数据库(database)	索引(indices)
表(tables)	types(慢慢被弃用)
行(rows)	documents（文档）
字段(columns)	fields

物理设计：一个人就是集群，后台把每个索引分成多个分片，每个分片可以在集群中不同服务间迁移。

默认集群名称为：elasticsearch

文档

就是我们一条条数据。

面向文档，索引和搜索数据的最小单位是文档，属性：1.自我包含，一篇文档同时包含字段和对应的值，也同时包含key:value。2.可以是层次型的，文档可以包含文档。3.灵活的结构，文档不依赖于预先定义的模式，动态添加字段。

类型

不设置会自动猜,映射类型

字符串类型:text、keyword
数值类型:long、integer、short、byte、double、float、half float、scaled float
日期类型:date
布尔类型:boolean
二进制类型:binary

索引

就是数据库。

倒排索引

Lucene倒排索引作为底层，适用于快速全文索引。为了创建倒排索引，将每个文档拆分为独立的词，然后创建一个包含所有不重复的词条的排序列表，然后列出每个词条出现在哪个文档。看权重匹配。完全过滤掉无关的所有数据，提高效率！

一个es的索引由多个lucene索引构成。

10.IK分词器插件

分词：把一段段中文或者别的划分成一个个关键字。

搜索时把自己信息分词，把数据库或索引库分词，匹配。但是中文是把每个字分词一个词，不符合要求，需要安装中文分词器ik来解决。

ik提供两个分词算法：ik_smart和ik_max_word，其中ik_smart为最少切分，ik_max_word为最细粒度划分。

安装

下载地址：https://github.com/medcl/elasticsearch-analysis-ik/releases

下载解压后放在es目录下的pulgin下,重启es

1 2	D:\ElasticSearch\elasticsearch-7.15.2\bin>elasticsearch-plugin list 查看加载进的插件

操作

1.ik_smart最少切分

GET _analyze
{
  "analyzer": "ik_smart",
  "text": [
    "中国共产党"
  ]
}

{
  "tokens" : [
    {
      "token" : "中国共产党",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 0
    }
  ]
}

2.ik_max_word最细粒度划分

GET _analyze
{
  "analyzer": "ik_max_word",
  "text": [
    "中国共产党"
  ]
}

{
  "tokens" : [
    {
      "token" : "中国共产党",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "中国",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "国共",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "共产党",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "共产",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "党",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "CN_CHAR",
      "position" : 5
    }
  ]
}

自定义词需要自己加到分词器字典中。

ik分词器增加自己的配置

D:\ElasticSearch\elasticsearch-7.15.2\plugins\ik\config

下的IKAnalyzer.cfg.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
	<comment>IK Analyzer 扩展配置</comment>
	<!--用户可以在这里配置自己的扩展字典 -->
	<entry key="ext_dict"></entry>
	 <!--用户可以在这里配置自己的扩展停止词字典-->
	<entry key="ext_stopwords"></entry>
	<!--用户可以在这里配置远程扩展字典 -->
	<!-- <entry key="remote_ext_dict">words_location</entry> -->
	<!--用户可以在这里配置远程扩展停止词字典-->
	<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

自建dic文件,输入需要添加的词语，然后在IKAnalyzer.cfg.xml中添加字典名

1	<entry key="ext_dict">xxx.dic</entry>

重启es。

11.Rest风格

方法	url地址	描述
PUT	localhost:9200/索引名/类型名称/文档id	创建指定文档
POST	localhost:9200/索引名/类型名称	创建随机文档
POST	localhost:9200/索引名/类型名称/文档id/_update	修改文档
DELETE	localhost:9200/索引名/类型名称/文档id	删除文档
GET	localhost:9200/索引名/类型名称/文档id	查询文档通过文档id
POST	localhost:9200/索引名/类型名称/_search	查询所有数据

12.关于索引得基本操作

操作

1.创建一个索引

PUT /索引名/类型名/文档id
{请求体}

PUT /demo01/type1/1
{
  "name":"小德",
  "age": 20
}

2.指定字段类型,创建规则

PUT /demo02
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      },
      "age": {
        "type": "long"
      }
    }
  }
}

3.获得规则,索引具体信息

GET demo01

4.不指定类型，默认字段类型：_doc,一般不设置，或可以设置为__doc

PUT /demo3/_doc/1
{
	"name":"haha",
	"age":23
}

5.查看es默认，具体信息

1	GET _cat/indices?v

6.修改索引,版本号会增加

POST /demo01/type1/1/_update
{
  "doc":{
    	"name":"法外狂徒张三"
  }
}

7.删除索引,根据请求判断删除所有还是文档记录

1	DELETE demo01

13.关于文档得基本操作(重点)

基本操作

1.添加数据

PUT /usersy/user/1
{
  "name":"张三",
  "age":23,
  "desc":"你好世界",
  "tags":["技术宅","直男","普信男"]
}

2.查询数据

1	GET usersy/user/1

精确查询

1	GET usersy/user/_search?q=name:张三

3.更新数据,version加一

PUT /usersy/user/1
{
  "name":"张三",
  "age":23,
  "desc":"你好世界",
  "tags":["技术宅","直男","普信男"]
}

或者：POST _update,推荐

POST /usersy/user/1/_update
{
  "name":"张三",
  "age":23,
  "desc":"你好世界",
  "tags":["技术宅","直男","普信男"]
}

复杂操作

复杂查询(排序、分页、高亮、模糊查询、精准查询)

GET usersy/user/_search
{
  "query":{
    "match": {
      "name":"张"
    }
  }
}

hit：索引和文档得信息，查询得结果总数，查询出来的具体文档,”_score” : 0.9808291权重

结果过滤

GET usersy/user/_search
{
  "query":{
    "match": {
      "name":"张"
    }
  },
  "_source":["name","desc"]
}

排序

GET usersy/user/_search
{
  "query":{
    "match": {
      "name":"张"
    }
  },
  "_source":["name","desc"],
  "sort":{
    "age":{
      "order":"asc"
    }
  }
}

分页

GET usersy/user/_search
{
  "query":{
    "match": {
      "name":"张"
    }
  },
  "_source":["name","desc"],
  "sort":{
    "age":{
      "order":"asc"
    }
  },
  "from":0,   起始下标
  "size":1	  每页多少条
}

数据下标从0开始

布尔值查询(多条件查询)

must(and)所有条件都符合

should(or)其中一个满足即可

must_not(not)不必需

GET usersy/user/_search
{
  "query":{
    "bool": {
      
      "must":[{
        "match": {
          "name": "张"
        }
      },
      {
        "match": {
          "age": "23"
        }
      }]
      
    }
  }
}

过滤

GET usersy/user/_search
{
  "query": {
    "bool": {
      "must": {
        "match": {
          "name": "张"
        }
      },
      "filter": {
        "range": {
          "age": {
            "gte": 5,
            "lte": 20
          }
        }
      }
    }
  }
}

gt:大于、lt:小于、gte:大于等于、lte:小于等于

匹配多个条件

GET usersy/user/_search
{
  "query":{
    "match": {
      "tags":"男"
    }
  }
}
GET usersy/user/_search
{
  "query":{
    "match": {
      "tags":"男 技术"
    }
  }
}
多个条件用空格隔开，满足一个结果可以查出

精确查询

term查询直接通过倒排索引指定得词条进程精确查找

term直接查询精确
match使用分词器解析

analyzer 两个类型 text keyword

keyword:不会被分析

GET usersy/user/_search
{
  "query":{
    "term": {
      "name":"张"
    }
  }
}

多个值匹配得精确查询

用布尔查询

GET usersy/user/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "张三"
          }
        },
        {
          "match": {
            "age": 23
          }
        }
      ]
    }
  }
}

高亮查询

GET usersy/user/_search
{
  "query": {
    "match": {
      "name": "张"
    }
  },
  "highlight":{
    "fields": {
      "name":{}
    }
  }
}

自定义高亮

GET usersy/user/_search
{
  "query": {
    "match": {
      "name": "张"
    }
  },
  "highlight": {
    "pre_tags": "<b>",
    "post_tags": "</b>",
    "fields": {
      "name": {}
    }
  }
}

14.SpringBoot集成ElasticSearch7.x

1.找到原生的依赖Java REST Client-> Java High Level REST Client

https://www.elastic.co/guide/en/elasticsearch/client/index.html

<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-high-level-client</artifactId>
    <version>7.15.2</version>
</dependency>

2.找对象

RestHighLevelClient client = new RestHighLevelClient(
        RestClient.builder(
                new HttpHost("localhost", 9200, "http"),
                new HttpHost("localhost", 9201, "http")));

1	client.close();

3.分析类中方法

配置基本项目

Maven中记得改es版本

1 2	<properties> <elasticsearch.version>7.15.2</elasticsearch.version> </properties>

15.SpringBoot中关于索引的基本操作

1.创建索引

//创建索引请求
CreateIndexRequest req = new CreateIndexRequest("stu");
//执行请求,请求后获得响应
CreateIndexResponse resp = restHighLevelClient.indices().create(req, RequestOptions.DEFAULT);
System.out.println(resp.isAcknowledged());

2.判断索引是否存在

//        //获取索引请求
        GetIndexRequest req = new GetIndexRequest("demo01");
//        //执行请求,请求后获得响应
        boolean exists = restHighLevelClient.indices().exists(req, RequestOptions.DEFAULT);
        System.out.println(exists);

3.删除索引

//删除索引请求
DeleteIndexRequest req = new DeleteIndexRequest("www");
//执行请求,请求后获得响应
AcknowledgedResponse resp = restHighLevelClient.indices().delete(req, RequestOptions.DEFAULT);
System.out.println(resp.isAcknowledged());

16.SpringBoot中关于文档的基本操作

1.添加文档

//创建对象
User user=new User(1,"红火",23);
//创建请求
IndexRequest req=new IndexRequest("usersy");
//规则
req.id("1");
req.timeout(TimeValue.timeValueSeconds(1));
req.timeout("1s");
//将数据放入请求
req.source(JSON.toJSONString(user), XContentType.JSON);
//获取响应信息
IndexResponse resp = restHighLevelClient.index(req, RequestOptions.DEFAULT);
System.out.println(resp.toString());
System.out.println(resp.status());

2.判断文档是否存在

GetRequest req=new GetRequest("usersy","1");
//忽略source
req.fetchSourceContext(new FetchSourceContext(false));
//忽略排序
 req.storedFields("_none_");
boolean exists = restHighLevelClient.exists(req, RequestOptions.DEFAULT);
System.out.println(exists);

3.获取文档信息

GetRequest req=new GetRequest("usersy","1");
GetResponse resp = restHighLevelClient.get(req, RequestOptions.DEFAULT);
System.out.println(resp.getSourceAsString());
System.out.println(resp);

4.更改文档信息

UpdateRequest req = new UpdateRequest("usersy","1");
req.timeout("1s");
User user=new User(1,"雄安",55);
req.doc(JSON.toJSONString(user),XContentType.JSON);
UpdateResponse resp = restHighLevelClient.update(req, RequestOptions.DEFAULT);
System.out.println(resp.status());

5.删除文档信息

DeleteRequest req = new DeleteRequest("usersy","1");
req.timeout("1s");
DeleteResponse resp = restHighLevelClient.delete(req, RequestOptions.DEFAULT);
System.out.println(resp.status());

6.批量添加

BulkRequest breq = new BulkRequest();
breq.timeout("10s");
ArrayList<User> users = new ArrayList<>();
users.add(new User(5, "赵六", 20));
users.add(new User(6, "无物", 19));
users.add(new User(7, "钱七", 16));
for (int i = 0; i < users.size(); i++) {
    breq.add(new IndexRequest("usersy")
            .id("" + i + 1)
            .source(JSON.toJSONString(users.get(i)), XContentType.JSON)
    );
}
BulkResponse resp = restHighLevelClient.bulk(breq, RequestOptions.DEFAULT);
System.out.println(resp.hasFailures());//是否失败

//查询文档

SearchRequest req = new SearchRequest("usersy");
//构建搜索条件
SearchSourceBuilder ssb = new SearchSourceBuilder();
TermQueryBuilder query = QueryBuilders.termQuery("name", "钱七");
ssb.query(query);
ssb.timeout(new TimeValue(60, TimeUnit.SECONDS));
req.source(ssb);
SearchResponse resp = restHighLevelClient.search(req, RequestOptions.DEFAULT);
System.out.println(JSON.toJSONString(resp.getHits()));
for (SearchHit searchHit:resp.getHits().getHits()){
    System.out.println(searchHit.getSourceAsMap());
}

Mybatis