prometheus（一、基础使用）

【Prometheus教程】 https://www.bilibili.com/video/BV11A411o7XD/?share_source=copy_web

## 架构
Prometheus 由多个组件组成，但是其中有些组件是可选的：

**Prometheus Server**   //Prometheus 以 Prometheus Server 为核心，用于收集和存储时间序列数据。
Prometheus Server 从监控目标中通过 pull 方式拉取指标数据，或通过 pushgateway 把采集的数据拉取到 Prometheus server 中。

**exporter**：暴露指标让任务来抓  
//通过 HTTP 的服务拉取时间序列数据

**pushgateway**：push 的方式将指标数据推送到该网关  
//类似一个中转站，Prometheus 的 server 端只会使用 pull 方式拉取数据，pushgateway是用来接收 push 而来的数据并暴露给 Prometheus 的 server 拉取的中转站。 可以理解成目标主机可以上报短期任务的数据到 Pushgateway，然后 Prometheus server 统一从 Pushgateway 拉取数据。

**alertmanager**：处理报警的报警组件   
//是一个独立的告警模块，从 Prometheus server 端接收到 alerts 后，会进行去重、分组， 并路由到相应的接收方，发出报警，常见的接收方式有：电子邮件，微信，钉钉等。

**adhoc**：用于数据查询

![](/media/202301/Prometheus_1675079693.png)

### Prometheus 工作流程：
（1）Prometheus Server 采集监控指标数据、按时间序列存储数据(通过 TSDB 存储到本地 HDD/SSD 中)
（2）Prometheus  通过配置报警规则，把触发的报警发送到 Alertmanager。
（3）Alertmanager 通过配置报警接收方，发送报警到邮件，微信或者钉钉等。
（4）Prometheus 自带的 Web UI 界面提供 PromQL 查询语言，可查询监控数据。
（5）Grafana 可接入 Prometheus 数据源，把监控数据以图形化形式展示出。

### prometheus的配置文件prometheus.yml参数详解
```
默认yum文件
[root@localhost prometheus-2.41.0.linux-amd64]# cat prometheus.yml
# my global config
global:  # 此片段指定的是prometheus的全局配置， 比如采集间隔，抓取超时时间等。
  scrape_interval: 15s #  Set the scrape interval to every 15 seconds. Default is every 1 minute.抓取间隔
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.评估规则间隔
  # scrape_timeout is set to the global default (10s).抓取超时时间

# Alertmanager configuration
alerting:  # 此片段指定报警配置， 这里主要是指定prometheus将报警规则推送到指定的alertmanager实例地址。
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:  # 此片段指定报警规则文件， prometheus根据这些规则信息，会推送报警信息到alertmanager中。
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:  # 此片段指定抓取配置，prometheus的数据采集通过此片段配置。
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

# metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.指定采集使用的协议，http或者https。

static_configs:  # 静态指定服务job。
      - targets: ["localhost:9090"]

```
**添加新作业名称 job_name**
编辑prometheus.yml添加需要抓取的目标源信息：
![](/media/202301/2023-01-30_210547.png)
 注：格式需完全相同，否则报错无法启动。例如最后的：`group: 'local'`这个中间必须有空格

参数解释：
```
global    # 此片段指定的是prometheus的全局配置， 比如采集间隔，抓取超时时间等。
 
rule_files   # 此片段指定报警规则文件， prometheus根据这些规则信息，会推送报警信息到alertmanager中。
 
scrape_configs    # 此片段指定抓取配置，prometheus的数据采集通过此片段配置。
 
alerting    # 此片段指定报警配置， 这里主要是指定prometheus将报警规则推送到指定的alertmanager实例地址。
 
remote_write    # 指定后端的存储的写入api地址。
 
remote_read    # 指定后端的存储的读取api地址
 
scrape_interval  # 抓取间隔,默认继承global值。
 
scrape_timeout   # 抓取超时时间,默认继承global值。
 
evaluation_interval # 评估规则间隔
 
external_labels  # 外部一些标签设置
 
metric_path     # 抓取路径， 默认是/metrics
 
scheme # 指定采集使用的协议，http或者https。
 
params # 指定url参数。
 
basic_auth # 指定认证信息。
 
*_sd_configs # 指定服务发现配置
 
static_configs # 静态指定服务job。
 
relabel_config # relabel设置。
```

普罗米修斯（prometheus）监控部署
环境: centos7
## 服务端
参考：
https://www.cnblogs.com/scajy/p/16952109.html
https://zhuanlan.zhihu.com/p/574729059
```
下载
wget https://github.com/prometheus/prometheus/releases/download/v2.41.0/prometheus-2.41.0.linux-amd64.tar.gz

解压
tar -zxvf prometheus-2.41.0.linux-amd64.tar.gz

#放通防火墙
firewall-cmd --zone=public --add-port=9090/tcp --permanent
firewall-cmd --reload

#测试启动prometheus服务
cd prometheus-2.41.0.linux-amd64
./prometheus
```

### 编写prometheus的systemd服务脚本
```
移动prometheus程序
mv prometheus-2.25.2.linux-amd64 prometheus
mv prometheus /opt/monitor/

编写prometheus的systemd服务脚本
vim /usr/lib/systemd/system/prometheus.service
[Unit]
Description=prometheus
[Service]
ExecStart=/opt/monitor/prometheus/prometheus --config.file=/opt/monitor/prometheus/prometheus.yml --web.enable-lifecycle
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure

[Install]
WantedBy=multi-user.target

启动服务
systemctl daemon-reload
systemctl start prometheus
systemctl enable prometheus
```
### 热加载prometheus.yml文件更新配置
上面编写的systemd服务脚本中 `--web.enable-lifecycle` 参数开启配置文件热加载，开启后修改 prometheus.yml 配置文件后不用重启 prometheus 服务，支持两种方式重新加载配置文件：
```
# 第一种：向prometheus进行发信号;通过 kill 命令的 HUP (hang up) 参数实现:
kill -HUP pid

# 第二种，向prometheus发送HTTP请求
# /-/reload 只接收POST请求，并且需要在启动prometheus进程时已经添加 --web.enable-lifecycle字段。
# curl -X POST http://IP/-/reload
curl -X POST http://localhost:9090/-/reload

```

### 动态配置

编辑prometheus.yml参数,基于文件实现动态配置

后期就不用再次重启配置文件了，只需要修改 ./static_conf/file.yaml

promethues 会每隔5s读取文件（refresh_interval: 5s）
```
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

# metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

static_configs:
      - targets: ["localhost:9090"]

- job_name: "prometheus_local"
    static_configs:
      - targets: ["localhost:9100"]
        labels:
          group: 'local'
  - job_name: 'node-exporter'
    scrape_interval: 15s
    file_sd_configs:
      - files:
        - "static_conf/*.yaml"
        refresh_interval: 5s

### 以下为关键信息，及注释:
  - job_name: 'node-exporter'
    scrape_interval: 15s
    file_sd_configs:
      - files:
        - "static_conf/*.yaml"  #动态配置要读取的文件
        refresh_interval: 5s  #动态配置的刷新间隔
### 注释结束

```

```
[root@localhost prometheus]# mkdir static_conf

[root@localhost prometheus]# cat static_conf/file.yaml
- targets: ['127.0.0.1:9100']
- targets: ['127.0.0.2:9100']
  labels:
    group: 'local'
- targets: ['127.0.0.3:9100']

#上面示例中，给第二台机器添加了组标签

当前为prometheus程序位置，目录结构：tree ./
├── prometheus
├── prometheus.yml
└── static_conf  #刚建的目录
    └── file.yaml  #动态配置文件

```

## 客户端
获取下载链接：https://prometheus.io/download/

```
# node_exporter部署 不建议docker部署
mkdir -p /opt/node_exporter
curl -LO https://github.com/prometheus/node_exporter/releases/download/v1.5.0/node_exporter-1.5.0.linux-amd64.tar.gz
tar -zxvf node_exporter-1.5.0.linux-amd64.tar.gz -C /opt/node_exporter

# 切换到解压后的目录node_exporter-1.5.0.linux-amd64 移动解压后的文件到上级目录 node_exporter
cd /opt/node_exporter/node_exporter-1.5.0.linux-amd64
mv * ../

vim /usr/lib/systemd/system/node_exporter.service
# 设置服务
cat /usr/lib/systemd/system/node_exporter.service
[unit]
Description=The node_exporter Server
Wants=network-online.target
After=network.target

[Service]
ExecStart=/opt/node_exporter/node_exporter
Restart=on-failure
RestartSec=15s
SyslogIdentifier=node_exporter

[Install]
WantedBy=multi-user.target

# 启动服务
systemctl daemon-reload
systemctl start node_exporter
systemctl status node_exporter
systemctl enable node_exporter

#放通防火墙
firewall-cmd --zone=public --add-port=9100/tcp --permanent
firewall-cmd --reload

```