Pushgateway-客户端主动推送告警

Pushgateway简介
Pushgateway为Prometheus整体监控方案的功能组件之一，并做为一个独立的工具存在。它主要用于Prometheus无法直接拿到监控指标的场景，如监控源位于防火墙之后，Prometheus无法穿透防火墙；目标服务没有可抓取监控数据的端点等多种情况。在类似场景中，可通过部署Pushgateway的方式解决问题。

当部署该组件后，监控源通过主动发送监控数据到Pushgateway，再由Prometheus定时获取信息，实现资源的状态监控。

参考链接：https://blog.csdn.net/weixin_43539320/article/details/138213665

## prometheus部署

官方文档
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config

下载
官网提供的是二进制版，解压就能用，不需要编译,下面是安装包下载地址
https://prometheus.io/download/
https://github.com/prometheus/prometheus/releases/download/v2.55.0/prometheus-2.55.0.linux-amd64.tar.gz

### 使用二进制文件安装：

创建 Prometheus 工作目录并添加 promethus 用户：

```
useradd -s /usr/sbin/nologin prometheus
mkdir -p /data/prometheus/{bin,conf,data,log,rules}
chown -R prometheus.prometheus /data/prometheus
cd /data/prometheus/

或者

PROM_PATH='/data/prometheus'
mkdir -p ${PROM_PATH}
mkdir -p ${PROM_PATH}/{data,conf,logs,bin}

```

参考链接：https://blog.csdn.net/u010751000/article/details/117915643

下载解压 prometheus
下载二进制包文件
```
wget https://mirrors.tuna.tsinghua.edu.cn/github-release/prometheus/prometheus/2.55.0%20_%202024-10-22/prometheus-2.55.0.linux-amd64.tar.gz
tar xvf prometheus-2.55.0.linux-amd64.tar.gz

```

简单的启动测试

```
cd prometheus-2.55.0.linux-amd64
./prometheus  --config.file="prometheus.yml"
```
默认使用9090端口启动，服务启动后，我们可以使用浏览器访问连接查看web端
http://ip:9090

```
cp prometheus promtool ${PROM_PATH}/bin/
cp prometheus.yml ${PROM_PATH}/conf/
chown -R prometheus.prometheus /data/prometheus

设置环境变量：

cat >> /etc/profile <<EOF
PATH=/data/prometheus/bin:$PATH:$HOME/bin
EOF
```

将 Promethus 配置为系统服务之一，以便使用 systemctl 命令管控服务：
```
cat >>/etc/systemd/system/prometheus.service <<EOF
[Unit]
Description=Prometheus
Documentation=https://prometheus.io/
After=network.target
 
[Service]
Type=simple
User=prometheus
ExecStart=/data/prometheus/bin/prometheus --config.file=/data/prometheus/conf/prometheus.yml --storage.tsdb.path=/data/prometheus/data --storage.tsdb.retention=90d
Restart=on-failure
 
[Install]
WantedBy=multi-user.target
EOF
```

现在使用下面的systemctl命令重新加载systemd系统，并查看服务是否启动：
```
systemctl daemon-reload
systemctl enable prometheus
systemctl start prometheus
systemctl status prometheus
```

参考链接：https://blog.csdn.net/u010751000/article/details/117915643

## pushgateway部署
Pushgateway为Prometheus整体监控方案的功能组件之一，并做于一个独立的工具存在。它主要用于Prometheus无法直接拿到监控指标的场景，如监控源位于防火墙之后，Prometheus无法穿透防火墙；目标服务没有可抓取监控数据的端点等多种情况。
参考：https://blog.csdn.net/heian_99/article/details/133851104
```
cd /data/prometheus
wget  https://github.com/prometheus/pushgateway/releases/download/v1.10.0/pushgateway-1.10.0.linux-amd64.tar.gz

tar xvf pushgateway-1.10.0.linux-amd64.tar.gz

[root@localhost pushgateway-1.10.0.linux-amd64]# pwd
/data/prometheus/pushgateway-1.10.0.linux-amd64

[root@localhost pushgateway-1.10.0.linux-amd64]# ls
LICENSE  NOTICE  pushgateway

cd ../
mv pushgateway-1.10.0.linux-amd64 pushgateway
```

system管理
启动服务，默认端口为9091,可通过–web.listen-address更改监听端口

```
cat >>/etc/systemd/system/pushgateway.service <<EOF
[Unit]
Description=Prometheus pushgateway
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[Service]
Type=simple
User=root
Group=root
ExecStart=/data/prometheus/pushgateway/pushgateway --persistence.file="/data/prometheus/pushgateway/data/" --persistence.interval=5m #保存时间5分钟
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
RestartSec=5s

[Install]
WantedBy=multi-user.target

EOF
```

现在使用下面的systemctl命令重新加载systemd系统，并查看服务是否启动：

```
systemctl daemon-reload

systemctl stop pushgateway
systemctl start pushgateway.service 
systemctl status pushgateway
systemctl enable pushgateway
```

放通防火墙
```
sudo firewall-cmd --zone=public --add-port=9091/tcp --permanent
sudo firewall-cmd --reload
```

prometheus中新增job pushgateway配置

`vim prometheus.yml`
增加

```
  - job_name: 'pushgateway'
    scrape_interval: 30s
    honor_labels: true  #加上此配置exporter节点上传数据中的一些标签将不会被pushgateway节点的相同标签覆盖
    static_configs:
      - targets: ['10.3.1.11:9091']
        labels:
          instance: pushgateway

```

`systemctl restart prometheus`

打开prometheus网页，在Status中的Targets就可以看到pushgateway了
http://10.3.1.11:9090/targets?search=

## 被监控的数据推送到Pushgateway
pushgateway的数据推送支持两种方式，Prometheus Client SDK推送和API推送。

参考：https://github.com/prometheus/pushgateway/blob/master/README.md
prometheus学习笔记之PushGateway  https://www.cnblogs.com/panwenbin-logs/p/18427183
Prometheus的Pushgateway快速部署及使用  https://blog.csdn.net/heian_99/article/details/133851104

### 1.Client SDK推送
Prometheus本身提供了支持多种语言的SDK，可通过SDK的方式，生成相关的数据，并推送到pushgateway，这也是官方推荐的方案。
详情可参见此链接：https://prometheus.io/docs/instrumenting/clientlibs/

### 2.API推送

### 推送自定义数据

推送单条数据

```
要 Push 数据到 PushGateway 中， 可以通过其提供的 API 标准接口来添加， 默认 URL 地址为：
http://<ip>:9091/metrics/job/<JOBNAME>{/<LABEL_NAME>/<LABEL_VALUE>}，
其中<JOBNAME>是必填项， 为 job 标签值， 后边可以跟任意数量的标签对， 一般我们会添加一个 instance/<INSTANCE_NAME>实例名称标签， 来方便区分各个指标。
#推送一个 job 名称为 mytest_job， key 为 mytest_metric 值为 2024
echo "mytest_metric 2024" | curl --data-binary @- http://192.168.100.133:9091/metrics/job/mytest_job
```

查看PushGateway接收到的数据  http://192.168.100.133:9091/
查看PushGateway提供给prometheus的数据  http://192.168.100.133:9091/metrics

推送多条数据

```
cat <<EOF | curl --data-binary @- http://192.168.100.133:9091/metrics/job/test_job/instance/172.31.0.100
#TYPE node_memory_usage gauge
node_memory_usage 4311744512
# TYPE memory_total gauge
node_memory_total 103481868288
EOF
```

简易推送数据脚本样例

```
# cat mem_monitor.sh
#!/bin/bash
total_memory=$(free |awk '/Mem/{print $2}')
used_memory=$(free |awk '/Mem/{print $3}')

job_name="custom_memory_monitor"
instance_name=`ifconfig eth0 | grep -w inet | awk '{print $2}'`
pushgateway_server="http://172.30.7.111:9091/metrics/job"

cat <<EOF | curl --data-binary @- ${pushgateway_server}/${job_name}/instance/${instance_name}  #可以写多个标签，格式为  key/value,如果新增一个zone标签，可以写成为 /instance/${instance_name}/zone/ShangHai ，后面可以一直加
#TYPE custom_memory_total gauge
custom_memory_total $total_memory
#TYPE custom_memory_used gauge
custom_memory_used $used_memory
EOF

```

## node_exporter下载安装
普罗米修斯的主机监控
资源：
https://prometheus.io/download/#node_exporter

https://prometheus.io/docs/instrumenting/exporters/#other-monitoring-systems

参考：  node_exporter开机自启
https://blog.csdn.net/zz960226/article/details/134931397

**下载部署客户端**
```
cd /usr/local/bin
curl -LO  https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-amd64.tar.gz

tar -zxvf node_exporter-1.8.2.linux-amd64.tar.gz -C /opt/node_exporter

mv node_exporter-1.8.2.linux-amd64 node_exporter
```

`vim /etc/systemd/system/node_exporter.service`
```
[Unit]
Description=node_exporter
Documentation=https://github.com/prometheus/node_exporter
After=network.target

[Service]
Type=simple
User=root
ExecStart=/usr/local/bin/node_exporter/node_exporter --web.listen-address=:9100
Restart=on-failure

[Install]
WantedBy=multi-user.target
```

```
systemctl daemon-reload
systemctl start node_exporter
systemctl status node_exporter
systemctl enable node_exporter
```

扩展备注：
prometheus 中node_exporter设置开启自启 prometheus自动发现详解
https://blog.51cto.com/u_16099172/10701662

### Post方式推送Node-expoerter组件数据
安装好node_exporter,此处不多介绍
传送监控数据到pushgateway节点
对于传过去的监控项会添加此处定义的标签 job=test instance=10.2.1.11 hostname=ip-10-2-1-11
```
curl 127.0.0.1:9100/metrics|curl --data-binary @- http://10.3.1.11:9091/metrics/job/test/instance/10.2.1.11/hostname/ip-10-2-1-11
```

编写成脚本
node_date.sh

```
#!/bin/bash
job_name="Bj"
hostname=$(hostname)
HOST_IP=$(hostname --all-ip-addresses | awk '{print $1}')

/usr/bin/curl 127.0.0.1:9100/metrics|/usr/bin/curl --data-binary @- http://sanming.f3322.net:9091/metrics/job/$job_name/instance/$HOST_IP/hostname/$hostname
```

crontab定时任务
#Ansible: node_date
* * * * * /bin/bash /usr/local/node_exporter/node_date.sh

pushgateway本身没有任何抓取监控数据的功能，它只能被动地等待数据被推送过来，故需要用户自行编写数据采集脚本。
例：采集TCP waiting_connection瞬时数量
mkdir -p /app/scripts/pushgateway

```
cat <<EOF >/app/scripts/pushgateway/tcp_waiting_connection.sh
#!/bin/bash

# 获取hostname，且host不能为localhost
instance_name=`hostname -f | cut -d '.' -f 1`
if [ $instance_name = "localhost" ];then
  echo "Must FQDN hostname"
  exit 1
fi

# For waiting connections
label="count_netstat_wait_connetions"
count_netstat_wait_connetions=`netstat -an | grep -i wait | wc -l`
echo "$label:$count_netstat_wait_connetions"
echo "$label $count_netstat_wait_connetions" | curl --data-binary @- http://localhost:9091/metrics/job/pushgateway/instance/$instance_name

EOF
```

chmod +x /app/scripts/pushgateway/tcp_waiting_connection.sh

仅为当前用户添加执行权限
chmod u+x /app/scripts/pushgateway/tcp_waiting_connection.sh

脚本解释：
```
1)netstat -an | grep -i wait | wc -l该自定义监控的取值方法

2)实际上就是将K/V键值对通过POST方式推送给pushgateway，格式如下：

http://localhost:9091/metrics/job/  接收地址url
job/pushgateway        必填项，job的标签值，即exported_job=“pushgateway”（类似prometheus.yml中定义的job）
instance/$instance_name    数据推送过去的第一个label，即exported_instance=“deepin-PC”
```

定时执行脚本

`crontab -e`

```
* * * * * /app/scripts/pushgateway/tcp_waiting_connection.sh >/dev/null 2>&1
```
prometheus默认每15秒从pushgateway获取一次数据，而cron定时任务最小精度是每分钟执行一次，若想没15秒执行一次，则：
方法1：sleep：定义多条定时任务

```
* * * * * /app/scripts/pushgateway/tcp_waiting_connection.sh >/dev/null 2>&1
* * * * * * sleep 15; /app/scripts/pushgateway/tcp_waiting_connection.sh >/dev/null 2>&1
* * * * * * sleep 30; /app/scripts/pushgateway/tcp_waiting_connection.sh >/dev/null 2>&1
* * * * * * sleep 45; /app/scripts/pushgateway/tcp_waiting_connection.sh >/dev/null 2>&1
```

方法2：脚本中增加for循环

```
cat <<EOF >/app/scripts/pushgateway/tcp_waiting_connection.sh
#!/bin/bash
time=15
for (( i=0; i<60; i=i+time )); do
  instance_name=`hostname -f | cut -d '.' -f 1`
  if [ $instance_name = "localhost" ];then
    echo "Must FQDN hostname"
    exit 1
  fi
  label="count_netstat_wait_connetions"
  count_netstat_wait_connetions=`netstat -an | grep -i wait | wc -l`
  echo "$label:$count_netstat_wait_connetions"
  echo "$label $count_netstat_wait_connetions" | curl --data-binary @- http://localhost:9091/metrics/job/pushgateway/instance/$instance_name
  
  sleep $time  
done
exit 0

EOF
```

此时cron定时任务只需要每分钟执行一次。

注：
```
1.若解释器使用#!/bin/bash，则调试时使用全路径或相对路径或者bash /app/scripts/pushgateway/tcp_waiting_connection.sh执行脚本；
2.若解释器使用#!/bin/sh，则调试时使用sh /app/scripts/pushgateway/tcp_waiting_connection.sh执行脚本，
否则出现错误：Syntax error: Bad for loop variable

3.promethues查看监控值count_netstat_wait_connetions

4.TCP等待连接数：count_netstat_wait_connetions（通过自定义脚本实现，通过node_exporter也可实现）
```

vi count_netstat_wait_connections.sh
```
#!/bin/bash
instance_name=`hostname -f | cut -d'.' -f1`  #获取本机名，用于后面的的标签
label="count_netstat_wait_connections"  #定义key名
count_netstat_wait_connections=`netstat -an | grep -i wait | wc -l`  #获取数据的命令
echo "$label: $count_netstat_wait_connections"
echo "$label  $count_netstat_wait_connections" | curl --data-binary @- http://server.com:9091/metrics/job/pushgateway_test/instance/$instance_name  #这里pushgateway_test就是prometheus主配置文件里job的名字，需要保持一致，这样数据就会推送给这个job。后面的instance则是指定机器名，使用的就是脚本里获取的那个变量值
```

## 删除数据
1.web界面
在Pushgateway网页上直接点对应项的  Delete Group

2.命令行
```
curl -X DELETE http://192.168.100.133:9091/metrics/job/mytest_job
curl -X DELETE http://192.168.100.133:9091/metrics/job/test_job/instance/172.31.0.100
```