kube-promethues配置钉钉告警
前置:k8s部署kube-promethues
一.配置钉钉机器人
二.编写dingtalk的yaml部署文件
vi dingtalk.yaml
apiVersion: v1kind: Servicemetadata: name: dingtalk namespace: monitoringspec: selector: app: dingtalk ports: - name: http protocol: TCP port: 8060 targetPort: 8060---apiVersion: apps/v1kind: Deploymentmetadata: name: dingtalk namespace: monitoring labels: app: dingtalkspec: replicas: 1 strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate selector: matchLabels: app: dingtalk template: metadata: labels: app: dingtalk spec: restartPolicy: "Always" containers: - name: dingtalk image: timonwong/prometheus-webhook-dingtalk:v2.1.0 imagePullPolicy: "IfNotPresent" volumeMounts: - name: dingtalk-conf mountPath: /etc/prometheus-webhook-dingtalk/ resources: limits: cpu: "400m" memory: "500Mi" requests: cpu: "100m" memory: "100Mi" ports: - containerPort: 8060 name: http protocol: TCP readinessProbe: failureThreshold: 3 periodSeconds: 5 initialDelaySeconds: 30 successThreshold: 1 tcpSocket: port: 8060 livenessProbe: tcpSocket: port: 8060 initialDelaySeconds: 30 periodSeconds: 10 volumes: - name: dingtalk-conf configMap: name: dingtalk-cm
prometheus-webhook-dingtalk是一个开源的钉钉告警的插件,目前最新版停留于v2.1.0
三.编写钉钉告警模板dingtalk-configmap.yaml
vi dingtalk-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: dingtalk-cm
namespace: monitoring
data:
config.yml: |-
templates:
- /etc/prometheus-webhook-dingtalk/dingding.tmpl
targets:
webhook:
url: https://oapi.dingtalk.com/robot/send?access_token=<复制的webhook地址>
secret: "<加签的时候复制的secret>"
message:
text: '{{ template "dingtalk.to.message" . }}'
dingding.tmpl: |-
{{ define "dingtalk.to.message" }}
{{- if gt (len .Alerts.Firing) 0 -}}
{{- range $index, $alert := .Alerts -}}
========= **监控告警** =========
**告警集群:** k8s
**告警类型:** {{ $alert.Labels.alertname }}
**告警级别:** {{ $alert.Labels.severity }}
**告警状态:** {{ .Status }}
**故障主机:** {{ $alert.Labels.instance }} {{ $alert.Labels.device }}
**告警主题:** {{ .Annotations.summary }}
**告警详情:** {{ $alert.Annotations.message }}{{ $alert.Annotations.description}}
**主机标签:** {{ range .Labels.SortedPairs }} </br> [{{ .Name }}: {{ .Value | markdown | html }} ]
{{- end }} </br>
**故障时间:** {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
========= = **end** = =========
{{- end }}
{{- end }}
{{- if gt (len .Alerts.Resolved) 0 -}}
{{- range $index, $alert := .Alerts -}}
========= **故障恢复** =========
**告警集群:** k8s
**告警主题:** {{ $alert.Annotations.summary }}
**告警主机:** {{ .Labels.instance }}
**告警类型:** {{ .Labels.alertname }}
**告警级别:** {{ $alert.Labels.severity }}
**告警状态:** {{ .Status }}
**告警详情:** {{ $alert.Annotations.message }}{{ $alert.Annotations.description}}
**故障时间:** {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
**恢复时间:** {{ ($alert.EndsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
========= = **end** = =========
{{- end }}
{{- end }}
{{- end }}
四.编写文件alertmanager-secret.yaml
该文件是 用来顶替原本kube-promethues部署时的,alertmanager的配置文件
vi alertmanager-secret.yaml
apiVersion: v1
data: { }
kind: Secret
metadata:
name: alertmanager-main
namespace: monitoring
stringData:
alertmanager.yaml: |-
global:
resolve_timeout: 5m
route:
group_by: ['alertname']
group_wait: 30s
group_interval: 5m
repeat_interval: 30m
receiver: 'webhook'
routes:
- match:
severity: 'info'
continue: true
receiver: 'null'
- match:
severity: 'none'
continue: true
receiver: 'null'
receivers:
- name: 'null'
- name: 'webhook'
webhook_configs:
- send_resolved: true
url: 'http://dingtalk:8060/dingtalk/webhook/send'
五.添加Prometheus告警规则
因为kube-promethues默认的告警规则大部分都和K8s的pod相关,所以需要新增一些关于node节点的告警规则
- name: 主机状态-监控告警
rules:
- alert: 节点内存
expr: (1 - (node_memory_MemAvailable_bytes / (node_memory_MemTotal_bytes)))* 100 > 85
for: 1m
labels:
severity: warning
annotations:
summary: "内存使用率过高!"
description: "节点{{$labels.instance}} 内存使用大于85%(目前使用:{{$value}}%)"
- alert: 节点TCP会话
expr: node_netstat_Tcp_CurrEstab > 1000
for: 1m
labels:
severity: warning
annotations:
summary: "TCP_ESTABLISHED过高!"
description: "{{$labels.instance }} TCP_ESTABLISHED大于1000%(目前使用:{{$value}}%)"
- alert: 节点磁盘容量
expr: max((node_filesystem_size_bytes{fstype=~"ext.?|xfs"}-node_filesystem_free_bytes{fstype=~"ext.?|xfs"}) *100/(node_filesystem_avail_bytes {fstype=~"ext.?|xfs"}+(node_filesystem_size_bytes{fstype=~"ext.?|xfs"}-node_filesystem_free_bytes{fstype=~"ext.?|xfs"})))by(instance) > 80
for: 1m
labels:
severity: warning
annotations:
summary: "节点磁盘分区使用率过高!"
description: "{{$labels.instance }} 磁盘分区使用大于80%(目前使用:{{$value}}%)"
- alert: 节点CPU
expr: (100 - (avg by (instance) (irate(node_cpu_seconds_total{job=~".*",mode="idle"}[5m])) * 100)) > 85
for: 1m
labels:
severity: warning
annotations:
summary: "节点CPU使用率过高!"
description: "{{$labels.instance }} CPU使用率大于80%(目前使用:{{$value}}%)"
将此段配置添加到kube-promethues解压目录manifests/prometheus中的prometheus-rules.yaml底部即可。
#然后执行更新prometheus-rules.yaml
kubectl apply -f prometheus-rules.yaml
#重启Promethus看见下图有新增的告警规则即配置成功
六.部署其他并检查是否运行成功
kubectl apply -f alertmanager-secret.yaml
kubectl apply -f dingtalk-configmap.yaml
kubectl apply -f dingtalk.yaml
#查看是否部署成功
kubectl get pods -n monitoring | grep dingtalk
dingtalk部署成功后,重新部署alertmanager就行了。
版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。
文章由极客之家整理,本文链接:https://www.bmabk.com/index.php/post/199983.html