Prometheus是一个开源监控报警系统和时序列数据库,通常会使用Grafana来美化数据展示。
1. 监控系统基础架
1.1核心组件
Prometheus Server, 主要用于抓取数据和存储时序数据,另外还提供查询和 Alert Rule 配置管理。
exporters ,数据采样器,例如采集机器数据的node_exporter,采集MongoDB 信息的 MongoDB exporter 等等。
alertmanager ,用于告警通知管理。
Grafana ,监控数据图表化展示模块。
2. 基础组件安装
由于是学习研究使用,这里通过docker快速安装环境。
2.1 安装Node Exporter
docker-compose-node-export.yml
version: '3'
services:
node-exporter:
image: prom/node-exporter
container_name: node-exporter
hostname: node-exporter
restart: always
ports:
- "9100:9100"
2.2 安装Alert Manager
docker-compose-alertmanager.yml
version: '3'
services:
alertmanager:
image: prom/alertmanager
container_name: alertmanager
hostname: alertmanager
restart: always
volumes:
- /data/docker_file/monitor/conf/alertmanager.yml:/etc/alertmanager/alertmanager.yml
ports:
- "9093:9093"
alertmanager.yml
global:
smtp_smarthost: 'smtp.qq.com:25' #QQ服务器
smtp_from: '793272861@qq.com' #发邮件的邮箱
smtp_auth_username: '793272861@qq.com' #发邮件的邮箱用户名,也就是你的邮箱
smtp_auth_password: '****************' #发邮件的邮箱密码
smtp_require_tls: false #不进行tls验证
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 10m
receiver: live-monitoring
receivers:
- name: 'live-monitoring'
email_configs:
- to: '793272861@qq.com' #收邮件的邮箱
2.3 安装Prometheus
docker-compose-prometheus.yml
version: '3'
services:
prometheus:
image: prom/prometheus
container_name: prometheus
hostname: prometheus
restart: always
volumes:
- /data/docker_file/prometheus/data:/prometheus
- /data/docker_file/prometheus/conf/prometheus.yml:/etc/prometheus/prometheus.yml
ports:
- "9090:9090"
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanager:9093']
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
# 配置定时任务,轮询拉取监控数据
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['prometheus:9090']
- job_name: 'node-exporter'
scrape_interval: 5s
static_configs:
- targets: ['node-exporter:9100']
2.4 安装Grafana
docker-compose-grafana.yml
version: '3'
services:
grafana:
image: grafana/grafana
container_name: grafana
hostname: grafana
restart: always
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
- /data/docker_file/grafana/data:/var/lib/grafana
- /data/docker_file/grafana/log:/var/log/grafana
ports:
- "3000:3000"
添加数据源(Prometheus)
访问:http://localhost:30000/ , 默认用户名:admin,密码:admin
2.5 Docker-Compose脚本
version: '3'
services:
prometheus:
image: prom/prometheus
container_name: prometheus
hostname: prometheus
restart: always
volumes:
- /data/docker_file/prometheus/data:/prometheus
- /data/docker_file/prometheus/conf/prometheus.yml:/etc/prometheus/prometheus.yml
ports:
- "9090:9090"
networks:
- monitor
alertmanager:
image: prom/alertmanager
container_name: alertmanager
hostname: alertmanager
restart: always
volumes:
- /data/docker_file/monitor/conf/alertmanager.yml:/etc/alertmanager/alertmanager.yml
ports:
- "9093:9093"
networks:
- monitor
grafana:
image: grafana/grafana
container_name: grafana
hostname: grafana
restart: always
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
- /data/docker_file/grafana/data:/var/lib/grafana
- /data/docker_file/grafana/log:/var/log/grafana
ports:
- "3000:3000"
networks:
- monitor
node-exporter:
image: prom/node-exporter
container_name: node-exporter
hostname: node-exporter
restart: always
ports:
- "9100:9100"
networks:
- monitor
networks:
monitor:
driver: bridge
3. 配置Grafana DashBoard
Grafana通过PromQL查询语句从Prometheus拉取数据,并有Pannel进行渲染,一个个Grafana Pannel 组成一个Grafana DashBoard。
3.1下载Grafana DashBoard文件
可以从官网下载已经写好的Grafana DashBoard文件,导入到我们Grafana系统就可以直接使用。
推荐的Grafana DashBoard
JVM (Micrometer)
Spring Boot 2.1 Statistics
主机基础监控(cpu,内存,磁盘,网络)
Node Exporter for Prometheus Dashboard CN
Druid Connection Pool Dashboard
导入Grafana DashBoard
3.2 添加修改Grafana Panel(扩展)
官方自带的Spring Boot 2.1 Statistics Dashboard没有展示第三方请求的数据报表,我们以此为例,添加第三方请求的Client Request Count报表和Client Response Time报表。
Client Request Count
irate(http_client_requests_seconds_count{instance="$instance", application="$application", uri!~".*actuator.*"}[5m])
注意:应用中的Meter的名称必须为http.client.requests
Client Response Time
irate(http_client_requests_seconds_sum{instance="$instance", application="$application",uri!~".*actuator.*"}[5m]) / irate(http_client_requests_seconds_count{instance="$instance", application="$application",uri!~".*actuator.*"}[5m])
4. Spring Boot 集成Micrometer
Metrics(译:指标,度量)
Micrometer提供了与供应商无关的接口,包括 timers(计时器), gauges(量规), counters(计数器), distribution summaries(分布式摘要), long task timers(长任务定时器)。它具有维度数据模型,当与维度监视系统结合使用时,可以高效地访问特定的命名度量,并能够跨维度深入研究。
4.1 引入依赖
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
<version>${micrometer.version}</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
4.2 开启Prometheus功能
spring:
application:
name: spring-boot-node
management:
metrics:
# 1.添加全局的tags,后面可以作为变量搜索数据
tags:
application: ${spring.application.name}
endpoints:
web:
exposure:
# 2.打开prometheus端点功能
include: 'health,prometheus'
4.3 实现第三方请求的监控
基于OkHttpMetricsEventListener
可以有好的对OkHttp Client
的请求进行监控。
配置OkHttp Client事件监听
@Bean("okHttpClient")
public OkHttpClient okHttpClient(ConnectionPool connectionPool) {
return new OkHttpClient().newBuilder().connectionPool(connectionPool)
.connectTimeout(5, TimeUnit.SECONDS)
.readTimeout(10, TimeUnit.SECONDS)
.eventListener(eventListener())
.build();
}
/**
* 事件监听器 OkHttpMetricsEventListener
* metricsProperties.getWeb().getClient().getRequestsMetricName() equals 'http.client.request',可称为度量。
* @return
*/
private EventListener eventListener(){
return OkHttpMetricsEventListener.builder(
meterRegistry, metricsProperties.getWeb().getClient().getRequestsMetricName())
.build();
}
原理:OkHttpMetricsEventListener.java
public class OkHttpMetricsEventListener extends EventListener {
/**
* Header name for URI patterns which will be used for tag values.
*/
public static final String URI_PATTERN = "URI_PATTERN";
@Override
public void callFailed(Call call, IOException e) {
CallState state = callState.remove(call);
if (state != null) {
state.exception = e;
// 请求完成时,注册监控数据
time(state);
}
}
@Override
public void responseHeadersEnd(Call call, Response response) {
CallState state = callState.remove(call);
if (state != null) {
state.response = response;
// 请求完成时,注册监控数据
time(state);
}
}
private void time(CallState state) {
String uri = state.response == null ? "UNKNOWN" :
(state.response.code() == 404 || state.response.code() == 301 ? "NOT_FOUND" : urlMapper.apply(state.request));
// 定义一些Tag或者是变量,在Prometheus和Grafana中可以使用
Iterable<Tag> tags = Tags.concat(extraTags, Tags.of(
"method", state.request != null ? state.request.method() : "UNKNOWN",
"uri", uri,
"status", getStatusMessage(state.response, state.exception),
"host", state.request != null ? state.request.url().host() : "UNKNOWN"
));
// 注册计时器监控数据,此时Prometheus可以通过Spring Boot Actuator提供的/actuator/promotheus断点来pull数据
Timer.builder(this.requestsMetricName)
.tags(tags)
.description("Timer of OkHttp operation")
.register(registry)
.record(registry.config().clock().monotonicTime() - state.startTime, TimeUnit.NANOSECONDS);
}
}