Prometheus是一个开源监控报警系统和时序列数据库,通常会使用Grafana来美化数据展示。

1. 监控系统基础架

Promethues + Grafana + AlertManager使用总结-冯金伟博客园

1.1核心组件

Prometheus Server, 主要用于抓取数据和存储时序数据,另外还提供查询和 Alert Rule 配置管理。
exporters ,数据采样器,例如采集机器数据的node_exporter,采集MongoDB 信息的 MongoDB exporter 等等。
alertmanager ,用于告警通知管理。
Grafana ,监控数据图表化展示模块。

Promethues + Grafana + AlertManager使用总结-冯金伟博客园

2. 基础组件安装

由于是学习研究使用,这里通过docker快速安装环境。

2.1 安装Node Exporter

docker-compose-node-export.yml

version: '3'
services:
  node-exporter:
    image: prom/node-exporter
    container_name: node-exporter
    hostname: node-exporter
    restart: always
    ports:
      - "9100:9100"

2.2 安装Alert Manager

docker-compose-alertmanager.yml

version: '3'
services:
  alertmanager:
    image: prom/alertmanager
    container_name: alertmanager
    hostname: alertmanager
    restart: always
    volumes:
      - /data/docker_file/monitor/conf/alertmanager.yml:/etc/alertmanager/alertmanager.yml
    ports:
      - "9093:9093"

alertmanager.yml

global:
  smtp_smarthost: 'smtp.qq.com:25'  		#QQ服务器
  smtp_from: '793272861@qq.com'        	#发邮件的邮箱
  smtp_auth_username: '793272861@qq.com'  	#发邮件的邮箱用户名,也就是你的邮箱
  smtp_auth_password: '****************'  	#发邮件的邮箱密码
  smtp_require_tls: false        		#不进行tls验证
 
route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 10m
  receiver: live-monitoring

receivers:
- name: 'live-monitoring'
  email_configs:
  - to: '793272861@qq.com'        		#收邮件的邮箱

2.3 安装Prometheus

docker-compose-prometheus.yml

version: '3'
services:
  prometheus:
    image: prom/prometheus
    container_name: prometheus
    hostname: prometheus
    restart: always
    volumes:
      - /data/docker_file/prometheus/data:/prometheus
      - /data/docker_file/prometheus/conf/prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"

prometheus.yml

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).
 
# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets: ['alertmanager:9093']
 
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"
 
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
# 配置定时任务,轮询拉取监控数据
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['prometheus:9090']
  - job_name: 'node-exporter'
    scrape_interval: 5s
    static_configs:
      - targets: ['node-exporter:9100']

Prometheus服务发现机制

通过consul实现自动服务发现

访问:http://localhost:9090/

Promethues + Grafana + AlertManager使用总结-冯金伟博客园

2.4 安装Grafana

docker-compose-grafana.yml

version: '3'
services:
  grafana:
    image: grafana/grafana
    container_name: grafana
    hostname: grafana
    restart: always
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - /data/docker_file/grafana/data:/var/lib/grafana
      - /data/docker_file/grafana/log:/var/log/grafana
    ports:
      - "3000:3000"

添加数据源(Prometheus)

Promethues + Grafana + AlertManager使用总结-冯金伟博客园

访问:http://localhost:30000/ , 默认用户名:admin,密码:admin

Promethues + Grafana + AlertManager使用总结-冯金伟博客园

2.5 Docker-Compose脚本

version: '3'
services:
  prometheus:
    image: prom/prometheus
    container_name: prometheus
    hostname: prometheus
    restart: always
    volumes:
      - /data/docker_file/prometheus/data:/prometheus
      - /data/docker_file/prometheus/conf/prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"
    networks:
      - monitor
  alertmanager:
    image: prom/alertmanager
    container_name: alertmanager
    hostname: alertmanager
    restart: always
    volumes:
      - /data/docker_file/monitor/conf/alertmanager.yml:/etc/alertmanager/alertmanager.yml
    ports:
      - "9093:9093"
    networks:
      - monitor
  grafana:
    image: grafana/grafana
    container_name: grafana
    hostname: grafana
    restart: always
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - /data/docker_file/grafana/data:/var/lib/grafana
      - /data/docker_file/grafana/log:/var/log/grafana
    ports:
      - "3000:3000"
    networks:
      - monitor
  node-exporter:
    image: prom/node-exporter
    container_name: node-exporter
    hostname: node-exporter
    restart: always
    ports:
      - "9100:9100"
    networks:
      - monitor
networks:
  monitor:
    driver: bridge
 

3. 配置Grafana DashBoard

Grafana通过PromQL查询语句从Prometheus拉取数据,并有Pannel进行渲染,一个个Grafana Pannel 组成一个Grafana DashBoard。

3.1下载Grafana DashBoard文件

可以从官网下载已经写好的Grafana DashBoard文件,导入到我们Grafana系统就可以直接使用。

推荐的Grafana DashBoard

JVM (Micrometer)
Spring Boot 2.1 Statistics
主机基础监控(cpu,内存,磁盘,网络)
Node Exporter for Prometheus Dashboard CN
Druid Connection Pool Dashboard

导入Grafana DashBoard

Promethues + Grafana + AlertManager使用总结-冯金伟博客园

3.2 添加修改Grafana Panel(扩展)

官方自带的Spring Boot 2.1 Statistics Dashboard没有展示第三方请求的数据报表,我们以此为例,添加第三方请求的Client Request Count报表和Client Response Time报表。

Client Request Count

irate(http_client_requests_seconds_count{instance="$instance", application="$application", uri!~".*actuator.*"}[5m])

Promethues + Grafana + AlertManager使用总结-冯金伟博客园

注意:应用中的Meter的名称必须为http.client.requests

Client Response Time

irate(http_client_requests_seconds_sum{instance="$instance", application="$application",uri!~".*actuator.*"}[5m]) / irate(http_client_requests_seconds_count{instance="$instance", application="$application",uri!~".*actuator.*"}[5m])

Promethues + Grafana + AlertManager使用总结-冯金伟博客园

4. Spring Boot 集成Micrometer

Metrics(译:指标,度量)

Micrometer提供了与供应商无关的接口,包括 timers(计时器)gauges(量规)counters(计数器)distribution summaries(分布式摘要)long task timers(长任务定时器)。它具有维度数据模型,当与维度监视系统结合使用时,可以高效地访问特定的命名度量,并能够跨维度深入研究。

4.1 引入依赖

<dependency>
 	<groupId>io.micrometer</groupId>
   	<artifactId>micrometer-registry-prometheus</artifactId>
   	<version>${micrometer.version}</version>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

4.2 开启Prometheus功能

spring:
  application:
    name: spring-boot-node

management:
  metrics:
    # 1.添加全局的tags,后面可以作为变量搜索数据
    tags:
      application: ${spring.application.name}
  endpoints:
    web:
      exposure:
      	# 2.打开prometheus端点功能
        include: 'health,prometheus'

4.3 实现第三方请求的监控

基于OkHttpMetricsEventListener可以有好的对OkHttp Client的请求进行监控。

配置OkHttp Client事件监听

@Bean("okHttpClient")
public OkHttpClient okHttpClient(ConnectionPool connectionPool) {
    return new OkHttpClient().newBuilder().connectionPool(connectionPool)
            .connectTimeout(5, TimeUnit.SECONDS)
            .readTimeout(10, TimeUnit.SECONDS)
            .eventListener(eventListener())
            .build();
}

/**
* 事件监听器 OkHttpMetricsEventListener
* metricsProperties.getWeb().getClient().getRequestsMetricName() equals 'http.client.request',可称为度量。
* @return
*/
private EventListener eventListener(){
    return OkHttpMetricsEventListener.builder(
    meterRegistry, metricsProperties.getWeb().getClient().getRequestsMetricName())
    .build();
}

原理:OkHttpMetricsEventListener.java

public class OkHttpMetricsEventListener extends EventListener {

    /**
     * Header name for URI patterns which will be used for tag values.
     */
    public static final String URI_PATTERN = "URI_PATTERN";

    @Override
    public void callFailed(Call call, IOException e) {
        CallState state = callState.remove(call);
        if (state != null) {
            state.exception = e;
            // 请求完成时,注册监控数据
            time(state);
        }
    }

    @Override
    public void responseHeadersEnd(Call call, Response response) {
        CallState state = callState.remove(call);
        if (state != null) {
            state.response = response;
            // 请求完成时,注册监控数据
            time(state);
        }
    }

    private void time(CallState state) {
        String uri = state.response == null ? "UNKNOWN" :
            (state.response.code() == 404 || state.response.code() == 301 ? "NOT_FOUND" : urlMapper.apply(state.request));

        // 定义一些Tag或者是变量,在Prometheus和Grafana中可以使用
        Iterable<Tag> tags = Tags.concat(extraTags, Tags.of(
            "method", state.request != null ? state.request.method() : "UNKNOWN",
            "uri", uri,
            "status", getStatusMessage(state.response, state.exception),
            "host", state.request != null ? state.request.url().host() : "UNKNOWN"
        ));

        // 注册计时器监控数据,此时Prometheus可以通过Spring Boot Actuator提供的/actuator/promotheus断点来pull数据
        Timer.builder(this.requestsMetricName)
            .tags(tags)
            .description("Timer of OkHttp operation")
            .register(registry)
            .record(registry.config().clock().monotonicTime() - state.startTime, TimeUnit.NANOSECONDS);
    }

}

4.4 Spring Boot集成案例

Spring Boot Node

5. 参考文档

【1】Grafana Dashboards

【2】Centos7.X 搭建Prometheus+node-exporter+Grafana实时监控平台

【3】Micrometer 快速入门

【4】JVM应用度量框架Micrometer实战

【5】SpringBoot+Prometheus:微服务开发中自定义业务监控指标的几点经验