Flink Monitoring

Sonal KumarSonal Kumar
4 min read

Following article lists the steps to follow for generating flink metrics , scenarios where flink runs in two different setups -

Flink provides built-in support for Prometheus. You need to configure Flink to expose its metrics in a format that Prometheus can scrape.

1. Scenerio where flink running on container
Enable Prometheus Metrics in Flink :

Step 1 : Add following lines in the docker-compose.yaml file of the flink container.
Lines -

metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 9256

Open port 9256 in the docker container. You can use any port.

You can also skip the port line as its optional.
The port on which the Prometheus exporter listens on by defaults is 9249

metrics.reporter.prom.port: 9256 #we can skip this line

Step 2 : Similarly we can add same steps for taskmanager configuration.

Make sure both containers jobmanager, taskmanager are in the same network.

Step 3 : Run the container
# docker-compose up -d

Step 4 : Now we need to add the ports in prometheus.yaml file

Ssh into the Prometheus server.
Go to prometheus.yaml file and add following lines :

- job_name: 'flink-sit'
    scrape_interval: 5s
    scrape_timeout: 5s
    static_configs:
      - targets: ['13.114.144.179:9257','13.114.144.179:9256']

where,
13.114.144.179 is the public ip of the instance where flink container is running
9257 -> port of taskmanager
9256 -> port of jobmanager

Then restart prometheus
# sudo systemctl restart prometheus
# sudo systemctl status prometheus

Step 5 : Add the ports 9257,9256 in inbound rules of flink server security group to allow Prometheus server to get the metrics.

If port is not opened in the security group, we get the following error in Prometheus -
Ex: context deadline exceeded

Step 6 : Check the promethus url
# http://<pub-ip-prometheus-server>:9090
go to /targets

View the metrics by clicking on the endpoint.

Sample docker-compose.yaml file for flink.

# Example docker-compose.yaml file of flink container
version: '3.4'
services:
  summarization-jobmanager:
    restart: always
    image: flink:latest
    container_name: summarization-jobmanager
    ports:
      - "8088:8081"
      - "9256:9256"
    command: jobmanager
    environment:
      - |
        FLINK_PROPERTIES=
        jobmanager.rpc.address: summarization-jobmanager
        metrics.reporters: prom
        metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
        metrics.reporter.prom.port: 9256

  summarization-taskmanager:
    restart: always
    image: flink:latest
    container_name: summarization-taskmanager
    depends_on:
      - summarization-jobmanager
    ports:
      - "9257:9257"   
    command: taskmanager
    environment:
      - |
        FLINK_PROPERTIES=
        jobmanager.rpc.address: summarization-jobmanager
        taskmanager.numberOfTaskSlots: 6
        taskmanager.memory.process.size: 2000m
        metrics.reporters: prom
        metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
        metrics.reporter.prom.port: 9257

networks:
  default:
    external:
      name: flink-network

2.Scenerio where flink runs on Server (EC2 in case of AWS)

To setup monitoring for this, we have followed following steps -

Step 1: Make the PrometheusReporter jar available to the classpath of the Flink cluster (it comes with the Flink distribution):
# cp /opt/flink/opt/flink-metrics-prometheus-1.7.2.jar /opt/flink/lib

Step 2 : Open the flink-conf.yaml file and add following lines for exporting prometheus metrics

Lines :
#Expose metrics on the configured port to Prometheus reporter
metrics.reporters: prom
metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory taskmanager.network.detailed-metrics: true
metrics.system-resource: true
metrics.system-resource-probing-interval: 5000
metrics.reporter.prom.port: 9250-9251

Note :

My flink version is 1.19.1 so in my case earlier I was using following lines -

# Expose metrics on the configured port to Prometheus reporter
metrics.enabled: true
metrics.reporters: prom
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 9259

But below highlighted line
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
is not supported it throws following error in task manager logs,

So I replaced this line by -
metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory

Flink conf file exposes 2 ports for flink metrics , one port for taskmanager related metrics and other port for jobmanager related metrics.
So we can either give big range like -
metrics.reporter.prom.port: 9250-9260
where any 2 ports will be picked by flink to expose metrics from the port range 9250-9260
or
we can also give the range of ports which flink should choose like -
metrics.reporter.prom.port: 9250-9251

Step 3 : Restart Flink
# ./stop-cluster.sh
# ./start-cluster.sh

Step 4 : Configure the prometheus.yaml file

job_name: 'Flink-DS' 
scrape_interval: 5s 
scrape_timeout: 5s 
static_configs: 
   - targets: ['52.33.9.91:9250','52.33.9.91:9251']

Where,
52.33.9.91:9250 -> ip of the server where flink is running
9250, 9251 -> ports where flink metrics are exposed

Step 5 : Restart prometheus
# sudo systemctl restart prometheus
# sudo systemctl status prometheus

Step 6 : Add port 9250,9251 to the security group of the flink server to allow prometheus server to get the metrics.

Step 7 : Check the promethus url
# http://<pub-ip-prometheus-server>:9090
go to /targets

View the metrics by clicking on the endpoint.

Note :
If flink is not able to connect on the port assigned in the conf.yaml file of flink
i.e metrics.reporter.prom.port: 9250/9251
then we get connect: connection refused error.
Ex:

Further we can use these metrics to create dashboard in Grafana.

Hope you find this useful.

0
Subscribe to my newsletter

Read articles from Sonal Kumar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Sonal Kumar
Sonal Kumar