Flink
Flink collector can take many metrics from Flink instances, such as Flink server status and network status, and collect the metrics to DataFlux to help you monitor and analyze various abnormal situations of Flink.
Configuration¶
Preconditions¶
Explanation: Example Flink version is: Flink 1.14. 2 (CentOS), each version of the indicator may be different.
At present, Flink officially provides two methods for reporting metrics: Prometheus and PrometheusPushGateway. Their main differences are:
- Prometheus PushGateway is to report all metrics of the cluster to PushGateway in a unified way, so you need to install PushGateway additionally.
- Prometheus mode requires each node of the cluster to expose a unique port, and does not need to install other software, but it requires N available ports, which is slightly complicated to configure.
PrometheusPushGateway Way (recommended)¶
- Download and Install: PushGateway can be downloaded at Prometheuse official page.
Start PushGateway: (This command is for reference only, and the specific command may vary according to the actual environment)
- Configure
flink-conf.yaml
to report metrics uniformly to PushGateway
Configure the configuration file for Flink conf/flink-conf.yaml
sample:
metrics.reporter.promgateway.class: org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter # Fixed this value and cannot be changed
metrics.reporter.promgateway.host: localhost # IP address of promgateway
metrics.reporter.promgateway.port: 9091 # promgateway listening port
metrics.reporter.promgateway.interval: 15 SECONDS # collection interval
metrics.reporter.promgateway.groupingKey: k1=v1;k2=v2
# The following are optional parameters
# metrics.reporter.promgateway.jobName: myJob
# metrics.reporter.promgateway.randomJobNameSuffix: true
# metrics.reporter.promgateway.deleteOnShutdown: false
Start Flink: ./bin/start-cluster.sh
(This command is for reference only, and the specific command may vary depending on the actual environment)
Prometheus Mode¶
- Configure
flink-conf.yaml
to expose metrics for each node. Configure the configuration file for Flinkconf/flink-conf.yaml
sample:
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 9250-9260 # The port range of each node is different according to the number of nodes, and one port corresponds to one node
-
Start Flink:
./bin/start-cluster.sh
(This command is for reference only, and the specific command may vary depending on the actual environment) -
Hosts with access to external networks Install Datakit
- Change the Flink configuration and add the following to turn on Prometheus collection.
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 9250-9260
Note: The
metrics.reporter.prom.port
setting is based on the number of clusteredjobmanagers
andtaskmanager
- Restart the Flink cluster application configuration
- curl http://{Flink iP}:9250-9260 to start collecting
Metric¶
Flink collects multiple metrics by default, and these metrics provide insight into the current state.
flink_jobmanager
¶
- Tags
Tag | Description |
---|---|
host |
Host name. |
- Metrics
flink_taskmanager
¶
- Tags
Tag | Description |
---|---|
host |
Host name. |
tm_id |
Task manager ID. |
- Metrics