PostgreSQL
PostgreSQL 采集器可以从 PostgreSQL 实例中采集实例运行状态指标,并将指标采集到观测云,帮助监控分析 PostgreSQL 各种异常情况。
配置¶
前置条件¶
- PostgreSQL 版本 >= 9.0
-
创建监控帐号
-
开启
pg_stat_statements扩展(可选)PostgreSQL 对象 采集时,部分指标如
qps/tps/avg_query_time等,需要开启pg_stat_statements扩展。具体步骤如下:-
修改配置文件
找到并编辑 PostgreSQL 配置文件(通常位于
/var/lib/pgsql/data/postgresql.conf或/etc/postgresql/<版本>/main/postgresql.conf): -
重启 PostgreSQL 服务
修改好配置文件后,需要重启 PostgreSQL 服务。
-
数据库中创建扩展
连接到目标数据库, 执行以下 SQL:
-
验证扩展是否成功开启
-
采集器配置¶
进入 DataKit 安装目录下的 conf.d/samples 目录,复制 postgresql.conf.sample 并命名为 postgresql.conf。示例如下:
[[inputs.postgresql]]
## Server address
# URI format
# postgres://[datakit[:PASSWORD]]@localhost[/dbname]?sslmode=[disable|verify-ca|verify-full]
# or simple string
# host=localhost user=pqgotest password=... sslmode=... dbname=app_production
address = "postgres://datakit:PASSWORD@localhost/postgres?sslmode=disable"
## Ignore databases which are gathered. Do not use with 'databases' option.
#
# ignored_databases = ["db1"]
## Specify the list of the databases to be gathered. Do not use with the 'ignored_databases' option.
#
# databases = ["db1"]
## Specify the name used as the "server" tag.
#
# outputaddress = "db01"
## Collect interval
# Time unit: "ns", "us", "ms", "s", "m", "h"
#
interval = "10s"
## @param connect_timeout - duration - optional - default: "10s"
## Timeout for connecting to PostgreSQL and executing collector queries.
# connect_timeout = "10s"
## Set true to enable election
#
election = true
## v2+ overrides metric-category measurement names to "postgresql_metric". Default: v2
## Set to "v1" to keep legacy names: postgresql, postgresql_function, postgresql_lock, postgresql_stat,
## postgresql_index, postgresql_size, postgresql_statio, postgresql_replication, postgresql_replication_slot,
## postgresql_slru, postgresql_io, postgresql_bgwriter, postgresql_connection, postgresql_conflict,
## postgresql_archiver, postgresql_dbm_metric, postgresql_dbm_session, postgresql_dbm_connection
measurement_version = "v2"
## Set true to enable collecting function metrics from pg_stat_user_functions.
#
collect_function_metrics = false
## Metric name in metric_exclude_list will not be collected.
#
metric_exclude_list = [""]
## Set dbm to true to collect database activity
dbm = false
## Config dbm metric
[inputs.postgresql.dbm_metric]
enabled = true
interval = "60s"
## Config dbm sample
[inputs.postgresql.dbm_sample]
enabled = true
## @param explain_cache_ttl - duration - optional - default: "10m"
## TTL for explain-rate cache.
explain_cache_ttl = "10m"
## @param plan_cache_ttl - duration - optional - default: "1h"
## Do not re-emit the same execution plan within this window.
plan_cache_ttl = "1h"
## Config dbm activity
[inputs.postgresql.dbm_activity]
enabled = true
interval = "10s"
## Config database discovery.
# Discovered databases will be used to for database specific metric collection,
# such as object collection and function metrics collection.
#
[inputs.postgresql.auto_discovery_database]
# Set true to enable database discovery
enabled = false
# Maximum number of databases to discover
max_databases = 100
# Regex pattern to include databases
include = [".*"]
# Regex pattern to exclude databases. The pattern in "exclude" takes precedence over "include".
exclude = ["model", "msdb", "cloudsqladmin", "rdsadmin"]
#Discovery interval
interval = "600s"
## collect object
[inputs.postgresql.object]
# Set true to enable collecting objects
enabled = true
# interval to collect postgresql object which will be greater than collection interval
interval = "600s"
[inputs.postgresql.object.collect_schemas]
# Set true to enable collecting schemas
enabled = true
# Maximum number of tables to collect
max_tables = 300
# Maximum number of databases to collect
max_database = 100
## Relations config
#
# The list of relations/tables can be specified to track per-relation metrics.
# To collect relation metrics, you need to specify the databases or database_regex field to indicate which databases to collect.
# relation_name refer to the name of a relation, either relation_name or relation_regex must be set.
# relation_regex is a regex rule, only takes effect when relation_name is not set.
# schemas used for filtering, ignore this field when it is empty
# relkind can be a list of the following options:
# r(ordinary table), i(index), S(sequence), t(TOAST table), p(partitioned table),
# m(materialized view), c(composite type), f(foreign table)
# Size metrics are collected only for ordinary tables. Index metrics are collected only for user indexes.
# Lock metrics are collected for all relation types. The rest of the metrics are collected only for user tables.
#
# [[inputs.postgresql.relations]]
# databases = ["postgres"]
# database_regex = "<DATABASE_PATTERN>"
# relation_name = "<TABLE_NAME>"
# relation_regex = "<TABLE_PATTERN>"
# schemas = ["public"]
# relkind = ["r", "p"]
## Run a custom SQL query and collect corresponding metrics.
#
# [[inputs.postgresql.custom_queries]]
# sql = '''
# select datname,numbackends,blks_read
# from pg_stat_database
# limit 10
# '''
# metric = "postgresql_custom_stat"
# tags = ["datname" ]
# fields = ["numbackends", "blks_read"]
# interval = "10s"
## Log collection
#
# [inputs.postgresql.log]
# files = []
# pipeline = "postgresql.p"
## Custom tags
#
[inputs.postgresql.tags]
# some_tag = "some_value"
# more_tag = "some_other_value"
# ...
配置好后,重启 DataKit 即可。
目前可以通过 ConfigMap 方式注入采集器配置来开启采集器。
数据库性能指标采集¶
数据库性能指标主要来源于 PostgreSQL 的内置系统视图和扩展插件,其中最核心的包括 pg_stat_activity 和 pg_stat_statements。这些工具提供了在运行时获取数据库内部执行情况的方法:pg_stat_activity 实时展示当前会话的活动状态、执行的查询及等待事件等信息;pg_stat_statements 则记录历史 SQL 语句的执行统计数据,包括执行次数、耗时、IO 情况等。
通过这些视图和插件,DataKit 能够采集 PostgreSQL DBM 所需的 statement metric、query sample、execution plan、实时会话活动,以及由 activity 快照派生出来的 session/connection 指标。
如需开启,需要执行以下步骤。
- 修改配置文件,开启监控采集
[[inputs.postgresql]]
## Set dbm to true to collect database activity
dbm = true
## Config dbm metric
[inputs.postgresql.dbm_metric]
enabled = true
interval = "60s"
## Config dbm sample
[inputs.postgresql.dbm_sample]
enabled = true
explain_cache_ttl = "10m"
plan_cache_ttl = "1h"
## Config dbm activity
[inputs.postgresql.dbm_activity]
enabled = true
interval = "10s"
...
开启 DBM 后,DataKit 还会自动采集 postgresql_dbm_session 和 postgresql_dbm_connection 两组指标。其中 postgresql_dbm_session 是基于 DBM activity 行聚合得到的,而 postgresql_dbm_connection 是通过单独查询 pg_stat_activity 并分组统计得到的,无需额外配置开关。
- PostgreSQL 配置
修改配置文件(如 postgresql.conf),配置相关参数:
shared_preload_libraries = 'pg_stat_statements'
track_activity_query_size = 4096 # Required for collection of larger queries.
- 权限配置
账号授权
在每个数据库中执行以下 SQL:
在每个数据库中执行以下 SQL:
在每个数据库中执行以下 SQL:
CREATE SCHEMA datakit;
GRANT USAGE ON SCHEMA datakit TO datakit;
GRANT USAGE ON SCHEMA public TO datakit;
GRANT SELECT ON pg_stat_database TO datakit;
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
CREATE OR REPLACE FUNCTION datakit.pg_stat_activity() RETURNS SETOF pg_stat_activity AS
$$ SELECT * FROM pg_catalog.pg_stat_activity; $$
LANGUAGE sql
SECURITY DEFINER;
CREATE OR REPLACE FUNCTION datakit.pg_stat_statements() RETURNS SETOF pg_stat_statements AS
$$ SELECT * FROM pg_stat_statements; $$
LANGUAGE sql
SECURITY DEFINER;
-
额外权限与执行计划 helper 函数
如果开启了
dbm_sample.enabled = true,则需要在每个需要采集执行计划的数据库中创建以下 helper 函数:CREATE OR REPLACE FUNCTION datakit.explain_statement( l_query TEXT, OUT explain JSON ) RETURNS SETOF JSON AS $$ DECLARE curs REFCURSOR; plan JSON; BEGIN OPEN curs FOR EXECUTE pg_catalog.concat('EXPLAIN (FORMAT JSON) ', l_query); FETCH curs INTO plan; CLOSE curs; RETURN QUERY SELECT plan; END; $$ LANGUAGE 'plpgsql' RETURNS NULL ON NULL INPUT SECURITY DEFINER;
指标¶
以下所有数据采集,默认会追加全局选举 tag,也可以在配置中通过 [inputs.postgresql.tags] 指定其它标签:
postgresql_metric¶
Metric set including PostgreSQL database, function, lock, statistics, index, size, relation I/O, replication, replication slot, simple LRU cache, I/O, background writer, connection, conflict, archiver, and DBM (metric/session/connection) statistics, unified in v2
| Tags & Fields | Description |
|---|---|
| application_name ( tag) |
Name of the application connected to this backend |
| backend_type ( tag) |
Type of backend (e.g. background worker, autovacuum worker) |
| client_addr ( tag) |
IP address of the client connected to this backend. |
| context ( tag) |
The context of an I/O operation |
| database_instance ( tag) |
PostgreSQL instance identifier from configured tag database_instance or system_identifier. Common tag. |
| db ( tag) |
The database name |
| function ( tag) |
The function name |
| host ( tag) |
The server host address. Common tag. |
| locktype ( tag) |
The lock type |
| mode ( tag) |
The lock mode |
| name ( tag) |
The name of the SLRU |
| object ( tag) |
Target object of an I/O operation |
| pg_index ( tag) |
The index name |
| query_signature ( tag) |
The hash signature computed from db, rolname, and normalized query text. |
| queryid ( tag) |
The query ID reported by pg_stat_statements. |
| rolname ( tag) |
The role name |
| schema ( tag) |
The schema name |
| server ( tag) |
The address of the server. The value is host:port. Common tag. |
| session_status ( tag) |
Derived session status: active / idle / blocked. |
| slot_name ( tag) |
The replication slot name |
| slot_type ( tag) |
The replication slot type |
| state ( tag) |
Current overall state of this backend |
| table ( tag) |
The table name |
| usename ( tag) |
Name of the user logged into this backend |
| wait_event ( tag) |
Wait event name if backend is currently waiting. |
| wait_event_type ( tag) |
Type of event for which the backend is waiting. |
| wait_group ( tag) |
Datakit unified wait group: Lock, I/O, Concurrency, Network, CPU, Commit/Log, Other. |
| active_time | Time spent executing SQL statements in this database, in milliseconds. Type: float | (count) Unit: count Tagged by: db |
| analyze_count | The number of times this table has been manually analyzed. Type: int | (gauge) Unit: count Tagged by: db, schema, table |
| archived_count | Number of WAL files that have been successfully archived. Type: int | (count) Unit: count |
| archived_failed_count | Number of failed attempts for archiving WAL files. Type: int | (count) Unit: count |
| autoanalyze_count | The number of times this table has been analyzed by the autovacuum daemon.Type: int | (gauge) Unit: count Tagged by: db, schema, table |
| autovacuum_count | The number of times this table has been vacuumed by the autovacuum daemon.Type: int | (gauge) Unit: count Tagged by: db, schema, table |
| avg_total_exec_time | Average execution time per call in current interval. Type: float | (gauge) Unit: timeStamp,msec Tagged by: db, query_signature, queryid, rolname |
| avg_total_plan_time | Average plan time per call in current interval. Type: float | (gauge) Unit: timeStamp,msec Tagged by: db, query_signature, queryid, rolname |
| blk_read_time | Total time the statement spent reading data file blocks, in milliseconds (if track_io_timing is enabled, otherwise zero). Postgres < 17 Type: float | (gauge) Unit: timeStamp,msec Tagged by: db, query_signature, queryid, rolname |
| blk_write_time | Total time the statement spent writing data file blocks, in milliseconds (if track_io_timing is enabled, otherwise zero). Postgres < 17 Type: float | (gauge) Unit: timeStamp,msec Tagged by: db, query_signature, queryid, rolname |
| blks_exists | Number of blocks checked for existence for this SLRU (simple least-recently-used) cache.Type: int | (gauge) Unit: count Tagged by: name |
| blks_hit | Number of block cache hits. Emitted either as database-wide statistics (tagged by db) or SLRU cache statistics (tagged by name).Type: int | (gauge) Unit: count Tagged by: name |
| blks_read | Number of disk blocks read. Emitted either as database-wide statistics (tagged by db) or SLRU cache statistics (tagged by name).Type: int | (gauge) Unit: count Tagged by: name |
| blks_written | Number of disk blocks written for this SLRU (simple least-recently-used) cache.Type: int | (gauge) Unit: count Tagged by: name |
| blks_zeroed | Number of blocks zeroed during initializations of SLRU (simple least-recently-used) cache.Type: int | (gauge) Unit: count Tagged by: name |
| buffers_alloc | The number of buffers allocated Type: int | (count) Unit: count |
| buffers_backend | The number of buffers written directly by a backend. Type: int | (count) Unit: count |
| buffers_backend_fsync | The of times a backend had to execute its own fsync call instead of the background writer. Type: int | (count) Unit: count |
| buffers_checkpoint | The number of buffers written during checkpoints. Type: int | (count) Unit: count |
| buffers_clean | The number of buffers written by the background writer. Type: int | (count) Unit: count |
| calls | Number of times the statement was executed. Type: int | (count) Unit: count Tagged by: db, query_signature, queryid, rolname |
| checkpoint_sync_time | The total amount of checkpoint processing time spent synchronizing files to disk. Type: float | (count) Unit: time,ms |
| checkpoint_write_time | The total amount of checkpoint processing time spent writing files to disk. Type: float | (count) Unit: time,ms |
| checkpoints_req | The number of requested checkpoints that were performed. Type: int | (count) Unit: count |
| checkpoints_timed | The number of scheduled checkpoints that were performed. Type: int | (count) Unit: count |
| confl_bufferpin | Number of queries in this database that have been canceled due to pinned buffers. Type: int | (count) Unit: count Tagged by: db |
| confl_deadlock | Number of queries in this database that have been canceled due to deadlocks. Type: int | (count) Unit: count Tagged by: db |
| confl_lock | Number of queries in this database that have been canceled due to dropped tablespaces. This will occur when a temp_tablespace is dropped while being used on a standby.Type: int | (count) Unit: count Tagged by: db |
| confl_snapshot | Number of queries in this database that have been canceled due to old snapshots. Type: int | (count) Unit: count Tagged by: db |
| confl_tablespace | Number of queries in this database that have been canceled due to dropped tablespaces. This will occur when a temp_tablespace is dropped while being used on a standby.Type: int | (count) Unit: count Tagged by: db |
| connection_count | Number of active user connections grouped by application, state, user and database. Type: int | (gauge) Unit: count Tagged by: application_name, db, state, usename |
| database_size | The disk space used by this database. Type: int | (gauge) Unit: count Tagged by: db |
| dbm_qps | DBM query throughput derived from delta_total_calls divided by the elapsed time between DBM metric collection points. Type: float | (gauge) Unit: count |
| deadlocks | The number of deadlocks detected in this database. Type: int | (gauge) Unit: count Tagged by: db |
| delta_blk_read_time | Delta value of blk_read_time between collection intervals. Type: float | (gauge) Unit: timeStamp,msec Tagged by: db, query_signature, queryid, rolname |
| delta_blk_write_time | Delta value of blk_write_time between collection intervals. Type: float | (gauge) Unit: timeStamp,msec Tagged by: db, query_signature, queryid, rolname |
| delta_calls | Delta value of calls between collection intervals. Type: float | (gauge) Unit: count Tagged by: db, query_signature, queryid, rolname |
| delta_local_blks_dirtied | Delta value of local_blks_dirtied between collection intervals. Type: float | (gauge) Unit: count Tagged by: db, query_signature, queryid, rolname |
| delta_local_blks_hit | Delta value of local_blks_hit between collection intervals. Type: float | (gauge) Unit: count Tagged by: db, query_signature, queryid, rolname |
| delta_local_blks_read | Delta value of local_blks_read between collection intervals. Type: float | (gauge) Unit: count Tagged by: db, query_signature, queryid, rolname |
| delta_local_blks_written | Delta value of local_blks_written between collection intervals. Type: float | (gauge) Unit: count Tagged by: db, query_signature, queryid, rolname |
| delta_rows | Delta value of rows between collection intervals. Type: float | (gauge) Unit: count Tagged by: db, query_signature, queryid, rolname |
| delta_shared_blk_read_time | Delta value of shared_blk_read_time between collection intervals. Type: float | (gauge) Unit: timeStamp,msec Tagged by: db, query_signature, queryid, rolname |
| delta_shared_blk_write_time | Delta value of shared_blk_write_time between collection intervals. Type: float | (gauge) Unit: timeStamp,msec Tagged by: db, query_signature, queryid, rolname |
| delta_shared_blks_dirtied | Delta value of shared_blks_dirtied between collection intervals. Type: float | (gauge) Unit: count Tagged by: db, query_signature, queryid, rolname |
| delta_shared_blks_hit | Delta value of shared_blks_hit between collection intervals. Type: float | (gauge) Unit: count Tagged by: db, query_signature, queryid, rolname |
| delta_shared_blks_read | Delta value of shared_blks_read between collection intervals. Type: float | (gauge) Unit: count Tagged by: db, query_signature, queryid, rolname |
| delta_shared_blks_written | Delta value of shared_blks_written between collection intervals. Type: float | (gauge) Unit: count Tagged by: db, query_signature, queryid, rolname |
| delta_temp_blks_read | Delta value of temp_blks_read between collection intervals. Type: float | (gauge) Unit: count Tagged by: db, query_signature, queryid, rolname |
| delta_temp_blks_written | Delta value of temp_blks_written between collection intervals. Type: float | (gauge) Unit: count Tagged by: db, query_signature, queryid, rolname |
| delta_total_calls | Delta value of total_calls between collection intervals. Type: int | (count) Unit: count |
| delta_total_exec_time | Delta value of execution time between collection intervals. For older PostgreSQL versions this is derived from total_time. Type: float | (gauge) Unit: timeStamp,msec Tagged by: db, query_signature, queryid, rolname |
| delta_total_plan_time | Delta value of total_plan_time between collection intervals. Type: float | (gauge) Unit: timeStamp,msec Tagged by: db, query_signature, queryid, rolname |
| delta_wal_bytes | Delta value of wal_bytes between collection intervals. Type: float | (gauge) Unit: digital,B Tagged by: db, query_signature, queryid, rolname |
| delta_wal_fpi | Delta value of wal_fpi between collection intervals. Type: float | (gauge) Unit: count Tagged by: db, query_signature, queryid, rolname |
| delta_wal_records | Delta value of wal_records between collection intervals. Type: float | (gauge) Unit: count Tagged by: db, query_signature, queryid, rolname |
| evictions | Number of times a block has been written out from a shared or local buffer in order to make it available for another use. Type: int | (gauge) Unit: count Tagged by: backend_type, context, object |
| extend_time | Time spent in extend operations in milliseconds (if track_io_timing is enabled, otherwise zero). Type: float | (gauge) Unit: timeStamp,msec Tagged by: backend_type, context, object |
| extends | Number of relation extend operations, each of the size specified in op_bytes. Type: int | (gauge) Unit: count Tagged by: backend_type, context, object |
| flushes | Number of flush of dirty data for this SLRU (simple least-recently-used) cache.Type: int | (gauge) Unit: count Tagged by: name |
| fsync_time | Time spent in fsync operations in milliseconds (if track_io_timing is enabled, otherwise zero). Type: float | (gauge) Unit: timeStamp,msec Tagged by: backend_type, context, object |
| fsyncs | Number of fsync calls. These are only tracked in context normal. Type: int | (gauge) Unit: count Tagged by: backend_type, context, object |
| function_calls | The number of times this function has been called. Type: int | (count) Unit: count Tagged by: db, function, schema |
| function_self_time | Time spent in this function, not including time spent in called functions, in milliseconds. Type: float | (timeStamp,msec) Unit: count Tagged by: db, function, schema |
| function_total_time | Total time spent in this function, in milliseconds. Type: float | (timeStamp,msec) Unit: count Tagged by: db, function, schema |
| heap_blks_hit | The number of buffer hits in this table. Type: int | (gauge) Unit: count Tagged by: db, schema, table |
| heap_blks_read | The number of disk blocks read from this table. Type: int | (gauge) Unit: count Tagged by: db, schema, table |
| hits | The number of times a desired block was found in a shared buffer. Type: int | (gauge) Unit: count Tagged by: backend_type, context, object |
| idle_in_transaction_time | Time spent idling while in a transaction in this database, in milliseconds. Type: float | (count) Unit: count Tagged by: db |
| idx_blks_hit | The number of buffer hits in all indexes on this table. Type: int | (gauge) Unit: count Tagged by: db, schema, table |
| idx_blks_read | The number of disk blocks read from all indexes on this table. Type: int | (gauge) Unit: count Tagged by: db, schema, table |
| idx_scan | Number of index scans. When pg_index is present the metric comes from per-index statistics; otherwise it comes from table-level statistics.Type: int | (gauge) Unit: count Tagged by: db, pg_index, schema, table |
| idx_tup_fetch | Number of live rows fetched by index scans. When pg_index is present the metric comes from per-index statistics; otherwise it comes from table-level statistics.Type: int | (gauge) Unit: count Tagged by: db, pg_index, schema, table |
| idx_tup_read | The number of index entries returned by scans on this index. Type: int | (gauge) Unit: count Tagged by: db, pg_index, schema, table |
| index_size | The total disk space used by indexes attached to the specified table. Type: int | (gauge) Unit: digital,B Tagged by: db, schema, table |
| local_blks_dirtied | Total number of local blocks dirtied by the statement. Type: int | (count) Unit: count Tagged by: db, query_signature, queryid, rolname |
| local_blks_hit | Total number of local block cache hits by the statement. Type: int | (count) Unit: count Tagged by: db, query_signature, queryid, rolname |
| local_blks_read | Total number of local blocks read by the statement. Type: int | (count) Unit: count Tagged by: db, query_signature, queryid, rolname |
| local_blks_written | Total number of local blocks written by the statement. Type: int | (count) Unit: count Tagged by: db, query_signature, queryid, rolname |
| lock_count | The number of locks active for this database. Type: int | (gauge) Unit: count Tagged by: db, locktype, mode, schema, table |
| max_connections | The maximum number of client connections allowed to this database. Type: float | (gauge) Unit: count |
| max_plan_time | Maximum time spent planning the statement, in milliseconds (if pg_stat_statements.track_planning is enabled, otherwise zero). Type: int | (gauge) Unit: timeStamp,msec Tagged by: db, query_signature, queryid, rolname |
| maxwritten_clean | The number of times the background writer stopped a cleaning scan due to writing too many buffers. Type: int | (count) Unit: count |
| min_plan_time | Minimum time spent planning the statement, in milliseconds (if pg_stat_statements.track_planning is enabled, otherwise zero). Type: float | (gauge) Unit: timeStamp,msec Tagged by: db, query_signature, queryid, rolname |
| n_dead_tup | The estimated number of dead rows. Type: int | (gauge) Unit: count Tagged by: db, schema, table |
| n_live_tup | The estimated number of live rows. Type: int | (gauge) Unit: count Tagged by: db, schema, table |
| n_tup_del | The number of rows deleted by queries in this database. Type: int | (gauge) Unit: count Tagged by: db, schema, table |
| n_tup_hot_upd | The number of rows HOT updated, meaning no separate index update was needed. Type: int | (gauge) Unit: count Tagged by: db, schema, table |
| n_tup_ins | The number of rows inserted by queries in this database. Type: int | (gauge) Unit: count Tagged by: db, schema, table |
| n_tup_upd | The number of rows updated by queries in this database. Type: int | (gauge) Unit: count Tagged by: db, schema, table |
| numbackends | The number of active connections to this database. Type: int | (gauge) Unit: count Tagged by: db |
| percent_usage_connections | The number of connections to this database as a fraction of the maximum number of allowed connections. Type: float | (gauge) Unit: count |
| read_time | Time spent in read operations in milliseconds (if track_io_timing is enabled, otherwise zero). Type: float | (gauge) Unit: timeStamp,msec Tagged by: backend_type, context, object |
| reads | Number of read operations, each of the size specified in op_bytes. Type: int | (gauge) Unit: count Tagged by: backend_type, context, object |
| replication_delay | The current replication delay in seconds. Only available with postgresql 9.1 and newer.Type: int | (gauge) Unit: time,s Tagged by: db |
| replication_delay_bytes | The current replication delay in bytes. Only available with postgresql 9.2 and newer.Type: int | (gauge) Unit: digital,B Tagged by: db |
| rows | Total number of rows retrieved or affected by the statement. Type: int | (count) Unit: count Tagged by: db, query_signature, queryid, rolname |
| seq_scan | The number of sequential scans initiated on this table. Type: int | (gauge) Unit: count Tagged by: db, schema, table |
| seq_tup_read | The number of live rows fetched by sequential scans. Type: int | (gauge) Unit: count Tagged by: db, schema, table |
| session_blocked_count | Number of blocked sessions in this dimension group. Type: int | (gauge) Unit: count Tagged by: application_name, client_addr, db, session_status, usename, wait_event, wait_event_type, wait_group |
| session_group_count | Number of sessions in this dimension group. Type: int | (gauge) Unit: count Tagged by: application_name, client_addr, db, session_status, usename, wait_event, wait_event_type, wait_group |
| session_time | Time spent by database sessions in this database, in milliseconds. Type: float | (count) Unit: count Tagged by: db |
| sessions | Total number of sessions established to this database. Type: int | (count) Unit: count Tagged by: db |
| sessions_abandoned | Number of database sessions to this database that were terminated because connection to the client was lost. Type: int | (count) Unit: count Tagged by: db |
| sessions_fatal | Number of database sessions to this database that were terminated by fatal errors. Type: int | (count) Unit: count Tagged by: db |
| sessions_killed | Number of database sessions to this database that were terminated by operator intervention. Type: int | (count) Unit: count Tagged by: db |
| shared_blk_read_time | Total time the statement spent reading shared blocks, in milliseconds (if track_io_timing is enabled, otherwise zero). Postgres >= 17 Type: float | (gauge) Unit: timeStamp,msec Tagged by: db, query_signature, queryid, rolname |
| shared_blk_write_time | Total time the statement spent writing shared blocks, in milliseconds (if track_io_timing is enabled, otherwise zero). Postgres >= 17 Type: float | (gauge) Unit: timeStamp,msec Tagged by: db, query_signature, queryid, rolname |
| shared_blks_dirtied | Total number of shared blocks dirtied by the statement. Type: int | (count) Unit: count Tagged by: db, query_signature, queryid, rolname |
| shared_blks_hit | Total number of shared block cache hits by the statement. Type: int | (count) Unit: count Tagged by: db, query_signature, queryid, rolname |
| shared_blks_read | Total number of shared blocks read by the statement. Type: int | (count) Unit: count Tagged by: db, query_signature, queryid, rolname |
| shared_blks_written | Total number of shared blocks written by the statement. Type: int | (count) Unit: count Tagged by: db, query_signature, queryid, rolname |
| spill_bytes | Amount of decoded transaction data spilled to disk while performing decoding of changes from WAL for this slot. This and other spill counters can be used to gauge the I/O which occurred during logical decoding and allow tuning logical_decoding_work_mem. Only available with PostgreSQL 14 and newer.Type: int | (gauge) Unit: digital,B Tagged by: db, slot_name, slot_type |
| spill_count | Number of times transactions were spilled to disk while decoding changes from WAL for this slot. This counter is incremented each time a transaction is spilled, and the same transaction may be spilled multiple times. Only available with PostgreSQL 14 and newer. Type: int | (gauge) Unit: count Tagged by: db, slot_name, slot_type |
| spill_txns | Number of transactions spilled to disk once the memory used by logical decoding to decode changes from WAL has exceeded logical_decoding_work_mem. The counter gets incremented for both top-level transactions and subtransactions. Only available with PostgreSQL 14 and newer.Type: int | (gauge) Unit: count Tagged by: db, slot_name, slot_type |
| stream_bytes | Amount of transaction data decoded for streaming in-progress transactions to the decoding output plugin while decoding changes from WAL for this slot. This and other streaming counters for this slot can be used to tune logical_decoding_work_mem. Only available with PostgreSQL 14 and newer.Type: int | (gauge) Unit: digital,B Tagged by: db, slot_name, slot_type |
| stream_count | Number of times in-progress transactions were streamed to the decoding output plugin while decoding changes from WAL for this slot. This counter is incremented each time a transaction is streamed, and the same transaction may be streamed multiple times. Only available with PostgreSQL 14 and newer. Type: int | (gauge) Unit: count Tagged by: db, slot_name, slot_type |
| stream_txns | Number of in-progress transactions streamed to the decoding output plugin after the memory used by logical decoding to decode changes from WAL for this slot has exceeded logical_decoding_work_mem. Streaming only works with top-level transactions (subtransactions can't be streamed independently), so the counter is not incremented for subtransactions. Only available with PostgreSQL 14 and newer.Type: int | (gauge) Unit: count Tagged by: db, slot_name, slot_type |
| table_size | The total disk space used by the specified table with TOAST data. Free space map and visibility map are not included. Type: int | (gauge) Unit: digital,B Tagged by: db, schema, table |
| temp_blks_read | Total number of temp blocks read by the statement. Type: int | (count) Unit: count Tagged by: db, query_signature, queryid, rolname |
| temp_blks_written | Total number of temp blocks written by the statement. Type: int | (count) Unit: count Tagged by: db, query_signature, queryid, rolname |
| temp_bytes | The amount of data written to temporary files by queries in this database. Type: int | (gauge) Unit: count Tagged by: db |
| temp_files | The number of temporary files created by queries in this database. Type: int | (gauge) Unit: count Tagged by: db |
| tidx_blks_hit | The number of buffer hits in this table's TOAST table index. Type: int | (gauge) Unit: count Tagged by: db, schema, table |
| tidx_blks_read | The number of disk blocks read from this table's TOAST table index. Type: int | (gauge) Unit: count Tagged by: db, schema, table |
| toast_blks_hit | The number of buffer hits in this table's TOAST table. Type: int | (gauge) Unit: count Tagged by: db, schema, table |
| toast_blks_read | The number of disk blocks read from this table's TOAST table. Type: int | (gauge) Unit: count Tagged by: db, schema, table |
| total_bytes | Amount of transaction data decoded for sending transactions to the decoding output plugin while decoding changes from WAL for this slot. Note that this includes data that is streamed and/or spilled. Only available with PostgreSQL 14 and newer. Type: int | (gauge) Unit: digital,B Tagged by: db, slot_name, slot_type |
| total_calls | Total number of statement calls aggregated across all pg_stat_statements rows. Type: int | (count) Unit: count |
| total_exec_time | Total time spent executing the statement, in milliseconds. Type: float | (gauge) Unit: timeStamp,msec Tagged by: db, query_signature, queryid, rolname |
| total_plan_time | Total time spent planning the statement, in milliseconds (if pg_stat_statements.track_planning is enabled, otherwise zero). Type: float | (gauge) Unit: timeStamp,msec Tagged by: db, query_signature, queryid, rolname |
| total_size | The total disk space used by the table, including indexes and TOAST data. Type: int | (gauge) Unit: digital,B Tagged by: db, schema, table |
| total_txns | Number of decoded transactions sent to the decoding output plugin for this slot. This counts top-level transactions only, and is not incremented for subtransactions. Note that this includes the transactions that are streamed and/or spilled. Only available with PostgreSQL 14 and newer. Type: int | (gauge) Unit: count Tagged by: db, slot_name, slot_type |
| truncates | Number of truncates for this SLRU (simple least-recently-used) cache.Type: int | (gauge) Unit: count Tagged by: name |
| tup_deleted | The number of rows deleted by queries in this database. Type: int | (gauge) Unit: count Tagged by: db |
| tup_fetched | The number of rows fetched by queries in this database. Type: int | (gauge) Unit: count Tagged by: db |
| tup_inserted | The number of rows inserted by queries in this database. Type: int | (gauge) Unit: count Tagged by: db |
| tup_returned | The number of rows returned by queries in this database. Type: int | (gauge) Unit: count Tagged by: db |
| tup_updated | The number of rows updated by queries in this database. Type: int | (gauge) Unit: count Tagged by: db |
| vacuum_count | The number of times this table has been manually vacuumed. Type: int | (gauge) Unit: count Tagged by: db, schema, table |
| wal_bytes | Total amount of WAL generated by the statement in bytes. Type: int | (gauge) Unit: digital,B Tagged by: db, query_signature, queryid, rolname |
| wal_fpi | Total number of WAL full page images generated by the statement. Type: int | (count) Unit: count Tagged by: db, query_signature, queryid, rolname |
| wal_records | Total number of WAL records generated by the statement. Type: int | (count) Unit: count Tagged by: db, query_signature, queryid, rolname |
| wraparound | The number of transactions that can occur until a transaction wraparound. Type: float | (gauge) Unit: count Tagged by: db |
| write_time | Time spent in write operations in milliseconds (if track_io_timing is enabled, otherwise zero). Type: float | (gauge) Unit: timeStamp,msec Tagged by: backend_type, context, object |
| writes | Number of write operations, each of the size specified in op_bytes. Type: int | (gauge) Unit: count Tagged by: backend_type, context, object |
| xact_commit | The number of transactions that have been committed in this database. Type: int | (gauge) Unit: count Tagged by: db |
| xact_rollback | The number of transactions that have been rolled back in this database. Type: int | (gauge) Unit: count Tagged by: db |
collector¶
| Tags & Fields | Description |
|---|---|
| instance ( tag) |
Server addr of the instance |
| job ( tag) |
Server name of the instance |
| up | Type: int | (gauge) Unit: - |
对象¶
db_query¶
PostgreSQL DBM query object. Each object represents a unique normalized SQL statement identified by query_signature, which is derived from db, rolname, and query text.
| Tags & Fields | Description |
|---|---|
| database_instance ( tag) |
PostgreSQL instance identifier from configured tag database_instance or system_identifier. |
| database_type ( tag) |
The type of database. The value is PostgreSQL |
| db ( tag) |
The database name |
| name ( tag) |
Object identity built from server, optional database_instance, and query_signature. |
| query_signature ( tag) |
The hash signature computed from db, rolname, and normalized query text. |
| queryid ( tag) |
The query ID reported by pg_stat_statements, if available. |
| rolname ( tag) |
The role name |
| server ( tag) |
The PostgreSQL server address |
| message | The normalized/obfuscated SQL text. Type: string Unit: - |
db_exec_plan¶
PostgreSQL database statement sample information, including statement text, execution plans, used for detailed analysis of SQL execution and performance issues.( Version-1.84.0)
| Tags & Fields | Description |
|---|---|
| application_name ( tag) |
Name of the application that is connected to this backend. |
| client_addr ( tag) |
IP address of the client that is connected to this backend. |
| client_hostname ( tag) |
Host name of the connected client, as reported by a reverse DNS lookup of client_addr. This field will only be non-null for IP connections, and only when log_hostname is enabled. |
| client_port ( tag) |
TCP port number that the client is using for communication, or -1 if a Unix socket is used. |
| database_instance ( tag) |
PostgreSQL instance identifier from configured tag database_instance or system_identifier. |
| database_type ( tag) |
The database type. The value is PostgreSQL |
| db ( tag) |
Name of the database this backend is connected to. |
| name ( tag) |
Object identity built from server, optional database_instance, and query_signature:plan_signature. |
| plan_signature ( tag) |
The hash signature computed from the normalized execution plan. |
| plan_type ( tag) |
The format of the plan content. The value is JSON. |
| query_signature ( tag) |
The hash value computed from query |
| server ( tag) |
The server address |
| service ( tag) |
The service name postgresql |
| usename ( tag) |
Name of the user logged into this backend. |
| message | The obfuscated/normalized JSON execution plan definition. Type: string | (gauge) Unit: - |
| statement | The obfuscated/normalized SQL text corresponding to this execution plan. Type: string | (gauge) Unit: - |
database¶
PostgreSQL object metrics( Version-1.76.0)
| Tags & Fields | Description |
|---|---|
| database_instance ( tag) |
PostgreSQL instance identifier from configured tag database_instance or system_identifier. |
| database_type ( tag) |
The type of the database. The value is PostgreSQL |
| host ( tag) |
The hostname of the PostgreSQL server |
| name ( tag) |
The object identifier. The value is <server>-<database_instance> when database_instance is configured, otherwise host:port. |
| port ( tag) |
The port of the PostgreSQL server |
| server ( tag) |
The server address of the PostgreSQL server. The value is host:port |
| version ( tag) |
The version of the PostgreSQL server |
| avg_query_time | The average time taken by a query to execute Type: float Unit: timeStamp,usec |
| message | Summary of database information Type: string Unit: - |
| qps | The number of queries executed by the database per second Type: float Unit: gauge |
| slow_queries | The number of queries that have taken more than long_query_time seconds. This counter increments regardless of whether the slow query log is enabled. Type: int Unit: count |
| slow_query_log | Whether the slow query log is enabled. The value can OFF to disable the log or ON to enable the log. Type: string Unit: - |
| tps | The number of transactions executed by the database per second Type: float Unit: gauge |
| uptime | The number of seconds that the server has been up Type: int Unit: time,s |
message 指标字段结构¶
message 字段基本结构如下:
{
"setting": {
"DateStyle":"ISO, MDY",
...
},
"databases": [ # databases information
{
"name": "db1",
"encoding": "utf8",
"owner": "datakit",
"schemas": [ # schemas information
{
"name": "schema1",
"owner": "datakit",
"tables": [ # tables information
{
"name": "table1",
"columns": [], # columns information
"indexes": [], # indexes information
"foreign_keys": [], # foreign keys information
}
...
]
},
...
]
}
...
]
}
setting¶
setting 字段中的数据来源于 pg_settings 系统视图,用于展示当前数据库系统的配置参数信息,详细信息可以参考 PostgreSQL 文档。
databases¶
databases 字段保存了 PostgreSQL 服务器上所有数据库的信息,每个数据库的信息如下:
| 字段名 | 描述 | 类型 |
|---|---|---|
name |
数据库名称 | string |
encoding |
数据库编码 | string |
owner |
角色名称 | string |
description |
描述文本 | string |
schemas |
包含 schema 信息的列表 |
list |
schemas 字段包含了数据库中的所有 schema 信息,每个 schema 的信息如下:
| 字段名 | 描述 | 类型 |
|---|---|---|
name |
schema 名称 |
string |
owner |
角色名称 | string |
tables |
包含 table 信息的列表 |
list |
tables 字段包含了数据库中所有表的信息,每个表的信息如下:
| 字段名 | 描述 | 类型 |
|---|---|---|
name |
表名称 | string |
owner |
角色名称 | string |
has_indexes |
是否有索引 | bool |
has_partitions |
是否有分区 | bool |
toast_table |
toast 表名称 |
string |
partition_key |
分区键 | string |
num_partitions |
分区数量 | int64 |
foreign_keys |
包含外键信息的列表 | list |
columns |
包含列信息的列表 | list |
indexes |
包含索引信息的列表 | list |
tables.columns
columns 字段包含了数据库中表的所有列的信息,每个列的信息如下:
| 字段名 | 描述 | 类型 |
|---|---|---|
name |
column 名称 |
string |
data_type |
数据类型 | string |
nullable |
是否可以为空 | bool |
default |
默认值 | string |
tables.indexes
indexes 字段包含了数据库中表的所有索引的信息,每个索引的信息如下:
| 字段名 | 描述 | 类型 |
|---|---|---|
name |
index 名称 |
string |
columns |
索引包含的列 | list |
index_type |
索引类型 | string |
definition |
索引定义 | string |
is_unique |
是否唯一 | bool |
is_primary |
是否主键 | bool |
is_exclusion |
是否为排除约束索引 | bool |
is_immediate |
每条语句执行结束后是否立即检查约束 | bool |
is_valid |
是否有效 | bool |
is_clustered |
是否为聚簇索引 | bool |
is_checkxmin |
是否检查 xmin |
bool |
is_ready |
是否就绪 | bool |
is_live |
是否活跃 | bool |
is_replident |
是否是行标识索引 | bool |
is_partial |
是否是部分索引 | bool |
索引列信息字段 indexes.columns 包含了索引中包含的列的信息,每个列的信息如下:
| 字段名 | 描述 | 类型 |
|---|---|---|
name |
列名称 | string |
tables.foreign_keys
foreign_keys 字段包含了数据库中表的所有外键的信息,每个外键的信息如下:
| 字段名 | 描述 | 类型 |
|---|---|---|
name |
foreign_key 名称 |
string |
definition |
外键定义 | string |
constraint_schema |
外键所属的数据库(通常与表所在数据库一致) | string |
column_names |
外键列名称(多个列用逗号分隔,如 user_id, order_id) | string |
referenced_table_schema |
引用表所在的数据库 | string |
referenced_table_name |
引用表名称 | string |
referenced_column_names |
引用列名称(多个列用逗号分隔) | string |
update_action |
级联更新规则(如 CASCADE, RESTRICT) | string |
delete_action |
级联删除规则(如 CASCADE, SET NULL) | string |
日志¶
postgresql_dbm_activity¶
PostgreSQL database activity information collected from the pg_stat_activity view. Provides detailed information about the current activity of each backend, including query status, execution time, and resource usage.( Version-1.84.0)
| Tags & Fields | Description |
|---|---|
| application_name ( tag) |
Name of the application that is connected to this backend. |
| backend_type ( tag) |
Type of current backend. |
| client_addr ( tag) |
IP address of the client that is connected to this backend. If this field is null, it indicates either that the client is connected via a Unix socket or that this is an internal process (such as autovacuum worker). |
| client_hostname ( tag) |
Host name of the connected client, as reported by a reverse DNS lookup of client_addr. This field will only be non-null for IP connections, and only when log_hostname is enabled. |
| client_port ( tag) |
TCP port number that the client is using for communication, or -1 if a Unix socket is used. |
| db ( tag) |
The database name. |
| message ( tag) |
Text of a normalized statement. |
| pid ( tag) |
Process ID of this backend. |
| query_signature ( tag) |
The hash value computed from query |
| server ( tag) |
The server address |
| service ( tag) |
The service name postgresql |
| state ( tag) |
Current overall state of this backend. |
| status ( tag) |
The status of the statement. The value is only info for now. |
| usename ( tag) |
Name of the user logged into this backend. |
| wait_event ( tag) |
Wait event name if backend is currently waiting. |
| wait_event_type ( tag) |
Type of event for which the backend is waiting. |
| wait_group ( tag) |
Datakit unified wait group: Lock, I/O, Concurrency, Network, CPU, Commit/Log, Other. |
| backend_start | Time when this process was started. For client backends, this is the time the client connected to the server. Type: int | (count) Unit: timeStamp,usec |
| blocking_pids | The raw pg_blocking_pids(pid) result for this backend. Type: string | (string) Unit: - |
| duration | The query duration derived from query_start and state/sample time. Type: int | (gauge) Unit: time,μs |
| query_start | Time when the currently active query was started, or if state is not active, when the last query was started. Type: int | (count) Unit: timeStamp,usec |
| state_change | Time when the state was last changed. Type: int | (count) Unit: timeStamp,usec |
| tx_duration | The transaction duration derived from sample time - xact_start. Type: int | (gauge) Unit: time,μs |
| wait_duration | The wait duration derived from sample time - state_change. Type: int | (gauge) Unit: time,μs |
| xact_start | Time when this process' current transaction was started, or null if no transaction is active. If the current query is the first of its transaction, this column is equal to the query_start column. Type: int | (count) Unit: timeStamp,usec |
文件日志¶
日志采集¶
- PostgreSQL 日志默认是输出至
stderr,如需开启文件日志,可在 PostgreSQL 的配置文件/etc/postgresql/<VERSION>/main/postgresql.conf, 进行如下配置:
logging_collector = on # 开启日志写入文件功能
log_directory = 'pg_log' # 设置文件存放目录,绝对路径或相对路径(相对 PGDATA)
log_filename = 'pg.log' # 日志文件名称
log_statement = 'all' # 记录所有查询
#log_duration = on
log_line_prefix= '%m [%p] %d [%a] %u [%h] %c ' # 日志行前缀
log_file_mode = 0644
# For Windows
#log_destination = 'eventlog'
更多配置,请参考官方文档。
- PostgreSQL 采集器默认是未开启日志采集功能,可在 conf.d/samples/postgresql.conf 中 将
files打开,并写入 PostgreSQL 日志文件的绝对路径。比如:
开启日志采集后,默认会产生日志来源(source)为 PostgreSQL 的日志。
注意:日志采集仅支持已安装 DataKit 主机上的日志。
日志 Pipeline 切割¶
原始日志为
2021-05-31 15:23:45.110 CST [74305] test [pgAdmin 4 - DB:postgres] postgres [127.0.0.1] 60b48f01.12241 LOG: statement:
SELECT psd.*, 2^31 - age(datfrozenxid) as wraparound, pg_database_size(psd.datname) as pg_database_size
FROM pg_stat_database psd
JOIN pg_database pd ON psd.datname = pd.datname
WHERE psd.datname not ilike 'template%' AND psd.datname not ilike 'rdsadmin'
AND psd.datname not ilike 'azure_maintenance' AND psd.datname not ilike 'postgres'
切割后的字段说明:
| 字段名 | 字段值 | 说明 |
|---|---|---|
application_name |
pgAdmin 4 - DB:postgres |
连接当前数据库的应用的名称 |
db_name |
test |
访问的数据库 |
process_id |
74305 |
当前连接的客户端进程 ID |
remote_host |
127.0.0.1 |
客户端的地址 |
session_id |
60b48f01.12241 |
当前会话的 ID |
user |
postgres |
当前访问用户名 |
status |
LOG |
当前日志的级别(LOG,ERROR,FATAL,PANIC,WARNING,NOTICE,INFO) |
time |
1622445825110000000 |
日志产生时间 |
FAQ¶
缺失指标¶
对于 postgresql_lock/postgresql_stat/postgresql_index/postgresql_size/postgresql_statio 这些指标,需要开启配置文件中的 relations 字段。如果这些指标存在部分缺失,可能是因为相关指标不存在数据导致的。