跳转至

Oracle


·


Oracle 监控指标采集,具有以下数据收集功能

  • Process 相关
  • Table Space 相关数据
  • System 数据采集
  • 自定义查询数据采集

已测试的版本:

  • Oracle 19c
  • Oracle 12c
  • Oracle 11g

配置

前置条件

  • 创建监控账号

如果是使用单 PDB 或者非 CDB 实例,一个本地用户(local user)就足够了:

-- Create the datakit user. Replace the password placeholder with a secure password.
CREATE USER datakit IDENTIFIED BY <PASSWORD>;

-- Grant access to the datakit user.
GRANT CONNECT, CREATE SESSION TO datakit;
GRANT SELECT_CATALOG_ROLE to datakit;
GRANT SELECT ON DBA_TABLESPACE_USAGE_METRICS TO datakit;
GRANT SELECT ON DBA_TABLESPACES TO datakit;
GRANT SELECT ON DBA_USERS TO datakit;
GRANT SELECT ON SYS.DBA_DATA_FILES TO datakit;
GRANT SELECT ON V_$ACTIVE_SESSION_HISTORY TO datakit;
GRANT SELECT ON V_$ARCHIVE_DEST TO datakit;
GRANT SELECT ON V_$ASM_DISKGROUP TO datakit;
GRANT SELECT ON V_$DATABASE TO datakit;
GRANT SELECT ON V_$DATAFILE TO datakit;
GRANT SELECT ON V_$INSTANCE TO datakit;
GRANT SELECT ON V_$LOG TO datakit;
GRANT SELECT ON V_$OSSTAT TO datakit;
GRANT SELECT ON V_$PGASTAT TO datakit;
GRANT SELECT ON V_$PROCESS TO datakit;
GRANT SELECT ON V_$RECOVERY_FILE_DEST TO datakit;
GRANT SELECT ON V_$RESTORE_POINT TO datakit;
GRANT SELECT ON V_$SESSION TO datakit;
GRANT SELECT ON V_$SGASTAT TO datakit;
GRANT SELECT ON V_$SYSMETRIC TO datakit;
GRANT SELECT ON V_$SYSTEM_PARAMETER TO datakit;

如果想监控来自 CDB 和所有 PDB 中的表空间(Table Spaces),需要一个有合适权限的公共用户(common user):

-- Create the datakit user. Replace the password placeholder with a secure password.
CREATE USER datakit IDENTIFIED BY <PASSWORD>;

-- Grant access to the datakit user.
ALTER USER datakit SET CONTAINER_DATA=ALL CONTAINER=CURRENT;
GRANT CONNECT, CREATE SESSION TO datakit;
GRANT SELECT_CATALOG_ROLE to datakit;
GRANT SELECT ON v_$instance TO datakit;
GRANT SELECT ON v_$database TO datakit;
GRANT SELECT ON v_$sysmetric TO datakit;
GRANT SELECT ON v_$system_parameter TO datakit;
GRANT SELECT ON v_$session TO datakit;
GRANT SELECT ON v_$recovery_file_dest TO datakit;
GRANT SELECT ON v_$active_session_history TO datakit;
GRANT SELECT ON v_$osstat TO datakit;
GRANT SELECT ON v_$restore_point TO datakit;
GRANT SELECT ON v_$process TO datakit;
GRANT SELECT ON v_$datafile TO datakit;
GRANT SELECT ON v_$pgastat TO datakit;
GRANT SELECT ON v_$sgastat TO datakit;
GRANT SELECT ON v_$log TO datakit;
GRANT SELECT ON v_$archive_dest TO datakit;
GRANT SELECT ON v_$asm_diskgroup TO datakit;
GRANT SELECT ON sys.dba_data_files TO datakit;
GRANT SELECT ON DBA_TABLESPACES TO datakit;
GRANT SELECT ON DBA_TABLESPACE_USAGE_METRICS TO datakit;
GRANT SELECT ON DBA_USERS TO datakit;

注意:上述的 SQL 语句由于 Oracle 版本的原因部分可能会出现 "表不存在" 等错误,忽略即可。

  • 安装依赖包

根据操作系统和 Oracle 版本选择安装对应的安装包,参考这里,如:

wget https://download.oracle.com/otn_software/linux/instantclient/2110000/instantclient-basiclite-linux.x64-21.10.0.0.0dbru.zip
unzip instantclient-basiclite-linux.x64-21.10.0.0.0dbru.zip

将解压后的目录文件路径添加到以下配置信息中的 LD_LIBRARY_PATH 环境变量路径中。

也可以直接下载我们预先准备好的依赖包:

wget https://static.guance.com/otn_software/instantclient/instantclient-basiclite-linux.x64-21.10.0.0.0dbru.zip \
    -O /usr/local/datakit/externals/instantclient-basiclite-linux.zip \
    && unzip /usr/local/datakit/externals/instantclient-basiclite-linux.zip -d /opt/oracle \
    && mv /opt/oracle/instantclient_21_10 /opt/oracle/instantclient;
wget https://download.oracle.com/otn_software/linux/instantclient/1919000/instantclient-basiclite-linux.arm64-19.19.0.0.0dbru.zip
unzip instantclient-basiclite-linux.arm64-19.19.0.0.0dbru.zip

将解压后的目录文件路径添加到以下配置信息中的 LD_LIBRARY_PATH 环境变量路径中。

也可以直接下载我们预先准备好的依赖包:

wget https://static.guance.com/otn_software/instantclient/instantclient-basiclite-linux.arm64-19.19.0.0.0dbru.zip \
    -O /usr/local/datakit/externals/instantclient-basiclite-linux.zip \
    && unzip /usr/local/datakit/externals/instantclient-basiclite-linux.zip -d /opt/oracle \
    && mv /opt/oracle/instantclient_19_19 /opt/oracle/instantclient;
  • 部分系统需要安装额外的依赖库:
apt-get install -y libaio-dev libaio1

采集器配置

进入 DataKit 安装目录下的 conf.d/db 目录,复制 oracle.conf.sample 并命名为 oracle.conf。示例如下:

[[inputs.external]]
  daemon = true
  name   = "oracle"
  cmd    = "/usr/local/datakit/externals/oracle"

  ## Set true to enable election
  election = true

  ## Modify below if necessary.
  ## The password use environment variable named "ENV_INPUT_ORACLE_PASSWORD".
  args = [
    "--interval"        , "1m"                           ,
    "--host"            , "<your-oracle-host>"           ,
    "--port"            , "1521"                         ,
    "--username"        , "<oracle-user-name>"           ,
    "--service-name"    , "<oracle-service-name>"        ,
    "--slow-query-time" , "0s"                           ,
    "--log"             , "/var/log/datakit/oracle.log"  ,
  ]
  envs = [
    "ENV_INPUT_ORACLE_PASSWORD=<oracle-password>",
    "LD_LIBRARY_PATH=/opt/oracle/instantclient:$LD_LIBRARY_PATH",
  ]

  [inputs.external.tags]
    # some_tag = "some_value"
    # more_tag = "some_other_value"

  ## Run a custom SQL query and collect corresponding metrics.
  # [[inputs.external.custom_queries]]
  #   sql = '''
  #     SELECT
  #       GROUP_ID, METRIC_NAME, VALUE
  #     FROM GV$SYSMETRIC
  #   '''
  #   metric = "oracle_custom"
  #   tags = ["GROUP_ID", "METRIC_NAME"]
  #   fields = ["VALUE"]

  #############################
  # Parameter Description (Marked with * is required field)
  #############################
  # *--interval                   : Collect interval (Default is 1m).
  # *--host                       : Oracle instance address (IP).
  # *--port                       : Oracle listen port (Default is 1521).
  # *--username                   : Oracle username.
  # *--service-name               : Oracle service name.
  # *--slow-query-time            : Oracle slow query time threshold defined. If larger than this, the executed sql will be reported.
  # *--log                        : Collector log path.
  # *ENV_INPUT_ORACLE_PASSWORD    : Oracle password.

配置好后,重启 DataKit 即可。

目前可以通过 ConfigMap 方式注入采集器配置来开启采集器。

Tip

上述配置会以命令行形式展示在进程列表中(包括密码),如果想隐藏密码,可以通过将密码写进环境变量 ENV_INPUT_ORACLE_PASSWORD 形式实现,示例:

envs = [
  "ENV_INPUT_ORACLE_PASSWORD=<YOUR-SAFE-PASSWORD>"
] 

该环境变量在读取密码时有最高优先级,即只要出现该环境变量,那密码就以该环境变量中的值为准。如果密码中有特殊字符,可以参见这里的做法来处理。

指标

以下所有数据采集,默认会追加名为 host 的全局 tag(tag 值为 DataKit 所在主机名),也可以在配置中通过 [inputs.oracle.tags] 指定其它标签:

 [inputs.oracle.tags]
  # some_tag = "some_value"
  # more_tag = "some_other_value"
  # ...

oracle_process

  • 标签
Tag Description
host Host name
oracle_server Server addr
oracle_service Server service
pdb_name PDB name
program Program in progress
  • 指标列表
Metric Description Type Unit
pga_alloc_mem PGA memory allocated by process float B
pga_freeable_mem PGA memory freeable by process float B
pga_max_mem PGA maximum memory ever allocated by process float B
pga_used_mem PGA memory used by process float B
pid Oracle process identifier int -

oracle_tablespace

  • 标签
Tag Description
host Host name
oracle_server Server addr
oracle_service Server service
pdb_name PDB name
tablespace_name Table space name
  • 指标列表
Metric Description Type Unit
in_use Table space in-use float count
off_use Table space offline float count
ts_size Table space size float B
used_space Used space float count

oracle_system

  • 标签
Tag Description
host Host name
oracle_server Server addr
oracle_service Server service
pdb_name PDB name
  • 指标列表
Metric Description Type Unit
active_sessions Number of active sessions float count
buffer_cachehit_ratio Ratio of buffer cache hits float percent
cache_blocks_corrupt Corrupt cache blocks float count
cache_blocks_lost Lost cache blocks float count
consistent_read_changes Consistent read changes per second float count
consistent_read_gets Consistent read gets per second float count
cursor_cachehit_ratio Ratio of cursor cache hits float percent
database_cpu_time_ratio Database CPU time ratio float percent
database_wait_time_ratio Memory sorts per second float percent
db_block_changes DB block changes per second float count
db_block_gets DB block gets per second float count
disk_sorts Disk sorts per second float count
enqueue_timeouts Enqueue timeouts per second float count
execute_without_parse Execute without parse ratio float count
gc_cr_block_received GC CR block received float count
host_cpu_utilization Host CPU utilization (%) float percent
library_cachehit_ratio Ratio of library cache hits float percent
logical_reads Logical reads per second float count
logons Number of logon attempts float count
memory_sorts_ratio Memory sorts ratio float percent
pga_over_allocation_count Over-allocating PGA memory count float count
physical_reads Physical reads per second float count
physical_reads_direct Physical reads direct per second float count
physical_writes Physical writes per second float count
redo_generated Redo generated per second float count
redo_writes Redo writes per second float count
rows_per_sort Rows per sort float count
service_response_time Service response time float sec
session_count Session count float count
session_limit_usage Session limit usage float percent
shared_pool_free Shared pool free memory % float percent
soft_parse_ratio Soft parse ratio float percent
sorts_per_user_call Sorts per user call float count
temp_space_used Temp space used float B
user_rollbacks Number of user rollbacks float count

慢查询支持

Datakit 可以将执行超过用户自定义时间的 SQL 语句报告给观测云,在日志中显示,来源名是 oracle_log

该功能默认情况下是关闭的,用户可以在 Oracle 的配置文件中将其打开,方法如下:

--slow-query-time 后面的值从 0s 改成用户心中的阈值,最小值 1 毫秒。一般推荐 10 秒。

  args = [
    ...
    '--slow-query-time' , '10s'                        ,
  ]
字段说明
  • avg_elapsed: 该 SQL 语句执行的平均耗时。
  • username:执行该语句的用户名。
  • failed_obfuscate:SQL 脱敏失败的原因。只有在 SQL 脱敏失败才会出现。SQL 脱敏失败后原 SQL 会被上报。 更多字段解释可以查看这里
重要信息
  • 如果值是 0s 或空或小于 1 毫秒,则不会开启 Oracle 采集器的慢查询功能,即默认状态。
  • 没有执行完成的 SQL 语句不会被查询到。

自定义查询支持

支持自定义查询数据采集。具体用法与例子见上面 采集器配置 里面的 custom_queries

FAQ

如何查看 Oracle 采集器的运行日志?

由于 Oracle 采集器是外部采集器,其日志是默认单独存放在 [Datakit 安装目录]/externals/oracle.log 中。

另外,可以在配置文件中通过 --log 参数来指定日志文件位置。

配置好 Oracle 采集之后,为何 monitor 中无数据显示?

大概原因有如下几种可能:

  • Oracle 动态库依赖有问题

即使你本机当前可能已经有对应的 Oracle 包,仍然建议使用上面文档中指定的依赖包且确保其安装路径跟 LD_LIBRARY_PATH 所指定的路径一致。

  • glibc 版本有问题

由于 Oracle 采集器是独立编译的,且开启了 CGO,故其运行时需要 glibc 的依赖在 Linux 上可通过如下命令检查当前机器的 glibc 依赖是否有问题:

$ ldd <DataKit 安装目录>/externals/oracle
    linux-vdso.so.1 (0x00007ffed33f9000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f70144e1000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f70144be000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f70142cc000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f70144fc000)

如果有报告如下信息,则基本是当前机器上的 glibc 版本较低导致:

externals/oracle: /lib64/libc.so.6: version  `GLIBC_2.14` not found (required by externals/oracle)
  • Oracle 采集器只能在 Linux x86_64/ARM64 架构的 DataKit 使用,其它平台均不支持

这意味着 Oracle 这个采集器只能在 x86_64/ARM64 的 Linux 上运行,其它平台一律无法运行当前的 Oracle 采集器。

为什么看不到 oracle_system 指标集?

需要数据库运行起来之后,过 1 分钟才能看到。

文档评价

文档内容是否对您有帮助? ×