Oracle
Oracle 监控指标采集,具有以下数据收集功能
- Process 相关
- Table Space 相关数据
- System 数据采集
- 自定义查询数据采集
已测试的版本:
- Oracle 19c
- Oracle 12c
- Oracle 11g
自 DataKit 1.32.0 版本 开始,支持通过 DataKit 直接采集和外部采集器两种方式采集 Oracle 指标。
配置¶
前置条件¶
- 创建监控账号
如果是使用单 PDB 或者非 CDB 实例,一个本地用户(local user)就足够了:
-- Create the datakit user. Replace the password placeholder with a secure password.
CREATE USER datakit IDENTIFIED BY <PASSWORD>;
-- Grant access to the datakit user.
GRANT CONNECT, CREATE SESSION TO datakit;
GRANT SELECT_CATALOG_ROLE to datakit;
GRANT SELECT ON DBA_TABLESPACE_USAGE_METRICS TO datakit;
GRANT SELECT ON DBA_TABLESPACES TO datakit;
GRANT SELECT ON DBA_USERS TO datakit;
GRANT SELECT ON SYS.DBA_DATA_FILES TO datakit;
GRANT SELECT ON V_$ACTIVE_SESSION_HISTORY TO datakit;
GRANT SELECT ON V_$ARCHIVE_DEST TO datakit;
GRANT SELECT ON V_$ASM_DISKGROUP TO datakit;
GRANT SELECT ON V_$DATABASE TO datakit;
GRANT SELECT ON V_$DATAFILE TO datakit;
GRANT SELECT ON V_$INSTANCE TO datakit;
GRANT SELECT ON V_$LOG TO datakit;
GRANT SELECT ON V_$OSSTAT TO datakit;
GRANT SELECT ON V_$PGASTAT TO datakit;
GRANT SELECT ON V_$PROCESS TO datakit;
GRANT SELECT ON V_$RECOVERY_FILE_DEST TO datakit;
GRANT SELECT ON V_$RESTORE_POINT TO datakit;
GRANT SELECT ON V_$SESSION TO datakit;
GRANT SELECT ON V_$SGASTAT TO datakit;
GRANT SELECT ON V_$SYSMETRIC TO datakit;
GRANT SELECT ON V_$SYSTEM_PARAMETER TO datakit;
如果想监控来自 CDB 和所有 PDB 中的表空间(Table Spaces),需要一个有合适权限的公共用户(common user):
-- Create the datakit user. Replace the password placeholder with a secure password.
CREATE USER datakit IDENTIFIED BY <PASSWORD>;
-- Grant access to the datakit user.
ALTER USER datakit SET CONTAINER_DATA=ALL CONTAINER=CURRENT;
GRANT CONNECT, CREATE SESSION TO datakit;
GRANT SELECT_CATALOG_ROLE to datakit;
GRANT SELECT ON v_$instance TO datakit;
GRANT SELECT ON v_$database TO datakit;
GRANT SELECT ON v_$sysmetric TO datakit;
GRANT SELECT ON v_$system_parameter TO datakit;
GRANT SELECT ON v_$session TO datakit;
GRANT SELECT ON v_$recovery_file_dest TO datakit;
GRANT SELECT ON v_$active_session_history TO datakit;
GRANT SELECT ON v_$osstat TO datakit;
GRANT SELECT ON v_$restore_point TO datakit;
GRANT SELECT ON v_$process TO datakit;
GRANT SELECT ON v_$datafile TO datakit;
GRANT SELECT ON v_$pgastat TO datakit;
GRANT SELECT ON v_$sgastat TO datakit;
GRANT SELECT ON v_$log TO datakit;
GRANT SELECT ON v_$archive_dest TO datakit;
GRANT SELECT ON v_$asm_diskgroup TO datakit;
GRANT SELECT ON sys.dba_data_files TO datakit;
GRANT SELECT ON DBA_TABLESPACES TO datakit;
GRANT SELECT ON DBA_TABLESPACE_USAGE_METRICS TO datakit;
GRANT SELECT ON DBA_USERS TO datakit;
注意:上述的 SQL 语句由于 Oracle 版本的原因部分可能会出现 "表不存在" 等错误,忽略即可。
- 安装依赖包
如果使用 DataKit 直接采集,可以跳过此步骤。
根据操作系统和 Oracle 版本选择安装对应的安装包,参考这里,如:
wget https://download.oracle.com/otn_software/linux/instantclient/2110000/instantclient-basiclite-linux.x64-21.10.0.0.0dbru.zip
unzip instantclient-basiclite-linux.x64-21.10.0.0.0dbru.zip
将解压后的目录文件路径添加到以下配置信息中的 LD_LIBRARY_PATH
环境变量路径中。
也可以直接下载我们预先准备好的依赖包:
wget https://static.guance.com/otn_software/instantclient/instantclient-basiclite-linux.x64-21.10.0.0.0dbru.zip \
-O /usr/local/datakit/externals/instantclient-basiclite-linux.zip \
&& unzip /usr/local/datakit/externals/instantclient-basiclite-linux.zip -d /opt/oracle \
&& mv /opt/oracle/instantclient_21_10 /opt/oracle/instantclient;
wget https://download.oracle.com/otn_software/linux/instantclient/1919000/instantclient-basiclite-linux.arm64-19.19.0.0.0dbru.zip
unzip instantclient-basiclite-linux.arm64-19.19.0.0.0dbru.zip
将解压后的目录文件路径添加到以下配置信息中的 LD_LIBRARY_PATH
环境变量路径中。
也可以直接下载我们预先准备好的依赖包:
wget https://static.guance.com/otn_software/instantclient/instantclient-basiclite-linux.arm64-19.19.0.0.0dbru.zip \
-O /usr/local/datakit/externals/instantclient-basiclite-linux.zip \
&& unzip /usr/local/datakit/externals/instantclient-basiclite-linux.zip -d /opt/oracle \
&& mv /opt/oracle/instantclient_19_19 /opt/oracle/instantclient;
部分系统需要安装额外的依赖库:
采集器配置¶
进入 DataKit 安装目录下的 conf.d/db
目录,复制 oracle.conf.sample
并命名为 oracle.conf
。示例如下:
[[inputs.oracle]]
# host name
host = "localhost"
## port
port = 1521
## user name
user = "datakit"
## password
password = "<PASS>"
## service
service = "XE"
## interval
interval = "10s"
## connection timeout
connect_timeout = "30s"
## slow query time threshold defined. If larger than this, the executed sql will be reported.
slow_query_time = "0s"
## Metric name in metric_exclude_list will not be collected.
metric_exclude_list = [""]
## Set true to enable election
election = true
## Run a custom SQL query and collect corresponding metrics.
# [[inputs.oracle.custom_queries]]
# sql = '''
# SELECT
# GROUP_ID, METRIC_NAME, VALUE
# FROM GV$SYSMETRIC
# '''
# metric = "oracle_custom"
# tags = ["GROUP_ID", "METRIC_NAME"]
# fields = ["VALUE"]
# interval = "10s"
[inputs.oracle.tags]
# some_tag = "some_value"
# more_tag = "some_other_value"
配置好后,重启 DataKit 即可。
目前可以通过 ConfigMap 方式注入采集器配置来开启采集器。
外部采集器的配置示例如下:
[[inputs.external]]
daemon = true
name = "oracle"
cmd = "/usr/local/datakit/externals/oracle"
## Set true to enable election
election = true
## Modify below if necessary.
## The password use environment variable named "ENV_INPUT_ORACLE_PASSWORD".
args = [
"--interval" , "1m" ,
"--host" , "<your-oracle-host>" ,
"--port" , "1521" ,
"--username" , "<oracle-user-name>" ,
"--service-name" , "<oracle-service-name>" ,
"--slow-query-time" , "0s" ,
"--log" , "/var/log/datakit/oracle.log" ,
]
envs = [
"ENV_INPUT_ORACLE_PASSWORD=<oracle-password>",
"LD_LIBRARY_PATH=/opt/oracle/instantclient:$LD_LIBRARY_PATH",
]
[inputs.external.tags]
# some_tag = "some_value"
# more_tag = "some_other_value"
## Run a custom SQL query and collect corresponding metrics.
# [[inputs.external.custom_queries]]
# sql = '''
# SELECT
# GROUP_ID, METRIC_NAME, VALUE
# FROM GV$SYSMETRIC
# '''
# metric = "oracle_custom"
# tags = ["GROUP_ID", "METRIC_NAME"]
# fields = ["VALUE"]
#############################
# Parameter Description (Marked with * is required field)
#############################
# *--interval : Collect interval (Default is 1m).
# *--host : Oracle instance address (IP).
# *--port : Oracle listen port (Default is 1521).
# *--username : Oracle username.
# *--service-name : Oracle service name.
# *--slow-query-time : Oracle slow query time threshold defined. If larger than this, the executed sql will be reported.
# *--log : Collector log path.
# *ENV_INPUT_ORACLE_PASSWORD : Oracle password.
Tip
上述配置会以命令行形式展示在进程列表中(包括密码),如果想隐藏密码,可以通过将密码写进环境变量 ENV_INPUT_ORACLE_PASSWORD
形式实现,示例:
该环境变量在读取密码时有最高优先级,即只要出现该环境变量,那密码就以该环境变量中的值为准。如果密码中有特殊字符,可以参见这里的做法来处理。
指标¶
以下所有数据采集,默认会追加全局选举 tag,也可以在配置中通过 [inputs.oracle.tags]
指定其它标签:
oracle_process
¶
- 标签
Tag | Description |
---|---|
host | Host name |
oracle_server | Server addr |
oracle_service | Server service |
pdb_name | PDB name |
program | Program in progress |
- 指标列表
Metric | Description |
---|---|
pga_alloc_mem | PGA memory allocated by process Type: float Unit: digital,B |
pga_freeable_mem | PGA memory freeable by process Type: float Unit: digital,B |
pga_max_mem | PGA maximum memory ever allocated by process Type: float Unit: digital,B |
pga_used_mem | PGA memory used by process Type: float Unit: digital,B |
pid | Oracle process identifier Type: int Unit: N/A |
oracle_tablespace
¶
- 标签
Tag | Description |
---|---|
host | Host name |
oracle_server | Server addr |
oracle_service | Server service |
pdb_name | PDB name |
tablespace_name | Table space name |
- 指标列表
Metric | Description |
---|---|
in_use | Percentage of used space,as a function of the maximum possible Tablespace size Type: float Unit: percent,percent |
off_use | Total space consumed by the Tablespace, in database blocks Type: float Unit: digital,B |
ts_size | Table space size Type: float Unit: digital,B |
used_space | Used space Type: float Unit: digital,B |
oracle_system
¶
You have to wait for a few minutes to see these metrics when your running Oracle database's version is earlier than 12c.
- 标签
Tag | Description |
---|---|
host | Host name |
oracle_server | Server addr |
oracle_service | Server service |
pdb_name | PDB name |
- 指标列表
Metric | Description |
---|---|
active_sessions | Number of active sessions Type: float Unit: count |
buffer_cachehit_ratio | Ratio of buffer cache hits Type: float Unit: percent,percent |
cache_blocks_corrupt | Corrupt cache blocks Type: float Unit: count |
cache_blocks_lost | Lost cache blocks Type: float Unit: count |
consistent_read_changes | Consistent read changes per second Type: float Unit: count |
consistent_read_gets | Consistent read gets per second Type: float Unit: count |
cursor_cachehit_ratio | Ratio of cursor cache hits Type: float Unit: percent,percent |
database_cpu_time_ratio | Database CPU time ratio Type: float Unit: percent,percent |
database_wait_time_ratio | Memory sorts per second Type: float Unit: percent,percent |
db_block_changes | DB block changes per second Type: float Unit: count |
db_block_gets | DB block gets per second Type: float Unit: count |
disk_sorts | Disk sorts per second Type: float Unit: count |
enqueue_timeouts | Enqueue timeouts per second Type: float Unit: count |
execute_without_parse | Execute without parse ratio Type: float Unit: count |
gc_cr_block_received | GC CR block received Type: float Unit: count |
host_cpu_utilization | Host CPU utilization (%) Type: float Unit: percent,percent |
library_cachehit_ratio | Ratio of library cache hits Type: float Unit: percent,percent |
logical_reads | Logical reads per second Type: float Unit: count |
logons | Number of logon attempts Type: float Unit: count |
memory_sorts_ratio | Memory sorts ratio Type: float Unit: percent,percent |
pga_over_allocation_count | Over-allocating PGA memory count Type: float Unit: count |
physical_reads | Physical reads per second Type: float Unit: count |
physical_reads_direct | Physical reads direct per second Type: float Unit: count |
physical_writes | Physical writes per second Type: float Unit: count |
redo_generated | Redo generated per second Type: float Unit: count |
redo_writes | Redo writes per second Type: float Unit: count |
rows_per_sort | Rows per sort Type: float Unit: count |
service_response_time | Service response time Type: float Unit: time,ms |
session_count | Session count Type: float Unit: count |
session_limit_usage | Session limit usage Type: float Unit: percent,percent |
shared_pool_free | Shared pool free memory % Type: float Unit: percent,percent |
soft_parse_ratio | Soft parse ratio Type: float Unit: percent,percent |
sorts_per_user_call | Sorts per user call Type: float Unit: count |
temp_space_used | Temp space used Type: float Unit: digital,B |
uptime | Instance uptime Type: int Unit: time,s |
user_rollbacks | Number of user rollbacks Type: float Unit: count |
oracle_waiting_event
¶
- 标签
Tag | Description |
---|---|
event | Event name |
event_type | Event type, such as USER/BACKGROUND |
program | Program(process) name that waiting the event |
username | Oracle username that waiting the event |
- 指标列表
Metric | Description |
---|---|
count | Waiting event count Type: int Unit: count |
oracle_locked_session
¶
- 标签
Tag | Description |
---|---|
event | Locked session that waiting the specified event name |
host | Host name |
oracle_server | Server addr |
oracle_service | Server service |
- 指标列表
Metric | Description |
---|---|
waiting_session_count | Locked session count Type: int Unit: count |
collector
¶
- 标签
Tag | Description |
---|---|
instance | Server addr of the instance |
job | Server name of the instance |
- 指标列表
Metric | Description |
---|---|
up | Type: int Unit: - |
自定义对象¶
database
¶
- 标签
Tag | Description |
---|---|
col_co_status | Current status of collector on instance(OK/NotOK ) |
host | The server host address |
ip | Connection IP of the instance |
name | Object uniq ID |
reason | If status not ok, we'll get some reasons about the status |
- 指标列表
Metric | Description |
---|---|
display_name | Displayed name in UI Type: string Unit: N/A |
uptime | Current instance uptime Type: int Unit: time,s |
version | Current version of the instance Type: string Unit: N/A |
慢查询采集¶
DataKit 可以将执行超过用户自定义时间的 SQL 语句报告给观测云,在日志中显示,来源名是 oracle_log
。
该功能默认情况下是关闭的,用户可以在 Oracle 的配置文件中将其打开,方法如下:
将 slow_query_time
的值从 0s
改成用户心中的阈值,最小值 1 毫秒。一般推荐 10 秒。
字段说明
avg_elapsed
: 该 SQL 语句执行的平均耗时username
:执行该语句的用户名failed_obfuscate
:SQL 脱敏失败的原因。只有在 SQL 脱敏失败才会出现。SQL 脱敏失败后原 SQL 会被上报- 如果
slow_query_time
为空或小于 1 毫秒,则不会开启 Oracle 采集器的慢查询功能,即默认状态 - 没有执行完成的 SQL 语句不会被查询到
更多字段解释可以查看这里。
oracle_log
¶
For full and detailed field into, see here
- 标签
Tag | Description |
---|---|
action | Contains the name of the action that was executing when the SQL statement was first parsed as set by calling DBMS_APPLICATION_INFO.SET_ACTION |
command_type | Oracle command type definition |
module | Contains the name of the module that was executing when the SQL statement was first parsed as set by calling DBMS_APPLICATION_INFO.SET_MODULE |
sql_id | SQL identifier of the parent cursor in the library cache |
status | Log level, always warning here |
- 指标列表
Metric | Description |
---|---|
application_wait_time | Application wait time (in microseconds) Type: int Unit: time,μs |
avg_elapsed | Average elapsed time of executions(elapsed_time/executions )Type: int Unit: time,μs |
buffer_gets | Sum of buffer gets over all child cursors Type: int Unit: count |
cluster_wait_time | Cluster wait time (in microseconds) Type: int Unit: time,μs |
concurrency_wait_time | Concurrency wait time (in microseconds) Type: int Unit: time,μs |
cpu_time | CPU time (in microseconds) used by this cursor for parsing, executing, and fetching Type: int Unit: time,μs |
disk_reads | Sum of the number of disk reads over all child cursors Type: int Unit: count |
elapsed_time | Elapsed time (in microseconds) used by this cursor for parsing, executing, and fetching. If the cursor uses parallel execution, then ELAPSED_TIME is the cumulative time...Type: int Unit: time,μs |
executions | Total number of executions, totalled over all the child cursors Type: int Unit: count |
last_active_time | Time at which the query plan was last active Type: string Unit: N/A |
message | JSON dump of all queried fields of table V$SQLAREA Type: string Unit: N/A |
parse_calls | Sum of all parse calls to all the child cursors under this parent Type: int Unit: count |
parsing_schema_name | Schema name that was used to parse this child cursor Type: string Unit: N/A |
plan_hash_value | Numeric representation of the current SQL plan for this cursor. Comparing one PLAN_HASH_VALUE to another easily identifies whether or not two plans are the same (rather than comparing the two plans line by line).Type: int Unit: N/A |
rows_processed | Total number of rows processed on behalf of this SQL statement Type: int Unit: count |
sorts | Sum of the number of sorts that were done for all the child cursors Type: int Unit: count |
sql_fulltext | All characters of the SQL text for the current cursor Type: string Unit: N/A |
user_io_wait_time | User I/O Wait Time (in microseconds) Type: int Unit: time,μs |
username | Name of the user Type: string Unit: N/A |
FAQ¶
如何查看 Oracle 采集器的运行日志?¶
由于 Oracle 采集器是外部采集器,其日志是默认单独存放在 [DataKit 安装目录]/externals/oracle.log 中。
另外,可以在配置文件中通过 --log
参数来指定日志文件位置。
自 DataKit 1.32.0 开始,Oracle 采集器不再是外部采集器,它的日志跟 DataKit 自身日志(Linux 下为 /var/log/datakit/log)在一起。
为何 monitor 中无数据显示?¶
该问题只有当 Oracle 是外部采集器(需要依赖当前环境的动态库)时才会发生,大概原因有如下几种可能:
- Oracle 动态库依赖有问题
即使你本机当前可能已经有对应的 Oracle 包,仍然建议使用上面文档中指定的依赖包且确保其安装路径跟 LD_LIBRARY_PATH
所指定的路径一致。
- glibc 版本有问题
由于 Oracle 采集器是独立编译的,且开启了 CGO,故其运行时需要 glibc 的依赖在 Linux 上可通过如下命令检查当前机器的 glibc 依赖是否有问题:
$ ldd <DataKit 安装目录>/externals/oracle
linux-vdso.so.1 (0x00007ffed33f9000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f70144e1000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f70144be000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f70142cc000)
/lib64/ld-linux-x86-64.so.2 (0x00007f70144fc000)
如果有报告如下信息,则基本是当前机器上的 glibc 版本较低导致:
- Oracle 采集器只能在 Linux x86_64/ARM64 架构的 DataKit 使用,其它平台均不支持
这意味着 Oracle 这个采集器只能在 x86_64/ARM64 的 Linux 上运行,其它平台一律无法运行当前的 Oracle 采集器。
为什么看不到 oracle_system
指标集?¶
需要数据库运行起来之后,过 1 分钟才能看到。