Nginx
NGINX 采集器可以从 NGINX 实例中采取很多指标,比如请求总数连接数、缓存等多种指标,并将指标采集到观测云,帮助监控分析 NGINX 各种异常情况。
配置¶
前置条件¶
-
NGINX 版本 >=
1.8.0
; 已测试的版本:- 1.23.2
- 1.22.1
- 1.21.6
- 1.18.0
- 1.14.2
- 1.8.0
-
NGINX 默认采集
http_stub_status_module
模块的数据,开启http_stub_status_module
模块参见这里,开启了以后会上报 NGINX 指标集的数据; -
如果您正在使用 VTS 或者想监控更多数据,建议开启 VTS 相关数据采集,可在
nginx.conf
中将选项use_vts
设置为true
。如何开启 VTS 参见这里; -
开启 VTS 功能后,能产生如下指标集:
nginx
nginx_server_zone
nginx_upstream_zone
(NGINX 需配置upstream
相关配置)nginx_cache_zone
(NGINX 需配置cache
相关配置)
-
以产生
nginx_upstream_zone
指标集为例,NGINX 相关配置示例如下:
...
http {
...
upstream your-upstreamname {
server upstream-ip:upstream-port;
}
server {
...
location / {
root html;
index index.html index.htm;
proxy_pass http://yourupstreamname;
}}}
-
已经开启了 VTS 功能以后,不必再去采集
http_stub_status_module
模块的数据,因为 VTS 模块的数据会包括http_stub_status_module
模块的数据 -
NGINX Plus 用户仍可以使用
http_stub_status_module
采集基础数据,同时需要在 NGINX 配置文件中开启http_api_module
模块 (参考),并在想要监控的server
中设置status_zone
,配置示例如下:
# 开启 http_api_module
server {
listen 8080;
location /api {
api write=on;
}
}
# 监控更多指标
server {
listen 80;
status_zone <ZONE_NAME>;
...
}
-
开启 NGINX Plus 采集需要在
nginx.conf
中将选项use_plus_api
设置为true
,并将plus_api_url
的注释去除。(注意, VTS 功能暂不支持 NGINX Plus) -
NGINX Plus 额外产生如下指标集:
nginx_location_zone
采集器配置¶
进入 DataKit 安装目录下的 conf.d/nginx
目录,复制 nginx.conf.sample
并命名为 nginx.conf
。示例如下:
[[inputs.nginx]]
## Nginx status URL.
## (Default) If not use with VTS, the formula is like this: "http://localhost:80/basic_status".
## If using with VTS, the formula is like this: "http://localhost:80/status/format/json".
url = "http://localhost:80/basic_status"
# If using Nginx Plus, this formula is like this: "http://localhost:8080/api/<api_version>".
# Note: Nginx Plus not support VTS and should be used with http_stub_status_module (Default)
# plus_api_url = "http://localhost:8080/api/9"
## Optional Can set ports as [<form>,<to>], Datakit will collect all ports.
# ports = [80,80]
## Optional collection interval, default is 10s
# interval = "30s"
use_vts = false
use_plus_api = false
## Optional TLS Config
# tls_ca = "/xxx/ca.pem"
# tls_cert = "/xxx/cert.cer"
# tls_key = "/xxx/key.key"
## Use TLS but skip chain & host verification
insecure_skip_verify = false
## HTTP response timeout (default: 5s)
response_timeout = "20s"
## Set true to enable election
election = true
# [inputs.nginx.log]
# files = ["/var/log/nginx/access.log","/var/log/nginx/error.log"]
## grok pipeline script path
# pipeline = "nginx.p"
# [inputs.nginx.tags]
# some_tag = "some_value"
# more_tag = "some_other_value"
配置好后,重启 DataKit 即可。
目前可以通过 ConfigMap 方式注入采集器配置来开启采集器。
Attention
url
地址以 nginx 具体配置为准,一般常见的用法就是用 /basic_status
这个路由。
指标¶
以下所有数据采集,默认会追加全局选举 tag,也可以在配置中通过 [inputs.nginx.tags]
指定其它标签:
nginx
¶
- 标签
Tag | Description |
---|---|
host |
Host name which installed nginx |
nginx_port |
Nginx server port |
nginx_server |
Nginx server host |
nginx_version |
Nginx version, exist when using vts |
- 指标列表
Metric | Description | Type | Unit |
---|---|---|---|
connection_accepts |
The total number of accepts client connections | int | count |
connection_active |
The current number of active client connections | int | count |
connection_dropped |
The total number of dropped client connections | int | count |
connection_handled |
The total number of handled client connections | int | count |
connection_reading |
The total number of reading client connections | int | count |
connection_requests |
The total number of requests client connections | int | count |
connection_waiting |
The total number of waiting client connections | int | count |
connection_writing |
The total number of writing client connections | int | count |
load_timestamp |
Nginx process load time in milliseconds, exist when using vts | int | msec |
pid |
The pid of nginx process (only for Nginx plus) | int | count |
ppid |
The ppid of nginx process (only for Nginx plus) | int | count |
nginx_server_zone
¶
- 标签
Tag | Description |
---|---|
host |
host name which installed nginx |
nginx_port |
nginx server port |
nginx_server |
nginx server host |
nginx_version |
nginx version |
server_zone |
server zone |
- 指标列表
Metric | Description | Type | Unit |
---|---|---|---|
code_200 |
The number of responses with status code 200 (only for Nginx plus) | int | count |
code_301 |
The number of responses with status code 301 (only for Nginx plus) | int | count |
code_404 |
The number of responses with status code 404 (only for Nginx plus) | int | count |
code_503 |
The number of responses with status code 503 (only for Nginx plus) | int | count |
discarded |
The number of requests being discarded (only for Nginx plus) | int | count |
processing |
The number of requests being processed (only for Nginx plus) | int | count |
received |
The total amount of data received from clients. | int | B |
requests |
The total number of client requests received from clients. | int | count |
response_1xx |
The number of responses with status codes 1xx | int | count |
response_2xx |
The number of responses with status codes 2xx | int | count |
response_3xx |
The number of responses with status codes 3xx | int | count |
response_4xx |
The number of responses with status codes 4xx | int | count |
response_5xx |
The number of responses with status codes 5xx | int | count |
responses |
The total number of responses (only for Nginx plus) | int | count |
send |
The total amount of data sent to clients. | int | B |
nginx_upstream_zone
¶
- 标签
Tag | Description |
---|---|
host |
host name which installed nginx |
nginx_port |
nginx server port |
nginx_server |
nginx server host |
nginx_version |
nginx version |
upstream_server |
upstream server |
upstream_zone |
upstream zone |
- 指标列表
Metric | Description | Type | Unit |
---|---|---|---|
active |
The number of active connections (only for Nginx plus) | int | count |
backup |
Whether it is configured as a backup server (only for Nginx plus) | int | count |
fails |
The number of failed requests (only for Nginx plus) | int | count |
received |
The total number of bytes received from this server. | int | B |
request_count |
The total number of client requests received from server. | int | count |
response_1xx |
The number of responses with status codes 1xx | int | count |
response_2xx |
The number of responses with status codes 2xx | int | count |
response_3xx |
The number of responses with status codes 3xx | int | count |
response_4xx |
The number of responses with status codes 4xx | int | count |
response_5xx |
The number of responses with status codes 5xx | int | count |
send |
The total number of bytes sent to clients. | int | B |
state |
The current state of the server (only for Nginx plus) | int | count |
unavail |
The number of unavailable server (only for Nginx plus) | int | count |
weight |
Weights used when load balancing (only for Nginx plus) | int | count |
nginx_cache_zone
¶
- 标签
Tag | Description |
---|---|
cache_zone |
cache zone |
host |
host name which installed nginx |
nginx_port |
nginx server port |
nginx_server |
nginx server host |
nginx_version |
nginx version |
- 指标列表
Metric | Description | Type | Unit |
---|---|---|---|
max_size |
The limit on the maximum size of the cache specified in the configuration | int | B |
received |
The total number of bytes received from the cache. | int | B |
responses_bypass |
The number of cache bypass | int | count |
responses_expired |
The number of cache expired | int | count |
responses_hit |
The number of cache hit | int | count |
responses_miss |
The number of cache miss | int | count |
responses_revalidated |
The number of cache revalidated | int | count |
responses_scarce |
The number of cache scarce | int | count |
responses_stale |
The number of cache stale | int | count |
responses_updating |
The number of cache updating | int | count |
send |
The total number of bytes sent from the cache. | int | B |
used_size |
The current size of the cache. | int | B |
nginx_location_zone
¶
- 标签
Tag | Description |
---|---|
host |
host name which installed nginx |
location_zone |
cache zone |
nginx_port |
nginx server port |
nginx_server |
nginx server host |
nginx_version |
nginx version |
- 指标列表
Metric | Description | Type | Unit |
---|---|---|---|
code_200 |
The number of 200 code (only for Nginx plus) | int | count |
code_301 |
The number of 301 code (only for Nginx plus) | int | count |
code_404 |
The number of 404 code (only for Nginx plus) | int | count |
code_503 |
The number of 503 code (only for Nginx plus) | int | count |
discarded |
The total number of discarded request (only for Nginx plus) | int | B |
received |
The total number of received bytes (only for Nginx plus) | int | B |
requests |
The number of requests (only for Nginx plus) | int | B |
response |
The number of response (only for Nginx plus) | int | B |
response_1xx |
The number of 1xx response (only for Nginx plus) | int | count |
response_2xx |
The number of 2xx response (only for Nginx plus) | int | count |
response_3xx |
The number of 3xx response (only for Nginx plus) | int | count |
response_4xx |
The number of 4xx response (only for Nginx plus) | int | count |
response_5xx |
The number of 5xx response (only for Nginx plus) | int | count |
sent |
The total number of send bytes (only for Nginx plus) | int | count |
自定义对象¶
web_server
¶
- 标签
Tag | Description |
---|---|
col_co_status |
Current status of collector on Nginx(OK/NotOK ) |
host |
The server host address |
ip |
Connection IP of the Nginx |
name |
Object uniq ID |
reason |
If status not ok, we'll get some reasons about the status |
- 指标列表
Metric | Description | Type | Unit |
---|---|---|---|
display_name |
Displayed name in UI | string | - |
uptime |
Current Nginx uptime | int | s |
version |
Current version of Nginx | string | - |
日志¶
如需采集 NGINX 的日志,可在 nginx.conf 中 将 files
打开,并写入 NGINX 日志文件的绝对路径。比如:
[[inputs.nginx]]
...
[inputs.nginx.log]
files = ["/var/log/nginx/access.log","/var/log/nginx/error.log"]
开启日志采集以后,默认会产生日志来源(source
)为 nginx
的日志。
注意:必须将 DataKit 安装在 NGINX 所在主机才能采集 NGINX 日志。
日志 Pipeline 功能切割字段说明¶
- NGINX 错误日志切割
错误日志文本示例:
2021/04/21 09:24:04 [alert] 7#7: *168 write() to "/var/log/nginx/access.log" failed (28: No space left on device) while logging request, client: 120.204.196.129, server: localhost, request: "GET / HTTP/1.1", host: "47.98.103.73"
切割后的字段列表如下:
字段名 | 字段值 | 说明 |
---|---|---|
status | error | 日志等级(alert 转成了 error) |
client_ip | 120.204.196.129 | client IP 地址 |
server | localhost | server 地址 |
http_method | GET | http 请求方式 |
http_url | / | http 请求 URL |
http_version | 1.1 | http version |
ip_or_host | 47.98.103.73 | 请求方 IP 或者 host |
msg | 7#7: *168 write()...host: \"47.98.103.73 | 日志内容 |
time | 1618968244000000000 | 纳秒时间戳(作为行协议时间) |
错误日志文本示例:
切割后的字段列表如下:
字段名 | 字段值 | 说明 |
---|---|---|
status |
error |
日志等级(emerg 转成了 error ) |
msg |
50102#0: unexpected \";\" in /usr/local/etc/nginx/nginx.conf:23 |
日志内容 |
time |
1619684678000000000 |
纳秒时间戳(作为行协议时间) |
- NGINX 访问日志切割
访问日志文本示例:
127.0.0.1 - - [24/Mar/2021:13:54:19 +0800] "GET /basic_status HTTP/1.1" 200 97 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.72 Safari/537.36"
切割后的字段列表如下:
字段名 | 字段值 | 说明 |
---|---|---|
client_ip |
127.0.0.1 |
日志等级(emerg 转成了 error ) |
status |
ok |
日志等级 |
status_code |
200 |
HTTP Code |
http_method |
GET |
HTTP 请求方式 |
http_url |
/basic_status |
HTTP 请求 URL |
http_version |
1.1 |
HTTP Version |
agent |
Mozilla/5.0... Safari/537.36 |
User-Agent |
browser |
Chrome |
浏览器 |
browserVer |
89.0.4389.72 |
浏览器版本 |
isMobile |
false |
是否手机 |
engine |
AppleWebKit |
引擎 |
os |
Intel Mac OS X 11_1_0 |
系统 |
time |
1619243659000000000 |
纳秒时间戳(作为行协议时间) |
链路¶
前提条件¶
- 安装 nginx (>=1.9.13)
该模块只支持 Linux 操作系统
安装 Nginx OpenTracing 插件¶
Nginx OpenTracing 插件是 OpenTracing 开源的链路追踪插件,基于 C++ 编写,可以工作于 Jaeger
、Zipkin
、LightStep
、Datadog
。
- 下载 与当前 Nginx 版本对应的插件,通过以下命令可以查看当前 Nginx 版本
- 解压
- 配置插件
在 nginx.conf
文件最上面新增以下信息
安装 DDAgent Nginx OpenTracing 插件¶
DDAgent Nginx OpenTracing 插件是基于 Nginx OpenTracing
的一套厂商的实现,不同的 APM 会有各自的编解码实现。
-
下载
dd-opentracing-cpp
,libdd_opentracing.so
或者linux-amd64-libdd_opentracing_plugin.so.gz
-
配置 Nginx
opentracing_load_tracer /etc/nginx/tracer/libdd_opentracing.so /etc/nginx/tracer/dd.json;
opentracing on; # Enable OpenTracing
opentracing_tag http_user_agent $http_user_agent;
opentracing_trace_locations off;
opentracing_propagate_context;
opentracing_operation_name nginx-$host;
opentracing_load_tracer
: 加载 opentracing
的 apm
插件路径
opentracing_propagate_context;
: 表示链路上下文需要进行传递
- 配置 DDTrace
dd.json
用于配置 ddtrace
信息,如:service
、agent_host
等,内容如下:
{
"environment": "test",
"service": "nginx",
"operation_name_override": "nginx.handle",
"agent_host": "localhost",
"agent_port": 9529
}
- nginx 日志配置
将 Trace 信息注入到 Nginx 日志中。可按如下示例编辑:
log_format with_trace_id '$remote_addr - $http_x_forwarded_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for" '
'"$opentracing_context_x_datadog_trace_id" "$opentracing_context_x_datadog_parent_id"';
access_log /var/log/nginx/access-with-trace.log with_trace_id;
说明:
log_format
关键字告诉 Nginx 这里定义了一套日志规则,with_trace_id
是规则名,可以自己修改,注意在下方指定日志路径时要用一样的名字来关联该日志的规则。access_log
中的路径和文件名可以更换。通常情况下原 Nginx 是配有日志规则的,我们可以配置多条规则,并将不同的日志格式输出到不同的文件,即保留原access_log
规则及路径不变,新增一个包含 trace 信息的日志规则,命名为不同的日志文件,供不同的日志工具读取。
- 验证插件是否正常使用
执行以下命令进行校验
$:/etc/nginx# nginx -t
info: DATADOG TRACER CONFIGURATION - {"agent_url":"http://localhost:9529","analytics_enabled":false,"analytics_sample_rate":null,"date":"2023-09-25T14:33:40+0800","enabled":true,"env":"prod","lang":"cpp","lang_version":"201402","operation_name_override":"nginx.handle","report_hostname":false,"sampling_rules":"[]","service":"nginx","version":"v1.3.7"}
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
info: DATADOG TRACER CONFIGURATION
表示已经成功加载了 DDTrace 。
服务链路转发¶
Nginx 产生链路信息后,需要将相关请求头信息转发给后端,可以形成 Nginx 与后端的链路串联操作。
如果出现 Nginx 链路信息与 DDTrace 不匹配,则需要检查这一步是否规范操作。
需要在对应的 server
下的 location
添加以下配置
location ^~ / {
...
proxy_set_header X-datadog-trace-id $opentracing_context_x_datadog_trace_id;
proxy_set_header X-datadog-parent-id $opentracing_context_x_datadog_parent_id;
...
}
加载 Nginx 配置¶
执行以下命令使 Nginx 配置生效:
root@liurui:/etc/nginx/tracer# nginx -s reload
info: DATADOG TRACER CONFIGURATION - {"agent_url":"http://localhost:9529","analytics_enabled":false,"analytics_sample_rate":null,"date":"2023-09-25T11:30:10+0800","enabled":true,"env":"prod","lang":"cpp","lang_version":"201402","operation_name_override":"nginx.handle","report_hostname":false,"sampling_rules":"[]","service":"nginx","version":"v1.3.7"}
root@liurui:/etc/nginx/tracer#
如果出现以下错误:
root@liurui:/etc/nginx/conf.d# nginx -s reload
info: DATADOG TRACER CONFIGURATION - {"agent_url":"http://localhost:9529","analytics_enabled":false,"analytics_sample_rate":null,"date":"2023-09-25T12:28:53+0800","enabled":true,"env":"prod","lang":"cpp","lang_version":"201402","operation_name_override":"nginx.handle","report_hostname":false,"sampling_rules":"[]","service":"nginx","version":"v1.3.7"}
nginx: [warn] could not build optimal proxy_headers_hash, you should increase either proxy_headers_hash_max_size: 512 or proxy_headers_hash_bucket_size: 64; ignoring proxy_headers_hash_bucket_size
则需要在 nginx.conf
的 http
模块追加以下配置: