Best Practices for TAGs in Guance¶
This article is intended to serve as a starting point, encouraging readers to expand on these ideas and create their own unique uses for tags.
Preface¶
Opentelemetry Protocol, defined by the CNCF (Cloud Native Computing Foundation), represents the latest generation of observability specifications (still in incubation). This specification defines the three pillars of observability: metrics, trace, log. However, merely collecting data from these three pillars without correlation does not distinguish modern observability from traditional monitoring tools (APM, logs, Zabbix, etc.). Is it just a collection of monitoring tools? Therefore, an important concept arises: TAG (tag). For example, a traceID that connects front-end and back-end can be considered a tag, as can a host that initially correlates metrics, traces, and logs. Other examples include project, environment, version number, all of which are individual tags!
In short, using TAGs can achieve data correlation and enable more customized observability, making it crucial. In Guance's current architecture, all observable items support tag settings, with theoretically no upper limit on the number of tags.
Example: A common real-life scenario is job hunting or HR recruitment, where specific requirements such as programming skills, computer knowledge, a bachelor's degree, and years of experience are like tags. Only candidates meeting these tags can qualify for the position. Similarly, in IT systems, if a server runs a specific application, database, and NGINX in a certain environment with a responsible person, having enough tags allows for quick identification of problematic servers, affected services, and responsible parties, thereby improving problem resolution efficiency.
This article will explore the extensibility and flexibility of tags through four examples using Guance.
Experiment One: Grouping Servers¶
Background¶
Companies often have multiple project teams or business units. Each team or unit may use its own infrastructure for business development. If observability is implemented using Guance from infrastructure to applications, how can resources be distinguished beyond workspace separation?
Of course, there is a way. Guance considered this scenario during design. The default DataKit main configuration file includes a global_tag
label, which sets tags at the infrastructure level. All components on this infrastructure, such as applications and databases, inherit this tag by default.
1 Modify datakit-inputs to Configure global_tag¶
$ vim /usr/local/datakit/conf.d/datakit.conf
# Add tags in global_tags, additional tags can be added beyond the default three
[global_tags]
cluster = ""
project = "solution"
site = ""
Similarly, all related hosts' DataKit can be configured with this tag.
2 Guance - View Server Groups¶
Experiment Two: Modify Hostname Recognized by DataKit¶
Background¶
DataKit defaults to collecting the hostname at the host level and uses it as a global tag to correlate all metrics, traces, logs, and objects. However, in many enterprise environments, hostnames are random strings without practical meaning. Changing the hostname might affect connections to applications or databases, so companies are hesitant to modify them. To avoid risks, DataKit's built-in ENV_HOSTNAME
can handle this situation.
Warning
Note: After applying this method, data from the new hostname will be uploaded anew, and data from the old hostname will no longer be updated.
Recommendation: If you need to change the hostname, it is best to do so during the initial installation of DataKit.
1 Modify datakit-inputs to Configure [environments]¶
$ vim /usr/local/datakit/conf.d/datakit.conf
# Modify ENV_HOSTNAME in [environments] to a recognizable hostname
[environments]
ENV_HOSTNAME = "118.178.57.79"
2 Guance - Compare Data Before and After Changes¶
Experiment Three: Nginx Log Statistics Displayed Per Service¶
Background¶
Internal company NGINX servers typically handle domain forwarding or service forwarding. They may forward frontend requests to multiple backend subdomains or different ports, or directly serve multiple domains. Unified NGINX monitoring cannot meet these needs. How does Guance address this issue?
Scenario: NGINX exposes ports 18889 and 80, forwarding to internal server 118.178.57.79 on ports 8999 and 18999 respectively.
Requirement: Statistically analyze data for NGINX ports 18889 and 80, such as PV, UV, and error counts.
Prerequisite: Access logs for NGINX ports 80 and 18889 are configured in separate directories (or different log file names).
Port 80 Log Directory | /var/log/nginx/80/ |
---|---|
Port 18889 Log Directory | /var/log/nginx/18999/ |
1 Configure Nginx Performance Metrics Monitoring¶
Refer to the integration documentation <Nginx> for detailed configuration.
- Enable the
nginx.conf
performance metrics module
Check if the http_stub_status_module
is enabled in nginx.
(This example already has it enabled.)
- Add
nginx_status
location innginx.conf
$ cd /etc/nginx
// Adjust nginx path as needed
$ vim nginx.conf
server {
listen 80;
server_name localhost;
// Port can be customized
location /nginx_status {
stub_status on;
allow 127.0.0.1;
deny all;
}
}
-
Execute
nginx -s reload
to reload nginx -
Enable
nginx.inputs
in DataKit
- Modify as follows:
- Save the
nginx.conf
file and restart DataKit
2 Configure Log Monitoring for Services on Ports 80 and 18889¶
$ cd /usr/local/datakit/conf.d/log/
$ cp logging.conf.sample nginx80.conf
$ vim nginx80.conf
## Modify log paths to correct application log paths
## source, service, pipeline are mandatory fields and can directly use the application name to distinguish different log names
## Add tag domainname
## Modify as follows:
[[inputs.logging]]
logfiles = ["/var/log/nginx/80/access.log","/var/log/nginx/80/error.log" ]
source = "nginx"
service = "nginx"
pipeline = "nginx.p"
[inputs.logging.tags]
domainname = "118.178.226.149:80"
$ cd /usr/local/datakit/conf.d/log/
$ cp logging.conf.sample nginx18889.conf
$ vim nginx18889.conf
## Modify log paths to correct application log paths
## source, service, pipeline are mandatory fields and can directly use the application name to distinguish different log names
## Add tag domainname
## Modify as follows:
[[inputs.logging]]
logfiles = ["/var/log/nginx/18889/access.log","/var/log/nginx/18889/error.log" ]
source = "nginx"
service = "nginx"
pipeline = "nginx.p"
[inputs.logging.tags]
domainname = "118.178.226.149:18889"
3 Configure Custom Views (Using Tags to Distinguish Domains)¶
Steps: Log in to Guance - 「Scene」 - 「Create Scene」 - 「Create Blank Scene」 - 「System View」 (Create NGINX)
Key Point: Modify NGINX view-related configurations in the system template
- Enter view editing mode, click 「Modify View Variables」 - 「Add View Variable」
Explanation: Inherit the host from NGINX metrics, query different domainnames in L (logs) from NGINX logs.
- Modify specific view parameters
4 Guance - Display Data Per Service¶
Similarly, different tags can be used to distinguish different projects, different owners, different business modules, different environments, etc. The specific capabilities of tags depend on your imagination.
Experiment Four: Confirm Specific Service Owner via Tag for Alert Notifications¶
Background¶
As businesses grow, microservices and containers are widely used, increasing the number of service components and corresponding development and operations personnel. With finer divisions of labor, the best alert practice is to directly notify the responsible person when a business or IT system fails, thus improving alert closure efficiency. This can be achieved by sending alerts only to relevant individuals or assigning tickets in Jira. How does Guance handle this? In Guance, simply add a tag in specific observable inputs (unlimited tags supported), for example, adding a custom tag owner = "xxx"
in nginx-inputs
, then set owner as a variable in anomaly detection. Anomaly detection will automatically recognize this field and send notifications to DingTalk or WeCom groups, as shown below:
For example, add the following in the custom NGINX logs: