Skip to content

Building Springcloud Service Observability from 0 to 1 with Guance


Overview of the Business System in This Project:

This case study uses a simulated enterprise internal office system, building its observability from scratch using Guance

The observability setup chosen for this project is based on a standalone JAR package application

Project open-source address: https://gitee.com/y_project/RuoYi-Cloud

Project demo address: http://demo.ruoyi.vip/login

System Introduction:

This system is an open-source backend management system and also a rapid development platform for Java EE enterprise applications, built on classic technology combinations (Spring Boot, Apache Shiro, MyBatis, Thymeleaf, Bootstrap, etc.). It includes numerous built-in modules such as department management, role-user, menu and button authorization, data permissions, system parameters, log management, notifications, and announcements. The main goal is to allow developers to focus on business logic, reducing technical complexity and saving human resources, shortening project cycles, and improving software security and quality. This project can be used for all web applications, such as website management backends, website member centers, CMS, CRM, OA, etc., while supporting deep customization, allowing enterprises to build more powerful systems. All frontend and backend code is encapsulated and easy to use, with a low probability of errors. Mobile client access is also supported.

Project Feature Modules:

  • User Management: Configuration of system users.
  • Department Management: Configuration of organizational structures (companies, departments, teams) with tree structure support and data permission settings.
  • Position Management: Configuration of positions held by system users.
  • Menu Management: Configuration of system menus, operation permissions, and button permission identifiers.
  • Role Management: Allocation of menu permissions and setting role-based data scope permissions.
  • Dictionary Management: Maintenance of frequently used fixed data in the system.
  • Parameter Management: Dynamic configuration of common parameters.
  • Notifications and Announcements: Maintenance of system information releases.
  • Operation Logs: Recording and querying of normal operation logs; recording and querying of system exception logs.
  • Login Logs: Querying login logs including abnormal logins.
  • Online Users: Monitoring the status of active users in the system.
  • Scheduled Tasks: Online operations (addition, modification, deletion) of scheduled tasks including execution result logs.
  • Code Generation: Generation of front-end and back-end code (Java, HTML, XML, SQL) supporting CRUD downloads.
  • System APIs: Automatic generation of API documentation based on business code.
  • Service Monitoring: Monitoring CPU, memory, disk, stack, and other information of the current system.
  • Cache Monitoring: Operations like querying, viewing, and clearing cache.
  • Online Builder: Dragging form elements to generate corresponding HTML code.

Technologies Involved in the Office System:

Technology Version Guance Observability Inputs Required
SpringBoot 2.3.7.RELEASE ddtrace
SpringCloud Hoxton.SR9 ddtrace
SpringCloud Alibaba 2.2.5.RELEASE ddtrace
Nginx 1.16.1 nginx
Mysql 5.7.17 mysql
Redis 3.2.12 redis
Vue 2.6.0 rum
Java OpenJDK 1.8.0_292 Statsd or jolokia
(statsd used in this example)

Office System Architecture:

  • Web Pages: Hosted in Nginx
  • Registration Center: Nacos
  • Gateway: Gateway
  • Service Modules: Auth, System
  • Database: MySQL
  • Cache: Redis

Note: This demo deploys all service modules on a single server, accessing services via different ports.

image

Guance Overview:

Overview: [Guance Official Overview]

Guance is a cloud service platform designed to provide full-stack observability for every complete application in the era of cloud computing and cloud-native systems, fundamentally different from traditional monitoring systems.

Traditional monitoring systems are often domain-specific, similar to many isolated silos within enterprises, such as APM, RUM, logs, NPM, Zabbix, etc., each being a separate and fragmented monitoring system. These silos lead to data isolation within enterprises, causing significant challenges in cross-departmental and cross-platform issue diagnosis, consuming substantial human and material resources.

The concept of observability involves a comprehensive system that provides observability for IT infrastructure supporting business systems, encompassing metrics, logs, and trace components. It achieves unified data collection, storage, query, and presentation, and correlates all observability data (metrics, traces, logs), enabling complete observability of the IT system.

Guance is developed based on this philosophy, aiming to enhance the quality of internal IT services within enterprises and improve end-user experience.

Guance Data Flow:

image

Note: DQL is a specialized QL language developed by DataFlux for correlated queries of ES and InfluxDB data.

Installing Datakit:

  1. Log in to console.guance.com
  2. Create a new workspace
  3. Select Integration — Datakit — Choose the installation command suitable for your environment and copy it
  4. Install Datakit on the server
  5. Execute service datakit status (or systemctl status datakit) to check the Datakit status image image image

After installing Datakit, it defaults to collecting the following content, which can be viewed directly under DataFlux — Infrastructure — Hosts

Select different integration input names to view corresponding monitoring views. Below the monitoring view, you can also view other data such as logs, processes, containers, etc.

Collector Name Description
cpu Collects host CPU usage
disk Collects disk usage
diskio Collects host disk IO
mem Collects host memory usage
swap Collects Swap memory usage
system Collects host OS load
net Collects host network traffic
host_process Collects resident processes (alive for over 10 minutes)
hostobject Collects basic host information (OS, hardware info, etc.)
docker Collects container objects and container logs

image

Enabling Specific Inputs:

Component Involved Input to Enable Directory of Input File Relevant Metrics
Nginx •/usr/local/datakit/conf.d/nginx Request information, logs, request duration, etc.
MySQL •/usr/local/datakit/conf.d/db Connections, QPS, read/write stats, slow queries
Redis •/usr/local/datakit/conf.d/db Connections, CPU usage, memory usage, hit rate, loss rate
JVM •/usr/local/datakit/conf.d/statsd Heap memory, GC count, GC time
APM •/usr/local/datakit/conf.d/ddtrace Response time, error count, error rate
RUM Enabled by default —— UV/PV, LCP, FID, CLS, JS errors

Note:

RUM Metric Explanation Description Target Value
LCP(Largest Contentful Paint) Time taken to load the largest content element in the visible area of the webpage Less than 2.5s
FID(First Input Delay) Delay when the user first interacts with the webpage Less than 100ms
CLS(Cumulative Layout Shift) Whether page layout changes due to dynamic loading, 0 means no change. Less than 0.1

Nginx:

For detailed steps, refer to the document <Nginx Observability Best Practices>
Prerequisite: Check if the http_stub_status_module module is enabled in Nginx, skip step 1 if already installed.

image

  1. Install the http_stub_status_module (Linux): To enable this module, recompile Nginx with the following command: ./configure --with-http_stub_status_module Locate the configure file using: find / | grep configure | grep nginx
$ find / | grep configure | grep nginx

$ cd /usr/local/src/nginx-1.20.0/
$ ./configure --with-http_stub_status_module

image

  1. Add the nginx_status location in nginx.conf
$ cd /etc/nginx   
   ## Adjust nginx path as needed
$ vim nginx.conf
$  server {
     listen 80;   
     server_name localhost;
     ## Port can be customized

      location /nginx_status {
          stub_status on;
          allow 127.0.0.1;
          deny all;
                             }

          }
  1. Modify the nginx inputs in Datakit
$ cd /usr/local/datakit/conf.d/nginx/
$ cp nginx.conf.sample nginx.conf
$ vim nginx.conf
# Modify as follows
[[inputs.nginx]]
        url = "http://localhost/nginx_status"
[inputs.nginx.log]
        files = ["/var/log/nginx/access.log", "/var/log/nginx/error.log"]

# Save the file and restart Datakit    
$ service datakit restart

image

Verify data: curl 127.0.0.1/nginx_status

image

  1. Create an Nginx view in the Guance platform and view data
    Refer to Creating Scenarios and Views Steps: Scenario —> New Scenario —> New Blank Scenario —> System View (Create Nginx) View Example (quickly check Nginx-related metrics and logs to determine Nginx health):

image

image

MySQL:

For detailed steps, refer to the document <MySQL DataKit Integration>

# Log in to MySQL 
$ mysql -uroot -p  
# Enter password: Solution****

# Create a monitoring account
$ CREATE USER 'datakit'@'localhost' IDENTIFIED BY 'Datakit_1234';

# Grant monitoring account privileges
$ GRANT PROCESS, SELECT, REPLICATION CLIENT ON *.* TO 'datakit'@'%' IDENTIFIED BY 'Datakit_1234';

# Refresh privileges
FLUSH PRIVILEGES;
1. Modify MySQL inputs in Datakit
$ cd /usr/local/datakit/conf.d/db/
$ cp mysql.conf.sample mysql.conf
$ vim mysql.conf

# Modify as follows 
## It's recommended to create a read-only MySQL account
[[inputs.mysql]]
     user = "datakit"
     pass = "Datakit_1234"

# Save the file and restart Datakit    
$ service datakit restart

image

2. Create a MySQL view in the Guance platform and view data

Refer to Creating Scenarios and Views Steps: Scenario —> New Scenario —> New Blank Scenario —> System View (Create MySQL) View Example (quickly check MySQL-related metrics and logs to determine MySQL health):

image

image

Redis:

For detailed steps, refer to the document <Redis DataKit Integration>

1. Modify Redis inputs in Datakit
$ cd /usr/local/datakit/conf.d/db/
$ cp redis.conf.sample redis.conf
$ vim redis.conf

# Modify as follows
## It's recommended to create a read-only Redis account
[[inputs.redis]]
     pass = "Solution******"
# Note: Uncomment the pass line before modifying
[inputs.redis.log]
    files = ["/var/log/redis/redis.log"]

# Save the file and restart Datakit    
$ service datakit restart

image

2. Create a Redis view in the Guance platform and view data

Refer to Creating Scenarios and Views Steps: Scenario —> New Scenario —> New Blank Scenario —> System View (Create Redis) View Example (quickly check Redis-related metrics and logs to determine Redis health):

image

image

JVM:

For detailed steps, refer to the document <JVM DataKit Integration>

1. Modify JVM inputs in Datakit

By default, there is no need to modify JVM inputs; only copying the generated conf file is required

$ cd /usr/local/datakit/conf.d/statsd/
$ cp statsd.conf.sample ddtrace-jvm-statsd.conf 
$ vim ddtrace-jvm-statsd.conf

# No modifications needed by default
2. Modify the Java application startup script

Since JVM and APM both rely on ddtrace-agent for data collection, see APM-related content [APM]

3. Create a JVM view in the Guance platform and view data

Refer to Creating Scenarios and Views Steps: Scenario —> New Scenario —> New Blank Scenario —> System View (Create JVM) View Example (quickly check JVM-related metrics and logs to determine JVM health):

image

APM (Application Performance Monitoring):

For detailed steps, refer to the document Distributed Tracing (APM) Best Practices
Guance supports multiple APM tools that adhere to the OpenTracing protocol, including ddtrace, SkyWalking, Zipkin, Jaeger, etc. This example uses ddtrace for APM observability.

1. Modify APM (ddtrace) inputs in Datakit

By default, there is no need to modify JVM inputs; only copying the generated conf file is required

$ cd /usr/local/datakit/conf.d/ddtrace/
$ cp ddtrace.conf.sample ddtrace.conf
$ vim ddtrace.conf

# No modifications needed by default
2. Modify the Java application startup script

APM observability requires adding an agent to the Java application. This agent collects performance data during application startup through bytecode injection, capturing method calls, SQL calls, external system calls, etc., to monitor the quality of the application code.

# Original application startup scripts
$ cd /usr/local/ruoyi/
$ nohup java -Dfile.encoding=utf-8 -jar ruoyi-gateway.jar > logs/gateway.log 2>&1 &
$ nohup java -Dfile.encoding=utf-8 -jar ruoyi-auth.jar > logs/auth.log 2>&1 &
$ nohup java -Dfile.encoding=utf-8 -jar ruoyi-modules-system.jar > logs/system.log 2>&1 &

Kill existing application startup processes and add ddtrace parameters before restarting the application. See the following image for details:

image

# Application startup script with ddtrace-agent added

$ cd /usr/local/ruoyi/

$ nohup java -Dfile.encoding=utf-8 -javaagent:dd-java-agent-0.80.0.jar -XX:FlightRecorderOptions=stackdepth=256 -Ddd.logs.injection=true -Ddd.service.name=ruoyi-gateway -Ddd.service.mapping=redis:redis_ruoyi -Ddd.agent.port=9529 -Ddd.jmxfetch.enabled=true -Ddd.jmxfetch.check-period=1000 -Ddd.jmxfetch.statsd.port=8125 -Ddd.version=1.0 -jar ruoyi-gateway.jar > logs/gateway.log 2>&1 &

$ nohup java -Dfile.encoding=utf-8 -javaagent:dd-java-agent-0.80.0.jar -XX:FlightRecorderOptions=stackdepth=256 -Ddd.logs.injection=true -Ddd.service.name=ruoyi-auth -Ddd.service.mapping=redis:redis_ruoyi -Ddd.env=staging -Ddd.agent.port=9529 -Ddd.jmxfetch.enabled=true -Ddd.jmxfetch.check-period=1000 -Ddd.jmxfetch.statsd.port=8125 -Ddd.version=1.0 -jar ruoyi-auth.jar > logs/auth.log 2>&1 & 

$ nohup java -Dfile.encoding=utf-8 -javaagent:dd-java-agent-0.80.0.jar -XX:FlightRecorderOptions=stackdepth=256 -Ddd.logs.injection=true -Ddd.service.name=ruoyi-modules-system -Ddd.service.mapping=redis:redis_ruoyi,mysql:mysql_ruoyi -Ddd.env=dev -Ddd.agent.port=9529 -Ddd.jmxfetch.enabled=true -Ddd.jmxfetch.check-period=1000 -Ddd.jmxfetch.statsd.port=8125 -Ddd.version=1.0 -jar ruoyi-modules-system.jar > logs/system.log 2>&1 & 

If APM data is not visible in the Guance platform, check the Datakit logs cat /var/log/datakit/gin.log
Normal logs:

image

Error logs:

image

Modify /usr/local/datakit/con.d/ddtrace/ddtrace.conf to match the following image Ensure the path in ddtrace.conf matches the path in datakit/gin.log

image

Explanation of ddtrace environment variables (start parameters):

  • Ddd.env: Custom environment type, optional.
  • Ddd.tags: Custom application tags, optional.
  • Ddd.service.name: Custom application name, required.
  • Ddd.agent.port: Data upload port (default 9529), required.
  • Ddd.version: Application version, optional.
  • Ddd.trace.sample.rate: Set sampling rate (default full sample), optional, e.g., 0.6 for 60% sampling.
  • Ddd.service.mapping: Add aliases for Redis, MySQL, etc., called by the current application, to distinguish them from those called by other applications, optional. For example, projects A and B call MySQL-a and MySQL-b respectively. Without mapping configuration, Guance will show both projects calling a database named MySQL. With mapping configured as MySQL-a and MySQL-b, Guance will show project A calling MySQL-a and project B calling MySQL-b.
  • Ddd.agent.host: Data transmission target IP, default is localhost, optional.
3. View APM Data in the Guance Platform

APM (Application Performance Monitoring) is a default built-in module in Guance, requiring no scenario or view creation for viewing.
Path: Guance platform — Application Performance Monitoring
View Example: (quickly check application calls, topology, anomaly data, and other APM-related data)

image

image

image

RUM (Real User Monitoring):

For detailed steps, refer to the document [User Access (RUM) Observability Best Practices]

1. Log in to the Dataflux platform
2. Choose User Access Monitoring —> New Application —> Select Web Type —> Synchronous Loading

image

3. Integrate Guance RUM observability JS file into the frontend index.html page
$ cd /usr/local/ruoyi/dist/

// Remember to back up
$ cp index.html index.html.bkd

// Add df-js to index.html before </head>, then save the file, example:

$ vim index.html
<script src="https://static.guance.com/browser-sdk/v2/dataflux-rum.js" type="text/javascript"></script>
<script>
  window.DATAFLUX_RUM &&
    window.DATAFLUX_RUM.init({
      applicationId: 'xxxxxxxxxxxxxxxxxxxxxxxxxx',
      datakitOrigin: 'xxx.xxx.xxx.xxx:9529',
      env: 'test',
      version: '1.0.0',
      trackInteractions: true,
      allowedTracingOrigins: ["xxx.xxx.xxx.xxx"]
    })
</script></head> 

# Replace xxx with actual values as needed, refer to the following for detailed changes:

datakitOrigin: Data transmission address, in production environments, if configured as a domain name, requests can be forwarded to any server running datakit-9529. If frontend traffic is high, consider adding a load balancer between the domain name and datakit servers. The load balancer must have port 9529 open and forward requests to multiple datakit-9529 servers. Multiple datakit servers handle RUM data without session interruption.

trackInteractions: User behavior collection configuration, enabling tracking of user actions on the frontend.

allowedTracingOrigins: Configuration for connecting frontend (RUM) and backend (APM), fill in the domain names or IPs of backend servers interacting with the frontend.

Notes:

  • datakitOrigin: Data transmission address. In production, if configured as a domain name, requests can be forwarded to any server running datakit-9529. If frontend traffic is high, consider adding a load balancer between the domain name and datakit servers. The load balancer must have port 9529 open and forward requests to multiple datakit-9529 servers. Multiple datakit servers handle RUM data without session interruption.
  • allowedTracingOrigins: Connects frontend (RUM) and backend (APM). Fill in the domain names (production) or IPs (testing) of backend servers interacting with the frontend. Use Case: Slow frontend visits caused by backend code issues can be traced via RUM data to APM data for root cause analysis.
  • env: Required, environment of the application, e.g., test or product.
  • version: Required, application version number.
  • trackInteractions: Tracks user interactions, e.g., button clicks, form submissions.

image

4. Save, Verify, and Publish the Page

Open a browser to visit the target page and use F12 Developer Tools to check if there are related RUM requests with status code 200.

image

Note!!: If F12 Developer Tools shows data not being reported and port refused, verify the port connectivity with telnet IP:9529. If blocked, modify /usr/local/datakit/conf.d/datakit.conf to change http_listen from localhost to 0.0.0.0.

image

5. View RUM Data in User Access Monitoring

image

6. Demonstration of RUM and APM Data Correlation

Configuration method: [Java Example]

Use Case: Frontend and backend correlation, binding frontend request data with backend method execution performance data one-to-one, facilitating cross-team and cross-departmental issue localization. For instance, slow frontend visits due to backend service anomalies can be quickly diagnosed. Example:

image

image

image

image

Security Checker:

Security Checker Overview: [Guance Official Overview]

Note: Currently only supports Linux Detailed steps refer to the document [Security Checker Installation and Configuration]

1. Install Security Checker
## Install
$ bash -c "$(curl https://static.guance.com/security-checker/install.sh)"
## Or execute   sudo datakit --install scheck
## Update
$ bash -c "$(curl https://static.guance.com/security-checker/install.sh) --upgrade"
## Start/Stop Commands
$ systemctl start/stop/restart/status scheck
## Or
$ service scheck start/stop/restart/status
## Installation directory  /usr/local/scheck
2. Connect Security Checker to Datakit

Send Security Checker data to Datakit and then forward it to the Dataflux platform.

$ cd /usr/local/scheck/
$ vim scheck.conf


    # ##(required) directory contains script
    rule_dir='/usr/local/scheck/rules.d'

    # ##(required) output of the check result, support local file or remote http server
    # ##localfile: file:///your/file/path
    # ##remote:  http(s)://your.url
    output='http://127.0.0.1:9529/v1/write/security'


    # ##(optional)global cron, default is every 10 seconds
    #cron='*/10 * * * *'

    log='/usr/local/scheck/log'
    log_level='info'
    #disable_log=false
3. View Security Checker Data

image

Logs:

Detailed steps refer to the document Log Collection

1. Standard Log Collection (Nginx, MySQL, Redis, etc.)

Enable various built-in inputs in DataKit to collect logs directly, such as Nginx, Redis, Containers, ES, etc.

Example: Nginx

$ cd /usr/local/datakit/conf.d/nginx/
$ cp nginx.conf.sample nginx.conf
$ vim nginx.conf

## Modify log paths to correct Nginx paths
$ [inputs.nginx.log]
$     files = ["/usr/local/nginx/logs/access.log","/usr/local/nginx/logs/error.log"]
$     pipeline = "nginx.p"

## Pipeline refers to grok statements for text log parsing. DataKit has built-in pipelines for Nginx, MySQL, etc., located in /usr/local/datakit/pipeline/. There's no need to modify the pipeline path; DataKit automatically reads it.

image

View Display:

image

image

2. Custom Log Collection (Application Logs, Business Logs, etc.)

Example: Application Logs

Pipeline (log grok parsing) [Guance Official Documentation]

$ cd /usr/local/datakit/conf.d/log/
$ cp logging.conf.sample logging.conf
$ vim logging.conf

## Modify log paths to correct application log paths

## source and service are required fields and can be set to the application name to differentiate log names

$  [inputs.nginx.log]
$    logfiles = [
      "/usr/local/ruoyi/logs/ruoyi-system/error.log",
      "/usr/local/ruoyi/logs/ruoyi-system/info.log",]
$    source = "ruoyi-system"
$    service = "ruoyi-system"
#    pipeline = "ruoyi-system.p"

## Pipeline refers to grok statements for text log parsing. If this configuration is not uncommented, DF platform will display raw log content. If filled, it will parse the log using the specified .p file, which needs to be manually written.

image

$ cd /usr/local/datakit/pipeline/
$ vim ruoyi_system.p

## Example:
# Log format 
#2021-06-25 14:27:51.952 [http-nio-9201-exec-7] INFO  c.r.s.c.SysUserController - [list,70] ruoyi-08-system 5430221015886118174 6503455222153372731 - Query user

## Example grok, copy the following content to ruoyi_system.p

grok(_, "%{TIMESTAMP_ISO8601:time} %{NOTSPACE:thread_name} %{LOGLEVEL:level} \\s+%{NOTSPACE:class_name} - \\[%{NOTSPACE:method_name},%{NUMBER:line}\\] %{DATA:service} %{DATA:trace_id} %{DATA:span_id} - %{GREEDYDATA:msg}")

default_time(time)

image

View Display:

image

image

Creating Nginx Log Anomaly Detection:

  1. Open Guance platform -> Anomaly Detection Library -> New Detection Library -> Custom Monitoring image

  2. Click on the newly created detection library name -> New Detection Rule -> New Log Detection image

  3. Fill in specific detection rule content and save
    Rule Name: Nginx Log ERROR Count Exceeds Threshold Detection
    Detection Metric: As shown in the figure
    Trigger Condition: Result>=5
    Event Name: Nginx Log ERROR Count Exceeds Threshold Alert
    Event Content:

    Level: {status}
    Host: {host}
    Content: Too many log errors, error count is {{ Result }} Recommendation: Too many log errors indicate potential application issues; recommend checking the application health.
    Detection Frequency: 1 minute

image

Verifying Anomaly Detection Mechanism:

  1. Query ruoyi-gateway-related processes on the server and kill them
  2. $ ps -ef | grep ruoyi-gateway
    $ kill -9 xxxxx
    

image

  1. Visit the ruoyi website (refresh multiple times, at least 5 times)

image

  1. Check event-related content on the Guance platform

image

image

  1. Check Nginx log-related content and relevant views

image

image

Troubleshooting During Input Activation:

  1. Check input error messages
    Guance uploads input status information to the Guance platform at regular intervals, which can be viewed under Infrastructure — Specific Host.
    Example: Apache service down, input shows error

image image image

  1. Check data reporting information

Method 1:
Enter curl 127.0.0.1:9529/monitor in the browser or terminal to view image
Method 2:
Enter curl 127.0.0.1:9529/stats in the browser or terminal to view image

  1. Check Datakit logs

Datakit log directory: cd /var/log/datakit image

Creating Scenarios and Views:

Using System View Templates (Using Nginx as an Example)

  1. Scenario —> New Scenario

image

  1. New Blank Scenario

image

  1. Input Scenario Name —> Confirm

image

  1. System View —> Nginx View (Create)

image

  1. View Nginx View

image image

  1. Other

Other view creation methods are similar. For custom view content and layout requirements, you can create blank views and build them yourself.

Summary:

Thus, we have achieved comprehensive observability for the demo office system's chain, metrics, logs, and infrastructure.

Guance is user-friendly and convenient to manage, providing a unified view for all metrics, traces, and logs, all associated through the same tag (host). This makes it easy to achieve cascading on the platform, thus realizing overall IT system observability.

Finally, combining anomaly detection can achieve integrated system management, enhancing operational and development efficiency and IT decision-making capabilities!

The product is continuously improving, with more features and better usability, and a more aesthetically pleasing UI.

Guance aims to be the champion of observability!

Feedback

Is this page helpful? ×