Alibaba Cloud SAE¶
Collect metrics, logs, and trace information from Alibaba Cloud SAE (Serverless App Engine).
Configuration¶
Applications deployed on SAE can integrate trace, metric, and log data through the following process:
- Applications report trace data to DataKit via APM integration.
- Application log data is collected via KafkaMQ and then consumed by DataKit.
- Container metric data from Alibaba Cloud is collected using monitoring APIs through the Function platform (DataFlux.f(x)) and reported to Guance.
- DataKit processes the collected data uniformly before reporting it to Guance.
Note: Deploying DataKit on SAE helps save bandwidth.
Create a DataKit Application¶
Create a DataKit application on SAE:
- Enter SAE, click Applications > Create Application.
- Fill in the application details:
- Application name
- Select a namespace; create one if none exists
- Select a VPC; create one if none exists
- Select a security group: The vSwitch must match the NAT switch
- Adjust instance count as needed
- CPU 1 core, memory 1GB
- Click Next after completing the above
- Add image: pubrepo.guance.com/datakit/datakit:1.31.0
- Add environment variables with the following configuration:
{
"ENV_DATAWAY": "https://openway.guance.com?token=tkn_xxx",
"KAFKAMQ": "# {\"version\": \"1.22.7-1510\", \"desc\": \"do NOT edit this line\"}\n\n[[inputs.kafkamq]]\n # addrs = [\"alikafka-serverless-cn-8ex3y7ciq02-1000.alikafka.aliyuncs.com:9093\",\"alikafka-serverless-cn-8ex3y7ciq02-2000.alikafka.aliyuncs.com:9093\",\"alikafka-serverless-cn-8ex3y7ciq02-3000.alikafka.aliyuncs.com:9093\"]\n addrs = [\"alikafka-serverless-cn-8ex3y7ciq02-1000-vpc.alikafka.aliyuncs.com:9092\",\"alikafka-serverless-cn-8ex3y7ciq02-2000-vpc.alikafka.aliyuncs.com:9092\",\"alikafka-serverless-cn-8ex3y7ciq02-3000-vpc.alikafka.aliyuncs.com:9092\"]\n # your kafka version:0.8.2 ~ 3.2.0\n kafka_version = \"3.3.1\"\n group_id = \"datakit-group\"\n # consumer group partition assignment strategy (range, roundrobin, sticky)\n assignor = \"roundrobin\"\n\n ## kafka tls config\n tls_enable = false\n\n ## -1:Offset Newest, -2:Offset Oldest\n offsets=-1\n\n\n ## user custom message with PL script.\n [inputs.kafkamq.custom]\n #spilt_json_body = true\n ## spilt_topic_map determines whether to enable log splitting for specific topic based on the values in the spilt_topic_map[topic].\n #[inputs.kafkamq.custom.spilt_topic_map]\n # \"log_topic\"=true\n # \"log01\"=false\n [inputs.kafkamq.custom.log_topic_map]\n \"springboot-server_log\"=\"springboot_log.p\"\n #[inputs.kafkamq.custom.metric_topic_map]\n # \"metric_topic\"=\"metric.p\"\n # \"metric01\"=\"rum_apm.p\"\n #[inputs.kafkamq.custom.rum_topic_map]\n # \"rum_topic\"=\"rum_01.p\"\n # \"rum_02\"=\"rum_02.p\"\n",
"SPRINGBOOT_LOG_P": "abc = load_json(_)\n\nadd_key(file, abc[\"file\"])\n\nadd_key(message, abc[\"message\"])\nadd_key(host, abc[\"host\"])\nmsg = abc[\"message\"]\ngrok(msg, \"%{TIMESTAMP_ISO8601:time} %{NOTSPACE:thread_name} %{LOGLEVEL:status}%{SPACE}%{NOTSPACE:class_name} - \\\\[%{NOTSPACE:method_name},%{NUMBER:line}\\\\] %{DATA:service_name} %{DATA:trace_id} %{DATA:span_id} - %{GREEDYDATA:msg}\")\n\nadd_key(topic, abc[\"topic\"])\n\ndefault_time(time,\"Asia/Shanghai\")",
"ENV_GLOBAL_HOST_TAGS": "host=__datakit_hostname,host_ip=__datakit_ip",
"ENV_HTTP_LISTEN": "0.0.0.0:9529",
"ENV_DEFAULT_ENABLED_INPUTS": "dk,cpu,disk,diskio,mem,swap,system,hostobject,net,host_processes,container,ddtrace,statsd,profile"
}
Configuration Description:
- ENV_DATAWAY: Required, gateway address for reporting data to Guance
- KAFKAMQ: Optional, kafkamq collector configuration. Refer to Kafka Collector Configuration File Introduction for details.
- SPRINGBOOT_LOG_P: Optional, used together with KAFKAMQ, pipeline script for logs, used to parse log data from Kafka
- ENV_GLOBAL_HOST_TAGS: Required, global tags for collectors
- ENV_HTTP_LISTEN: Required, DataKit port, IP must be 0.0.0.0; otherwise other pods cannot access it
- ENV_DEFAULT_ENABLED_INPUTS: Required, list of default enabled collectors
For more details, refer to Best Practices for Observability of Alibaba Cloud SAE Application Engine
Tracing¶
To deploy applications on Alibaba Cloud SAE, you need to integrate APM into the corresponding containers:
- You can upload the APM package file to OSS, or integrate the APM package into your application's Dockerfile for building.
- Start loading; the steps are the same as integrating APM in a regular environment.
For more details, refer to Best Practices for Observability of Alibaba Cloud SAE Application Engine
Metrics¶
Install Func¶
It is recommended to activate Guance Integration > Extensions > Managed Func
If deploying Func manually, refer to Deploying Func Manually
Activation Script¶
Tip: Prepare an Alibaba Cloud AK meeting requirements in advance (for simplicity, directly grant full read-only permissions
ReadOnlyAccess
)
Managed Activation Script¶
- Log in to the Guance console.
- Click the Integration menu, select Cloud Account Management.
- Click Add Cloud Account, choose Alibaba Cloud, fill in the required information shown on the interface. If cloud account information has been configured previously, skip this step.
- Click Test, click Save if the test succeeds. If the test fails, check whether the related configuration information is correct and retest.
- In the Cloud Account Management list, you can see the added cloud accounts. Click the relevant cloud account to enter its details page.
- On the cloud account details page, click the Integration button. Under the
Not Installed
list, findAlibaba Cloud SAE
, click the Install button, and install it through the pop-up installation interface.
Manual Activation Script¶
-
Log in to the Func console, click Script Market, enter the official script market, and search for:
guance_aliyun_sae_app
,guance_aliyun_sae_instance
. -
After clicking Install, input the corresponding parameters: Alibaba Cloud AK ID, AK Secret, and account name.
-
Click Deploy Startup Script, the system will automatically create a
Startup
script set and configure the corresponding startup scripts automatically. -
After activation, you can view the corresponding auto-trigger configurations under Management / Auto Trigger Configuration. Click Execute to run once immediately without waiting for the scheduled time. Wait a moment, you can check the execution task records and corresponding logs.
Verification¶
- Confirm whether the corresponding tasks have auto-trigger configurations in Management / Auto Trigger Configuration, and check the task records and logs for any anomalies.
- In Guance, go to Infrastructure / Custom to check if asset information exists.
- In Guance, go to Metrics to check if monitoring data exists.
Metric Overview¶
After configuring basic monitoring metrics for Alibaba Cloud-SAE, the default measurement set is as follows. More metrics can be collected through configuration: SAE Basic Monitoring Metrics Details
Metric | Unit | Dimensions | Description |
---|---|---|---|
cpu_Average |
% | userId, appId | Application CPU |
diskIopsRead_Average |
Count/Second | userId, appId | Application Disk IOPS Read |
diskIopsWrite_Average |
Count/Second | userId, appId | Application Disk IOPS Write |
diskRead_Average |
Byte/Second | userId, appId | Application Disk IO Throughput Read |
diskTotal_Average |
Kilobyte | userId, appId | Application Total Disk |
diskUsed_Average |
Kilobyte | userId, appId | Application Disk Usage |
diskWrite_Average |
Byte/Second | userId, appId | Application Disk IO Throughput Write |
instanceId_memoryUsed_Average |
MB | userId, appId, instanceId | Instance Memory Used |
instance_cpu_Average |
% | userId, appId, instanceId | Instance CPU |
instance_diskIopsRead_Average |
Count/Second | userId, appId, instanceId | Instance Disk IOPS Read |
instance_diskIopsWrite_Average |
Count/Second | userId, appId, instanceId | Instance Disk IOPS Write |
instance_diskRead_Average |
Byte/Second | userId, appId, instanceId | Instance Disk IO Throughput Read |
instance_diskTotal_Average |
Kilobyte | userId, appId, instanceId | Instance Total Disk |
instance_diskUsed_Average |
Kilobyte | userId, appId, instanceId | Instance Disk Usage |
instance_diskWrite_Average |
Byte/Second | userId, appId, instanceId | Instance Disk IO Throughput Write |
instance_load_Average |
min | userId, appId, instanceId | Instance Load Average |
instance_memoryTotal_Average |
MB | userId, appId, instanceId | Instance Total Memory |
instance_memoryUsed_Average |
MB | userId, appId, instanceId | Instance Memory Used |
instance_netRecv_Average |
Byte/Second | userId, appId, instanceId | Instance Bytes Received |
instance_netRecvBytes_Average |
Byte | userId, appId, instanceId | Instance Total Bytes Received |
instance_netRecvDrop_Average |
Count/Second | userId, appId, instanceId | Instance Dropped Packets Received |
instance_netRecvError_Average |
Count/Second | userId, appId, instanceId | Instance Error Packets Received |
instance_netRecvPacket_Average |
Count/Second | userId, appId, instanceId | Instance Packets Received |
instance_netTran_Average |
Byte/Second | userId, appId, instanceId | Instance Bytes Sent |
instance_netTranBytes_Average |
Byte | userId, appId, instanceId | Instance Total Bytes Sent |
instance_netTranDrop_Average |
Count/Second | userId, appId, instanceId | Instance Dropped Packets Sent |
instance_netTranError_Average |
Count/Second | userId, appId, instanceId | Instance Error Packets Sent |
instance_netTranPacket_Average |
Count/Second | userId, appId, instanceId | Instance Packets Sent |
instance_tcpActiveConn_Average |
Count | userId, appId, instanceId | Instance Active TCP Connections |
instance_tcpInactiveConn_Average |
Count | userId, appId, instanceId | Instance Inactive TCP Connections |
instance_tcpTotalConn_Average |
Count | userId, appId, instanceId | Instance Total TCP Connections |
load_Average |
min | userId, appId | Application Load Average |
memoryTotal_Average |
MB | userId, appId | Application Total Memory |
memoryUsed_Average |
MB | userId, appId | Application Memory Used |
netRecv_Average |
Byte/Second | userId, appId | Application Bytes Received |
netRecvBytes_Average |
Byte | userId, appId | Application Total Bytes Received |
netRecvDrop_Average |
Count/Second | userId, appId | Application Dropped Packets Received |
netRecvError_Average |
Count/Second | userId, appId | Application Error Packets Received |
netRecvPacket_Average |
Count/Second | userId, appId | Application Packets Received |
netTran_Average |
Byte/Second | userId, appId | Application Bytes Sent |
netTranBytes_Average |
Byte | userId, appId | Application Total Bytes Sent |
netTranDrop_Average |
Count/Second | userId, appId | Application Dropped Packets Sent |
netTranError_Average |
Count/Second | userId, appId | Application Error Packets Sent |
netTranPacket_Average |
Count/Second | userId, appId | Application Packets Sent |
tcpActiveConn_Average |
Count | userId, appId | Application Active TCP Connections |
tcpInactiveConn_Average |
Count | userId, appId | Application Inactive TCP Connections |
tcpTotalConn_Average |
Count | userId, appId | Application Total TCP Connections |
Logs¶
Alibaba Cloud SAE provides a Kafka method to output logs to Guance. The process is as follows:
- Enable Kafka log reporting for SAE applications.
- Enable KafkaMQ log collection in DataKit to collect application Kafka log topics.
For more details, refer to Best Practices for Observability of Alibaba Cloud SAE Application Engine
Example: