Alibaba Cloud SAE¶
Collecting metrics, logs, and trace information from Alibaba Cloud SAE (Serverless App Engine).
Configuration¶
Applications deployed on SAE can be integrated with trace, metric, and log data through the following process:
- Applications report Trace data to DataKit via APM.
- Application log data is collected through KafkaMQ and then consumed by DataKit.
- Application container metric data is collected using Alibaba Cloud's monitoring API and reported to Guance through the Function platform (DataFlux Func).
- After DataKit collects the corresponding data, it processes and reports it uniformly to Guance.
Note: Deploying DataKit on SAE can save bandwidth.
Creating a DataKit Application¶
Create a DataKit application on SAE:
- Enter SAE, click on Application List - Create Application.
- Fill in application information:
- Application name
- Select namespace; if none exists, create one.
- Select VPC; if none exists, create one.
- Select security group: vswitch must match NAT switch.
- Adjust instance count as needed.
- CPU 1 core, memory 1GB.
- After completion, click Next.
- Add image: pubrepo.guance.com/datakit/datakit:1.31.0
- Add environment variables with the following configuration:
{
"ENV_DATAWAY": "https://openway.guance.com?token=tkn_xxx",
"KAFKAMQ": "# {\"version\": \"1.22.7-1510\", \"desc\": \"do NOT edit this line\"}\n\n[[inputs.kafkamq]]\n # addrs = [\"alikafka-serverless-cn-8ex3y7ciq02-1000.alikafka.aliyuncs.com:9093\",\"alikafka-serverless-cn-8ex3y7ciq02-2000.alikafka.aliyuncs.com:9093\",\"alikafka-serverless-cn-8ex3y7ciq02-3000.alikafka.aliyuncs.com:9093\"]\n addrs = [\"alikafka-serverless-cn-8ex3y7ciq02-1000-vpc.alikafka.aliyuncs.com:9092\",\"alikafka-serverless-cn-8ex3y7ciq02-2000-vpc.alikafka.aliyuncs.com:9092\",\"alikafka-serverless-cn-8ex3y7ciq02-3000-vpc.alikafka.aliyuncs.com:9092\"]\n # your kafka version:0.8.2 ~ 3.2.0\n kafka_version = \"3.3.1\"\n group_id = \"datakit-group\"\n # consumer group partition assignment strategy (range, roundrobin, sticky)\n assignor = \"roundrobin\"\n\n ## kafka tls config\n tls_enable = false\n\n ## -1:Offset Newest, -2:Offset Oldest\n offsets=-1\n\n\n ## user custom message with PL script.\n [inputs.kafkamq.custom]\n #spilt_json_body = true\n ## spilt_topic_map determines whether to enable log splitting for specific topic based on the values in the spilt_topic_map[topic].\n #[inputs.kafkamq.custom.spilt_topic_map]\n # \"log_topic\"=true\n # \"log01\"=false\n [inputs.kafkamq.custom.log_topic_map]\n \"springboot-server_log\"=\"springboot_log.p\"\n #[inputs.kafkamq.custom.metric_topic_map]\n # \"metric_topic\"=\"metric.p\"\n # \"metric01\"=\"rum_apm.p\"\n #[inputs.kafkamq.custom.rum_topic_map]\n # \"rum_topic\"=\"rum_01.p\"\n # \"rum_02\"=\"rum_02.p\"\n",
"SPRINGBOOT_LOG_P": "abc = load_json(_)\n\nadd_key(file, abc[\"file\"])\n\nadd_key(message, abc[\"message\"])\nadd_key(host, abc[\"host\"])\nmsg = abc[\"message\"]\ngrok(msg, \"%{TIMESTAMP_ISO8601:time} %{NOTSPACE:thread_name} %{LOGLEVEL:status}%{SPACE}%{NOTSPACE:class_name} - \\\\[%{NOTSPACE:method_name},%{NUMBER:line}\\\\] %{DATA:service_name} %{DATA:trace_id} %{DATA:span_id} - %{GREEDYDATA:msg}\")\n\nadd_key(topic, abc[\"topic\"])\n\ndefault_time(time,\"Asia/Shanghai\")",
"ENV_GLOBAL_HOST_TAGS": "host=__datakit_hostname,host_ip=__datakit_ip",
"ENV_HTTP_LISTEN": "0.0.0.0:9529",
"ENV_DEFAULT_ENABLED_INPUTS": "dk,cpu,disk,diskio,mem,swap,system,hostobject,net,host_processes,container,ddtrace,statsd,profile"
}
Configuration item description:
- ENV_DATAWAY: Required, gateway address for reporting to Guance.
- KAFKAMQ: Optional, kafkamq collector configuration, refer to: Kafka Collector Configuration File Introduction.
- SPRINGBOOT_LOG_P: Optional, used together with KAFKAMQ, log pipeline script for splitting log data from Kafka.
- ENV_GLOBAL_HOST_TAGS: Required, global tags for collectors.
- ENV_HTTP_LISTEN: Required, DataKit port, IP must be 0.0.0.0 otherwise other pods cannot access.
- ENV_DEFAULT_ENABLED_INPUTS: Required, default enabled collectors.
For more details, refer to Alibaba Cloud SAE Application Engine Observability Best Practices.
Tracing¶
To deploy an application on Alibaba Cloud SAE, you need to integrate APM into the corresponding container:
- You can upload the APM package file to OSS or integrate the APM build package into the application's Dockerfile for building.
- Start loading, follow the same steps as integrating APM in a regular environment.
For more details, refer to Alibaba Cloud SAE Application Engine Observability Best Practices.
Metrics¶
Install Func¶
It is recommended to enable Guance integration - extension - hosted Func.
If deploying Func yourself, refer to Self-hosted Func Deployment.
Enable Script¶
Note: Please prepare an Alibaba Cloud AK that meets the requirements (for simplicity, you can directly grant global read-only permission
ReadOnlyAccess
).
Hosted Version Enable Script¶
- Log in to the Guance console.
- Click on the [Integration] menu, select [Cloud Account Management].
- Click [Add Cloud Account], select [Alibaba Cloud], fill in the required information on the interface; if a cloud account has already been configured, skip this step.
- Click [Test], after testing successfully, click [Save]; if the test fails, check the related configuration information and retest.
- In the [Cloud Account Management] list, you can see the added cloud accounts, click on the corresponding cloud account to enter the detail page.
- Click the [Integration] button on the cloud account detail page, under the
Not Installed
list, findAlibaba Cloud SAE
, click the [Install] button, and install it through the installation interface.
Manual Enable Script¶
-
Log in to the Func console, click [Script Market], enter the official script market, search for:
guance_aliyun_sae_app
,guance_aliyun_sae_instance
. -
After clicking [Install], input the corresponding parameters: Alibaba Cloud AK ID, AK Secret, and account name.
-
Click [Deploy Startup Script], the system will automatically create a
Startup
script set and automatically configure the corresponding startup scripts. -
After enabling, you can see the corresponding automatic trigger configurations in the
Management / Automatic Trigger Configuration
. Click [Execute] to immediately execute once without waiting for the scheduled time. Wait a moment, and you can view the execution task records and corresponding logs.
We default collect some configurations, see the metrics section for details.
Customize Cloud Object Metrics Configuration
Verification¶
- Confirm in
Management / Automatic Trigger Configuration
whether the corresponding task has the corresponding automatic trigger configuration, while checking the corresponding task records and logs for any anomalies. - In Guance,
Infrastructure / Custom
, check if asset information exists. - In Guance,
Metrics
, check if there are corresponding monitoring data.
Metric Introduction¶
After configuring the basic monitoring metrics for Alibaba Cloud-SAE, the default metric sets are as follows. More metrics can be collected through configuration. SAE Basic Monitoring Metrics Details
Metric | Unit | Dimensions | Description |
---|---|---|---|
cpu_Average |
% | userId、appId | Application CPU |
diskIopsRead_Average |
Count/Second | userId、appId | Application Disk IOPS Read |
diskIopsWrite_Average |
Count/Second | userId、appId | Application Disk IOPS Write |
diskRead_Average |
Byte/Second | userId、appId | Application Disk IO Throughput Read |
diskTotal_Average |
Kilobyte | userId、appId | Application Disk Total |
diskUsed_Average |
Kilobyte | userId、appId | Application Disk Usage |
diskWrite_Average |
Byte/Second | userId、appId | Application Disk IO Throughput Write |
instanceId_memoryUsed_Average |
MB | userId、appId、instanceId | Instance Memory Used |
instance_cpu_Average |
% | userId、appId、instanceId | Instance CPU |
instance_diskIopsRead_Average |
Count/Second | userId、appId、instanceId | Instance Disk IOPS Read |
instance_diskIopsWrite_Average |
Count/Second | userId、appId、instanceId | Instance Disk IOPS Write |
instance_diskRead_Average |
Byte/Second | userId、appId、instanceId | Instance Disk IO Throughput Read |
instance_diskTotal_Average |
Kilobyte | userId、appId、instanceId | Instance Disk Total |
instance_diskUsed_Average |
Kilobyte | userId、appId、instanceId | Instance Disk Usage |
instance_diskWrite_Average |
Byte/Second | userId、appId、instanceId | Instance Disk IO Throughput Write |
instance_load_Average |
min | userId、appId、instanceId | Instance Average Load |
instance_memoryTotal_Average |
MB | userId、appId、instanceId | Instance Total Memory |
instance_memoryUsed_Average |
MB | userId、appId、instanceId | Instance Memory Used |
instance_netRecv_Average |
Byte/Second | userId、appId、instanceId | Instance Received Bytes |
instance_netRecvBytes_Average |
Byte | userId、appId、instanceId | Instance Total Received Bytes |
instance_netRecvDrop_Average |
Count/Second | userId、appId、instanceId | Instance Received Packet Drops |
instance_netRecvError_Average |
Count/Second | userId、appId、instanceId | Instance Received Error Packets |
instance_netRecvPacket_Average |
Count/Second | userId、appId、instanceId | Instance Received Packets |
instance_netTran_Average |
Byte/Second | userId、appId、instanceId | Instance Sent Bytes |
instance_netTranBytes_Average |
Byte | userId、appId、instanceId | Instance Total Sent Bytes |
instance_netTranDrop_Average |
Count/Second | userId、appId、instanceId | Instance Sent Packet Drops |
instance_netTranError_Average |
Count/Second | userId、appId、instanceId | Instance Sent Error Packets |
instance_netTranPacket_Average |
Count/Second | userId、appId、instanceId | Instance Sent Packets |
instance_tcpActiveConn_Average |
Count | userId、appId、instanceId | Instance Active TCP Connections |
instance_tcpInactiveConn_Average |
Count | userId、appId、instanceId | Instance Inactive TCP Connections |
instance_tcpTotalConn_Average |
Count | userId、appId、instanceId | Instance Total TCP Connections |
load_Average |
min | userId、appId | Application Average Load |
memoryTotal_Average |
MB | userId、appId | Application Total Memory |
memoryUsed_Average |
MB | userId、appId | Application Memory Used |
netRecv_Average |
Byte/Second | userId、appId | Application Received Bytes |
netRecvBytes_Average |
Byte | userId、appId | Application Total Received Bytes |
netRecvDrop_Average |
Count/Second | userId、appId | Application Received Packet Drops |
netRecvError_Average |
Count/Second | userId、appId | Application Received Error Packets |
netRecvPacket_Average |
Count/Second | userId、appId | Application Received Packets |
netTran_Average |
Byte/Second | userId、appId | Application Sent Bytes |
netTranBytes_Average |
Byte | userId、appId | Application Total Sent Bytes |
netTranDrop_Average |
Count/Second | userId、appId | Application Sent Packet Drops |
netTranError_Average |
Count/Second | userId、appId | Application Sent Error Packets |
netTranPacket_Average |
Count/Second | userId、appId | Application Sent Packets |
tcpActiveConn_Average |
Count | userId、appId | Application Active TCP Connections |
tcpInactiveConn_Average |
Count | userId、appId | Application Inactive TCP Connections |
tcpTotalConn_Average |
Count | userId、appId | Application Total TCP Connections |
Logs¶
Alibaba Cloud SAE provides a Kafka method to output logs to Guance, the process is as follows:
- Enable Kafka log reporting for SAE applications.
- DataKit enables KafkaMQ log collection, collecting application Kafka log reporting topics.
For more detailed steps, refer to Alibaba Cloud SAE Application Engine Observability Best Practices.