Cloud Idle Resources Intelligent Inspection¶
Background¶
Cloud computing, as a rapidly developing new IT service method, provides enterprises and individuals with convenient, fast, and flexible IT infrastructure and application services, bringing extremely high efficiency and cost-effectiveness. However, as cloud resources gradually become a major part of enterprise data centers, the problem of massive waste of resources in the cloud becomes more significant. Especially within the enterprise scope, due to factors such as demand fluctuations and isolation between departments, some cloud resources cannot be fully utilized, resulting in a large amount of idle resources. This situation leads to a linear increase in the cost of enterprise cloud services, a decrease in resource efficiency, and a potential reduction in security and performance levels. In order to better manage and optimize idle resources in the cloud, and to improve the usage benefits and resource utilization of cloud computing, it is essential to inspect idle resources in the cloud. Through inspection, unnecessary resources in the current cloud services can be discovered and processed promptly, avoiding problems such as cost overhead, data leakage, and poor performance due to long-term unnecessary resource usage.
Preconditions¶
- In Guance「 application performance monitoring 」that already have access applications.
- Offline deployment of DataFlux Func GSE, Or activate the DataFlux Func Hosted Edition
Note:If you are considering using a cloud server for your DataFlux Func offline deployment, please consider deploying with your current Guance SaaS on the same carrier in the same region。
Start Intelligent Inspection¶
In the DataFlux Func, install the "Guance Custom Inspection (Idle Cloud Resource Inspection)" through the "Script Market" and follow the prompts to configure the Guance API Key to complete activation.
Select the inspection scene you want to enable in the DataFlux Func script market and click install. Configure the Guance API Key and GuanceNode, then select deploy and start the script.
Once the deployment of the startup script is successful, it will automatically create the startup script and trigger configuration. You can check the corresponding configuration directly by clicking on the link.
Note: If you need to inspect idle resources from different cloud providers, you need to enable different collectors.
AWS: Guance Integration (AWS-EC2 collection), Guance Integration (AWS-CloudWatch collection) - mem
Huawei: Guance Integration (Huawei Cloud - Cloud Monitoring Collection)
Tencent: Guance Integration (Tencent Cloud - Cloud Monitoring Collection)
Alibaba: Guance Integration (Alibaba Cloud - Cloud Monitoring Collection)
Configs Intelligent Inspection¶
Configure Intelligent Inspection in Guance¶
Enable/Disable¶
APM Intelligent Inspection is "On" by default, and can be manually "Off". When it is on, it will inspect the configured APM.
Export¶
Intelligent Inspection supports "Export JSON configuration". Under the operation menu on the right side of the Intelligent Inspection list, click the "Export" button to export the JSON code of the current inspection, and the export file name format: intelligent inspection name.json
.
Editor¶
The Smart Inspection "Idle Cloud Resource Inspection" supports users manually adding filtering conditions. In the operation menu on the right side of the Smart Inspection list, click the Edit button to edit the inspection template.
- Filtering conditions: Configure the corresponding cloud provider information for the inspection (it is recommended to configure separately for different cloud providers)
- Alert notifications: Support selecting and editing alert policies, including the event levels to be notified, notification objects, and the alert silence period, etc.
After configuring the entry parameters, click edit, fill in the corresponding detection objects in the parameter configuration, and click save to start the inspection:
You can refer to the following configuration:
View Events¶
Based on the Guance inspection algorithm, Intelligent Inspection will look for abnormalities in APM metrics, such as resource
abnormalities occurring suddenly. For abnormal conditions, Intelligent Inspection will generate corresponding events, and you can check the corresponding abnormal events by clicking the "View Related Events" button under the operation menu on the right side of the Smart Inspection list.
Event Details page¶
Click "Event" to view the detail page of intelligent inspection events, including event status, time of exception occurrence, exception name, basic attributes, event details, alarm notification, history and associated events.
- Click the "View monitor configuration" small icon at the top right corner of the detail page to support viewing and editing the configuration details of the current intelligent inspection.
- Click the "Export Event JSON" icon in the upper-right corner of the detail page to support exporting the event details.
Basic Properties¶
- Detection Dimensions: Filter criteria based on smart check configuration, enabling replication of detection dimensions
key/value
, adding to filters and viewing related logs, containers, processes, security patrol, links, user access monitoring, availability monitoring and CI data - Extended Attributes: Support replication in the form of
key/value
after selecting extended attributes and forward/reverse filtering.
Event Details¶
- Event overview: Describes the object, content, etc., of abnormal inspection events
- Idle host details: You can view the idle host status and consumption situation under the current cloud account
Note: When some hosts in the workspace fail to query custom object data and cannot obtain host details, please check whether the custom object data reporting is correct.
History¶
Support to view the detection object, exception/recovery time and duration.
Related events¶
Support to view related events through filtering fields and selected time component information.
FAQ¶
1. How to configure the detection frequency of idle cloud resource inspection
In the self-built DataFlux Func, add fixed_crontab='0 * * * *', timeout=900
in the decorator when writing the custom inspection processing function, and then configure it in "Management / Automatic Trigger Configuration".
2. Idle cloud resource inspection may not have abnormal analysis when triggered
When there is no abnormal analysis in the inspection report, please check the data collection status of the current datakit
.
3. Under what circumstances will idle cloud resource inspection events be generated
When it is found that the average CPU utilization per hour in the past 48 hours is < 1% and the average MEM usage per hour in 48 hours is < 10% and the total in/out TCP traffic per hour in 48 hours is lower than 10M
4. During the inspection process, previously normally running scripts have abnormal errors
Please update the referenced script set in the DataFlux Func script market. You can view the update records of the script market through the Change Log to facilitate timely script updates.
5. During the inspection script upgrade process, there is no change in the corresponding script set in Startup
Please delete the corresponding script set first, then click the upgrade button to configure the corresponding Guance API key to complete the upgrade.
6. How to judge whether the inspection is effective after starting the inspection
View the corresponding inspection status in "Management / Automatic Trigger Configuration". First, the status should be enabled; secondly, you can verify whether there is a problem with the inspection script by clicking "Execute". If the message "xxx minutes ago execution successful" appears, the inspection is running normally and takes effect.