Huawei Cloud Search Service CSS for Elasticsearch
Collect monitoring metrics for Huawei Cloud Search Service CSS for Elasticsearch
Configuration¶
Install Func¶
It is recommended to enable Guance Integration - Extensions - DataFlux Func (Automata): all prerequisites are automatically installed, please proceed with script installation.
If you need to deploy Func manually, refer to Manual Deployment of Func
Install Script¶
Note: Please prepare Huawei Cloud AK with appropriate permissions in advance (for simplicity, you can grant global read-only permission
Tenant Guest)
Automata Version Installation Script¶
- Log in to the Guance console
- Click the 【Integration】 menu, select 【Cloud Account Management】
- Click 【Add Cloud Account】, select 【Huawei Cloud】, and fill in the required information on the interface. If you have already configured the cloud account information before, you can skip this step
- Click 【Test】, if the test is successful, click 【Save】. If the test fails, please check the relevant configuration information and test again
- Click 【Cloud Account Management】, you can see the added cloud account in the list, click the corresponding cloud account to enter the details page
- Click the 【Integration】 button on the cloud account details page, find
Huawei Cloud Search Service CSS for Elasticsearchunder theNot Installedlist, and click the 【Install】 button to install.
Manual Installation Script¶
-
Log in to the Func console, click 【Script Market】, enter the Guance script market, search for
integration_huaweicloud_css -
Click 【Install】, then enter the corresponding parameters: Huawei Cloud AK, SK, and account name
-
Click 【Deploy Startup Script】, the system will automatically create the
Startupscript set and configure the corresponding startup script -
After enabling, you can see the corresponding automatic trigger configuration in 「Management / Automatic Trigger Configuration」. Click 【Execute】 to execute immediately without waiting for the scheduled time. After a while, you can check the execution task records and corresponding logs
Verification¶
- In 「Management / Automatic Trigger Configuration」, confirm whether the corresponding task has the corresponding automatic trigger configuration, and you can also check the corresponding task records and logs to see if there are any exceptions
- In Guance, go to 「Infrastructure - Resource Catalog」 to check if there is asset information
- In Guance, go to 「Metrics」 to check if there is corresponding monitoring data
Metrics¶
Configure Huawei Cloud CSS metrics, you can collect more metrics through configuration Huawei Cloud CSS Metrics Details
Instance Monitoring Metrics¶
Huawei Cloud Search Service CSS for Elasticsearch instance performance monitoring metrics, as shown in the table below. For more metrics, please refer to Table 1
| Metric ID | Metric Name | Metric Description | Range | Monitoring Period (Original Metric) |
|---|---|---|---|---|
status |
Cluster Health Status | This metric is used to measure the status of the monitored object. | 0,1,2,3; 0:The cluster is 100% available. 1:The data is complete, but some replicas are missing. High availability is somewhat weakened, there is a risk, please pay attention to the cluster status. 2:Data is missing, and the cluster will be abnormal when used. 3:The cluster status is not obtained. |
1 minute |
indices_count |
Number of Indices | The number of indices in the CSS cluster. | ≥ 0 | 1 minute |
total_shards_count |
Number of Shards | The number of shards in the CSS cluster. | ≥ 0 | 1 minute |
primary_shards_count |
Number of Primary Shards | The number of primary shards in the CSS cluster. | ≥ 0 | 1 minute |
coordinating_nodes_count |
Number of Coordinating Nodes | The number of coordinating nodes in the CSS cluster. | ≥ 0 | 1 minute |
data_nodes_count |
Number of Data Nodes | The number of data nodes in the CSS cluster. | ≥ 0 | 1 minute |
SearchRate |
Average Search Rate | Search QPS, the average number of search operations per second in the cluster. | ≥ 0 | 1 minute |
IndexingRate |
Average Indexing Rate | Indexing TPS, the average number of indexing operations per second in the cluster. | ≥ 0 | 1 minute |
IndexingLatency |
Average Indexing Latency | The average time it takes for shards to complete indexing operations. | ≥ 0 ms | 1 minute |
SearchLatency |
Average Search Latency | The average time it takes for shards to complete search operations. | ≥ 0 ms | 1 minute |
avg_cpu_usage |
Average CPU Usage | The average CPU utilization of nodes in the CSS cluster. | 0-100% | 1 minute |
avg_mem_used_percent |
Average Memory Usage Ratio | The average memory usage ratio of nodes in the CSS cluster. | 0-100% | 1 minute |
disk_util |
Disk Utilization | This metric is used to measure the disk utilization of the monitored object. | 0-100% | 1 minute |
avg_load_average |
Average Node Load Value | The average number of queued tasks in the operating system of nodes in the CSS cluster for 1 minute. | ≥ 0 | 1 minute |
avg_jvm_heap_usage |
Average JVM Heap Usage | The average JVM heap memory usage of nodes in the CSS cluster. | 0-100% | 1 minute |
sum_current_opened_http_count |
Total Number of Currently Opened HTTP Connections | The sum of the number of HTTP connections opened and not yet closed in each node of the CSS cluster. | ≥ 0 | 1 minute |
avg_thread_pool_write_queue |
Average Number of Queued Tasks in the Write Queue | The average number of queued tasks in the write thread pool of nodes in the CSS cluster. | ≥ 0 | 1 minute |
avg_thread_pool_search_queue |
Average Number of Queued Tasks in the Search Queue | The average number of queued tasks in the search thread pool of nodes in the CSS cluster. | ≥ 0 | 1 minute |
avg_thread_pool_force_merge_queue |
Average Number of Queued Tasks in the ForceMerge Queue | The average number of queued tasks in the force merge thread pool of nodes in the CSS cluster. | ≥ 0 | 1 minute |
avg_thread_pool_write_rejected |
Average Number of Rejected Tasks in the Write Queue | The average number of rejected tasks in the write thread pool of nodes in the CSS cluster. | ≥ 0 | 1 minute |
avg_jvm_old_gc_count |
Average Number of Old Generation GCs | The average cumulative number of times the "old generation" garbage collection has run in each node of the CSS cluster. | ≥ 0 | 1 minute |
avg_jvm_old_gc_time |
Average Old Generation GC Time | The average cumulative time spent on "old generation" garbage collection in each node of the CSS cluster. | ≥ 0 ms | 1 minute |
avg_jvm_young_gc_count |
Average Number of Young Generation GCs | The average cumulative number of times the "young generation" garbage collection has run in each node of the CSS cluster. | ≥ 0 | 1 minute |
avg_jvm_young_gc_time |
Average Young Generation GC Time | The average cumulative time spent on "young generation" garbage collection in each node of the CSS cluster. | ≥ 0 ms | 1 minute |
Object¶
The data structure of the collected Huawei Cloud Search Service CSS for Elasticsearch object can be seen in 「Infrastructure - Resource Catalog」
{
"measurement": "huaweicloud_css",
"tags": {
"RegionId" : "cn-north-4",
"project_id" : "xxxxxxx",
"enterpriseProjectId" : "",
"instance_id" : "xxxxxxx-xxxxxxx-xxxxxxx-00001",
"instance_name" : "css-3384",
"publicIp" : "xxxxx",
"status" : "100",
"endpoint" : "192.168.0.100:9200",
},
"fields": {
"vpc_id" : "3dda7d4b-aec0-4838-a91a-28xxxxxxxx",
"subnetId" : "xxxxx",
"securityGroupId" : "xxxxxxx",
"datastore" : "{\"supportSecuritymode\": false, \"type\": \"elasticsearch\", \"version\": \"7.6.2\"}",
"instances" : "[{\"azCode\": \"cn-east-3a\", \"id\": \"95f61e90-507b-48d4-8ac5-53dcefd155a3\", \"ip\": \"192.168.0.140\", \"name\": \"css-test-ess-esn-1-1\", \"specCode\": \"ess.spec-kc1.xlarge.2\", \"status\": \"200\", \"type\": \"ess\", \"volume\": {\"size\": 40, \"type\": \"HIGH\"}}]",
"publicKibanaResp" : "xxxx",
"elbWhiteList" : "xxxx",
"updated" : "2023-06-27T07:35:29",
"created" : "2023-06-27T07:35:29",
"bandwidthSize" : "100",
"actions" : "REBOOTING",
"tags" : "xxxx",
"period" : true,
}
}
Some parameter descriptions are as follows:
| Parameter Name | Description |
|---|---|
status |
Cluster Status Value |
updated |
Last Modified Time of the Cluster, in ISO8601 Format |
bandwidthSize |
Public Network Bandwidth, Unit: Mbit/s |
actions |
Current Actions of the Cluster |
period |
Whether it is a Subscription Cluster |
status (Cluster Status Value) Meaning:
| Value | Description |
|---|---|
100 |
Creating |
200 |
Available |
303 |
Unavailable |
actions (Current Actions of the Cluster) Meaning:
| Value | Description |
|---|---|
REBOOTING |
Restarting |
GROWING |
Expanding |
RESTORING |
Restoring Cluster |
SNAPSHOTTING |
Creating Snapshot |
period Meaning:
| Value | Description |
|---|---|
true |
Subscription Billing Cluster |
false |
Pay-As-You-Go Billing Cluster |
Note: Fields in
tagsandfieldsmay change with subsequent updatesTip: The value of
tags.instance_idis the cluster ID, which is used as a unique identifier