HUAWEI CLOUD CSS for Elasticsearch¶
Use the「Guance Synchronization」series script package in the script market to monitor the cloud ,The data of the cloud asset is synchronized to the Guance.
Config¶
Install Func¶
Recommend opening 「Integrations - Extension - DataFlux Func (Automata)」: All preconditions are installed automatically,Please continue with the script installation
If you deploy Func yourself,Refer to Self-Deployment of Func
Installation script¶
Tip: Please prepare HUAWEI CLOUD AK that meets the requirements in advance(For simplicity's sake,You can directly grant the global read-only permission
ReadOnlyAccess
)
To synchronize the monitoring data of CSS for Elasticsearch cloud resources, we install the corresponding collection script:To access the [Script Market] via the web service of Func,search by css keywords,installation 「Guance Integration(HUAWEI CLOUD-CSS)」(ID:guance_huaweicloud_css
)
Click [Install] and enter the corresponding parameters: HUAWEI CLOUD AK,SK,HUAWEI CLOUD account name.
Tap [Deploy startup Script],The system automatically creates Startup script sets,And automatically configure the corresponding startup script.
After the script is installed,Find the script in「Development」in Func「Guance Integration(HUAWEI CLOUD-CSS)」,Expand to modify this script,find collector_configsandmonitor_configsEdit the content inregion_projects,Change the locale and Project ID to the actual locale and Project ID,Click Save Publish again.
In addition, the corresponding automatic trigger configuration is displayed in「Management / Crontab Config」. Tap [Run],It can be executed immediately once,without waiting for a periodic time.After a while,you can view task execution records and corresponding logs.
We collected some configurations by default, as described in the Metrics column Configure custom cloud object metrics
Verify¶
- In「Management / Crontab Config」check whether the automatic triggering configuration exists for the corresponding task,In addition, you can view task records and logs to check whether exceptions exist
- On Guance platform, click 「Infrastructure / Custom」 to check whether asset information exists
- On Guance platform, press 「Metrics」 to check whether monitoring data exists
Metric¶
Configure HUAWEI CLOUD - cloud monitoring. The default metric set is as follows. You can collect more metrics by configuring them HUAWEI CLOUD Monitor Metrics Details
Instance monitoring metric¶
HUAWEI CLOUD CSS for Elasticsearch instance performance monitoring metric,as shown in the table below.
Metric ID | Metric name | Metric meaning | Value range | Monitoring cycle(original metric) |
---|---|---|---|---|
status |
Cluster Health Status | This metric is used to measure the status of the monitoring object. | 0, 1, 2, 3; 0: The cluster is 100% available. 1: Data is complete, but some replicas are missing. High availability is weakened to some extent, and there is a risk. Please pay attention to the cluster situation in a timely manner. 2: Data is missing, and using the cluster will result in exceptions. 3: Failed to obtain the cluster status. |
1 minute |
indices_count |
Index Number | The number of indexes in the CSS cluster. | ≥ 0 | 1 minute |
total_shards_count |
Shard Number | The number of shards in the CSS cluster. | ≥ 0 | 1 minute |
primary_shards_count |
Primary Shard Number | The number of primary shards in the CSS cluster. | ≥ 0 | 1 minute |
coordinating_nodes_count |
Coordination Node Number | The number of coordination nodes in the CSS cluster. | ≥ 0 | 1 minute |
data_nodes_count |
Data Node Number | The number of data nodes in the CSS cluster. | ≥ 0 | 1 minute |
SearchRate |
Average Query Rate | The QPS of the queries, which is the average number of queries per second in the cluster. | ≥ 0 | 1 minute |
IndexingRate |
Average Index Rate | The TPS of the indexes, which is the average number of indexes per second in the cluster. | ≥ 0 | 1 minute |
IndexingLatency |
Average Index Latency | The average time required for a shard to complete an index operation. | ≥ 0 ms | 1 minute |
SearchLatency |
Average Query Latency | The average time required for a shard to complete a search operation. | ≥ 0 ms | 1 minute |
avg_cpu_usage |
Average CPU Usage | The average CPU utilization of each node in the CSS cluster. | 0-100% | 1 minute |
avg_mem_used_percent |
Average Memory Usage Ratio | The average ratio of memory usage of each node in the CSS cluster. | 0-100% | 1 minute |
disk_util |
Disk Usage Ratio | This metric is used to measure the disk usage ratio of the monitoring object. | 0-100% | 1 minute |
avg_load_average |
Average Node Load Value | The average value of the number of tasks queued in the operating system for each node in the CSS cluster per minute. | ≥ 0 | 1 minute |
avg_jvm_heap_usage |
Average JVM Heap Usage Ratio | The average ratio of JVM heap memory usage of each node in the CSS cluster. | 0-100% | 1 minute |
sum_current_opened_http_count |
Current Number of Open HTTP Connections | The total number of open and unclosed HTTP connections for each node in the CSS cluster. | ≥ 0 | 1 minute |
avg_thread_pool_write_queue |
Average Number of Queued Tasks in the Write Queue | The average number of queued tasks in the write thread pool for each node in the CSS cluster. | ≥ 0 | 1 minute |
avg_thread_pool_search_queue |
Average Number of Queued Tasks in the Search Queue | The average number of queued tasks in the search thread pool for each node in the CSS cluster. | ≥ 0 | 1 minute |
avg_thread_pool_force_merge_queue |
Average Number of Queued Tasks in the ForceMerge Queue | The average number of queued tasks in the force merge thread pool for each node in the CSS cluster. | ≥ 0 | 1 minute |
avg_thread_pool_write_rejected |
Average Number of Rejected Tasks in the Write Queue | The average number of rejected tasks in the write thread pool for each node in the CSS cluster. | ≥ 0 | 1 minute |
avg_jvm_old_gc_count |
Average Number of JVM Garbage Collection in the Old Generation | The average cumulative number of garbage collection runs in the old generation of each node in the CSS cluster. | ≥ 0 | 1 minute |
avg_jvm_old_gc_time |
Average JVM Garbage Collection Time in the Old Generation | The average cumulative time spent on garbage collection runs in the old generation of each node in the CSS cluster. | ≥ 0 ms | 1 minute |
avg_jvm_young_gc_count |
Average Number of JVM Garbage Collection in the Young Generation | The average cumulative number of garbage collection runs in the young generation of each node in the CSS cluster. | ≥ 0 | 1 minute |
avg_jvm_young_gc_time |
Average JVM Garbage Collection Time in the Young Generation | The average cumulative time spent on garbage collection runs in the young generation of each node in the CSS cluster. | ≥ 0 ms | 1 minute |
Object¶
The collected HUAWEI CLOUD CSS for Elasticsearch object data structure can see the object data from「Infrastructure-Custom」
{
"measurement": "huaweicloud_css",
"tags": {
"name" : "xxxxx",
"publicIp" : "xxxxx",
"id" : "xxxxx",
"status" : "100",
"endpoint" : "192.168.0.100:9200",
"vpc_id" : "3dda7d4b-aec0-4838-a91a-28xxxxxxxx",
"instance_name" : "css-3384",
"subnetId" : "xxxxx",
"securityGroupId" : "xxxxxxx",
"enterpriseProjectId" : "xxxxxxx",
"project_id" : "xxxxxxx",
"RegionId" : "cn-north-4"
},
"fields": {
"datastore" : "{\"supportSecuritymode\": false, \"type\": \"elasticsearch\", \"version\": \"7.6.2\"}",
"instances" : "[{\"azCode\": \"cn-east-3a\", \"id\": \"95f61e90-507b-48d4-8ac5-53dcefd155a3\", \"ip\": \"192.168.0.140\", \"name\": \"css-test-ess-esn-1-1\", \"specCode\": \"ess.spec-kc1.xlarge.2\", \"status\": \"200\", \"type\": \"ess\", \"volume\": {\"size\": 40, \"type\": \"HIGH\"}}]",
"publicKibanaResp" : "xxxx",
"elbWhiteList" : "xxxx",
"updated" : "2023-06-27T07:35:29",
"created" : "2023-06-27T07:35:29",
"bandwidthSize" : "100",
"actions" : "REBOOTING",
"tags" : "xxxx",
"period" : true,
}
}
Some parameter descriptions are as follows:
| parameter name | illustrate |
| ------------------ | ----------- |
| status
| Cluster status value |
| updated
| The last modification time of the cluster, in the format of ISO8601 |
| bandwidthSize
| Public network bandwidth, unit: Mbit/s
|
| actions
| The current behavior of the cluster |
| period
| Whether it is a package period cluster |
status (cluster status value) value meaning:
| value | illustrate |
| --------- | ------------------ |
| 100
| Creating |
| 200
| Available |
| 303
| unavailable |
Actions (current behavior of the cluster) value meaning:
| value | illustrate |
| --------- | ------------------ |
| REBOOTING
| Restart |
| GROWING
| Expansion |
| RESTORING
| Restoring the cluster |
| SNAPSHOTTING
| Create snapshot |
period value meaning:
| value | illustrate |
| --------- | ------------------ |
| true
| Periodic billing cluster |
| false
| On-demand billing cluster |
notice:
tags
、fields
The fields in this section may change with subsequent updatesTip:
tags.name
The value is the cluster ID as a unique identification