Kube State Metrics
通过 Kube State Metrics 收集集群资源实时信息
配置¶
部署 Kube-state-metrics¶
- 安装 Helm(此为在线安装模式)
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh
- 安装 kube-promethues
kube-promethues 安装包采用了 Bitnami 的 Helm chart 方案,Bitnami 官方地址
通过社区获取最新版本Chart 包:
获取已验证的离线版本:
执行以下命令解压文件:
Values 文件配置说明:
global:
  # 修改为私有仓库项目地址 
  imageRegistry: ""
  ## E.g.
  ## imagePullSecrets:
  ##   - myRegistryKeySecretName
  ##
  # 此处 修改为私有仓库密钥的secret 名称
  imagePullSecrets: []
  # 此处df-nfs-storage 修改为集群内可用的 storageClass
  defaultstorageClass: "df-nfs-storage"
  # 此处df-nfs-storage 修改为集群内可用的 storageClass
  storageClass: "df-nfs-storage"
  ## Compatibility adaptations for Kubernetes platforms
  ##
  compatibility:
    ## Compatibility adaptations for Openshift
    ##
    openshift:
      ## @param global.compatibility.openshift.adaptSecurityContext Adapt the securityContext sections of the deployment to make them compatible with Openshift restricted-v2 SCC: remove runAsUser, runAsGroup and fsGroup and let the platform use their allowed default IDs. Possible values: auto (apply if the detected running cluster is Openshift), force (perform the adaptation always), disabled (do not perform adaptation)
      ##
      adaptSecurityContext: auto
## @section Common parameters
....
注意: 部署的 namespace 需要与 datakit 采集器 配置的 namespace 保持一致
- 执行以下命令进行部署:
部署完成后,通过以下命令查看是否部署成功:
配置Datakit¶
- 在 datakit.yaml中的ConfigMap资源中添加kubernetesprometheus.conf
apiVersion: v1
kind: ConfigMap
metadata:
  name: datakit-conf
  namespace: datakit
data:
    kubernetesprometheus.conf: |-
      [inputs.kubernetesprometheus]
        [[inputs.kubernetesprometheus.instances]]
          role       = "service"
          namespaces = ["datakit"]
          selector   = "app.kubernetes.io/name=kube-state-metrics"
          scrape     = "true"
          scheme     = "http"
          port       = "__kubernetes_service_port_http_port"
          path       = "/metrics"
          params     = ""
          interval   = "15s"
          [inputs.kubernetesprometheus.instances.custom]
            measurement        = "kube-state-metrics"
            job_as_measurement = true
            [inputs.kubernetesprometheus.instances.custom.tags]
              cluster_name_k8s       = "promethues-cluster"
              job           = "kube-state-metrics"
              svc_name      = "__kubernetes_service_name"
              pod_name      = "__kubernetes_service_target_name"
              pod_namespace = "__kubernetes_service_target_namespace"
          [inputs.kubernetesprometheus.instances.auth]
            bearer_token_file      = "/var/run/secrets/kubernetes.io/serviceaccount/token"
            [inputs.kubernetesprometheus.instances.auth.tls_config]
              insecure_skip_verify = true
              cert     = "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
- 挂载kubernetesprometheus.conf在datakit.yaml文件中volumeMounts下添加
- mountPath: /usr/local/datakit/conf.d/kubernetesprometheus/kubernetesprometheus.conf
  name: datakit-conf
  subPath: kubernetesprometheus.conf
  readOnly: true
- 执行以下命令,重启 datakit
指标¶
kube-state-metrics 指标集¶
kube-state-metrics 采集的指标位于 kube-state-metrics 指标集下,这里介绍指标相关说明
| Metrics | 描述 | 单位 | 
|---|---|---|
| kube_configmap_created | ConfigMap资源的创建时间 | s | 
| kube_cronjob_created | CronJob资源的创建时间 | s | 
| kube_cronjob_next_schedule_time | CronJob下一次计划执行时间 | s | 
| kube_cronjob_spec_failed_job_history_limit | CronJob规范中失败作业历史记录的限制数量 | count | 
| kube_cronjob_spec_successful_job_history_limit | CronJob规范中成功作业历史记录的限制数量 | count | 
| kube_cronjob_spec_suspend | CronJob是否被挂起 | boolean | 
| kube_cronjob_status_active | CronJob当前活跃的作业数量 | count | 
| kube_cronjob_status_last_schedule_time | CronJob最后一次调度时间 | s | 
| kube_cronjob_status_last_successful_time | CronJob最后一次成功执行时间 | s | 
| kube_daemonset_created | DaemonSet资源的创建时间 | s | 
| kube_daemonset_metadata_generation | DaemonSet元数据的版本号 | count | 
| kube_daemonset_status_current_number_scheduled | 当前已调度的DaemonSet数量 | count | 
| kube_daemonset_status_desired_number_scheduled | 期望已调度的DaemonSet数量 | count | 
| kube_daemonset_status_number_available | 可用的DaemonSet数量 | count | 
| kube_daemonset_status_number_misscheduled | 调度错误的DaemonSet数量 | count | 
| kube_daemonset_status_number_ready | 就绪的DaemonSet数量 | count | 
| kube_daemonset_status_number_unavailable | 不可用的DaemonSet数量 | count | 
| kube_daemonset_status_observed_generation | 观察到的DaemonSet代数 | count | 
| kube_daemonset_status_updated_number_scheduled | 已更新的DaemonSet数量 | count | 
| kube_deployment_created | Deployment资源的创建时间 | s | 
| kube_deployment_metadata_generation | Deployment元数据的版本号 | count | 
| kube_deployment_spec_paused | Deployment是否被暂停 | boolean | 
| kube_deployment_spec_replicas | Deployment规范中期望的副本数量 | count | 
| kube_deployment_spec_strategy_rollingupdate_max_surge | Deployment滚动更新策略中的最大额外副本数 | count | 
| kube_deployment_spec_strategy_rollingupdate_max_unavailable | Deployment滚动更新策略中的最大不可用副本数 | count | 
| kube_deployment_status_observed_generation | 观察到的Deployment代数 | count | 
| kube_deployment_status_replicas | Deployment当前的副本数量 | count | 
| kube_deployment_status_replicas_available | Deployment当前可用的副本数量 | count | 
| kube_deployment_status_replicas_ready | Deployment当前就绪的副本数量 | count | 
| kube_deployment_status_replicas_unavailable | Deployment当前不可用的副本数量 | count | 
| kube_deployment_status_replicas_updated | Deployment当前已更新的副本数量 | count | 
| kube_endpoint_address_available | 可用的Endpoint地址数量 | count | 
| kube_endpoint_address_not_ready | 未就绪的Endpoint地址数量 | count | 
| kube_endpoint_created | Endpoint资源的创建时间 | s | 
| kube_endpoint_info | Endpoint资源的详细信息 | - | 
| kube_endpoint_ports | Endpoint的端口信息 | - | 
| kube_ingress_created | Ingress资源的创建时间 | s | 
| kube_ingress_info | Ingress资源的详细信息 | -s | 
| kube_ingress_metadata_resource_version | Ingress资源的版本号 | count | 
| kube_ingress_path | Ingress的路径信息 | - | 
| kube_job_complete | Job是否完成 | boolean | 
| kube_job_created | Job资源的创建时间 | s | 
| kube_job_info | Job资源的详细信息 | - | 
| kube_job_spec_completions | Job规范中期望完成的工作数量 | count | 
| kube_job_spec_parallelism | Job规范中期望的并行工作数量 | count | 
| kube_job_status_active | Job当前活跃的工作数量 | count | 
| kube_job_status_completion_time | Job完成的时间 | s | 
| kube_job_status_failed | Job失败的工作数量 | count | 
| kube_job_status_start_time | Job开始的时间 | s | 
| kube_job_status_succeeded | Job成功的工作数量 | count | 
| kube_lease_renew_time | Lease的续订时间 | s | 
| kube_namespace_created | Namespace资源的创建时间 | s | 
| kube_namespace_status_phase | Namespace的状态阶段 | count | 
| kube_networkpolicy_created | NetworkPolicy资源的创建时间 | s | 
| kube_networkpolicy_spec_egress_rules | NetworkPolicy规范中的出站规则数量 | count | 
| kube_networkpolicy_spec_ingress_rules | NetworkPolicy规范中的入站规则数量 | count | 
| kube_node_created | Node资源的创建时间 | s | 
| kube_node_spec_unschedulable | Node是否不可调度 | boolean | 
| kube_node_status_addresses | Node的状态地址信息 | count | 
| kube_node_status_capacity | Node的容量信息 | count | 
| kube_node_status_condition | Node的状态条件 | count | 
| kube_persistentvolume_capacity_bytes | PersistentVolume的容量 | byte | 
| kube_persistentvolume_created | PersistentVolume资源的创建时间 | s | 
| kube_persistentvolume_info | PersistentVolume资源的详细信息 | count | 
| kube_persistentvolumeclaim_created | PersistentVolumeClaim资源的创建时间 | s | 
| kube_persistentvolumeclaim_resource_requests_storage_bytes | PersistentVolumeClaim请求的存储资源 | byte | 
| kube_pod_completion_time | Pod完成的时间 | s | 
| kube_pod_container_state_started | Pod中容器是否已启动 | boolean | 
| kube_pod_container_status_last_terminated_exitcode | Pod中容器上次终止的退出码 | count | 
| kube_pod_container_status_last_terminated_timestamp | Pod中容器上次终止的时间戳 | s | 
| kube_pod_container_status_ready | Pod中容器是否就绪 | boolean | 
| kube_pod_container_status_restarts_total | Pod中容器的总重启次数 | count | 
| kube_pod_container_status_running | Pod中容器是否正在运行 | boolean | 
| kube_pod_container_status_terminated | Pod中容器是否已终止 | boolean | 
| kube_pod_container_status_waiting | Pod中容器是否正在等待 | boolean | 
| kube_pod_created | Pod资源的创建时间 | s | 
| kube_pod_deletion_timestamp | Pod资源的删除时间戳 | s | 
| kube_pod_init_container_status_ready | Pod初始化容器是否就绪 | boolean | 
| kube_pod_init_container_status_restarts_total | Pod初始化容器的总重启次数 | count | 
| kube_pod_init_container_status_running | Pod初始化容器是否正在运行 | boolean | 
| kube_pod_init_container_status_terminated | Pod初始化容器是否已终止 | boolean | 
| kube_pod_init_container_status_waiting | Pod初始化容器是否正在等 | boolean | 
| kube_pod_spec_volumes_persistentvolumeclaims_readonly | Pod规范中PersistentVolumeClaim是否为只读 | boolean | 
| kube_pod_start_time | Pod开始的时间 | s | 
| kube_pod_status_container_ready_time | Pod中容器就绪的时间 | s | 
| kube_pod_status_initialized_time | Pod初始化完成的时间 | s | 
| kube_pod_status_ready | Pod是否就绪 | boolean | 
| kube_pod_status_ready_time | Pod就绪的时间 | s | 
| kube_pod_status_scheduled | Pod是否已调度 | boolean | 
| kube_poddisruptionbudget_status_current_healthy | PodDisruptionBudget当前健康的Pod数量 | count | 
| kube_poddisruptionbudget_status_desired_healthy | PodDisruptionBudget期望健康的Pod数量 | count | 
| kube_poddisruptionbudget_status_expected_pods | PodDisruptionBudget期望的Pod数量 | count | 
| kube_poddisruptionbudget_status_observed_generation | PodDisruptionBudget观察到的代数 | count | 
| kube_poddisruptionbudget_status_pod_disruptions_allowed | PodDisruptionBudget允许的Pod中断数量 | count | 
| kube_replicaset_created | ReplicaSet资源的创建时间 | s | 
| kube_replicaset_spec_replicas | ReplicaSet规范中期望的副本数量 | count | 
| kube_replicaset_status_fully_labeled_replicas | ReplicaSet完全标签化的副本数量 | count | 
| kube_replicaset_status_observed_generation | ReplicaSet观察到的代数 | count | 
| kube_replicaset_status_ready_replicas | ReplicaSet就绪的副本数量 | count | 
| kube_replicaset_status_replicas | ReplicaSet当前的副本数量 | count | 
| kube_secret_created | Secret资源的创建时间 | s | 
| kube_service_created | Service资源的创建时间 | s | 
| kube_statefulset_created | StatefulSet资源的创建时间 | s | 
| kube_statefulset_replicas | ReplicaSet当前的副本数量 | count | 
| kube_statefulset_status_observed_generation | StatefulSet观察到的代数 | count | 
| kube_statefulset_status_replicas | StatefulSet当前的副本数量 | count | 
| kube_statefulset_status_replicas_available | StatefulSet可用的副本数量 | count | 
| kube_statefulset_status_replicas_current | ReplicaSet当前的副本数量 | count | 
| kube_statefulset_status_replicas_ready | StatefulSet就绪的副本数量 | count | 
| kube_statefulset_status_replicas_updated | StatefulSet已更新的副本数量 | count | 
| kube_storageclass_created | StorageClass资源的创建时间 | s |