Deployment Manual on the Cloud¶
1 Preface¶
1.1 Product Profile¶
Guance is a cloud service platform aimed at solving cloud computing and building full-link observability for every complete application in the cloud native era system. As a product created by Guance Weilai since 2018, its goal is to provide services for the vast number of cloud-based development project teams in China. Compared with complex and changeable open source products, such as ELK, Prometheus, Grafana and Skywalking, Guance provides monitoring functions and overall observability services. In addition to the integration of the underlying storage and system architecture, It also completely analyzes and deconstructs all the technology stacks related to cloud computing and cloud nativity. Any project team can use our products very easily, and there is no need to invest too much energy to study or transform immature open source products. Meanwhile, based entirely on the amount of data generated by users Guance adopts a service-based, on-demand and volume-based method to charging fees without the need to invest in hardware. As for paying clients, we would establish a professional service team to help clients build a data-based core assurance system, which has the characteristics of real-time, flexibility, easy expansion and easy deployment, and supports cloud SaaS and local deployment mode.
1.2 Description¶
This document mainly introduces the complete steps from resource planning and configuration to deployment and operation of Guance.
Description:
- This document takes dataflux.cn as the main domain name example, and the actual deployment is replaced by the corresponding domain name.
1.3 Keywords¶
Entry | Description |
---|---|
Launcher | WEB application is used to deploy and install Guance, and the installation and upgrade of Guance are completed according to the boot steps of Launcher service |
Operation and Maintenance Operator | An operation and maintenance machine with kubectl installed on the same network as the target Kubernetes cluster |
Install the Operating Machine | Visit the launcher service in the browser to complete the installation of Guance |
kubectl | Kubernetes' command-line client tool, installed on O&M console |
1.4 Deployment Architecture¶
2 Resource Preparation¶
2.1 Resource List¶
Resources | **Specification (Minimum Configuration) ** | **Specification (Recommended Configuration) ** | **Quantity ** | Notes |
---|---|---|---|---|
ACK | Standard hosted cluster version | Standard hosted cluster version | 1 | - |
NAS | 200GB (Capacity Type) | 200GB (capacity type) | 1 | ACK cluster data persistence |
NAT gateway | Small NAT gateway | Small NAT gateway | 1 | ACK cluster out-of-network use |
SLB | Performance-guaranteed type | Performance-guaranteed Type | 2 | In front of Kubenetes Ingress |
ECS | 4C8G (single system disk 80GB) | 8C16G (single system disk 120GB) | 4 | Deploy Alibaba Cloud ACK hosted cluster |
2C4G (single system disk 80GB) | 4C8G (single system disk 120GB) | 2 | Deploy Dataway | |
RDS | 1C2G 50GB | 2C4G 100GB (three node enterprise version) | 1 | MySQL 5.7 |
Redis | 2G | 4G (Standard master-slave version double copy) | 1 | Version: 4.0 |
InfluxDB | 4C16G 500GB | 8C32G 1T (2 nodes) | 1 | Version: 1.7.x |
Elasticsearch | 4C16G 1T(2 nodes) | 16C64G (3Tnodes) | 1 | Version: 7.4+(Recommended 7.10) |
Cloud communication | - | - | 1 | Open mail service and short message service |
Domain name | - | - | 1 | The main domain name needs to be filed, and 8 subdomains under one main domain name. |
SSL certificate | Wildcard domain name certificate | Wildcard domain name certificate | 1 | - |
Note:
- The “minimum configuration” is only suitable for POC scenario deployment and functional verification, and it is not suitable for use as a production environment.
- The “recommended configuration” is suitable for data volume scenarios where InfuxDB is less than 150,000 timeseries and Elasticsearch is less than 7 billion documents (the sum of documents such as logs, links, user access monitoring and events).
- As a production deployment, the actual access data volume is used to evaluate. The more access data volume, the higher the storage and specification configuration of InfuxDB and Elasticsearch.
2.2 Create Resources¶
2.2.1 Basic Resources¶
RDS, Redis, InfluxDB, Elasticsearch, NAS stores are created according to configuration requirements and under the same VPC network in the same region. ECS, SLB and NAT Gateway are created automatically by ACK, without the need to be created separately.
2.2.2 ACK Service Creation¶
2.2.2.1 Cluster Configuration¶
Enter the kubernetes version of the container service, create a Kubernetes, and select the standard managed cluster version. Considerations for cluster configuration are as follows:
- Must be in the same domain as the RDS, InfuxDB, Elasticsearch and other resources created earlier
- Check the "Configure SNAT" option (ACK automatically creates and configures NAT gateway, so that the cluster has the ability to leave the network)
- Check the "Public Network Access" option (you can access the cluster API on the public network, but if you are operating and maintaining this cluster on the Intranet, you can not check this option)
- When ACK service is opened, the storage driver temporarily selects flexvolume, and the CSI driver is not supported in this document for the time being
2.2.2.2 Worker Configuration¶
It mainly selects ECS specifications and quantity. The specifications can be created according to the requirements of configuration list or evaluated according to the actual situation, but they should not be lower than the minimum configuration requirements and the quantity should be at least 3 sets or more.
2.2.2.3 Component Configuration¶
On the page of Component Configuration, you must check the option of "Install Ingress Component" and select the type of "Public Network". ACK would automatically create a SLB of public network type. After installation, touch the domain name to the public network IP of this SLB.
2.3 Resource Allocation¶
2.3.1 RDS¶
- Create an administrator account (it must be an administrator account, which is needed to create and initialize each application DB for subsequent installation and initialization).
- Modify the parameter configuration in the console to set innodb_large_prefix to ON
- Add ECS Intranet IP automatically created by ACK to RDS White List.
2.3.2 Redis¶
- Set the Redis password
- Add ECS Intranet IP automatically created by ACK to Redis White List.
2.3.3 InfluxDB¶
- Create an administrator account (it must be an administrator account, which is needed to create and initialize DB and RP information for subsequent installation and initialization).
- Add the ECS Intranet IP automatically created by ACK to the IntroxDB white list.
2.3.4 Elasticsearch¶
- Create an administrator account.
- Install Chinese word segmentation plug-in:
- Download the word segmentation plug-in corresponding to ES version: https://github.com/medcl/elasticsearch-analysis-ik/releases
-
After unzipping, put it in the plugins directory of the elasticsearch directory, such as:
[root@ft-elasticsearch-867fb8d9bb-xchnm plugins]# find . . ./analysis-ik ./analysis-ik/commons-codec-1.9.jar ./analysis-ik/commons-logging-1.2.jar ./analysis-ik/config ./analysis-ik/config/IKAnalyzer.cfg.xml ./analysis-ik/config/extra_main.dic ./analysis-ik/config/extra_single_word.dic ./analysis-ik/config/extra_single_word_full.dic ./analysis-ik/config/extra_single_word_low_freq.dic ./analysis-ik/config/extra_stopword.dic ./analysis-ik/config/main.dic ./analysis-ik/config/preposition.dic ./analysis-ik/config/quantifier.dic ./analysis-ik/config/stopword.dic ./analysis-ik/config/suffix.dic ./analysis-ik/config/surname.dic ./analysis-ik/elasticsearch-analysis-ik-7.10.1.jar ./analysis-ik/elasticsearch-analysis-ik-7.10.1.zip ./analysis-ik/httpclient-4.5.2.jar ./analysis-ik/httpcore-4.4.4.jar ./analysis-ik/plugin-descriptor.properties ./analysis-ik/plugin-security.policy
-
Add the ECS Intranet IP automatically created by ACK to the Elasticsearch whitelist.
3 kubectl Installation and Configuration¶
3.1 Installing kubectl¶
Kubectl is a command-line client tool of Kubernetes, which can be used to deploy applications, check and manage cluster resources. Our Launcher is based on this command line tool to deploy applications. The specific installation method can be seen in the official document:
https://kubernetes.io/docs/tasks/tools/install-kubectl/
3.2 Configure kube config¶
For Kubectl to gain the ability to manage the cluster, it needs to put the Kubecconfig contents of the cluster into the $HOME/.kube/config file, which can be viewed in the basic information of the cluster.
Choosing a kubecconfig for public network access or intranet access depends on whether your operation and maintenance operator is connected to the cluster intranet.
4 Start Installing Guance¶
4.1 Automatic Storage Configuration¶
4.1.1 NAS Controller¶
NAS Controller YAML file download address: https://static.guance.com/launcher/nas_controller.yaml
Save the above YAML content as a nas_controller.yaml file and put it on the O&M console.
4.1.2 Storage Class Configuration¶
Storage Class YAML download: https://static.guance.com/launcher/storage_class.yaml
Save the above YAML content as storage_class.yaml file, put it on O&M console, and then replace the variable part in the document:
- Replace {{ nas_server_id }} with the Server ID of the NAS store you created earlier.
4.1.3 Import Storage Configuration¶
- Import controller.yaml and execute the command on O&M console:
- Import storage_class.yaml and execute the command on O&M console:
4.2 Launcher Service Installation Configuration¶
4.2.1 Launcher Installation¶
Launcher is installed in two ways:
- Helm installation
- Original YAML installation
!! Only need to choose one installation method
4.2.1.1 Helm Installation¶
Prerequisites:
- Helm3 is installed.
- You have completed storage configuration.
4.2.1.1.1 Installation¶
# add repository
$ helm repo add launcher https://pubrepo.guance.com/chartrepo/launcher
# update repository
$ helm repo update
# helm install Launcher
$ helm install <RELEASE_NAME> launcher/launcher -n launcher --create-namespace \
--set-file configyaml="<Kubeconfig Path>" \
--set ingress.hostName="<Hostname>",storageClassName=<Stroageclass>
Note: <RELEASE_NAME>
is the publication name, which can be set to launcher, <Kubeconfig Path>
is the kube config file path in section 2.3, which can be set to /root/.kube/config, <Hostname>
is the Launcher ingress domain name, <Stroageclass>
is the store class name in section 4.1.2, which can be obtained by executing kubectl get sc
.
helm install my-launcher launcher/launcher -n launcher --create-namespace \
--set-file configyaml="/Users/buleleaf/.kube/config" \
--set ingress.hostName="launcher.my.com",storageClassName=nfs-client
4.2.1.1.2 Community Version Installation¶
If you deploy the community version, you can first get the community version deployment image and add --set image.repository=<镜像地址>,--set image.tag=<镜像tag> parameters for deployment.
# This command is a demo command, please modify the content according to your own requirements
$ helm install my-launcher launcher/launcher -n launcher --create-namespace \
--set-file configyaml="/Users/buleleaf/.kube/config" \
--set ingress.hostName="launcher.my.com",storageClassName=nfs-client \
--set image.repository=pubrepo.jiagouyun.com/dataflux/1.40.93,image.tag=launcher-aa97377-1652102035
4.2.1.1.3 How to Uninstall¶
Launcher has been installed successfully. Please do not uninstall it if it is abnormal.
4.2.1.2 YAML Installation¶
Launcher YAML download: https://static.guance.com/launcher/launcher.yaml
Save the above YAML content as launcher.yaml file, put it on OP operator, and replace the variable section in the document:
- Replace {{ launcher_image }} with the latest version of the Launcher application mirror address, which can be obtained in the community deployment image documentation for the latest version of the Launcher installation mirror address.
- Replace {{ domain }} with the primary domain name, such as dataflux.cn
- Replace {{ storageClassName }} with storage class name, as configured in storage_class.yaml in the previous step: alicloud-nas
Execute the following kubectl command on OMC to import Launcher service:
4.2.2 Resolving Launcher Domain Name to Launcher Service¶
Because launcher service is used for deploying and upgrading Guance, and does not need to be open to users, domain names should not be resolved on the public network. You can bind host on the installation operator to simulate domain name resolution, and add launcher.dataflux.cn domain name binding in /etc/hosts:
192.168.0.1 launcher.dataflux.cn 192.168.0.1 is actually the public network IP address of the SLB instance that was created automatically when the ACK was created in the previous step.
4.3 Application Installation Boot Steps¶
Visit launcher.dataflux.cn in the browser of the installation console, and complete the installation configuration step by step according to the boot steps.
4.3.1 Database Configuration¶
- Database connection address must use intranet address.
- The account must use the administrator account, because this account is needed to initialize the database and database access account of multiple sub-applications
4.3.2 Redis Configuration¶
- Redis connection address must use Intranet address
4.3.3 InfluxDB Configuration¶
- The Intranet address must be used for the InfuxDB link address.
- The account must use the administrator account, because it is necessary to use this account to initialize DB and RP waiting information.
- You can add multiple instances of InfuxDB.
4.3.4 Additional Settings¶
- The initial account name and mailbox of the administrator account in Guance Cloud Management Center (the default password is admin, and it is recommended to modify the default password immediately after logging in).
- Intranet IP of cluster node (be obtained automatically, and it needs to be confirmed whether it is correct)
- The configuration of the main domain name and the sub-domain name of each sub-application, the default sub-domain name is as follows, which can be modified as needed:
- dataflux 【User Front Desk】
- df-api 【User Front Desk API】
- df-management 【Management Background】
- df-management-api 【Management Background API】
- df-websocket 【Websocket Service】
- df-func 【Func Platform】
- df-openapi 【OpenAPI】
- df-static-res 【Static Resource Site】
- df-kodo 【kodo】
Note: df-kodo service can choose whether to use intranet SLB or not. If DataWay and kodo are in the same intranet network, you can choose to use intranet when installing.
- TLS domain name certificate
4.3.5 Installation Information¶
Summarize and display the information just filled in. If there is any error in filling in the information, you can go back to the previous step to modify it.
4.3.6 Application Configuration File¶
The installer would automatically initialize the application configuration template according to the installation information provided in the previous steps, but it is still necessary to check all the application templates one by one and modify the personalized application configuration. See the installation interface for specific configuration instructions.
After confirmation, submit the creation configuration file.
4.3.7 Apply Mirroring¶
- Choose the correct shared storage, which is the storage class name you created in the previous step.
- The application image would be automatically filled in according to the Launcher version you selected without modification, and the application would be created after confirmation.
4.3.8 Application Status¶
The startup status of all application services would be listed here. This process needs to download all images, which may take several minutes to ten minutes. After all services are successfully started, it means that they have been installed successfully.
Note: In the process of service startup, you must stay on this page and don't close it. At the end, you would see the prompt of "version information was written successfully" and no error window would pop up, which means that the installation is successful!
4.4 Domain Name Resolution¶
Resolve all subdomain names except df-kodo.dataflux.cn to the SLB public network IP address automatically created by ACK:
- dataflux.dataflux.cn
- df-api.dataflux.cn
- df-management.dataflux.cn
- df-management-api.dataflux.cn
- df-openapi.dataflux.cn
- df-func.dataflux.cn
- df-static-res.dataflux.cn
After the service installation is completed, the cluster would automatically create a public network SLB for the kodo service. You can use the kubectl get svc-n forethought-kodo command to view the EXTERNAL-IP of the kodo-nginx service, and the df-kodo.dataflux.cn subdomain name is separately resolved to the public network IP of this SLB, as shown in the following figure:
The SLB needs to be configured with HTTPS certificates, and the required certificates need to be uploaded to the SLB console by themselves, and the SLB monitoring protocol is modified to seven-layer HTTPS. DataWay defaults to report data and select HTTPS protocol. Tips: For the specific way to access services through SLB, please refer to: https://www.alibabacloud.com/help/zh/doc-detail/86531.htm to edit the YAML file of kodo-nginx deployment, and add the following annotations:
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-cert-id: 1642778637586298_17076818419_1585666584_-1335499667 ## Subject to the certificate id on the actual console ##
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-force-override-listeners: '"true"' ## Force override listening using an existing configuration ##
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-id: lb-k2j4h4nlg2vgiwi9jyga6 ## Load balancing instance id ## (Specify an existing slb instance)
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-protocol-port: '"https:443"' ## Protocol type ##
4.5 After Installation¶
For successful deployment, please refer to the document how yo use
If there is a problem during installation and it needs to be reinstalled, please refer to Maintenance Manual
4.6 Important Steps!!!¶
After the above steps, Guance has been installed and can be verified. After the verification is correct, it is a very important step to offline the launcher service to prevent it from being accessed by mistake and damaging the application configuration. The following command can be executed on the O&M console to set the pod copy number of the launcher service to 0: