Cloud Integration¶
This document mainly introduces the access and processing of synchronizing data from cloud platforms such as Alibaba Cloud and AWS using the "Cloud Sync" series of script packages in the script market.
Tip
Always use the latest version of DataFlux Func for operations.
Tip
New features will be continuously added to this script package. Please keep an eye on this document page.
1. Prerequisites¶
- Log in to Guance and register an account.
1.1 If you have activated DataFlux Func (Automata)¶
All prerequisites are automatically installed. No additional prerequisites are required. Please proceed to script installation.
1.2 If you deploy Func yourself¶
- Install DataFlux Func on a cloud host. For specific system requirements, refer to: Deployment and Maintenance / System Requirements
- Download and install DataFlux Func GSE Edition on the cloud host:
# Download DataFlux Func GSE
/bin/bash -c "$(curl -fsSL func.guance.com/download)" -- --for=GSE
# Install DataFlux Func
sudo /bin/bash {installation directory}/run-portable.sh
1.2.1 GSE Edition vs Original Edition¶
The following are the differences between the GSE Edition and the Original Edition:
Comparison Item | GSE Edition | Original Edition |
---|---|---|
Pre-installed Scripts | Guance Script Market Scripts: 1. Integration Core 2. Self-built Inspection Core 3. Algorithm Library 4. Tool Kit Automatically updated to the latest version upon each restart. |
None |
Pre-installed Python Packages | In addition to the packages required by DataFlux Func: 1. Third-party packages required by official script sets 2. Mathematical packages such as numpy , pandas 3. Other packages such as jinja2 , mailer , openpyxl |
Only the packages required by DataFlux Func |
Pre-added Script Market | Guance Script Market | None |
Access to Public Network | Initialization processing of pre-installed script sets Requires DataFlux Func itself to access the public network, otherwise it may fail to start normally |
Not required |
Tip
If the user has already deployed the Original Edition of Func, they can directly re-download and install the GSE Edition.
For more information, refer to: Quick Start
- After installation, create a new connector, select the type as Guance, and configure the workspace's
API Key ID
andAPI Key
in the connector.
2. Script Installation¶
Here, assume that you need to collect Alibaba Cloud monitoring data and write it to Guance.
Tip
Please prepare the Alibaba Cloud AK with the required permissions in advance (for simplicity, you can directly grant global read-only permissions ReadOnlyAccess).
2.1 Install Specific Collectors¶
To synchronize cloud resource monitoring data, we generally need to install two scripts: one for collecting basic information of the corresponding cloud assets, and another for collecting cloud monitoring information.
If you need to collect corresponding logs, you also need to enable the corresponding log collection script. If you need to collect bills, you need to enable the cloud bill collection script.
Taking Alibaba Cloud ECS collection as an example, in "Management / Script Market", click and install the corresponding script packages in sequence:
- "Integration (Alibaba Cloud-Cloud Monitoring)" (ID:
integration_alibabacloud_monitor
) - "Integration (Alibaba Cloud-ECS)" (ID:
integration_alibabacloud_ecs
)
After clicking [Install], enter the corresponding parameters: Alibaba Cloud AK, Alibaba Cloud account name.
Click [Deploy Startup Script], and the system will automatically create the Startup
script set and configure the corresponding startup scripts.
Additionally, in "Management / Scheduled Tasks (Old Version: Automatic Trigger Configuration)", you can see the corresponding scheduled tasks (Old Version: Automatic Trigger Configuration). Click [Execute] to immediately execute once without waiting for the scheduled time. After a while, you can view the execution task records and corresponding logs.
2.2 Verify Synchronization Status¶
- In "Management / Scheduled Tasks (Old Version: Automatic Trigger Configuration)", confirm whether the corresponding tasks have the corresponding scheduled tasks (Old Version: Automatic Trigger Configuration). At the same time, you can view the corresponding task records and logs to check for any exceptions.
- In the Guance platform, check whether asset information exists in "Infrastructure / Custom".
- In the Guance platform, check whether there is corresponding monitoring data in "Metrics".
3. Code Explanation¶
The following is a step-by-step explanation of the code in this example.
In fact, all "Integration" type scripts can be implemented using similar methods.
import Section¶
To use the scripts provided by the script market normally, after installing the script package, you need to introduce these components through the import
method.
from integration_core__runner import Runner
import integration_alibabacloud_monitor__main as aliyun_monitor
Runner
is the actual launcher of all collectors. In any case, you need to introduce Runner
to start the collector.
aliyun_monitor
is the "Alibaba Cloud-Cloud Monitoring" collector required in this example.
Account Configuration Section¶
To call the cloud platform's API normally, users also need to provide the corresponding platform's AK for the collector to use.
account = {
'ak_id' : '<Alibaba Cloud AK ID with appropriate permissions>',
'ak_secret': '<Alibaba Cloud AK Secret with appropriate permissions>',
'extra_tags': {
'account_name': 'My Alibaba Cloud Account',
}
}
For Alibaba Cloud AK/SK creation, refer to: Create AccessKey
In addition to the basic ak_id
and ak_secret
, some cloud platform accounts may also need to provide additional content, such as AWS using iam roles, which requires configuring assume_role_arn
, role_session_name
, etc. For specific details, refer to Amazon (AWS) Code Example.
Finally, each account also allows adding an extra_tags
field, allowing users to add the same tags uniformly to the collected data, making it easier to identify different data accounts in Guance.
The Key and Value of extra_tags
are both strings, with no content restrictions, and support multiple Key and Value.
In this example, we configure { 'account_name': 'My Alibaba Cloud Account' }
for extra_tags
, adding the account_name="My Alibaba Cloud Account"
tag to all data of this account.
Function Definition Section¶
In DataFlux Func, all code must be included in a function decorated with @DFF.API(...)
.
The first parameter of the @DFF.API(...)
decorator is the title, with arbitrary content.
For integration scripts, they are ultimately run through "Scheduled Tasks (Old Version: Automatic Trigger Configuration)". Only functions with the @DFF.API(...)
decorator can be created as "Scheduled Tasks (Old Version: Automatic Trigger Configuration)".
Collector Configuration Section¶
In addition to configuring the corresponding cloud platform account, the collector also needs to be configured.
The configuration of the collector can be found in the specific collector's documentation. This article only provides usage hints here.
Basic Configuration¶
collector_configs = {
'targets': [
{
'namespace': 'acs_ecs_dashboard', # Cloud monitoring namespace
'metrics' : ['*cpu*', '*mem*'], # Cloud monitoring metrics containing cpu, mem data
},
],
}
collectors = [
aliyun_monitor.DataCollector(account, collector_configs),
]
Alibaba Cloud monitoring requires configuring the collection targets. In this example, we specify to only collect metrics related to CPU and memory in ECS.
Advanced Configuration¶
# Metric filter
def filter_ecs_metric(instance, namespace='acs_ecs_dashboard'):
'''
Collect metric data where instance_id is within ['xxxx']
'''
# return True
instance_id = instance['tags'].get('InstanceId')
if instance_id in ['xxxx']:
return True
return False
def after_collect_metric(point):
'''
Supplement tags for the collected data
'''
if point['tags']['name'] == 'xxx':
point['tags']['custom_tag'] = 'c1'
return point
collector_configs = {
'targets': [
{
'namespace': 'acs_ecs_dashboard', # Cloud monitoring namespace
'metrics' : ['*cpu*', '*mem*'], # Cloud monitoring metrics containing cpu, mem data
},
],
}
collectors = [
aliyun_monitor.DataCollector(account, collector_configs, filters=filter_ecs_metric, after_collect=after_collect_metric)),
]
filters
: Filter function. Filters the collected data (not every collector supports filters. Please check the specific collector documentation for "Configure Filters"). After defining the filter conditions, the function returns True to indicate that the condition is met and needs to be collected, and returns False to indicate that the condition is met but does not need to be collected. Please configure flexibly according to your business.after_collect
: Custom after_collect function to perform secondary processing on the collected data. Use cases: log data splitting, adding extra fields to field/tags, etc. Note: The return value of this function exists as the data to be reported. It is recommended that you only modify the incoming point or add a series of points according to the original point structure. If you return empty or False, it means that all points collected by the collector will not be reported.
Finally, you need to use the account configuration from the above and the collector configuration here to generate specific "Collector Instances".
Startup Execution Section¶
The operation of the collector requires a unified Runner
launcher to run.
The launcher needs to be initialized with the specific "Collector Instances" generated above and call the run()
function to start the operation.
The launcher will traverse all incoming collectors and sequentially report the collected data to DataKit (the default DataKit connector ID is datakit
).
After writing the code, if you are not sure whether the configuration is correct, you can add the debug=True
parameter to the launcher to run it in debug mode.
The launcher running in debug mode will perform the data collection operation normally, but will not write to DataKit in the end, as follows:
If the DataKit connector ID to be written is not the default datakit
, you can add datakit_id="<DataKit ID>"
to the launcher to specify the DataKit connector ID, as follows:
4. Other Cloud Vendor Code References¶
The configuration methods of other cloud vendors are similar to Alibaba Cloud.
Amazon (AWS)¶
Taking the collection of "EC2 Instance Objects" and "EC2-related Monitoring Metrics" as an example:
from integration_core__runner import Runner
import integration_aws_ec2__main as aws_ec2
import integration_aws_cloudwatch__main as aws_cloudwatch
# Account configuration
# AWS supports users to bring in iam roles to collect resources
# If you need to use roles, please configure: assume_role_arn, role_session_name
# If multi-factor authentication (MFA) is enabled, please configure: serial_number, token_code
account = {
'ak_id' : '<AWS AK ID with appropriate permissions>',
'ak_secret' : '<AWS AK Secret with appropriate permissions>',
'assume_role_arn' : '<Resource name (ARN) of the role to be brought in>',
'role_session_name': '<Role session name>',
'serial_number' : '<MFA device identifier>',
'token_code' : '<One-time code provided by the MFA device, optional>',
'extra_tags': {
'account_name': 'My AWS Account',
}
}
@DFF.API('Execute Cloud Asset Synchronization')
def run():
regions = ['cn-northwest-1']
# Collector configuration
ec2_configs = {
'regions': regions,
}
cloudwatch_configs = {
'regions': regions,
'targets': [
{
'namespace': 'AWS/EC2',
'metrics' : ['*cpu*'],
},
],
}
collectors = [
aws_ec2.DataCollector(account, ec2_configs),
aws_cloudwatch.DataCollector(account, cloudwatch_configs),
]
# Startup execution
Runner(collectors).run()
Tencent Cloud¶
Taking the collection of "CVM Instance Objects" and "CVM-related Monitoring Metrics" as an example:
from integration_core__runner import Runner
import integration_tencentcloud_cvm__main as tencentcloud_cvm
import integration_tencentcloud_monitor__main as tencentcloud_monitor
# Account configuration
account = {
'ak_id' : '<Tencent Cloud Secret ID with appropriate permissions>',
'ak_secret': '<Tencent Cloud Secret Key with appropriate permissions>',
'extra_tags': {
'account_name': 'My Tencent Cloud Account',
}
}
@DFF.API('Execute Cloud Asset Synchronization')
def run():
regions = ['ap-shanghai']
# Collector configuration
cvm_configs = {
'regions': regions,
}
monitor_configs = {
'regions': regions,
'targets': [
{
'namespace': 'QCE/CVM',
'metrics' : ['*cpu*'],
},
],
}
collectors = [
tencentcloud_cvm.DataCollector(account, cvm_configs),
tencentcloud_monitor.DataCollector(account, monitor_configs),
]
# Startup execution
Runner(collectors).run()
Microsoft Cloud¶
Taking the collection of "CVM Instance Objects" and "CVM-related Monitoring Metrics" as an example:
from integration_core__runner import Runner
import integration_azure_vm__main as vm_main
import integration_azure_monitor__main as monitor_main
# Account configuration
account = {
"client_id" : "<Azure Client Id>",
"client_secret" : "<Azure Client Secret>",
"tenant_id" : "<Azure Tenant Id>",
"authority_area": "<Azure Area, Default global>",
"extra_tags": {
"account_name": "<Your Account Name>",
}
}
subscriptions = "<Azure Subscriptions (Multiple needs to be separated by ',')>"
subscriptions = subscriptions.split(',')
# Collector configuration
collector_configs = {
'subscriptions': subscriptions,
}
monitor_configs = {
'targets': [
{
'namespace': 'Microsoft.Compute/virtualMachines',
'metrics' : [
'CPU*'
],
},
],
}
@DFF.API('Execute Microsoft Cloud VM Resource Collection')
def run():
collectors = [
vm_main.DataCollector(account, collector_configs),
monitor_main.DataCollector(account, monitor_configs),
]
Runner(collectors).run()
Microsoft Cloud account
parameter hints:
client_id
: Tenant IDclient_secret
: Application registration Client IDtenant_id
: Client secret value, note it is not the IDauthority_area
: Region, includingglobal
(global area, overseas area),china
(China area, 21Vianet), etc. Optional parameter, default isglobal
For Client Id, Client Secret, Tenant Id acquisition, refer to Azure documentation: Authenticate Python apps hosted on-premises to Azure resources