Skip to content

Cloud Integration

This document mainly introduces the access and processing of synchronizing data from cloud platforms such as Alibaba Cloud and AWS using the "Cloud Sync" series of script packages in the script market.

Tip

Always use the latest version of DataFlux Func for operations.

Tip

New features will be continuously added to this script package. Please keep an eye on this document page.

1. Prerequisites

  1. Log in to Guance and register an account.

1.1 If you have activated DataFlux Func (Automata)

All prerequisites are automatically installed. No additional prerequisites are required. Please proceed to script installation.

1.2 If you deploy Func yourself

# Download DataFlux Func GSE
/bin/bash -c "$(curl -fsSL func.guance.com/download)" -- --for=GSE

# Install DataFlux Func
sudo /bin/bash {installation directory}/run-portable.sh

1.2.1 GSE Edition vs Original Edition

The following are the differences between the GSE Edition and the Original Edition:

Comparison Item GSE Edition Original Edition
Pre-installed Scripts Guance Script Market Scripts:
1. Integration Core
2. Self-built Inspection Core
3. Algorithm Library
4. Tool Kit
Automatically updated to the latest version upon each restart.
None
Pre-installed Python Packages In addition to the packages required by DataFlux Func:
1. Third-party packages required by official script sets
2. Mathematical packages such as numpy, pandas
3. Other packages such as jinja2, mailer, openpyxl
Only the packages required by DataFlux Func
Pre-added Script Market Guance Script Market None
Access to Public Network Initialization processing of pre-installed script sets
Requires DataFlux Func itself to access the public network,
otherwise it may fail to start normally
Not required
Tip

If the user has already deployed the Original Edition of Func, they can directly re-download and install the GSE Edition.

For more information, refer to: Quick Start

  • After installation, create a new connector, select the type as Guance, and configure the workspace's API Key ID and API Key in the connector.

2. Script Installation

Here, assume that you need to collect Alibaba Cloud monitoring data and write it to Guance.

Tip

Please prepare the Alibaba Cloud AK with the required permissions in advance (for simplicity, you can directly grant global read-only permissions ReadOnlyAccess).

2.1 Install Specific Collectors

To synchronize cloud resource monitoring data, we generally need to install two scripts: one for collecting basic information of the corresponding cloud assets, and another for collecting cloud monitoring information.

If you need to collect corresponding logs, you also need to enable the corresponding log collection script. If you need to collect bills, you need to enable the cloud bill collection script.

Taking Alibaba Cloud ECS collection as an example, in "Management / Script Market", click and install the corresponding script packages in sequence:

  • "Integration (Alibaba Cloud-Cloud Monitoring)" (ID: integration_alibabacloud_monitor)
  • "Integration (Alibaba Cloud-ECS)" (ID: integration_alibabacloud_ecs)

After clicking [Install], enter the corresponding parameters: Alibaba Cloud AK, Alibaba Cloud account name.

Click [Deploy Startup Script], and the system will automatically create the Startup script set and configure the corresponding startup scripts.

Additionally, in "Management / Scheduled Tasks (Old Version: Automatic Trigger Configuration)", you can see the corresponding scheduled tasks (Old Version: Automatic Trigger Configuration). Click [Execute] to immediately execute once without waiting for the scheduled time. After a while, you can view the execution task records and corresponding logs.

2.2 Verify Synchronization Status

  1. In "Management / Scheduled Tasks (Old Version: Automatic Trigger Configuration)", confirm whether the corresponding tasks have the corresponding scheduled tasks (Old Version: Automatic Trigger Configuration). At the same time, you can view the corresponding task records and logs to check for any exceptions.
  2. In the Guance platform, check whether asset information exists in "Infrastructure / Custom".
  3. In the Guance platform, check whether there is corresponding monitoring data in "Metrics".

3. Code Explanation

The following is a step-by-step explanation of the code in this example.

In fact, all "Integration" type scripts can be implemented using similar methods.

import Section

To use the scripts provided by the script market normally, after installing the script package, you need to introduce these components through the import method.

from integration_core__runner import Runner
import integration_alibabacloud_monitor__main as aliyun_monitor

Runner is the actual launcher of all collectors. In any case, you need to introduce Runner to start the collector. aliyun_monitor is the "Alibaba Cloud-Cloud Monitoring" collector required in this example.

Account Configuration Section

To call the cloud platform's API normally, users also need to provide the corresponding platform's AK for the collector to use.

account = {
    'ak_id'    : '<Alibaba Cloud AK ID with appropriate permissions>',
    'ak_secret': '<Alibaba Cloud AK Secret with appropriate permissions>',

    'extra_tags': {
        'account_name': 'My Alibaba Cloud Account',
    }
}

For Alibaba Cloud AK/SK creation, refer to: Create AccessKey

In addition to the basic ak_id and ak_secret, some cloud platform accounts may also need to provide additional content, such as AWS using iam roles, which requires configuring assume_role_arn, role_session_name, etc. For specific details, refer to Amazon (AWS) Code Example.

Finally, each account also allows adding an extra_tags field, allowing users to add the same tags uniformly to the collected data, making it easier to identify different data accounts in Guance.

The Key and Value of extra_tags are both strings, with no content restrictions, and support multiple Key and Value.

In this example, we configure { 'account_name': 'My Alibaba Cloud Account' } for extra_tags, adding the account_name="My Alibaba Cloud Account" tag to all data of this account.

Function Definition Section

In DataFlux Func, all code must be included in a function decorated with @DFF.API(...).

@DFF.API('Execute Cloud Asset Synchronization')
def run():
    # Specific code omitted ...

The first parameter of the @DFF.API(...) decorator is the title, with arbitrary content.

For integration scripts, they are ultimately run through "Scheduled Tasks (Old Version: Automatic Trigger Configuration)". Only functions with the @DFF.API(...) decorator can be created as "Scheduled Tasks (Old Version: Automatic Trigger Configuration)".

Collector Configuration Section

In addition to configuring the corresponding cloud platform account, the collector also needs to be configured.

The configuration of the collector can be found in the specific collector's documentation. This article only provides usage hints here.

Basic Configuration

collector_configs = {
    'targets': [
        {
            'namespace': 'acs_ecs_dashboard', # Cloud monitoring namespace
            'metrics'  : ['*cpu*', '*mem*'],  # Cloud monitoring metrics containing cpu, mem data
        },
    ],
}
collectors = [
    aliyun_monitor.DataCollector(account, collector_configs),
]

Alibaba Cloud monitoring requires configuring the collection targets. In this example, we specify to only collect metrics related to CPU and memory in ECS.

Advanced Configuration

# Metric filter
def filter_ecs_metric(instance, namespace='acs_ecs_dashboard'):
    '''
    Collect metric data where instance_id is within ['xxxx']
    '''
    # return True
    instance_id = instance['tags'].get('InstanceId')
    if instance_id in ['xxxx']:
        return True
    return False

def after_collect_metric(point):
    '''
    Supplement tags for the collected data
    '''
    if point['tags']['name'] == 'xxx':
        point['tags']['custom_tag'] = 'c1'
    return point

collector_configs = {
    'targets': [
        {
            'namespace': 'acs_ecs_dashboard', # Cloud monitoring namespace
            'metrics'  : ['*cpu*', '*mem*'],  # Cloud monitoring metrics containing cpu, mem data
        },
    ],
}
collectors = [
    aliyun_monitor.DataCollector(account, collector_configs, filters=filter_ecs_metric, after_collect=after_collect_metric)),
]
  • filters: Filter function. Filters the collected data (not every collector supports filters. Please check the specific collector documentation for "Configure Filters"). After defining the filter conditions, the function returns True to indicate that the condition is met and needs to be collected, and returns False to indicate that the condition is met but does not need to be collected. Please configure flexibly according to your business.
  • after_collect: Custom after_collect function to perform secondary processing on the collected data. Use cases: log data splitting, adding extra fields to field/tags, etc. Note: The return value of this function exists as the data to be reported. It is recommended that you only modify the incoming point or add a series of points according to the original point structure. If you return empty or False, it means that all points collected by the collector will not be reported.

Finally, you need to use the account configuration from the above and the collector configuration here to generate specific "Collector Instances".

Startup Execution Section

The operation of the collector requires a unified Runner launcher to run.

The launcher needs to be initialized with the specific "Collector Instances" generated above and call the run() function to start the operation.

The launcher will traverse all incoming collectors and sequentially report the collected data to DataKit (the default DataKit connector ID is datakit).

Runner(collectors).run()

After writing the code, if you are not sure whether the configuration is correct, you can add the debug=True parameter to the launcher to run it in debug mode.

The launcher running in debug mode will perform the data collection operation normally, but will not write to DataKit in the end, as follows:

Runner(collectors, debug=True).run()

If the DataKit connector ID to be written is not the default datakit, you can add datakit_id="<DataKit ID>" to the launcher to specify the DataKit connector ID, as follows:

Runner(collectors, datakit_id='<DataKit ID>').run()

4. Other Cloud Vendor Code References

The configuration methods of other cloud vendors are similar to Alibaba Cloud.

Amazon (AWS)

Taking the collection of "EC2 Instance Objects" and "EC2-related Monitoring Metrics" as an example:

from integration_core__runner import Runner
import integration_aws_ec2__main as aws_ec2
import integration_aws_cloudwatch__main as aws_cloudwatch

# Account configuration
# AWS supports users to bring in iam roles to collect resources
# If you need to use roles, please configure: assume_role_arn, role_session_name
# If multi-factor authentication (MFA) is enabled, please configure: serial_number, token_code
account = {
    'ak_id'            : '<AWS AK ID with appropriate permissions>',
    'ak_secret'        : '<AWS AK Secret with appropriate permissions>',
    'assume_role_arn'  : '<Resource name (ARN) of the role to be brought in>',
    'role_session_name': '<Role session name>',
    'serial_number'    : '<MFA device identifier>',
    'token_code'       : '<One-time code provided by the MFA device, optional>',
    'extra_tags': {
        'account_name': 'My AWS Account',
    }
}

@DFF.API('Execute Cloud Asset Synchronization')
def run():
    regions = ['cn-northwest-1']

    # Collector configuration
    ec2_configs = {
        'regions': regions,
    }
    cloudwatch_configs = {
        'regions': regions,
        'targets': [
            {
                'namespace': 'AWS/EC2',
                'metrics'  : ['*cpu*'],
            },
        ],
    }
    collectors = [
        aws_ec2.DataCollector(account, ec2_configs),
        aws_cloudwatch.DataCollector(account, cloudwatch_configs),
    ]

    # Startup execution
    Runner(collectors).run()

Tencent Cloud

Taking the collection of "CVM Instance Objects" and "CVM-related Monitoring Metrics" as an example:

from integration_core__runner import Runner
import integration_tencentcloud_cvm__main as tencentcloud_cvm
import integration_tencentcloud_monitor__main as tencentcloud_monitor

# Account configuration
account = {
    'ak_id'    : '<Tencent Cloud Secret ID with appropriate permissions>',
    'ak_secret': '<Tencent Cloud Secret Key with appropriate permissions>',

    'extra_tags': {
        'account_name': 'My Tencent Cloud Account',
    }
}

@DFF.API('Execute Cloud Asset Synchronization')
def run():
    regions = ['ap-shanghai']

    # Collector configuration
    cvm_configs = {
        'regions': regions,
    }
    monitor_configs = {
        'regions': regions,
        'targets': [
            {
                'namespace': 'QCE/CVM',
                'metrics'  : ['*cpu*'],
            },
        ],
    }
    collectors = [
        tencentcloud_cvm.DataCollector(account, cvm_configs),
        tencentcloud_monitor.DataCollector(account, monitor_configs),
    ]

    # Startup execution
    Runner(collectors).run()

Microsoft Cloud

Taking the collection of "CVM Instance Objects" and "CVM-related Monitoring Metrics" as an example:

from integration_core__runner import Runner
import integration_azure_vm__main as vm_main
import integration_azure_monitor__main as monitor_main

# Account configuration
account = {
    "client_id"     : "<Azure Client Id>",
    "client_secret" : "<Azure Client Secret>",
    "tenant_id"     : "<Azure Tenant Id>",
    "authority_area": "<Azure Area, Default global>",
    "extra_tags": {
        "account_name": "<Your Account Name>",
    }
}

subscriptions = "<Azure Subscriptions (Multiple needs to be separated by  ',')>"
subscriptions = subscriptions.split(',')

# Collector configuration
collector_configs = {
    'subscriptions': subscriptions,
}

monitor_configs = {
    'targets': [
        {
            'namespace': 'Microsoft.Compute/virtualMachines',
            'metrics'  : [
                'CPU*'
            ],
        },
    ],
}

@DFF.API('Execute Microsoft Cloud VM Resource Collection')
def run():
    collectors = [
        vm_main.DataCollector(account, collector_configs),
        monitor_main.DataCollector(account, monitor_configs),
    ]

    Runner(collectors).run()

Microsoft Cloud account parameter hints:

  • client_id: Tenant ID
  • client_secret: Application registration Client ID
  • tenant_id: Client secret value, note it is not the ID
  • authority_area: Region, including global (global area, overseas area), china (China area, 21Vianet), etc. Optional parameter, default is global

For Client Id, Client Secret, Tenant Id acquisition, refer to Azure documentation: Authenticate Python apps hosted on-premises to Azure resources

Feedback

Is this page helpful? ×