Data Security¶

In the era of cloud computing, data security is critical. Having comprehensive data protection capabilities can enhance visibility and insight, automatically warn about security risks, thereby improving overall defense capabilities, ensuring that data is available and securely compliant.

When using Guance, its built-in tools will evaluate and process the received data for risk.

How to reduce data risks?¶

Guance collects observability information from your infrastructure and services and manages it centrally, allowing you to analyze and handle it at any time. During this process, servers transmit various types of data. Servers using Guance normally send various types of data content. Most data collected by using Guance products does not contain personal privacy information. For potentially unnecessary personal data, we provide detailed explanations and recommendations to prevent confusion. Guance offers multiple methods to help you reduce data risks.

DataKit Side Data Security Considerations¶

HTTPS Data Upload¶

All DataKit data is uploaded using the HTTPS protocol, ensuring secure communication of data.

Limited Issuance Mechanism¶

The center cannot issue commands to DataKit for execution; all requests are initiated by DataKit. DataKit can only periodically pull some related configurations from the center (such as Pipeline and blacklist configurations). The center cannot issue commands to DataKit for execution.

Tracing Field Value Desensitization¶

During the tracing collection process, SQL statement execution processes may be collected, and the field values of these SQL statements will be desensitized, for example:

SELECT name from class where name = 'zhangsan'

will be desensitized into

SELECT name from class where name = ?

Pipeline and Blacklist Mechanism¶

If there are indeed some sensitive data in the data that cannot be removed during the collection process, then specific functions of the Pipeline (for example, the cover() function can replace some parts of a string with *) can be used to desensitize some sensitive data (such as phone numbers).

Additionally, configuring blacklist rules can also prevent some sensitive data from being uploaded.

Sensitive Data Scanning¶

The feature of sensitive data scanning can be used to identify, label, and edit data containing personal privacy and other risky data. As a line of defense, it can effectively prevent sensitive data from leaking out.

For more details, refer to Sensitive Data Scanning.

Logs¶

During the use of Guance product services, numerous log records will be generated. Due to the strong relevance of log data itself, specific rules must be applied during the collection-analysis process to filter massive amounts of log data.

By configuring sensitive fields for log data, members with corresponding permissions can only see desensitized log data.

Data access permission control is another key method to reduce the security risks of log data. By configuring corresponding log data access query scopes for different roles, data is isolated to achieve comprehensive management and filtering of sensitive data.

For more details, refer to Multi-role Data Access Permission Control.

Snapshots¶

As an instant data copy, Guance's snapshot service contains abnormal data filtering conditions and data records. When facing the need to share observational data, setting data desensitization rules or deciding on sharing methods when sharing snapshots can generate access links with specified viewing permissions, automatically forming a data protection shield.

For more details, refer to Snapshots.

RUM¶

When collecting data related to user visits, the RUM (Real User Monitor) SDK modifies and intercepts the data to prevent sensitive data from flowing.

For more details, refer to SDK Data Interception and Data Modification.

Session Replay Privacy Settings¶

Session Replay provides privacy controls to ensure that no company exposes sensitive data or personal data. Additionally, the data is stored encrypted. The default privacy options of Session Replay aim to protect end-user privacy and prevent sensitive organizational information from being collected.

Global Configuration¶

By enabling Session Replay, sensitive elements can be automatically blocked so they are not recorded by the RUM SDK.

To enable your privacy settings, set the defaultPrivacyLevel to mask-user-input, mask, or allow in your SDK configuration.

import { datafluxRum } from '@cloudcare/browser-rum'

datafluxRum.init({
  applicationId: '<DATAFLUX_APPLICATION_ID>',
  datakitOrigin: '<DATAKIT ORIGIN>',
  service: 'browser',
  env: 'production',
  version: '1.0.0',
  sessionSampleRate: 100,
  sessionReplaySampleRate: 100,
  trackInteractions: true,
  defaultPrivacyLevel: 'mask-user-input' | 'mask' | 'allow',
})

datafluxRum.startSessionReplayRecording()

After updating the configuration, you can override HTML document elements with the following privacy options:

Mask user input mode: Blocks most form fields such as inputs, text areas, and checkbox values, while recording all other text as-is. Inputs are replaced with three asterisks (***), and text areas are obfuscated with x characters preserving spaces.

Note

By default, mask-user-input is the privacy setting enabled when session replay is activated.

Mask mode: Blocks all HTML text, user input, images, and links. Text on the application is replaced with Xs, rendering the page as a wireframe.
Allow mode: Records all data.

Some restrictions:

For data security considerations, regardless of the defaultPrivacyLevel you configure, the following elements will always be blocked:

Input elements of password, email, and tel types;
Elements with an autocomplete attribute, such as credit card numbers, expiration dates, and security codes.

Custom Configuration¶

Session Replay supports blocking sensitive elements, allowing you to flexibly set the content to block based on business needs, such as phone numbers and other sensitive information. Below are specific operation methods:

Blocking via Element Attributes¶

You can add the data-gc-privacy attribute to elements that need to be blocked, supporting the following four attribute values:

• allow: Allows data collection without blocking.
• mask: Masks content, displaying it in masked form.
• mask-user-input: Masks user input to prevent recording sensitive input data.
• hidden: Completely hides the content.

Example code:

<!-- Allow data collection -->
<div class="mobile" data-gc-privacy="allow">13523xxxxx</div>

<!-- Mask content -->
<div class="mobile" data-gc-privacy="mask">13523xxxxx</div>

<!-- Mask user input -->
<input class="mobile" data-gc-privacy="mask-user-input" value="13523xxxxx" />

<!-- Hide content -->
<div class="mobile" data-gc-privacy="hidden">13523xxxxx</div>

Blocking via Element Class Names¶

Supports blocking functionality by adding specific class names to elements. Currently supported class names include:

• gc-privacy-allow: Allows data collection.
• gc-privacy-mask: Masks content.
• gc-privacy-mask-user-input: Masks user input.
• gc-privacy-hidden: Completely hides content.

Example code:

<!-- Allow data collection -->
<div class="mobile gc-privacy-allow">13523xxxxx</div>

<!-- Mask content -->
<div class="mobile gc-privacy-mask">13523xxxxx</div>

<!-- Mask user input -->
<input class="mobile gc-privacy-mask-user-input" value="13523xxxxx" />

<!-- Hide content -->
<div class="mobile gc-privacy-hidden">13523xxxxx</div>

Using `shouldMaskNode` to Implement Custom Node Blocking Policies¶

In certain special scenarios, customized blocking processing may be required for specific DOM nodes. For example, in applications with higher security levels, it may be desirable to uniformly block all text content on the page that contains numerical values. This requirement can be achieved through the configuration of the shouldMaskNode callback function to implement more flexible privacy control policies.

import { datafluxRum } from '@cloudcare/browser-rum'

datafluxRum.init({
  applicationId: '<DATAFLUX_APPLICATION_ID>',
  datakitOrigin: '<DATAKIT ORIGIN>',
  service: 'browser',
  env: 'production',
  version: '1.0.0',
  sessionSampleRate: 100,
  sessionReplaySampleRate: 100,
  trackInteractions: true,
  defaultPrivacyLevel: 'mask-user-input' | 'mask' | 'allow',
  shouldMaskNode: (node, privacyLevel) => {
    if (node.nodeType === Node.TEXT_NODE) {
      // If it's a text node, check if the content contains numbers
      const textContent = node.textContent || ''
      return /\d+/.test(textContent)
    }
    return false
  },
})

datafluxRum.startSessionReplayRecording()

In the above example, the shouldMaskNode function evaluates all text nodes. If the content contains numbers (such as amounts or phone numbers), it will automatically be blocked, thereby enhancing the privacy protection capability of user data.

Some Suggestions

Priority Rules:

• If both data-gc-privacy attributes and class names are set simultaneously, follow the priority rules outlined in the project documentation.
Applicable Scenarios:

• allow: Suitable for regular data that does not need to be blocked.
• mask: Suitable for sensitive data requiring masked display, such as phone numbers.
• mask-user-input: Suitable for protecting input content, such as password fields.
• hidden: Suitable for content that should not be displayed or recorded.
Best Practices:

• Prioritize simple and clear methods (such as class names or attributes) to ensure accurate configuration.
• In high-sensitivity data scenarios, such as user privacy forms, consider using mask-user-input or hidden.

Through the above methods, you can flexibly configure blocking rules for sensitive elements, enhancing data security and meeting business compliance requirements.