Data Security¶

In the era of cloud computing, data security is paramount. Possessing comprehensive data protection capabilities can enhance visibility and insights, automatically warn of security risks, thereby improving overall defense capabilities and ensuring data availability, security, and compliance.

When using Guance, its built-in tools perform risk assessment and processing on the received data.

How to Reduce Data Risks?¶

Guance collects monitoring information from your infrastructure and services and centrally manages it, facilitating your analysis and processing at any time. During this process, servers transmit various types of data. Servers used normally with Guance send various types of data content. Most data collected through normal use of Guance products does not contain personal privacy information. For potentially included unnecessary personal data, we provide detailed explanations and recommendations to prevent confusion. Guance offers multiple ways to help you reduce data risks.

Data Security Considerations on the DataKit Side¶

HTTPS Data Upload¶

All data from DataKit is uploaded using the HTTPS protocol, ensuring the security of data communication.

Limited Push Mechanism¶

The center cannot issue commands to DataKit for execution; all requests are initiated actively by DataKit. DataKit can only periodically pull some relevant configurations (such as Pipeline and blacklist configurations) from the center. The center cannot issue commands for DataKit to execute.

Field Value Masking During Tracing Collection¶

During Tracing collection, the execution process of some SQL statements may be collected. The field values in these SQL statements will be masked, for example:

SELECT name from class where name = 'zhangsan'

will be masked as

SELECT name from class where name = ?

Pipeline and Blacklist Mechanisms¶

If there is indeed some sensitive data in the data that cannot be removed during the collection process, specific functions in Pipeline (such as the cover() function, which can replace parts of a string with *) can be used to mask sensitive data (such as phone numbers, etc.).

Additionally, by configuring blacklist rules, the upload of some sensitive data can also be prevented.

Sensitive Data Scanning¶

The sensitive data scanning feature can be used to identify, mark, and edit data containing personal privacy and many other types of risky data. As a security line of defense, it can effectively prevent sensitive data from leaking out.

For more details, refer to Sensitive Data Scanning.

Logs¶

During the use of Guance product services, many log records are generated. Due to the strong correlation inherent in log data itself, specific rule processing is required during the collection-analysis process to filter massive amounts of log data.

By configuring sensitive fields for log data, members with corresponding permissions can only see the masked log data.

Data access permission control is another key method to reduce log data security risks. By configuring corresponding log data access query scopes for different roles, data isolation is achieved, serving the purpose of comprehensive management and filtering of sensitive data.

For more details, refer to Multi-Role Data Access Permission Control.

Snapshots¶

The snapshot service of Guance, as an instant data copy, contains exception data filtering conditions and data records. When facing the need to share monitoring data, by setting data masking rules or deciding on the sharing method when sharing a snapshot, an access link with specified viewing permissions can be generated, automatically forming a data protection shield.

For more details, refer to Snapshots.

RUM¶

When collecting data related to user access, the RUM (Real User Monitor) SDK can perform custom modifications and interception of the data to prevent the flow of sensitive data.

For more details, refer to SDK Data Interception and Data Modification.

Web RUM SDK¶

The following is an explanation of RUM SDK's usage and alternative mechanisms regarding Cookies (compliance disclosure).

Under the default configuration, to implement session identification and user identification functions, the Guance RUM SDK writes two types of Cookies in the monitored application:

Cookies prefixed with _gc_s_: Used to store information related to the current access session, in order to correlate and count user behavior within one access cycle.
Cookies prefixed with _gc_usr_: Used to store user identification information, used to persistently identify the same end-user across different access sessions.

The aforementioned Cookies do not contain the user's real identity information and are only used for session management of monitoring data and anonymous user identification.

If writing Cookies is not allowed in specific business scenarios (such as privacy compliance requirements, user refusal to use Cookies, browser restrictions, etc.), the RUM SDK supports the following initialization configuration:

sessionPersistence: "local-storage"

This stores the aforementioned session information and user identification in LocalStorage, thereby continuing to provide session and user correlation capabilities without writing any Cookies.

Session Replay Privacy Settings¶

Session Replay provides privacy controls to ensure that no company exposes sensitive data or personal data. And the data is encrypted and stored. The default privacy options for Session Replay are designed to protect end-user privacy and prevent sensitive organizational information from being collected.

Global Configuration¶

By enabling Session Replay, sensitive elements can be automatically masked so they are not recorded by the RUM SDK.

To enable your privacy settings, set defaultPrivacyLevel to mask-user-input, mask, or allow in your SDK configuration.

import { datafluxRum } from '@cloudcare/browser-rum'

datafluxRum.init({
  applicationId: '<DATAFLUX_APPLICATION_ID>',
  datakitOrigin: '<DATAKIT ORIGIN>',
  service: 'browser',
  env: 'production',
  version: '1.0.0',
  sessionSampleRate: 100,
  sessionReplaySampleRate: 100,
  trackInteractions: true,
  defaultPrivacyLevel: 'mask-user-input' | 'mask' | 'allow',
})

datafluxRum.startSessionReplayRecording()

After updating the configuration, you can override elements in the HTML document using the following privacy options:

Mask user input mode: Masks most form fields, such as input, textarea, and checkbox values, while recording all other text as is. Inputs are replaced with three asterisks (***), and textareas are obfuscated with x characters preserving space.

Note

By default, mask-user-input is the privacy setting when Session Replay is enabled.

Mask mode: Masks all HTML text, user input, images, and links. Text on the application is replaced with X, rendering the page as a wireframe.

Allow mode: Records all data.

Some limitations:

For data security considerations, regardless of the mode you configure for defaultPrivacyLevel, the following elements will always be masked:

Input elements of type password, email, and tel;
Elements with the autocomplete attribute, such as credit card numbers, expiration dates, and security codes.

Custom Configuration¶

Session Replay supports the masking function for sensitive elements. You can flexibly set the content that needs to be masked according to business requirements, such as sensitive information like phone numbers. The following are specific operation methods:

Configuring Masking via Element Attributes¶

You can add the data-gc-privacy attribute to elements that need masking, supporting the following four attribute values:

• allow: Allows data collection, no masking processing. • mask: Masks the content, displaying the content in a masked form. • mask-user-input: Masks user input, preventing the recording of sensitive input data. • hidden: Completely hides the content.

Example code:

<!-- Allow data collection -->
<div class="mobile" data-gc-privacy="allow">13523xxxxx</div>

<!-- Mask content -->
<div class="mobile" data-gc-privacy="mask">13523xxxxx</div>

<!-- Mask user input -->
<input class="mobile" data-gc-privacy="mask-user-input" value="13523xxxxx" />

<!-- Hide content -->
<div class="mobile" data-gc-privacy="hidden">13523xxxxx</div>

Configuring Masking via Element Class Names¶

Supports implementing the masking function by adding specific class names to elements. Currently, the following class names are supported:

• gc-privacy-allow: Allows data collection. • gc-privacy-mask: Masks content. • gc-privacy-mask-user-input: Masks user input. • gc-privacy-hidden: Completely hides content.

Example code:

<!-- Allow data collection -->
<div class="mobile gc-privacy-allow">13523xxxxx</div>

<!-- Mask content -->
<div class="mobile gc-privacy-mask">13523xxxxx</div>

<!-- Mask user input -->
<input class="mobile gc-privacy-mask-user-input" value="13523xxxxx" />

<!-- Hide content -->
<div class="mobile gc-privacy-hidden">13523xxxxx</div>

Using `shouldMaskNode` to Implement Custom Node Masking Strategies¶

In some special scenarios, it may be necessary to perform customized masking processing on specific DOM nodes. For example, in applications with high-security levels, it may be desirable to uniformly mask all text content containing numbers on a page. This requirement can be achieved by configuring the shouldMaskNode callback function to implement more flexible privacy control strategies.

import { datafluxRum } from '@cloudcare/browser-rum'

datafluxRum.init({
  applicationId: '<DATAFLUX_APPLICATION_ID>',
  datakitOrigin: '<DATAKIT ORIGIN>',
  service: 'browser',
  env: 'production',
  version: '1.0.0',
  sessionSampleRate: 100,
  sessionReplaySampleRate: 100,
  trackInteractions: true,
  defaultPrivacyLevel: 'mask-user-input' | 'mask' | 'allow',
  shouldMaskNode: (node, privacyLevel) => {
    if (node.nodeType === Node.TEXT_NODE) {
      // If it's a text node, check if the content contains numbers
      const textContent = node.textContent || ''
      return /\d+/.test(textContent)
    }
    return false
  },
})

datafluxRum.startSessionReplayRecording()

In the above example, the shouldMaskNode function will evaluate all text nodes. If the content contains numbers (such as amounts, phone numbers, etc.), it will automatically perform masking processing, thereby enhancing the privacy protection capability of user data.

Some Recommendations

Priority Rules:

• If both the data-gc-privacy attribute and class name are set, it is recommended to determine the priority according to the project documentation.
Applicable Scenarios:

• allow: Suitable for regular data that does not require masking. • mask: Suitable for sensitive data that needs to be displayed in a masked form, such as phone numbers. • mask-user-input: Suitable for scenarios where input content needs protection, such as password fields. • hidden: Suitable for content that you do not wish to display or record.
Best Practices:

• Prioritize simple and clear methods (such as class names or attributes) to ensure accurate configuration. • In high-sensitivity data scenarios, such as user privacy forms, it is recommended to use mask-user-input or hidden.

Through the above methods, you can flexibly configure masking rules for sensitive elements, improving data security and meeting business compliance requirements.