Data Security¶

In the era of cloud computing, data security is paramount. Possessing comprehensive data protection capabilities enhances visibility and insights, automatically warns of security risks, thereby improving overall defense capabilities and ensuring data is available, secure, and compliant.

When using Guance, its built-in tools perform risk assessment and processing on the received data.

How to Reduce Data Risks?¶

Guance collects monitoring information from your infrastructure and services, and manages it centrally for your convenient analysis and processing. During this process, servers transmit various types of data. Servers operating normally with Guance send various types of data content. The data collected through normal use of Guance products mostly does not contain personal privacy information. For potentially included non-essential personal data, we provide detailed explanations and recommendations to prevent confusion. Guance offers multiple ways to help you reduce data risks.

Data Security Considerations on the DataKit Side¶

HTTPS Data Upload¶

All data from DataKit is uploaded using the HTTPS protocol, ensuring the security of data communication.

Limited Push Mechanism¶

The center cannot issue commands to DataKit for execution; all requests are initiated actively by DataKit. DataKit can only periodically pull some relevant configurations (such as Pipeline and blacklist configurations) from the center. The center cannot issue commands for DataKit to execute.

Field Value Masking During Tracing Collection¶

During Tracing collection, some SQL statement execution processes might be collected. The field values in these SQL statements will be masked, for example:

SELECT name from class where name = 'zhangsan'

Will be masked into

SELECT name from class where name = ?

Pipeline and Blacklist Mechanism¶

If the data indeed contains some sensitive data that cannot be removed during the collection process, specific functions in Pipeline (such as the cover() function which can replace parts of a string with *) can be used to mask sensitive data (like phone numbers, etc.).

Additionally, by configuring blacklist rules, the upload of some sensitive data can also be blocked.

Sensitive Data Scanning¶

The sensitive data scanning feature can be used to identify, tag, and edit data containing various risks such as personal privacy. It serves as a security line of defense, effectively preventing sensitive data from leaking out.

For more details, refer to Sensitive Data Scanning.

Logs¶

The process of using Guance's product services generates numerous log records. Due to the极强的关联性 (high correlation) of log data itself, specific rules need to be applied during the collection-analysis process to filter massive amounts of log data.

By configuring sensitive fields for log data, members with corresponding permissions can only see the masked log data.

Data access permission control is another key method to reduce log data security risks. By configuring corresponding log data access and query scopes for different roles, data is isolated, achieving the purpose of comprehensive management and filtering of sensitive data.

For more details, refer to Multi-Role Data Access Control.

Snapshots¶

Guance's snapshot service, as an instant data copy, contains abnormal data filter conditions and data records. When facing the need to share monitoring data, setting data masking rules or deciding on the sharing method when sharing a snapshot can generate an access link with specified viewing permissions, automatically forming a data protection shield.

For more details, refer to Snapshots.

RUM¶

When collecting relevant data on user access, the RUM (Real User Monitor) SDK can perform custom modifications and interception of data to prevent the flow of sensitive data.

For more details, refer to SDK Data Interception and Modification.

Web RUM SDK¶

The following is an explanation of RUM SDK's use of Cookies and alternative mechanisms (compliance disclosure).

Under default configuration, the Guance RUM SDK writes two types of Cookies into the monitored application to achieve session identification and user identification functions:

Cookies prefixed with _gc_s_: Used to store information related to the current access session, to associate and statistics user behavior within one access cycle.
Cookies prefixed with _gc_usr_: Used to store user identification information, used to continuously identify the same end-user across different access sessions.

The aforementioned Cookies do not contain the user's real identity information and are only used for session management and anonymous user identification in monitoring data.

In specific business scenarios (such as privacy compliance requirements, user refusal to use Cookies, browser restrictions, etc.) where writing Cookies is not allowed, the RUM SDK supports initialization configuration:

sessionPersistence: "local-storage"

This stores the aforementioned session information and user identification in LocalStorage, thus continuing to provide session and user association capabilities without writing any Cookies.

Session Replay Privacy Settings¶

Session Replay provides privacy controls to ensure that no company exposes sensitive data or personal data. And the data is stored encrypted. The default privacy options for Session Replay are designed to protect end-user privacy and prevent sensitive organizational information from being collected.

Global Configuration¶

By enabling Session Replay, sensitive elements can be automatically masked, preventing them from being recorded by the RUM SDK.

To enable your privacy settings, set defaultPrivacyLevel to mask-user-input, mask, or allow in your SDK configuration.

import { datafluxRum } from '@cloudcare/browser-rum'

datafluxRum.init({
  applicationId: '<DATAFLUX_APPLICATION_ID>',
  datakitOrigin: '<DATAKIT ORIGIN>',
  service: 'browser',
  env: 'production',
  version: '1.0.0',
  sessionSampleRate: 100,
  sessionReplaySampleRate: 100,
  trackInteractions: true,
  defaultPrivacyLevel: 'mask-user-input' | 'mask' | 'allow',
})

datafluxRum.startSessionReplayRecording()

After updating the configuration, you can override elements of the HTML document using the following privacy options:

Mask user input mode: Masks most form fields, such as input, textarea, and checkbox values, while recording all other text as is. Inputs are replaced with three asterisks (***), text areas are obfuscated with x-characters preserving space.

Note

By default, mask-user-input is the privacy setting when session replay is enabled.

Mask mode: Masks all HTML text, user input, images, and links. Text on the application is replaced with X, rendering the page as a wireframe.

Allow mode: Records all data.

Some limitations:

For data security considerations, regardless of the defaultPrivacyLevel mode you configure, the following elements will be masked:

Input elements of type password, email, and tel;
Elements with the autocomplete attribute, such as credit card numbers, expiration dates, and security codes.

Custom Configuration¶

Session Replay supports the masking function for sensitive elements. You can flexibly set the content that needs to be masked according to business requirements, such as sensitive information like phone numbers. The following are the specific operation methods:

Configuring Masking via Element Attributes¶

You can add the data-gc-privacy attribute to elements that need masking, supporting the following four attribute values:

• allow: Allows data collection, no masking processing. • mask: Masks the content, displaying the content in a masked form. • mask-user-input: Masks user input, preventing the recording of sensitive input data. • hidden: Completely hides the content.

Example code:

<!-- Allow data collection -->
<div class="mobile" data-gc-privacy="allow">13523xxxxx</div>

<!-- Mask content -->
<div class="mobile" data-gc-privacy="mask">13523xxxxx</div>

<!-- Mask user input -->
<input class="mobile" data-gc-privacy="mask-user-input" value="13523xxxxx" />

<!-- Hide content -->
<div class="mobile" data-gc-privacy="hidden">13523xxxxx</div>

Configuring Masking via Element Class Names¶

Supports implementing the masking function by adding specific class names to elements. The following class names are currently supported:

• gc-privacy-allow: Allows data collection. • gc-privacy-mask: Masks content. • gc-privacy-mask-user-input: Masks user input. • gc-privacy-hidden: Completely hides content.

Example code:

<!-- Allow data collection -->
<div class="mobile gc-privacy-allow">13523xxxxx</div>

<!-- Mask content -->
<div class="mobile gc-privacy-mask">13523xxxxx</div>

<!-- Mask user input -->
<input class="mobile gc-privacy-mask-user-input" value="13523xxxxx" />

<!-- Hide content -->
<div class="mobile gc-privacy-hidden">13523xxxxx</div>

Using `shouldMaskNode` to Implement Custom Node Masking Strategies¶

In certain special scenarios, it might be necessary to perform customized masking processing on specific DOM nodes. For example, in applications with high-security levels, it might be desirable to uniformly mask all text content containing numerical values on a page. This requirement can be achieved by configuring the shouldMaskNode callback function for more flexible privacy control strategies.

import { datafluxRum } from '@cloudcare/browser-rum'

datafluxRum.init({
  applicationId: '<DATAFLUX_APPLICATION_ID>',
  datakitOrigin: '<DATAKIT ORIGIN>',
  service: 'browser',
  env: 'production',
  version: '1.0.0',
  sessionSampleRate: 100,
  sessionReplaySampleRate: 100,
  trackInteractions: true,
  defaultPrivacyLevel: 'mask-user-input' | 'mask' | 'allow',
  shouldMaskNode: (node, privacyLevel) => {
    if (node.nodeType === Node.TEXT_NODE) {
      // If it's a text node, check if the content contains numbers
      const textContent = node.textContent || ''
      return /\d+/.test(textContent)
    }
    return false
  },
})

datafluxRum.startSessionReplayRecording()

In the example above, the shouldMaskNode function judges all text nodes. If the content contains numbers (such as amounts, phone numbers, etc.), it automatically performs masking processing, thereby enhancing the privacy protection capability of user data.

Some Recommendations

Priority Rules: • If both the data-gc-privacy attribute and a class name are set, it is recommended to determine the priority according to the project documentation.
Applicable Scenarios: • allow: Suitable for regular data that does not require masking. • mask: Suitable for sensitive data that needs to be displayed masked, such as phone numbers. • mask-user-input: Suitable for scenarios where input content needs protection, such as password fields. • hidden: Suitable for content that should not be displayed or recorded.
Best Practices: • Prioritize simple and clear methods (such as class names or attributes), ensuring accurate configuration. • In high-sensitivity data scenarios, such as user privacy forms, it is recommended to use mask-user-input or hidden.

Through the above methods, you can flexibly configure masking rules for sensitive elements, improving data security and meeting business compliance requirements.