Data Forwarding¶
For data that requires long-term storage but has a low update frequency (such as logs), the data forwarding feature can automatically save it to object storage or forward it in real-time to external systems like Kafka. This feature filters user-required data based on rules, enabling low-cost long-term archiving while supporting subsequent secondary processing.
Once a rule is effective, you can quickly search the stored data by setting the query time range and the rule on the data forwarding page.
Function Principle¶
When data is forwarded to object storage, the workflow is as follows: User-reported data that matches the rule is first written line by line to a temporary file on the server's local disk. When the size of this temporary file accumulates to a preset value (e.g., 256MB) or the continuous writing time exceeds a set duration (e.g., 1 hour), the system automatically closes the current file and creates a new temporary file to continue receiving data.
Simultaneously, a background service continuously scans these closed temporary files, compresses them using the gzip format to reduce their size, and then uploads them to your specified object storage location according to a pre-defined path rule. When you need to search this stored data, the system locates the relevant files in the object storage based on the same path rule, downloads and decompresses them, and matches your input search criteria line by line.
File Format Description¶
The final file format stored in object storage is: compressed using gzip. After decompression, the file content appears as multiple lines of text, with each line corresponding to a raw data record, and the data is completely preserved in JSON format. Any empty lines in the file are automatically ignored by the system.
A typical data forwarding file for a log type looks similar to the following:
Where date is a required field, used to identify the time of the data entry, as a millisecond-level Unix timestamp. The message field contains the specific log content.
{"__docid":"L_1750649205520_d1cciupkac7k1683bhq0","__namespace":"backup_log","date":1750649205520,"date_ns":168000,"df_metering_size":-9223372036854775808,"filepath":"/var/log/datakit/gin.log","host":"X.local","log_read_lines":2,"message":"[GIN] 2025/06/23 - 11:26:43 | 200 | 1.012923708s | 127.0.0.1 | GET \"/metrics\"","message_length":87,"service":"default","source":"default","status":"unknown"}
{"__docid":"L_1750649205516_d1cciupkac7k1683bhqg","__namespace":"backup_log","date":1750649205516,"date_ns":897000,"df_metering_size":-9223372036854775808,"filepath":"/var/log/datakit/gin.log","host":"X.local","log_read_lines":1,"message":"[GIN] 2025/06/23 - 11:26:38 | 200 | 1.012696542s | 127.0.0.1 | GET \"/metrics\"","message_length":87,"service":"default","source":"default","status":"unknown"}
{"__docid":"L_1750649206520_d1cciupkac7k1683bhr0","__namespace":"backup_log","date":1750649206520,"date_ns":948000,"df_metering_size":-9223372036854775808,"filepath":"/var/log/datakit/log","host":"X.local","log_read_lines":150,"message":"2025-06-23T11:26:46.520+0800\tWARN\thost_processes\tprocess/input.go:332\tprocess: {\"pid\":411}, proc.PageFaults(): not implemented yet","message_length":130,"service":"default","source":"default","status":"unknown"}
{"__docid":"L_1750649205520_d1cciupkac7k1683bhrg","__namespace":"backup_log","date":1750649205520,"date_ns":419000,"df_metering_size":-9223372036854775808,"filepath":"/var/log/datakit/log","host":"X.local","log_read_lines":9,"message":"2025-06-23T11:26:43.876+0800\tWARN\tcontainer\tcontainer/impl.go:254\tendpoint unix:///var/run/crio/crio.sock does not exist, maybe it is not running, skip","message_length":151,"service":"default","source":"default","status":"unknown"}
{"__docid":"L_1750649205517_d1cciupkac7k1683bhs0","__namespace":"backup_log","date":1750649205517,"date_ns":79000,"df_metering_size":-9223372036854775808,"filepath":"/var/log/datakit/log","host":"X.local","log_read_lines":1,"message":"2025-06-23T11:26:38.365+0800\tWARN\thttp\thttpapi/http.go:494\tlistener.Close failed: close tcp [::]:9529: use of closed network connection","message_length":135,"service":"default","source":"default","status":"unknown"}
{"__docid":"L_1750649205517_d1cciupkac7k1683bhsg","__namespace":"backup_log","date":1750649205517,"date_ns":80000,"df_metering_size":-9223372036854775808,"filepath":"/var/log/datakit/log","host":"X.local","log_read_lines":2,"message":"2025-06-23T11:26:38.365+0800\tWARN\thttp\thttpapi/http.go:494\tlistener.Close failed: close tcp [::]:9529: use of closed network connection","message_length":135,"service":"default","source":"default","status":"unknown"}
File Naming and Storage Path¶
[{$path_prefix}/]{$workspace_uuid}/[{$data_type}/]
{$rule_name}/{$year}/{$month}/{$day}/{$hour}/{$time}-{$hostname}.gz
Parts enclosed in [] indicate "optional items". Please refer to the instructions below for details:
Variable |
Description |
Example | Remarks |
|---|---|---|---|
$path_prefix |
Path prefix | path/to/backup |
Optional, corresponds to the storage path option when creating a new backup rule Object storage does not support keys starting with /, so do not start with / |
$workspace_uuid |
Workspace ID | wksp_d9a1851859e040469d290409bc17cceb |
|
$data_type |
Backup data type, optional values:logging: Logsrum: RUMtracing: Tracingevent: Eventsaudit_event: Audit Events |
tracing |
Since logs are the default data type, for log type data, the {$data_type}/ part (i.e., logging/) must be omitted. |
$rule_name |
Rule name | backup_logging_for_test |
Corresponds to the rule name option when creating a new rule It is recommended to use English. |
$year |
Year of the log occurrence time, 4-digit | 2025 |
UTC timezone |
$month |
Month of the log occurrence time, 2-digit | 03 |
UTC timezone |
$day |
Day of the log occurrence time, 2-digit | 01 |
UTC timezone |
$hour |
Hour of the log occurrence time, 2-digit | 22 |
UTC timezone |
$time |
Occurrence time of the last log in the file Format: HHMMSSmmm (HourHourMinMinSecSec + 3-digit millisecond) |
220607889 |
UTC timezone |
$hostname |
First 16 characters of the MD5 hash of the hostname | c6a92aafa992599c |
When constructing files yourself, you can use the current file's crc64 or generate a 64-bit random number and convert it to hexadecimal. |
Path examples:
wksp_d9a1851859e040469d290409bc17cceb/backup_logging_for_test/2025/05/06/17/175950000-c6a92aafa992599c.gz
path/to/backup/wksp_d9a1851859e040469d290409bc17cceb/tracing/test-minio/2025/05/06/17/175950000-c6a92aafa992599c.gz
File Splitting Rules
- Time boundary: A single file only contains logs from the same hour; it never spans hours.
- Size boundary: The uncompressed raw file is controlled between 256 MB and 512 MB. After gzip compression, it is typically tens of MB to a hundred MB. Files that are too large or too small will reduce search efficiency.
External files can be uploaded to object storage following the format and path rules generated by the data forwarding rules. The console will search and display them in the same manner.
Get Started¶
-
Establish data forwarding rules that meet your business needs based on different archive types
◼︎ AWS S3
-
Manage forwarding rules through a series of operations in the data forwarding rules list