Added Kodo bounded dispatch queue for queuing and concurrency control before write/upload requests are sent to Kodo or the next-level Dataway:
Covers /v1/write/*, /v1/upload/*, and /v1/input/firehose requests; disk cache replay requests still follow the original synchronous sending path to avoid cache replay occupying the queue again.
Enabled by default, with default settings: 256 workers, 1024 waiting slots, 1GB maximum total body bytes for the queue, and 100ms enqueue wait time.
Dataway returns 503 when queue slots or queue body bytes reach the limit and the wait times out.
Added invalid token negative cache:
Short-term caching of results returned by Kodo for kodo.tokenNotFound and kodo.invalidClientToken, reducing request pressure on Kodo from repeated invalid tokens.
Enabled by default, with default TTL of 5m, caching up to 1000 tokens; supports configuration via DW_TOKEN_NEGATIVE_CACHE_ENABLED, DW_TOKEN_NEGATIVE_CACHE_TTL, DW_TOKEN_NEGATIVE_CACHE_MAX_KEYS.
Added token validation to aggregation interfaces, supporting writing the validated token back to the request query and X-Token.
Added validation for empty batches, empty points, empty aggregation algorithms, and illegal windows in aggregation payloads.
Changed aggregation data sending to a background worker queue, with 3 retries added; added Content-MD5 to aggregation/tail sampling payloads sent to Kodo.
Aggregation proxy mode reuses reverse proxy instances, with explicit validation for abnormal configurations like backend endpoint and pick key.
Optimized disk cache replay:
During cache replay, only the path/query from the cache is reused; the scheme and host use the current remote_host to avoid historical hosts saved in the cache affecting replay after cascading or configuration changes.
Default disk cache cleanup interval adjusted from 30s to 1s.
Added test coverage for switching to the next data file after reading EOF from disk cache, and continuous advancement of the read pointer during concurrent writes.
Optimized Sinker documentation: Starting from DataKit 2.0.0, non-point write APIs no longer carry Sinker headers, so it's usually no longer necessary to configure special __dataway_api routing rules for these APIs; compatibility rules for older versions are still documented.
The new sinker caching mechanism uses a hash feature (16 bytes) of the request for caching, instead of caching the complete request characteristics. It uses two-way hashing to minimize hash collisions, with a theoretical collision probability of n/2^128, where n is the number of cached keys.
Added TTL and capacity mechanisms to Sinker cache, further limiting memory usage: The TTL mechanism cleans up inactive cache entries, and the capacity mechanism ensures the number of keys in the cache does not exceed the specified limit. Both TTL and capacity limits further reduce the aforementioned hash collision probability.
Added configuration to optimize Dataway HTTP header returns, reducing Dataway public network traffic (#65).
Added time synchronization interface for DataKit to obtain more accurate Unix timestamps (#40).
Sinker:
Filtering conditions support nil judgment, i.e., determining whether a specific field exists (#41).
Added default rule setting, where requests that do not match any existing routing rules will be directed to the workspace corresponding to this default rule (#30).
The new version removed the functionality of directly supporting Sinker configuration in host installation mode. This feature will be supported in a new way in the future.
Data writing no longer performs line protocol decoding, but still reads the Body for signing.
Disk cache:
Added consumption pause strategy during cache cleanup when sending to the center fails; the next cache is not cleaned until the current cached request is successfully sent.