Dataway¶
Introduction¶
DataWay is the data gateway of Guance. All data reported by collectors to Guance must go through the DataWay gateway.
Dataway Installation¶
- Create Dataway
In the Guance management backend under the "Data Gateway" page, click "Create Dataway". Input a name and binding address, then click "Create".
After successful creation, a new Dataway will be automatically created along with the installation script for Dataway.
Info
The binding address is the Dataway gateway address, which must include the full HTTP address, for example http(s)://1.2.3.4:9528
, including protocol, host address, and port. The host address generally can use the IP address of the machine where Dataway is deployed, or you can specify it as a domain name; the domain name must be properly resolved.
Note: Ensure that the collector can access this address, otherwise the data collection will fail.
- Install Dataway
DW_KODO=http://kodo_ip:port \
DW_TOKEN=<tkn_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX> \
DW_UUID=<YOUR_UUID> \
bash -c "$(curl https://static.guance.com/dataway/install.sh)"
After installation, in the installation directory, a dataway.yaml file will be generated. Its content example is as follows, which can be manually modified and takes effect after restarting the service.
dataway.yaml (Click to expand)
# ============= DATAWAY CONFIG =============
# Dataway UUID, we can get it during the creation of a new dataway
uuid:
# It's the workspace token, most of the time, it's
# system worker space's token.
token:
# secret_token used under sinker mode, and to check if incoming datakit
# requests are valid.
secret_token:
# If __internal__ token allowed? If ok, the data/request will direct to
# the workspace with the token above
enable_internal_token: false
# Is empty token allowed? If ok, the data/request will direct to
# the workspace with the token above
enable_empty_token: false
# Is dataway cascaded? For cascaded Dataway, it's remote_host is
# another Dataway and not Kodo.
cascaded: false
# kodo(next dataway) related configurations
remote_host:
http_timeout: 30s
http_max_idle_conn_perhost: 0 # default to CPU cores
http_max_conn_perhost: 0 # default no limit
insecure_skip_verify: false
http_client_trace: false
max_conns_per_host: 0
sni: ""
# dataway API configurations
bind: 0.0.0.0:9528
# disable 404 page
disable_404page: false
# dataway TLS file path
tls_crt:
tls_key:
# enable pprof
pprof_bind: localhost:6060
api_limit_rate : 100000 # 100K
max_http_body_bytes : 67108864 # 64MB
copy_buffer_drop_size : 262144 # 256KB, if copy buffer memory larger than this, this memory released
reserved_pool_size: 4096 # reserved pool size for better GC
within_docker: false
log_level: info
log: log
gin_log: gin.log
cache_cfg:
# cache disk path
dir: "disk_cache"
# disable cache
disabled: false
clean_interval: "10s"
# in MB, max single data package size in disk cache, such as HTTP body
max_data_size: 100
# in MB, single disk-batch(single file) size
batch_size: 128
# in MB, max disk size allowed to cache data
max_disk_size: 65535
# expire duration, default 7 days
expire_duration: "168h"
prometheus:
listen: "localhost:9090"
url: "/metrics"
enable: true
#sinker:
# etcd:
# urls:
# - http://localhost:2379 # one or multiple etcd host
# dial_timeout: 30s
# key_space: "/dw_sinker" # subscribe to the etcd key
# username: "dataway"
# password: "<PASSWORD>"
# #file:
# # path: /path/to/sinker.json
The Dataway pod yaml is as follows:
dataway-deploy.yaml (Click to expand)
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: deployment-utils-dataway
name: dataway
namespace: utils
spec:
replicas: 2
selector:
matchLabels:
app: deployment-utils-dataway
template:
metadata:
labels:
app: deployment-utils-dataway
annotations:
datakit/logs: |
[
{
"disable": false,
"source": "dataway",
"service": "dataway",
"multiline_match": "^\\d{4}|^\\[GIN\\]"
}
]
datakit/prom.instances: |
[[inputs.prom]]
url = "http://$IP:9090/metrics"
source = "dataway"
measurement_name = "dw"
interval = "10s"
disable_instance_tag = true
[inputs.prom.tags]
service = "dataway"
instance = "$PODNAME" # we can set as "guangzhou-$PODNAME"
spec:
affinity:
podAffinity: {}
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- deployment-utils-dataway
topologyKey: kubernetes.io/hostname
containers:
- image: pubrepo.guance.com/dataflux/dataway:1.9.0
imagePullPolicy: IfNotPresent
name: dataway
env:
- name: DW_REMOTE_HOST
value: "http://kodo.forethought-kodo:9527"
- name: DW_BIND
value: "0.0.0.0:9528"
- name: DW_UUID
value: "agnt_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # Fill in the real Dataway UUID here
- name: DW_TOKEN
value: "tkn_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # Fill in the real Dataway token here, usually the system workspace token
- name: DW_PROM_LISTEN
value: "0.0.0.0:9090"
ports:
- containerPort: 9528
name: 9528tcp01
protocol: TCP
volumeMounts:
- mountPath: /usr/local/cloudcare/dataflux/dataway/cache
name: dataway-cache
resources:
limits:
cpu: '4'
memory: 4Gi
requests:
cpu: 100m
memory: 512Mi
# nodeSelector:
# key: string
imagePullSecrets:
- name: registry-key
restartPolicy: Always
volumes:
- hostPath:
path: /root/dataway_cache
type: DirectoryOrCreate
name: dataway-cache
---
apiVersion: v1
kind: Service
metadata:
name: dataway
namespace: utils
spec:
ports:
- name: 9528tcp02
port: 9528
protocol: TCP
targetPort: 9528
nodePort: 30928
selector:
app: deployment-utils-dataway
type: NodePort
In dataway-deploy.yaml, you can modify Dataway configurations via environment variables, see here.
You can also attach a dataway.yaml using ConfigMap, but it must be mounted as /usr/local/cloudcare/dataflux/dataway/dataway.yaml:
Notes
- Dataway can only run on Linux systems (currently only Linux arm64/amd64 binaries are released)
- During HOST installation, the Dataway installation path is /usr/local/cloudcare/dataflux/dataway
- Kubernetes sets a default resource limit of 4000m/4Gi, which can be adjusted according to actual needs. The minimum requirement is 100m/512Mi
- Verify Dataway Installation
After installation, wait a moment and refresh the "Data Gateway" page. If you see a version number in the "Version Information" column of the newly added data gateway, it indicates that this Dataway has successfully connected to the Guance center, and front-end users can connect data through it.
After Dataway successfully connects to the Guance center, log in to the Guance console, navigate to the "Integration" / "DataKit" page, view all Dataway addresses, select the needed Dataway gateway address, obtain the DataKit installation command, and execute it on the server to start collecting data.
Manage DataWay¶
Delete DataWay¶
In the Guance management backend under the "Data Gateway" page, select the DataWay to delete, click "Configuration", in the popped-up Edit DataWay dialog box, click the "Delete" button at the bottom left corner.
Warning
After deleting DataWay, you also need to log into the server where the DataWay gateway is deployed, stop the DataWay process, and then delete the installation directory to completely delete DataWay.
Upgrade DataWay¶
In the Guance management backend under the "Data Gateway" page, if there is an upgradable version for DataWay, there will be an upgrade prompt in the version information section.
Dataway Service Management¶
When installing Dataway on a host, you can manage the Dataway service with the following commands.
# Start
$ systemctl start dataway
# Restart
$ systemctl restart dataway
# Stop
$ systemctl stop dataway
For Kubernetes, simply restart the corresponding Pod.
Environment Variables¶
Host Installation Supported Environment Variables¶
Host installation is no longer recommended, and new configuration items do not support configuration via command-line parameters. If you cannot change the deployment method, it is suggested to manually modify the corresponding configuration after installation (upgrade). Default configurations are shown in the default configuration example above.
When installing on a host, the following environment variables can be injected into the installation command:
Env | Type | Required | Description | Example Value |
---|---|---|---|---|
DW_BIND | string | N | Dataway HTTP API binding address, default 0.0.0.0:9528 |
|
DW_CASCADED | boolean | N | Whether Dataway is cascaded | true |
DW_HTTP_CLIENT_TRACE | boolean | N | As an HTTP client, Dataway can collect some relevant metrics that are eventually output in its Prometheus metrics | true |
DW_KODO | string | Y | Kodo address, or next Dataway address, format http://host:port |
|
DW_TOKEN | string | Y | Usually the system workspace data Token | |
DW_UPGRADE | boolean | N | Set to 1 during upgrades | |
DW_UUID | string | Y | Dataway UUID, generated when creating a new Dataway in the system workspace | |
DW_TLS_CRT | file-path | N | Specify HTTPS/TLS crt file directory Version-1.4.1 | |
DW_TLS_KEY | file-path | N | Specify HTTPS/TLS key file directory Version-1.4.1 | |
DW_PROM_EXPORTOR_BIND | string | N | Specify Dataway self-metric exposure HTTP port (default 9090) Version-1.5.0 | |
DW_PPROF_BIND | string | N | Specify Dataway self-pprof HTTP port (default 6060) Version-1.5.0 | |
DW_DISK_CACHE_CAP_MB | int | N | Specify disk cache size (unit MB), default 65535MB Version-1.5.0 |
Warning
Sinker-related settings must be manually modified after installation. Currently, they are not supported during the installation process. Version-1.5.0
Image Environment Variables¶
When Dataway runs in a Kubernetes environment, it supports the following environment variables.
Compatibility with existing dataway.yaml
Since some older Dataways inject configurations via ConfigMap (the filename in the container is generally dataway.yaml),
if the Dataway image detects that there is a file mounted from ConfigMap in the installation directory, then the following DW_*
environment variables will not take effect.
Removing the existing ConfigMap mount allows these environment variables to take effect.
If the environment variables take effect, there will be a hidden (viewable via ls -a
) .dataway.yaml file in the Dataway installation directory, which can be checked by cat
to confirm the status of the environment variable effects.
HTTP Server Settings¶
Env | Type | Required | Description | Example Value |
---|---|---|---|---|
DW_REMOTE_HOST | string | Y | Kodo address, or next Dataway address, format http://host:port |
|
DW_WHITE_LIST | string | N | Dataway client IP whitelist, separated by English , |
|
DW_HTTP_TIMEOUT | string | N | Timeout setting for Dataway requests to Kodo or next Dataway, default 30s | |
DW_HTTP_MAX_IDLE_CONN_PERHOST | int | N | Maximum idle connection setting for Dataway requests to Kodo, default value is CPU cores Version-1.6.2 | |
DW_HTTP_MAX_CONN_PERHOST | int | N | Maximum connection setting for Dataway requests to Kodo, default unlimited Version-1.6.2 | |
DW_BIND | string | N | Dataway HTTP API binding address, default 0.0.0.0:9528 |
|
DW_API_LIMIT | int | N | Dataway API rate limiting setting, if set to 1000, each specific API is limited to 1000 requests per second, default 100K | |
DW_HEARTBEAT | string | N | Heartbeat interval between Dataway and the center, default 60s | |
DW_MAX_HTTP_BODY_BYTES | int | N | Maximum HTTP Body allowed by Dataway API (in bytes), default 64MB | |
DW_TLS_INSECURE_SKIP_VERIFY | boolean | N | Ignore HTTPS/TLS certificate errors | true |
DW_HTTP_CLIENT_TRACE | boolean | N | Dataway itself acts as an HTTP client, enabling the collection of some related metrics that are ultimately output in its Prometheus metrics | true |
DW_ENABLE_TLS | boolean | N | Enable HTTPS Version-1.4.1 | |
DW_TLS_CRT | file-path | N | Specify HTTPS/TLS crt file directory Version-1.4.0 | |
DW_TLS_KEY | file-path | N | Specify HTTPS/TLS key file directory Version-1.4.0 | |
DW_SNI | string | N | Specify current Dataway SNI information Version-1.6.0 | |
DW_DISABLE_404PAGE | boolean | N | Disable 404 page Version-1.6.1 |
HTTP TLS Settings¶
To generate a TLS certificate valid for one year, you can use the following OpenSSL command:
# Generate a TLS certificate valid for one year
$ openssl req -new -newkey rsa:4096 -x509 -sha256 -days 365 -nodes -out tls.crt -keyout tls.key
...
After executing this command, the system will prompt you to input necessary information, including your country, region, city, organization name, department name, and email address. This information will be included in your certificate.
After completing the information input, you will generate two files: tls.crt (certificate file) and tls.key (private key file). Please keep your private key file secure.
To allow the application to use these TLS certificates, you need to set the absolute paths of these two files to the application's environment variables. Below is an example of setting environment variables:
You must first enable
DW_ENABLE_TLS
for the other two ENVs (DW_TLS_CRT/DW_TLS_KEY
) to take effect. Version-1.4.1
env:
- name: DW_ENABLE_TLS
value: "true"
- name: DW_TLS_CRT
value: "/path/to/your/tls.crt"
- name: DW_TLS_KEY
value: "/path/to/your/tls.key"
Replace /path/to/your/tls.crt
and /path/to/your/tls.key
with the actual paths where your tls.crt
and tls.key
files are stored.
After setting, you can test whether TLS is effective using the following command:
If successful, it will display an ASCII Art message saying It's working!
. If the certificate does not exist, the Dataway logs will have an error similar to the following:
At this point, Dataway cannot start, and the above curl command will also result in an error:
$ curl -vvv -k http://localhost:9528
curl: (7) Failed to connect to localhost port 9528 after 6 ms: Couldn't connect to server
Logging Settings¶
Env | Type | Required | Description | Example Value |
---|---|---|---|---|
DW_LOG | string | N | Log path, default is log | |
DW_LOG_LEVEL | string | N | Default is info |
|
DW_GIN_LOG | string | N | Default is gin.log |
Token/UUID Settings¶
Env | Type | Required | Description | Example Value |
---|---|---|---|---|
DW_UUID | string | Y | Dataway UUID, generated when creating a new Dataway in the system workspace | |
DW_TOKEN | string | Y | Usually the system workspace data upload Token | |
DW_SECRET_TOKEN | string | N | When enabling Sinker functionality, this Token can be set | |
DW_ENABLE_INTERNAL_TOKEN | boolean | N | Allow using __internal__ as a client Token, default uses the system workspace Token |
|
DW_ENABLE_EMPTY_TOKEN | boolean | N | Allow uploading data without a Token, default uses the system workspace Token |
Sinker Settings¶
Env | Type | Required | Description | Example Value |
---|---|---|---|---|
DW_SECRET_TOKEN | string | N | When enabling Sinker functionality, this Token can be set | |
DW_CASCADED | string | N | Whether Dataway is cascaded | true |
DW_SINKER_ETCD_URLS | string | N | List of etcd addresses, separated by , , like http://1.2.3.4:2379,http://1.2.3.4:2380 |
|
DW_SINKER_ETCD_DIAL_TIMEOUT | string | N | etcd connection timeout, default 30s | |
DW_SINKER_ETCD_KEY_SPACE | string | N | Sinker configuration etcd key name (default /dw_sinker ) |
|
DW_SINKER_ETCD_USERNAME | string | N | etcd username | |
DW_SINKER_ETCD_PASSWORD | string | N | etcd password | |
DW_SINKER_FILE_PATH | file-path | N | Specify Sinker rules configuration via a local file |
Warning
If both local file and etcd methods are specified, the local file Sinker rules will take precedence.
Prometheus Metrics Exposure¶
Env | Type | Required | Description | Example Value |
---|---|---|---|---|
DW_PROM_URL | string | N | Prometheus metrics URL Path (default /metrics ) |
|
DW_PROM_LISTEN | string | N | Prometheus metrics exposure address (default localhost:9090 ) |
|
DW_PROM_DISABLED | boolean | N | Disable Prometheus metrics exposure | true |
Disk Cache Settings¶
Env | Type | Required | Description | Example Value |
---|---|---|---|---|
DW_DISKCACHE_DIR | file-path | N | Set cache directory, this directory is generally mounted storage | path/to/your/cache |
DW_DISKCACHE_DISABLE | boolean | N | Disable disk cache, if disk cache is not disabled, remove this environment variable | true |
DW_DISKCACHE_CLEAN_INTERVAL | string | N | Cache cleanup interval, default 30s | Duration string |
DW_DISKCACHE_EXPIRE_DURATION | string | N | Cache expiration time, default 168h(7d) | Duration string, e.g., 72h means three days |
DW_DISKCACHE_CAPACITY_MB | int | N | Version-1.6.0 Set available disk space size, unit MB, default 20GB | Specify 1024 for 1GB |
DW_DISKCACHE_BATCH_SIZE_MB | int | N | Version-1.6.0 Set maximum size of a single disk cache file, unit MB, default 64MB | Specify 1024 for 1GB |
DW_DISKCACHE_MAX_DATA_SIZE_MB | int | N | Version-1.6.0 Set maximum size of a single cache content (e.g., a single HTTP body), unit MB, default 64MB, packets exceeding this size will be discarded | Specify 1024 for 1GB |
Tips
Set DW_DISKCACHE_DISABLE
to disable disk cache.
Performance-Related Settings¶
Env | Type | Required | Description | Example Value |
---|---|---|---|---|
DW_COPY_BUFFER_DROP_SIZE | int | N | Drop HTTP body buffer exceeding the specified size (unit bytes) immediately to avoid excessive memory consumption. Default value 256KB | 524288 |
Dataway API List¶
Details of the following APIs will be supplemented later.
GET /v1/ntp/
¶
- API Description: Get the current Unix timestamp (in seconds) of Dataway.
POST /v1/write/:category
¶
- API Description: Receive various collected data uploaded by Datakit.
GET /v1/datakit/pull
¶
- API Description: Handle Datakit pulling central configuration (blacklist/Pipeline) requests.
POST /v1/write/rum/replay
¶
- API Description: Receive Session Replay data uploaded by Datakit.
POST /v1/upload/profiling
¶
- API Description: Receive Profiling data uploaded by Datakit.
POST /v1/election
¶
- API Description: Handle Datakit election requests.
POST /v1/election/heartbeat
¶
- API Description: Handle Datakit election heartbeat requests.
POST /v1/query/raw
¶
Handles DQL query requests, simple example as follows:
POST /v1/query/raw?token=<workspace-token> HTTP/1.1
Content-Type: application/json
{
"token": "workspace-token",
"queries": [
{
"query": "M::cpu LIMIT 1"
}
],
"echo_explain": <true/false>
}
Example response:
{
"content": [
{
"series": [
{
"name": "cpu",
"columns": [
"time",
"usage_iowait",
"usage_total",
"usage_user",
"usage_guest",
"usage_system",
"usage_steal",
"usage_guest_nice",
"usage_irq",
"load5s",
"usage_idle",
"usage_nice",
"usage_softirq",
"global_tag1",
"global_tag2",
"host",
"cpu"
],
"values": [
[
1709782208662,
0,
7.421875,
3.359375,
0,
4.0625,
0,
0,
0,
1,
92.578125,
0,
0,
null,
null,
"WIN-JCHUL92N9IP",
"cpu-total"
]
]
}
],
"points": null,
"cost": "24.558375ms",
"is_running": false,
"async_id": "",
"query_parse": {
"namespace": "metric",
"sources": {
"cpu": "exact"
},
"fields": {},
"funcs": {}
},
"index_name": "",
"index_store_type": "",
"query_type": "guancedb",
"complete": false,
"index_names": "",
"scan_completed": false,
"scan_index": "",
"next_cursor_time": -1,
"sample": 1,
"interval": 0,
"window": 0
}
]
}
Explanation of the return results:
- Real data is located in the inner
series
field. name
indicates the name of the Measurement (here querying CPU metrics; if it’s log-type data, this field does not exist).columns
indicate the names of the returned result columns.values
contain the corresponding column results forcolumns
.
Info
- The token in the URL request parameter can differ from the token in the JSON body. The former is used to verify the legality of the query request, while the latter is used to determine the workspace where the target data resides.
- The
queries
field can carry multiple queries, each query can carry additional fields, see here for the specific list of fields.
POST /v1/workspace
¶
- API Description: Handle workspace query requests initiated by Datakit.
POST /v1/object/labels
¶
- API Description: Handle object Label modification requests.
DELETE /v1/object/labels
¶
- API Description: Handle object Label deletion requests.
GET /v1/check/:token
¶
- API Description: Check if the token is valid.
Dataway Metric Collection¶
HTTP Client Metric Collection
To collect metrics from Dataway HTTP requests to Kodo (or the next hop Dataway), you need to manually enable the http_client_trace
configuration. Or set the environment variable DW_HTTP_CLIENT_TRACE=true
.
Dataway exposes Prometheus metrics, which can be collected by the built-in prom
collector in Datakit. Below is an example configuration for the collector:
If the cluster has Datakit deployed (requires Datakit 1.14.2 or higher), you can enable Prometheus metric exposure in Dataway (Dataway default POD yaml already includes this):
annotations: # The following annotation is added by default
datakit/prom.instances: |
[[inputs.prom]]
url = "http://$IP:9090/metrics" # This port (default 9090) depends on the situation
source = "dataway"
measurement_name = "dw" # Fixed as this metrics set
interval = "10s"
disable_instance_tag = true
[inputs.prom.tags]
service = "dataway"
instance = "$PODNAME"
...
env:
- name: DW_PROM_LISTEN
value: "0.0.0.0:9090" # This port should match the port in the URL above
If the collection is successful, search for dataway
in the Guance "Scenarios" / "Built-in Views" to see the corresponding monitoring views.
Dataway Metrics List¶
Below are the metrics exposed by Dataway. You can obtain these metrics by requesting http://localhost:9090/metrics
, and you can monitor a specific metric in real-time (every 3s) using the following command:
Some metrics might not be found because the relevant business modules have not yet run. Some new metrics are only present in the latest versions; version information for each metric is not listed here. Refer to the
/metrics
interface for the final list of metrics.
TYPE | NAME | LABELS | HELP |
---|---|---|---|
SUMMARY | dataway_http_api_elapsed_seconds |
api,method,status |
API request latency |
SUMMARY | dataway_http_api_body_buffer_utilization |
api |
API body buffer utilization (Len/Cap) |
SUMMARY | dataway_http_api_body_copy |
api |
API body copy |
SUMMARY | dataway_http_api_resp_size_bytes |
api,method,status |
API response size |
SUMMARY | dataway_http_api_req_size_bytes |
api,method,status |
API request size |
COUNTER | dataway_http_api_total |
api,status |
API request count |
COUNTER | dataway_http_api_body_too_large_dropped_total |
api,method |
API request too large dropped |
COUNTER | dataway_http_api_with_inner_token |
api,method |
API request with inner token |
COUNTER | dataway_http_api_dropped_total |
api,method |
API request dropped whensinker rule match failed |
COUNTER | dataway_syncpool_stats |
name,type |
sync.Pool usage stats |
COUNTER | dataway_http_api_copy_body_failed_total |
api |
API copy body failed count |
COUNTER | dataway_http_api_signed_total |
api,method |
API signature count |
SUMMARY | dataway_http_api_cached_bytes |
api,cache_type,method,reason |
API cached body bytes |
SUMMARY | dataway_http_api_reusable_body_read_bytes |
api,method |
API re-read body on forking request |
SUMMARY | dataway_http_api_recv_points |
api |
API /v1/write/:category received points |
SUMMARY | dataway_http_api_send_points |
api |
API /v1/write/:category send points |
SUMMARY | dataway_http_api_cache_points |
api,cache_type |
Disk cached /v1/write/:category points |
SUMMARY | dataway_http_api_cache_cleaned_points |
api,cache_type,status |
Disk cache cleaned /v1/write/:category points |
COUNTER | dataway_http_api_forked_total |
api,method,token |
API request forked total |
GAUGE | dataway_http_info |
cascaded,docker,http_client_trace,listen,max_body,release_date,remote,version |
Dataway API basic info |
GAUGE | dataway_last_heartbeat_time |
N/A |
Dataway last heartbeat with Kodo timestamp |
GAUGE | dataway_cpu_usage |
N/A |
Dataway CPU usage(%) |
GAUGE | dataway_mem_stat |
type |
Dataway memory usage stats |
SUMMARY | dataway_http_api_copy_buffer_drop_total |
max |
API copy buffer dropped(too large cached buffer) count |
GAUGE | dataway_open_files |
N/A |
Dataway open files |
GAUGE | dataway_cpu_cores |
N/A |
Dataway CPU cores |
GAUGE | dataway_uptime |
N/A |
Dataway uptime |
COUNTER | dataway_process_ctx_switch_total |
type |
Dataway process context switch count(Linux only) |
COUNTER | dataway_process_io_count_total |
type |
Dataway process IO count |
SUMMARY | dataway_http_api_copy_buffer_drop_total |
max |
API copy buffer dropped(too large cached buffer) count |
COUNTER | dataway_process_io_bytes_total |
type |
Dataway process IO bytes count |
SUMMARY | dataway_http_api_dropped_expired_cache |
api,method |
Dropped expired cache data |
SUMMARY | dataway_httpcli_tls_handshake_seconds |
server |
HTTP TLS handshake cost |
SUMMARY | dataway_httpcli_http_connect_cost_seconds |
server |
HTTP connect cost |
SUMMARY | dataway_httpcli_got_first_resp_byte_cost_seconds |
server |
Got first response byte cost |
SUMMARY | http_latency |
api,server |
HTTP latency |
COUNTER | dataway_httpcli_tcp_conn_total |
server,remote,type |
HTTP TCP connection count |
COUNTER | dataway_httpcli_conn_reused_from_idle_total |
server |
HTTP connection reused from idle count |
SUMMARY | dataway_httpcli_conn_idle_time_seconds |
server |
HTTP connection idle time |
SUMMARY | dataway_httpcli_dns_cost_seconds |
server |
HTTP DNS cost |
SUMMARY | dataway_sinker_rule_cost_seconds |
N/A |
Rule cost time seconds |
SUMMARY | dataway_sinker_cache_key_len |
N/A |
cache key length(bytes) |
SUMMARY | dataway_sinker_cache_val_len |
N/A |
cache value length(bytes) |
COUNTER | dataway_sinker_pull_total |
event,source |
Sinker pulled or pushed counter |
GAUGE | dataway_sinker_rule_cache_miss |
N/A |
Sinker rule cache miss |
GAUGE | dataway_sinker_rule_cache_hit |
N/A |
Sinker rule cache hit |
GAUGE | dataway_sinker_rule_cache_size |
N/A |
Sinker rule cache size |
GAUGE | dataway_sinker_rule_error |
error |
Rule errors |
GAUGE | dataway_sinker_default_rule_hit |
info |
Default sinker rule hit count |
GAUGE | dataway_sinker_rule_last_applied_time |
source |
Rule last applied time(Unix timestamp) |
COUNTER | diskcache_put_bytes_total |
path |
Cache Put() bytes count |
COUNTER | diskcache_get_total |
path |
Cache Get() count |
COUNTER | diskcache_wakeup_total |
path |
Wakeup count on sleeping write file |
COUNTER | diskcache_seek_back_total |
path |
Seek back when Get() got any error |
COUNTER | diskcache_get_bytes_total |
path |
Cache Get() bytes count |
GAUGE | diskcache_capacity |
path |
Current capacity(in bytes) |
GAUGE | diskcache_max_data |
path |
Max data to Put(in bytes), default 0 |
GAUGE | diskcache_batch_size |
path |
Data file size(in bytes) |
GAUGE | diskcache_size |
path |
Current cache size(in bytes) |
GAUGE | diskcache_open_time |
no_fallback_on_error,no_lock,no_pos,no_sync,path |
Current cache Open time in unix timestamp(second) |
GAUGE | diskcache_last_close_time |
path |
Current cache last Close time in unix timestamp(second) |
GAUGE | diskcache_datafiles |
path |
Current un-read data files |
SUMMARY | diskcache_get_latency |
path |
Get() time cost(micro-second) |
SUMMARY | diskcache_put_latency |
path |
Put() time cost(micro-second) |
COUNTER | diskcache_dropped_bytes_total |
path |
Dropped bytes during Put() when capacity reached. |
COUNTER | diskcache_dropped_total |
path,reason |
Dropped files during Put() when capacity reached. |
COUNTER | diskcache_rotate_total |
path |
Cache rotate count, mean file rotate from data to data.0000xxx |
COUNTER | diskcache_remove_total |
path |
Removed file count, if some file read EOF, remove it from un-read list |
COUNTER | diskcache_put_total |
path |
Cache Put() count |
Metric Collection in Docker Mode¶
Host installation has two modes: one is host machine installation, and the other is via Docker installation. Here we specifically explain the differences in metric collection when installed via Docker.
When installed via Docker, the exposed HTTP port for metrics is mapped to port 19090 on the host (by default), so the metric collection address is http://localhost:19090/metrics
.
If a different port is specified separately, then during Docker installation, 10000 is added to that port, so the specified port should not exceed 45535.
In addition, during Docker installation, the profile collection port is also exposed, which by default is mapped to port 16060 on the host. The mechanism is also to add 10000 to the specified port.
Dataway Self-Logging Collection and Processing¶
Dataway's own logs are divided into two categories: one is Gin logs, and the other is application logs. These can be separated using the following Pipeline:
# Pipeline for dataway logging
# Testing sample logging
'''
2023-12-14T11:27:06.744+0800 DEBUG apis apis/api_upload_profile.go:272 save profile file to disk [ok] /v1/upload/profiling?token=****************a4e3db8481c345a94fe5a
[GIN] 2021/10/25 - 06:48:07 | 200 | 30.890624ms | 114.215.200.73 | POST "/v1/write/logging?token=tkn_5c862a11111111111111111111111111"
'''
add_pattern("TOKEN", "tkn_\\w+")
add_pattern("GINTIME", "%{YEAR}/%{MONTHNUM}/%{MONTHDAY}%{SPACE}-%{SPACE}%{HOUR}:%{MINUTE}:%{SECOND}")
grok(_,"\\[GIN\\]%{SPACE}%{GINTIME:timestamp}%{SPACE}\\|%{SPACE}%{NUMBER:dataway_code}%{SPACE}\\|%{SPACE}%{NOTSPACE:cost_time}%{SPACE}\\|%{SPACE}%{NOTSPACE:client_ip}%{SPACE}\\|%{SPACE}%{NOTSPACE:method}%{SPACE}%{GREEDYDATA:http_url}")
# Gin logging
if cost_time != nil {
if http_url != nil {
grok(http_url, "%{TOKEN:token}")
cover(token, [5, 15])
replace(message, "tkn_\\w{0,5}\\w{6}", "****************$4")
replace(http_url, "tkn_\\w{0,5}\\w{6}", "****************$4")
}
group_between(dataway_code, [200,299], "info", status)
group_between(dataway_code, [300,399], "notice", status)
group_between(dataway_code, [400,499], "warning", status)
group_between(dataway_code, [500,599], "error", status)
if sample(0.1) { # drop 90% debug log
drop()
exit()
} else {
set_tag(sample_rate, "0.1")
}
parse_duration(cost_time)
duration_precision(cost_time, "ns", "ms")
set_measurement('gin', true)
set_tag(service,"dataway")
exit()
}
# App logging
if cost_time == nil {
grok(_,"%{TIMESTAMP_ISO8601:timestamp}%{SPACE}%{NOTSPACE:status}%{SPACE}%{NOTSPACE:module}%{SPACE}%{NOTSPACE:code}%{SPACE}%{GREEDYDATA:msg}")
if level == nil {
grok(message,"Error%{SPACE}%{DATA:errormsg}")
if errormsg != nil {
add_key(status,"error")
drop_key(errormsg)
}
}
lowercase(level)
# If debug level is enabled, drop most of them
if status == 'debug' {
if sample(0.1) { # drop 90% debug log
drop()
exit()
} else {
set_tag(sample_rate, "0.1")
}
}
group_in(status, ["error", "panic", "dpanic", "fatal","err","fat"], "error", status) # mark them as 'error'
if msg != nil {
grok(msg, "%{TOKEN:token}")
cover(token, [5, 15])
replace(message, "tkn_\\w{0,5}\\w{6}", "****************$4")
replace(msg, "tkn_\\w{0,5}\\w{6}", "****************$4")
}
set_measurement("dataway-log", true)
set_tag(service,"dataway")
}
Dataway Bug Report¶
Dataway exposes its own metrics and profiling collection entry points, and we can collect this information to assist with troubleshooting.
The following information collection uses actual configured ports and addresses; the commands below are listed according to default parameters.
br_dir="dw-br-$(date +%s)"
mkdir -p $br_dir
echo "save bug report to ${br_dir}"
# Modify these configurations based on actual conditions
dw_ip="localhost" # IP address where Dataway metrics/profile are exposed
metric_port=9090 # Port exposing metrics
profile_port=6060 # Port exposing profiles
dw_yaml_conf="/usr/local/cloudcare/dataflux/dataway/dataway.yaml"
dw_dot_yaml_conf="/usr/local/cloudcare/dataflux/dataway/.dataway.yaml" # This file exists when installed in a container
# Collect runtime metrics
curl -v "http://${dw_ip}:${metric_port}/metrics" -o $br_dir/metrics
# Collect profiling information
curl -v "http://${dw_ip}:${profile_port}/debug/pprof/allocs" -o $br_dir/allocs
curl -v "http://${dw_ip}:${profile_port}/debug/pprof/heap" -o $br_dir/heap
curl -v "http://${dw_ip}:${profile_port}/debug/pprof/profile" -o $br_dir/profile # This command runs for about 30s
cp $dw_yaml_conf $br_dir/dataway.yaml.copy
cp $dw_dot_yaml_conf $br_dir/.dataway.yaml.copy
tar czvf ${br_dir}.tar.gz ${br_dir}
rm -rf ${br_dir}
Run the script:
After execution, a file similar to dw-br-1721188604.tar.gz will be generated, which can be extracted for analysis.
FAQ¶
Too Large Request Body Issue¶
Dataway has a default setting for the request body size (default 64MB). When the request body is too large, the client will receive an HTTP 413 error (Request Entity Too Large
). If the request body is within a reasonable range, you can appropriately increase this value (unit bytes):
- Set the environment variable
DW_MAX_HTTP_BODY_BYTES
- In dataway.yaml, set
max_http_body_bytes
If there are very large request packets during operation, they will be reflected in both metrics and logs:
- The metric
dataway_http_too_large_dropped_total
exposes the number of large requests dropped - Search Dataway logs
cat log | grep 'drop too large request'
, the logs will output the HTTP request Header details, facilitating further understanding of the client situation
Warning
In the disk cache module, there is also a maximum data block write limit (default 64MB). If the maximum request body configuration is increased, adjust this configuration accordingly ( ENV_DISKCACHE_MAX_DATA_SIZE
), to ensure that large requests are correctly written to the disk cache.
-
This restriction avoids Dataway containers/Pods being affected by system limitations and only being able to use approximately 20,000 connections during operation. Increasing the limit may affect the efficiency of Dataway data uploads. When Dataway traffic is high, consider increasing the number of CPUs for a single Dataway instance or horizontally scaling Dataway instances. ↩