Dataway¶
Introduction¶
DataWay is Guance's data gateway. All data collected by collectors must pass through the DataWay gateway before being reported to Guance.
Installing Dataway¶
- Creating a Dataway
In the Guance admin console, navigate to the "Data Gateway" page and click on "New Dataway". Enter a name and bind an address, then click "Create".
After successful creation, a new Dataway will be created automatically and an installation script for Dataway will be generated.
Info
The bound address refers to the Dataway gateway address, which must include the complete HTTP address including protocol, host address, and port, e.g., http(s)://1.2.3.4:9528
. The host address typically uses the IP of the machine where Dataway is deployed or can be a domain name (which must be properly resolved).
Note: Ensure that the collector can access this address, otherwise data collection will fail.
- Installing Dataway
DW_KODO=http://kodo_ip:port \
DW_TOKEN=<tkn_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX> \
DW_UUID=<YOUR_UUID> \
bash -c "$(curl https://static.guance.com/dataway/install.sh)"
After installation, a dataway.yaml file will be generated under the installation directory. An example of its content is as follows and can be manually modified to take effect after restarting the service.
dataway.yaml (Click to expand)
# ============= DATAWAY CONFIG =============
# Dataway UUID, we can get it on during create a new dataway
uuid:
# It's the workspace token, most of the time, it's
# system worker space's token.
token:
# secret_token used under sinker mode, and to check if incomming datakit
# requests are valid.
secret_token:
# If __internal__ token allowed? If ok, the data/request will direct to
# the workspace with the token above
enable_internal_token: false
# is empty token allowed? If ok, the data/request will direct to
# the workspace with the token above
enable_empty_token: false
# Is dataway cascaded? For cascaded Dataway, it's remote_host is
# another Dataway and not Kodo.
cascaded: false
# kodo(next dataway) related configures
remote_host:
http_timeout: 30s
http_max_idle_conn_perhost: 0 # default to CPU cores
http_max_conn_perhost: 0 # default no limit
insecure_skip_verify: false
http_client_trace: false
max_conns_per_host: 0
sni: ""
# dataway API configures
bind: 0.0.0.0:9528
# disable 404 page
disable_404page: false
# dataway TLS file path
tls_crt:
tls_key:
# enable pprof
pprof_bind: localhost:6060
api_limit_rate : 100000 # 100K
max_http_body_bytes : 67108864 # 64MB
copy_buffer_drop_size : 262144 # 256KB, if copy buffer memory larger than this, this memory released
reserved_pool_size: 4096 # reserved pool size for better GC
within_docker: false
log_level: info
log: log
gin_log: gin.log
cache_cfg:
# cache disk path
dir: "disk_cache"
# disable cache
disabled: false
clean_interval: "10s"
# in MB, max single data package size in disk cache, such as HTTP body
max_data_size: 100
# in MB, single disk-batch(single file) size
batch_size: 128
# in MB, max disk size allowed to cache data
max_disk_size: 65535
# expire duration, default 7 days
expire_duration: "168h"
prometheus:
listen: "localhost:9090"
url: "/metrics"
enable: true
#sinker:
# etcd:
# urls:
# - http://localhost:2379 # one or multiple etcd host
# dial_timeout: 30s
# key_space: "/dw_sinker" # subscribe to the etcd key
# username: "dataway"
# password: "<PASSWORD>"
# #file:
# # path: /path/to/sinker.json
The pod YAML for Dataway is as follows:
??? info "*dataway-deploy.yaml* (Click to expand)"
```yaml linenums="1"
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: deployment-utils-dataway
name: dataway
namespace: utils
spec:
replicas: 2
selector:
matchLabels:
app: deployment-utils-dataway
template:
metadata:
labels:
app: deployment-utils-dataway
annotations:
datakit/logs: |
[
{
"disable": false,
"source": "dataway",
"service": "dataway",
"multiline_match": "^\\d{4}|^\\[GIN\\]"
}
]
datakit/prom.instances: |
[[inputs.prom]]
url = "http://$IP:9090/metrics"
source = "dataway"
measurement_name = "dw"
interval = "10s"
disable_instance_tag = true
[inputs.prom.tags]
service = "dataway"
instance = "$PODNAME" # we can set as "guangzhou-$PODNAME"
spec:
affinity:
podAffinity: {}
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- deployment-utils-dataway
topologyKey: kubernetes.io/hostname
containers:
- image: pubrepo.guance.com/dataflux/dataway:1.9.0
imagePullPolicy: IfNotPresent
name: dataway
env:
- name: DW_REMOTE_HOST
value: "http://kodo.forethought-kodo:9527"
- name: DW_BIND
value: "0.0.0.0:9528"
- name: DW_UUID
value: "agnt_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # Fill in the actual Dataway UUID here
- name: DW_TOKEN
value: "tkn_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # Fill in the actual Dataway token here, usually the system workspace token
- name: DW_PROM_LISTEN
value: "0.0.0.0:9090"
ports:
- containerPort: 9528
name: 9528tcp01
protocol: TCP
volumeMounts:
- mountPath: /usr/local/cloudcare/dataflux/dataway/cache
name: dataway-cache
resources:
limits:
cpu: '4'
memory: 4Gi
requests:
cpu: 100m
memory: 512Mi
# nodeSelector:
# key: string
imagePullSecrets:
- name: registry-key
restartPolicy: Always
volumes:
- hostPath:
path: /root/dataway_cache
type: DirectoryOrCreate
name: dataway-cache
---
apiVersion: v1
kind: Service
metadata:
name: dataway
namespace: utils
spec:
ports:
- name: 9528tcp02
port: 9528
protocol: TCP
targetPort: 9528
nodePort: 30928
selector:
app: deployment-utils-dataway
type: NodePort
```
In *dataway-deploy.yaml*, you can modify Dataway configurations via environment variables. See [here](dataway.md#img-envs).
Alternatively, you can externally mount a *dataway.yaml* file using ConfigMap but it must be mounted as */usr/local/cloudcare/dataflux/dataway/dataway.yaml*:
```yaml
containers:
volumeMounts:
- name: dataway-config
mountPath: /usr/local/cloudcare/dataflux/dataway/dataway.yaml
subPath: config.yaml
volumes:
- configMap:
defaultMode: 256
name: dataway-config
optional: false
name: dataway-config
```
Notes
- Dataway only runs on Linux systems (currently only Linux arm64/amd64 binaries are published).
- When installing on a host, the Dataway installation path is /usr/local/cloudcare/dataflux/dataway.
- Kubernetes sets resource limits to 4000m/4Gi by default, which can be adjusted according to actual needs. Minimum requirements are 100m/512Mi.
- Verifying Dataway Installation
After installation, wait briefly and refresh the "Data Gateway" page. If you see a version number listed under the "Version Information" column for the newly added data gateway, it means that this Dataway has successfully connected to Guance. Front-end users can now use it to connect data.
Once Dataway has successfully connected to Guance, log into the Guance console and go to the "Integrations"/"DataKit" page. You can view all Dataway addresses there, select the desired Dataway gateway address, and obtain the DataKit installation instructions to execute on your server to start collecting data.
Managing DataWay¶
Deleting DataWay¶
In the Guance admin console, navigate to the "Data Gateway" page, select the DataWay you want to delete, click "Configure", and then click the "Delete" button at the bottom left of the edit DataWay dialog box.
Warning
After deleting a DataWay, you also need to log in to the server where the DataWay gateway is deployed, stop the DataWay process, and delete the installation directory to completely remove the DataWay.
Upgrading DataWay¶
On the "Data Gateway" page in the Guance admin console, if there is an upgrade available for a DataWay, an upgrade prompt will appear in the version information section.
Managing the Dataway Service¶
When installing Dataway on a host, you can manage the Dataway service using the following commands.
# Starting
$ systemctl start dataway
# Restarting
$ systemctl restart dataway
# Stopping
$ systemctl stop dataway
For Kubernetes, simply restart the corresponding Pod.
Environment Variables¶
Host Installation Supported Environment Variables¶
We no longer recommend the host installation method. New configuration items are no longer supported via command-line parameters. If changing the deployment method is not possible, it is recommended to manually modify the corresponding configuration after installation (or upgrade). Default configurations are shown in the example above.
When installing on a host, you can inject the following environment variables during the installation command:
Env | Type | Required | Description | Example Value |
---|---|---|---|---|
DW_BIND | string | N | Dataway HTTP API binding address, defaults to 0.0.0.0:9528 |
|
DW_CASCADED | boolean | N | Whether Dataway is cascaded | true |
DW_HTTP_CLIENT_TRACE | boolean | N | Dataway acts as an HTTP client itself, enabling the collection of some related metrics, which will eventually be output in its Prometheus metrics | true |
DW_KODO | string | Y | Kodo address, or the next Dataway address, in the format http://host:port |
|
DW_TOKEN | string | Y | Usually the data token for the system workspace | |
DW_UPGRADE | boolean | N | Specify as 1 when upgrading | |
DW_UUID | string | Y | Dataway UUID, which is generated by the system workspace when creating a new Dataway | |
DW_TLS_CRT | file-path | N | Specify HTTPS/TLS crt file directory Version-1.4.1 | |
DW_TLS_KEY | file-path | N | Specify HTTPS/TLS key file directory Version-1.4.1 | |
DW_PROM_EXPORTOR_BIND | string | N | Specify the HTTP port (default 9090) where Dataway exposes its own metrics Version-1.5.0 | |
DW_PPROF_BIND | string | N | Specify the HTTP port (default 6060) where Dataway exposes its pprof functionality Version-1.5.0 | |
DW_DISK_CACHE_CAP_MB | int | N | Specify disk cache size (in MB), default 65535MB Version-1.5.0 |
Warning
Settings related to Sinker require manual modification after installation. Currently, specifying Sinker configurations during installation is not supported. Version-1.5.0
Image Environment Variables¶
When running Dataway in a Kubernetes environment, the following environment variables are supported.
Compatibility with existing dataway.yaml
Since some older versions of Dataway inject configurations via ConfigMap (the filename mounted into the container is generally dataway.yaml),
if the Dataway image finds a file injected via ConfigMap in the installation directory after startup, the DW_*
environment variables mentioned below will not take effect.
These environment variables will only become effective after removing the existing ConfigMap mount.
If the environment variables take effect, there will be a hidden file named .dataway.yaml in the Dataway installation directory (visible with ls -a
). You can cat
this file to confirm whether the environment variables have taken effect.
HTTP Server Settings¶
Env | Type | Required | Description | Example Value |
---|---|---|---|---|
DW_REMOTE_HOST | string | Y | Kodo address, or the next Dataway address, in the format http://host:port |
|
DW_WHITE_LIST | string | N | Client IP whitelist for Dataway, separated by commas , |
|
DW_HTTP_TIMEOUT | string | N | Timeout setting for Dataway requests to Kodo or the next Dataway, default 30s | |
DW_HTTP_MAX_IDLE_CONN_PERHOST | int | N | Maximum idle connection setting for Dataway requests to Kodo, default value is CPU cores Version-1.6.2 | |
DW_HTTP_MAX_CONN_PERHOST | int | N | Maximum connection count setting for Dataway requests to Kodo, default is unlimited Version-1.6.2 | |
DW_BIND | string | N | Dataway HTTP API binding address, defaults to 0.0.0.0:9528 |
|
DW_API_LIMIT | int | N | Rate limiting setting for Dataway API. For example, if set to 1000, each specific API is allowed to be requested up to 1000 times per second, default is 100K | |
DW_HEARTBEAT | string | N | Heartbeat interval between Dataway and the center, default 60s | |
DW_MAX_HTTP_BODY_BYTES | int | N | Maximum allowed HTTP Body for Dataway API (unit bytes), default 64MB | |
DW_TLS_INSECURE_SKIP_VERIFY | boolean | N | Ignore HTTPS/TLS certificate errors | true |
DW_HTTP_CLIENT_TRACE | boolean | N | Dataway acts as an HTTP client itself, enabling the collection of some related metrics, which will eventually be output in its Prometheus metrics | true |
DW_ENABLE_TLS | boolean | N | Enable HTTPS Version-1.4.1 | |
DW_TLS_CRT | file-path | N | Specify HTTPS/TLS crt file directory Version-1.4.0 | |
DW_TLS_KEY | file-path | N | Specify HTTPS/TLS key file directory Version-1.4.0 | |
DW_SNI | string | N | Specify current Dataway SNI information Version-1.6.0 | |
DW_DISABLE_404PAGE | boolean | N | Disable 404 page Version-1.6.1 |
HTTP TLS Settings¶
To generate a TLS certificate valid for one year, you can use the following OpenSSL command:
# Generate a TLS certificate valid for one year
$ openssl req -new -newkey rsa:4096 -x509 -sha256 -days 365 -nodes -out tls.crt -keyout tls.key
...
After executing this command, you will be prompted to enter some necessary information, including your country, region, city, organization name, department name, and email address. This information will be included in your certificate.
After completing the input, two files will be generated: tls.crt (certificate file) and tls.key (private key file). Please keep your private key file secure.
To allow applications to use these TLS certificates, you need to set the absolute paths of these two files into the application's environment variables. Below is an example of how to set these environment variables:
You must first enable
DW_ENABLE_TLS
, and then the other two ENVs (DW_TLS_CRT/DW_TLS_KEY
) will take effect. Version-1.4.1
env:
- name: DW_ENABLE_TLS
value: "true"
- name: DW_TLS_CRT
value: "/path/to/your/tls.crt"
- name: DW_TLS_KEY
value: "/path/to/your/tls.key"
Replace /path/to/your/tls.crt
and /path/to/your/tls.key
with the actual paths where you store the tls.crt
and tls.key
files.
After setting up, you can test if TLS is working using the following command:
If successful, it will display an ASCII Art message saying It's working!
. If the certificate does not exist, the Dataway logs will show an error similar to the following:
At this point, Dataway cannot start, and the above curl command will also report an error:
$ curl -vvv -k http://localhost:9528
curl: (7) Failed to connect to localhost port 9528 after 6 ms: Couldn't connect to server
Logging Settings¶
Env | Type | Required | Description | Example Value |
---|---|---|---|---|
DW_LOG | string | N | Log path, default is log | |
DW_LOG_LEVEL | string | N | Default is info |
|
DW_GIN_LOG | string | N | Default is gin.log |
Token/UUID Settings¶
Env | Type | Required | Description | Example Value |
---|---|---|---|---|
DW_UUID | string | Y | Dataway UUID, which is generated by the system workspace when creating a new Dataway | |
DW_TOKEN | string | Y | Usually the data upload token for the system workspace | |
DW_SECRET_TOKEN | string | N | Set this token when enabling the Sinker function | |
DW_ENABLE_INTERNAL_TOKEN | boolean | N | Allow using __internal__ as the client token, at which point the system workspace token is used by default |
|
DW_ENABLE_EMPTY_TOKEN | boolean | N | Allow uploading data without a token, at which point the system workspace token is used by default |
Sinker Settings¶
Env | Type | Required | Description | Example Value |
---|---|---|---|---|
DW_SECRET_TOKEN | string | N | Set this token when enabling the Sinker function | |
DW_CASCADED | string | N | Whether Dataway is cascaded | true |
DW_SINKER_ETCD_URLS | string | N | List of etcd addresses, separated by commas , , e.g., http://1.2.3.4:2379,http://1.2.3.4:2380 |
|
DW_SINKER_ETCD_DIAL_TIMEOUT | string | N | etcd connection timeout, default 30s | |
DW_SINKER_ETCD_KEY_SPACE | string | N | Name of the etcd key where Sinker configurations are located (default /dw_sinker ) |
|
DW_SINKER_ETCD_USERNAME | string | N | etcd username | |
DW_SINKER_ETCD_PASSWORD | string | N | etcd password | |
DW_SINKER_FILE_PATH | file-path | N | Specify Sinker rule configurations via a local file |
Warning
If both a local file and etcd methods are specified, the Sinker rules in the local file will take precedence.
Prometheus Metric Exposure¶
Env | Type | Required | Description | Example Value |
---|---|---|---|---|
DW_PROM_URL | string | N | URL Path for Prometheus metrics (default /metrics ) |
|
DW_PROM_LISTEN | string | N | Address for exposing Prometheus metrics (default localhost:9090 ) |
|
DW_PROM_DISABLED | boolean | N | Disable Prometheus metric exposure | true |
Disk Cache Settings¶
Env | Type | Required | Description | Example Value |
---|---|---|---|---|
DW_DISKCACHE_DIR | file-path | N | Set the cache directory, this directory should generally be an external storage | path/to/your/cache |
DW_DISKCACHE_DISABLE | boolean | N | Disable disk caching, if caching is not disabled, remove this environment variable | true |
DW_DISKCACHE_CLEAN_INTERVAL | string | N | Cache cleanup interval, default 30s | Duration string |
DW_DISKCACHE_EXPIRE_DURATION | string | N | Cache expiration time, default 168h (7d) | Duration string, e.g., 72h for three days |
DW_DISKCACHE_CAPACITY_MB | int | N | Version-1.6.0 Set the available disk space size in MB, default 20GB | Specifying 1024 equals 1GB |
DW_DISKCACHE_BATCH_SIZE_MB | int | N | Version-1.6.0 Set the maximum size of a single disk cache file in MB, default 64MB | Specifying 1024 equals 1GB |
DW_DISKCACHE_MAX_DATA_SIZE_MB | int | N | Version-1.6.0 Set the maximum size (in MB) of a single cached item (e.g., a single HTTP body), default 64MB. Any packet larger than this size will be discarded | Specifying 1024 equals 1GB |
Tips
Setting DW_DISKCACHE_DISABLE
will disable disk caching.
Performance-related Settings¶
Env | Type | Required | Description | Example Value |
---|---|---|---|---|
DW_COPY_BUFFER_DROP_SIZE | int | N | HTTP body buffers exceeding the specified size (in bytes) will be immediately cleared to avoid consuming too much memory. Default value is 256KB | 524288 |
Dataway API List¶
Details of the following APIs will be added later.
GET /v1/ntp/
¶
- API description: Get the current Unix timestamp (in seconds) of Dataway
POST /v1/write/:category
¶
- API description: Receive various types of collected data uploaded by Datakit
GET /v1/datakit/pull
¶
- API description: Process Datakit's request to pull central configurations (blacklist/Pipeline)
POST /v1/write/rum/replay
¶
- API description: Receive Session Replay data uploaded by Datakit
POST /v1/upload/profiling
¶
- API description: Receive Profiling data uploaded by Datakit
POST /v1/election
¶
- API description: Process Datakit election requests
POST /v1/election/heartbeat
¶
- API description: Process heartbeat requests for Datakit elections
POST /v1/query/raw
¶
Process DQL query requests, simple example as follows:
POST /v1/query/raw?token=<workspace-token> HTTP/1.1
Content-Type: application/json
{
"token": "workspace-token",
"queries": [
{
"query": "M::cpu LIMIT 1"
}
],
"echo_explain": <true/false>
}
Example response:
{
"content": [
{
"series": [
{
"name": "cpu",
"columns": [
"time",
"usage_iowait",
"usage_total",
"usage_user",
"usage_guest",
"usage_system",
"usage_steal",
"usage_guest_nice",
"usage_irq",
"load5s",
"usage_idle",
"usage_nice",
"usage_softirq",
"global_tag1",
"global_tag2",
"host",
"cpu"
],
"values": [
[
1709782208662,
0,
7.421875,
3.359375,
0,
4.0625,
0,
0,
0,
1,
92.578125,
0,
0,
null,
null,
"WIN-JCHUL92N9IP",
"cpu-total"
]
]
}
],
"points": null,
"cost": "24.558375ms",
"is_running": false,
"async_id": "",
"query_parse": {
"namespace": "metric",
"sources": {
"cpu": "exact"
},
"fields": {},
"funcs": {}
},
"index_name": "",
"index_store_type": "",
"query_type": "guancedb",
"complete": false,
"index_names": "",
"scan_completed": false,
"scan_index": "",
"next_cursor_time": -1,
"sample": 1,
"interval": 0,
"window": 0
}
]
}
Response explanation:
- The real data is located in the inner
series
field. name
represents the name of the metric set (in this case, it's the CPU metric; if it's log data, this field won't be present).columns
represent the returned result column names.values
contain the corresponding column results for thecolumns
.
Info
- The token in the URL request parameter can differ from the token in the JSON body. The former is used to verify whether the query request is legitimate, while the latter determines the target workspace where the data resides.
- The
queries
field can carry multiple queries, each of which can have additional fields. For a list of specific field details, refer to here
POST /v1/workspace
¶
- API description: Process workspace query requests initiated by Datakit
POST /v1/object/labels
¶
- API description: Process requests to modify object labels
DELETE /v1/object/labels
¶
- API description: Process requests to delete object labels
GET /v1/check/:token
¶
- API description: Check if the token is valid
Dataway Metric Collection¶
HTTP Client Metric Collection
To collect metrics of Dataway's HTTP requests to Kodo (or the next hop Dataway), you need to manually enable the http_client_trace
configuration or specify the environment variable DW_HTTP_CLIENT_TRACE=true
.
Dataway itself exposes Prometheus metrics, which can be collected using the built-in prom
collector of Datakit. Here's an example configuration for the collector:
If there is a Datakit deployed in the cluster (needs to be version 1.14.2 or higher), you can enable Prometheus metric exposure in Dataway (the default POD yaml already includes this):
annotations: # The following annotation is already added by default
datakit/prom.instances: |
[[inputs.prom]]
url = "http://$IP:9090/metrics" # The port here (default 9090) depends on the situation
source = "dataway"
measurement_name = "dw" # Fixed metric set
interval = "10s"
disable_instance_tag = true
[inputs.prom.tags]
service = "dataway"
instance = "$PODNAME"
...
env:
- name: DW_PROM_LISTEN
value: "0.0.0.0:9090" # Keep this port consistent with the one in the url above
If the collection is successful, you can search for dataway
in the "Scenarios"/"Built-in Views" section of Guance to see the corresponding monitoring views.
Dataway Metric List¶
Below are the metrics exposed by Dataway. You can retrieve these metrics by requesting http://localhost:9090/metrics
. You can also use the following command to view a specific metric in real-time (every 3 seconds):
Some metrics may not be found if their related business modules haven't been running yet. Some new metrics are only available in the latest versions; the version information for each metric is not listed here. Refer to the list of metrics returned by the
/metrics
interface for accuracy.
TYPE | NAME | LABELS | HELP |
---|---|---|---|
SUMMARY | dataway_http_api_elapsed_seconds |
api,method,status |
API request latency |
SUMMARY | dataway_http_api_body_buffer_utilization |
api |
API body buffer utillization(Len/Cap) |
SUMMARY | dataway_http_api_body_copy |
api |
API body copy |
SUMMARY | dataway_http_api_resp_size_bytes |
api,method,status |
API response size |
SUMMARY | dataway_http_api_req_size_bytes |
api,method,status |
API request size |
COUNTER | dataway_http_api_total |
api,status |
API request count |
COUNTER | dataway_http_api_body_too_large_dropped_total |
api,method |
API request too large dropped |
COUNTER | dataway_http_api_with_inner_token |
api,method |
API request with inner token |
COUNTER | dataway_http_api_dropped_total |
api,method |
API request dropped when sinker rule match failed |
COUNTER | dataway_syncpool_stats |
name,type |
sync.Pool usage stats |
COUNTER | dataway_http_api_copy_body_failed_total |
api |
API copy body failed count |
COUNTER | dataway_http_api_signed_total |
api,method |
API signature count |
SUMMARY | dataway_http_api_cached_bytes |
api,cache_type,method,reason |
API cached body bytes |
SUMMARY | dataway_http_api_reusable_body_read_bytes |
api,method |
API re-read body on forking request |
SUMMARY | dataway_http_api_recv_points |
api |
API /v1/write/:category recevied points |
SUMMARY | dataway_http_api_send_points |
api |
API /v1/write/:category send points |
SUMMARY | dataway_http_api_cache_points |
api,cache_type |
Disk cached /v1/write/:category points |
SUMMARY | dataway_http_api_cache_cleaned_points |
api,cache_type,status |
Disk cache cleaned /v1/write/:category points |
COUNTER | dataway_http_api_forked_total |
api,method,token |
API request forked total |
GAUGE | dataway_http_info |
cascaded,docker,http_client_trace,listen,max_body,release_date,remote,version |
Dataway API basic info |
GAUGE | dataway_last_heartbeat_time |
N/A |
Dataway last heartbeat with Kodo timestamp |
GAUGE | dataway_cpu_usage |
N/A |
Dataway CPU usage(%) |
GAUGE | dataway_mem_stat |
type |
Dataway memory usage stats |
SUMMARY | dataway_http_api_copy_buffer_drop_total |
max |
API copy buffer dropped(too large cached buffer) count |
GAUGE | dataway_open_files |
N/A |
Dataway open files |
GAUGE | dataway_cpu_cores |
N/A |
Dataway CPU cores |
GAUGE | dataway_uptime |
N/A |
Dataway uptime |
COUNTER | dataway_process_ctx_switch_total |
type |
Dataway process context switch count(Linux only) |
COUNTER | dataway_process_io_count_total |
type |
Dataway process IO count |
SUMMARY | dataway_http_api_copy_buffer_drop_total |
max |
API copy buffer dropped(too large cached buffer) count |
COUNTER | dataway_process_io_bytes_total |
type |
Dataway process IO bytes count |
SUMMARY | dataway_http_api_dropped_expired_cache |
api,method |
Dropped expired cache data |
SUMMARY | dataway_httpcli_tls_handshake_seconds |
server |
HTTP TLS handshake cost |
SUMMARY | dataway_httpcli_http_connect_cost_seconds |
server |
HTTP connect cost |
SUMMARY | dataway_httpcli_got_first_resp_byte_cost_seconds |
server |
Got first response byte cost |
SUMMARY | http_latency |
api,server |
HTTP latency |
COUNTER | dataway_httpcli_tcp_conn_total |
server,remote,type |
HTTP TCP connection count |
COUNTER | dataway_httpcli_conn_reused_from_idle_total |
server |
HTTP connection reused from idle count |
SUMMARY | dataway_httpcli_conn_idle_time_seconds |
server |
HTTP connection idle time |
SUMMARY | dataway_httpcli_dns_cost_seconds |
server |
HTTP DNS cost |
SUMMARY | dataway_sinker_rule_cost_seconds |
N/A |
Rule cost time seconds |
SUMMARY | dataway_sinker_cache_key_len |
N/A |
cache key length(bytes) |
SUMMARY | dataway_sinker_cache_val_len |
N/A |
cache value length(bytes) |
COUNTER | dataway_sinker_pull_total |
event,source |
Sinker pulled or pushed counter |
GAUGE | dataway_sinker_rule_cache_miss |
N/A |
Sinker rule cache miss |
GAUGE | dataway_sinker_rule_cache_hit |
N/A |
Sinker rule cache hit |
GAUGE | dataway_sinker_rule_cache_size |
N/A |
Sinker rule cache size |
GAUGE | dataway_sinker_rule_error |
error |
Rule errors |
GAUGE | dataway_sinker_default_rule_hit |
info |
Default sinker rule hit count |
GAUGE | dataway_sinker_rule_last_applied_time |
source |
Rule last applied time(Unix timestamp) |
COUNTER | diskcache_put_bytes_total |
path |
Cache Put() bytes count |
COUNTER | diskcache_get_total |
path |
Cache Get() count |
COUNTER | diskcache_wakeup_total |
path |
Wakeup count on sleeping write file |
COUNTER | diskcache_seek_back_total |
path |
Seek back when Get() got any error |
COUNTER | diskcache_get_bytes_total |
path |
Cache Get() bytes count |
GAUGE | diskcache_capacity |
path |
Current capacity(in bytes) |
GAUGE | diskcache_max_data |
path |
Max data to Put(in bytes), default 0 |
GAUGE | diskcache_batch_size |
path |
Data file size(in bytes) |
GAUGE | diskcache_size |
path |
Current cache size(in bytes) |
GAUGE | diskcache_open_time |
no_fallback_on_error,no_lock,no_pos,no_sync,path |
Current cache Open time in unix timestamp(second) |
GAUGE | diskcache_last_close_time |
path |
Current cache last Close time in unix timestamp(second) |
GAUGE | diskcache_datafiles |
path |
Current un-read data files |
SUMMARY | diskcache_get_latency |
path |
Get() time cost(micro-second) |
SUMMARY | diskcache_put_latency |
path |
Put() time cost(micro-second) |
COUNTER | diskcache_dropped_bytes_total |
path |
Dropped bytes during Put() when capacity reached. |
COUNTER | diskcache_dropped_total |
path,reason |
Dropped files during Put() when capacity reached. |
COUNTER | diskcache_rotate_total |
path |
Cache rotate count, mean file rotate from data to data.0000xxx |
COUNTER | diskcache_remove_total |
path |
Removed file count, if some file read EOF, remove it from un-read list |
COUNTER | diskcache_put_total |
path |
Cache Put() count |
Metric Collection in Docker Mode¶
There are two modes for host installation: one is native host installation, and the other is Docker installation. Here we specifically explain the differences in metric collection when installing through Docker.
When installing through Docker, the HTTP port for exposing metrics will be mapped to port 19090 of the host machine (by default). At this point, the metric collection address will be http://localhost:19090/metrics
.
If a different port is specified, then during Docker installation, the port will be increased by 10000. Therefore, ensure that the specified port does not exceed 45535.
In addition, during Docker installation, the profile collection port will also be exposed. By default, it is mapped to port 16060 on the host machine. Its mechanism is similar, adding 10000 to the specified port.
Self-logging Collection and Processing of Dataway¶
Dataway's own logging is divided into two categories: one is gin logging, and the other is the program's own logging. The following Pipeline can separate them:
# Pipeline for dataway logging
# Testing sample loggin
'''
2023-12-14T11:27:06.744+0800 DEBUG apis apis/api_upload_profile.go:272 save profile file to disk [ok] /v1/upload/profiling?token=****************a4e3db8481c345a94fe5a
[GIN] 2021/10/25 - 06:48:07 | 200 | 30.890624ms | 114.215.200.73 | POST "/v1/write/logging?token=tkn_5c862a11111111111111111111111111"
'''
add_pattern("TOKEN", "tkn_\\w+")
add_pattern("GINTIME", "%{YEAR}/%{MONTHNUM}/%{MONTHDAY}%{SPACE}-%{SPACE}%{HOUR}:%{MINUTE}:%{SECOND}")
grok(_,"\\[GIN\\]%{SPACE}%{GINTIME:timestamp}%{SPACE}\\|%{SPACE}%{NUMBER:dataway_code}%{SPACE}\\|%{SPACE}%{NOTSPACE:cost_time}%{SPACE}\\|%{SPACE}%{NOTSPACE:client_ip}%{SPACE}\\|%{SPACE}%{NOTSPACE:method}%{SPACE}%{GREEDYDATA:http_url}")
# gin logging
if cost_time != nil {
if http_url != nil {
grok(http_url, "%{TOKEN:token}")
cover(token, [5, 15])
replace(message, "tkn_\\w{0,5}\\w{6}", "****************$4")
replace(http_url, "tkn_\\w{0,5}\\w{6}", "****************$4")
}
group_between(dataway_code, [200,299], "info", status)
group_between(dataway_code, [300,399], "notice", status)
group_between(dataway_code, [400,499], "warning", status)
group_between(dataway_code, [500,599], "error", status)
if sample(0.1) { # drop 90% debug log
drop()
exit()
} else {
set_tag(sample_rate, "0.1")
}
parse_duration(cost_time)
duration_precision(cost_time, "ns", "ms")
set_measurement('gin', true)
set_tag(service,"dataway")
exit()
}
# app logging
if cost_time == nil {
grok(_,"%{TIMESTAMP_ISO8601:timestamp}%{SPACE}%{NOTSPACE:status}%{SPACE}%{NOTSPACE:module}%{SPACE}%{NOTSPACE:code}%{SPACE}%{GREEDYDATA:msg}")
if level == nil {
grok(message,"Error%{SPACE}%{DATA:errormsg}")
if errormsg != nil {
add_key(status,"error")
drop_key(errormsg)
}
}
lowercase(level)
# if debug level enabled, drop most of them
if status == 'debug' {
if sample(0.1) { # drop 90% debug log
drop()
exit()
} else {
set_tag(sample_rate, "0.1")
}
}
group_in(status, ["error", "panic", "dpanic", "fatal","err","fat"], "error", status) # mark them as 'error'
if msg != nil {
grok(msg, "%{TOKEN:token}")
cover(token, [5, 15])
replace(message, "tkn_\\w{0,5}\\w{6}", "****************$4")
replace(msg, "tkn_\\w{0,5}\\w{6}", "****************$4")
}
set_measurement("dataway-log", true)
set_tag(service,"dataway")
}
Dataway Bug Report¶
Dataway itself exposes metrics and profiling collection endpoints, which we can gather to aid in troubleshooting.
The following information gathering assumes default configured ports and addresses; adjust accordingly based on your actual setup.
br_dir="dw-br-$(date +%s)"
mkdir -p $br_dir
echo "save bug report to ${br_dir}"
# Modify the following configurations as needed
dw_ip="localhost" # IP address where Dataway metrics/profile are exposed
metric_port=9090 # Port for metrics exposure
profile_port=6060 # Port for profile exposure
dw_yaml_conf="/usr/local/cloudcare/dataflux/dataway/dataway.yaml"
dw_dot_yaml_conf="/usr/local/cloudcare/dataflux/dataway/.dataway.yaml" # Present in container installations
# Collect runtime metrics
curl -v "http://${dw_ip}:${metric_port}/metrics" -o $br_dir/metrics
# Collect profiling information
curl -v "http://${dw_ip}:${profile_port}/debug/pprof/allocs" -o $br_dir/allocs
curl -v "http://${dw_ip}:${profile_port}/debug/pprof/heap" -o $br_dir/heap
curl -v "http://${dw_ip}:${profile_port}/debug/pprof/profile" -o $br_dir/profile # This command will run for about 30 seconds
cp $dw_yaml_conf $br_dir/dataway.yaml.copy
cp $dw_dot_yaml_conf $br_dir/.dataway.yaml.copy
tar czvf ${br_dir}.tar.gz ${br_dir}
rm -rf ${br_dir}
Run the script:
After execution, a file like dw-br-1721188604.tar.gz will be generated. Take this file out for analysis.
FAQ¶
Request Body Too Large Issue¶
Dataway has a default setting for request body size (default 64MB). When the request body is too large, clients will receive an HTTP 413 error (Request Entity Too Large
). If the request body is reasonably sized, you can increase this value appropriately (in bytes):
- Set the environment variable
DW_MAX_HTTP_BODY_BYTES
- Set
max_http_body_bytes
in dataway.yaml
If large request packets occur during operation, they will be reflected in both metrics and logs:
- The metric
dataway_http_too_large_dropped_total
shows the number of large requests dropped. - Search the Dataway logs with
cat log | grep 'drop too large request'
. The logs will output detailed HTTP request headers for further client analysis.
Warning
There is also a maximum data block write limit in the disk caching module (default 64MB). If you increase the maximum request body configuration, make sure to adjust this configuration (ENV_DISKCACHE_MAX_DATA_SIZE
) to ensure that large requests can be correctly written to the disk cache.
请提供需要继续翻译的内容,我将根据之前的翻译继续进行翻译工作。
-
This limitation is used to avoid issues where the Dataway container/Pod might run into system-imposed limitations, allowing only about 20,000 connections. Increasing the limit may affect Dataway's data upload efficiency. When Dataway traffic is high, consider increasing the CPU allocation for a single Dataway or horizontally scaling Dataway instances. ↩