Dataway¶
Introduction¶
DataWay is the data gateway of the Guance, and the collector needs to pass through the DataWay gateway to report data to the Guance.
Dataway Installation¶
- New Dataway
On the Data Gateways page in the Guance management console, click Create Dataway. Enter a name and binding address, and then click Create.
After the creation is successful, a new Dataway is automatically created and the installation script for the Dataway is generated.
Info
The binding address is the Dataway gateway address, which must be filled in as a complete HTTP address, such as http(s)://1.2.3.4:9528, including the protocol, host address and port, the host address can generally use the IP address of the Dataway machine deployed or specified as a domain name, and the domain name needs to be resolved.
Note: Make sure that the collector can access the address, otherwise the data collection will not be successful)
- Install Dataway
DW_KODO=http://kodo_ip:port \
DW_TOKEN=<tkn_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX> \
DW_UUID=<YOUR_UUID> \
bash -c "$(curl https://static.guance.com/dataway/install.sh)"
Install Dataway on host is deprecated, we should install Dataway in Kubernetes as statefulset.
After the installation is complete, in the installation directory, dataway.yaml will be generated, the content of which can be manually modified and take effect by restarting the service.
dataway.yaml
# ============= DATAWAY CONFIG =============
# Dataway UUID, we can get it on during create a new dataway
uuid:
# It's the workspace token, most of the time, it's
# system worker space's token.
token:
# secret_token used under sinker mode, and to check if incomming datakit
# requests are valid.
secret_token:
# If __internal__ token allowed? If ok, the data/request will direct to
# the workspace with the token above
enable_internal_token: false
# is empty token allowed? If ok, the data/request will direct to
# the workspace with the token above
enable_empty_token: false
# Is dataway cascaded? For cascaded Dataway, it's remote_host is
# another Dataway and not Kodo.
cascaded: false
# kodo(next dataway) related configures
remote_host:
http_timeout: 3s
http_max_idle_conn_perhost: 0 # default to CPU cores
http_max_conn_perhost: 0 # default no limit
kodo_queue:
enabled: true
workers: 256
queue_size: 1024
queue_max_bytes: 1073741824 # 1GB
enqueue_timeout: 100ms
insecure_skip_verify: false
http_client_trace: false
sni: ""
# dataway API configures
bind: 0.0.0.0:9528
# disable 404 page
disable_404page: false
# dataway TLS file path
tls_crt:
tls_key:
# enable pprof
pprof_bind: localhost:6060
api_limit_rate : 100000 # 100K
max_http_body_bytes : 67108864 # 64MB
copy_buffer_drop_size : 262144 # 256KB, if copy buffer memory larger than this, this memory released
reserved_pool_size: 4096 # reserved pool size for better GC
within_docker: false
log_level: info
log: log
gin_log: gin.log
ip_blacklist:
ttl = "1m"
clean_interval = "1h"
cache_cfg:
# cache disk path
dir: "disk_cache"
# disable cache
disabled: false
clean_interval: "1s"
# in MB, max single data package size in disk cache, such as HTTP body
max_data_size: 100
# in MB, single disk-batch(single file) size
batch_size: 128
# in MB, max disk size allowed to cache data
max_disk_size: 65535
# expire duration, default 7 days
expire_duration: "168h"
prometheus:
listen: "localhost:9090"
url: "/metrics"
enable: true
#sinker:
# cache_options:
# prealloc: true
# reserved_capacity: 10000000 # max cached items
# buckets: 64
# ttl: 10m # clear unactive matches
# etcd:
# urls:
# - http://localhost:2379 # one or multiple etcd host
# dial_timeout: 30s
# key_space: "/dw_sinker" # subscribe to the etcd key
# username: "dataway"
# password: "<PASSWORD>"
# file:
# path: /path/to/sinker.json
Dataway pod yaml:
dataway-statefulset.yaml
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app: sts-utils-dataway
name: dataway
namespace: utils
spec:
replicas: 2
selector:
matchLabels:
app: sts-utils-dataway
serviceName: dataway
template:
metadata:
annotations:
datakit/logs: |
[
{
"disable": false,
"source": "dataway",
"service": "dataway",
"multiline_match": "^\\d{4}|^\\[GIN\\]"
}
]
datakit/prom.instances: |
[[inputs.prom]]
url = "http://$IP:9090/metrics"
source = "dataway"
measurement_name = "dw"
interval = "10s"
disable_instance_tag = true
[inputs.prom.tags]
service = "dataway"
instance = "$PODNAME" # we can set as "xxx-$PODNAME"
labels:
app: sts-utils-dataway
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- sts-utils-dataway
topologyKey: kubernetes.io/hostname
containers:
- env:
- name: DW_REMOTE_HOST
value: http://kodo.forethought-kodo:9527
- name: DW_BIND
value: 0.0.0.0:9528
- name: DW_UUID
value: agnt_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx # Dataway UUID
- name: DW_TOKEN
value: tkn_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx # Dataway token
- name: DW_PROM_LISTEN
value: 0.0.0.0:9090
- name: DW_LOG
value: stdout
- name: DW_LOG_LEVEL
value: info
- name: DW_GIN_LOG
value: stdout
- name: DW_DISKCACHE_DIR
value: cache
- name: DW_HTTP_TIMEOUT
value: '3s'
- name: DW_ENABLE_INTERNAL_TOKEN
value: 'false'
- name: DW_MAX_HTTP_BODY_BYTES
value: '67108864'
- name: DW_HTTP_CLIENT_TRACE
value: 'on'
- name: DW_RESERVED_POOL_SIZE
value: '0'
- name: DW_COPY_BUFFER_DROP_SIZE
value: '262144'
- name: DW_DISKCACHE_CAPACITY_MB
value: 102400
image: pubrepo.guance.com/dataflux/dataway:1.15.0
imagePullPolicy: IfNotPresent
name: dataway
ports:
- containerPort: 9528
name: 9528tcp01
protocol: TCP
resources:
limits:
cpu: '4'
memory: 4Gi
requests:
cpu: 100m
memory: 512Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /usr/local/cloudcare/dataflux/dataway/cache
name: dataway-cache
dnsPolicy: ClusterFirst
imagePullSecrets: []
#nodeSelector:
# nodepool: dataway
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
#tolerations:
# - effect: NoSchedule
# key: nodepool
# operator: Equal
# value: dataway
updateStrategy:
rollingUpdate:
partition: 0
type: RollingUpdate
volumeClaimTemplates:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: dataway-cache
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
storageClassName: xxxxxx # High-Performance Storage StorageClass
volumeMode: Filesystem
status:
phase: Pending
---
apiVersion: v1
kind: Service
metadata:
name: dataway
namespace: utils
spec:
ports:
- name: 9528tcp02
nodePort: 30928
port: 9528
protocol: TCP
targetPort: 9528
selector:
app: sts-utils-dataway
type: NodePort
---
apiVersion: v1
kind: Service
metadata:
name: dataway
namespace: utils
spec:
ports:
- name: 9528tcp02
port: 9528
protocol: TCP
targetPort: 9528
nodePort: 30928
selector:
app: deployment-utils-dataway
type: NodePort
In dataway-statefulset.yaml, you can modify the Dataway configuration through environment variables, see [here] (dataway.md#img-envs).
For DataWay docker install, the environments are the same as in Kubernetes. We can start a DataWay docker container like this:
docker run -d \
--name <YOUR-DW-IN-DOCKER> \
-p 19528:9528 -p 19090:9090 \
--mount type=bind,source=<host/path/for/diskcache>,target=/usr/local/cloudcare/dataflux/dataway/cache \
--memory=2g --memory-reservation=256m \
--cpus="2" \
-e DW_UUID=<YOUR-AGNT_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX> \
-e DW_TOKEN=<YOUR-TKN_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX> \
-e DW_REMOTE_HOST=http://kodo.forethought-kodo:9527 \
-e DW_BIND=0.0.0.0:9528 \
-e DW_PROM_LISTEN=0.0.0.0:9090 \
-e DW_HTTP_CLIENT_TRACE=true \
-e DW_LOG_LEVEL=info \
-e DW_LOG=stdout \
-e DW_GIN_LOG=stdout \
-e DW_DISKCACHE_CAPACITY_MB=65536 \
pubrepo.guance.com/dataflux/dataway:1.15.0
Notes
- Can only run on Linux systems
- When the host is installed, the Dataway installation path is /usr/local/cloudcare/dataflux/dataway
- Verify Dataway installation
After installation, wait for a while to refresh the "Data Gateway" page, if you see the version number in the "Version Information" column of the data gateway you just added, it means that the Dataway has been successfully connected to the Guance center, and front-end users can access data through it.
After Dataway is successfully connected to the Guance center, log in to the Guance console, view all Dataway addresses on the "Integration" / DataKit page, select the required Dataway gateway address, and obtain the DataKit installation command to execute on the server to start collecting data.
Manage DataWay¶
Delete DataWay¶
On the "Data Gateway" page of the Guance management background, select the DataWay to be deleted, click "Configure", and click the "Delete" button in the lower left corner of the pop-up Edit DataWay dialog box.
Warning
After deleting DataWay, you also need to log in to the server where the DataWay gateway is deployed to stop the operation of DataWay, and then delete the installation directory to completely delete the DataWay.
Upgrade DataWay¶
On the Data Gateways page in the Guance management background, if an upgradeable version exists for DataWay, an upgrade prompt appears in the version information.
Dataway Service Management¶
When the host installs Dataway, you can use the following command to manage the Dataway service:
# Start
$ systemctl start dataway
# Reboot
$ systemctl restart dataway
# Stop
$ systemctl stop dataway
Kubernetes can restart the corresponding pod.
Environment variable¶
Docker image environment variable¶
When Dataway runs in a Kubernetes environment, it supports the following environment variables.
Compatible with lagacy dataway.yaml
Some old Dataway's configure imported by ConfigMap(and mount to install path with the name of dataway.yaml).
After Dataway image started, if detect the file dataway.yaml, all the configures from DW_* are ignored and only
apply the lagacy dataway.yaml. We can remove the ConfigMap on dataway.yaml to recover these environment configures.
If these environment configures applied, there was a hidden file .dataway.ayml(view them via ls -a) under
install path, we can cat it to check if all these environment configures applied ok.
HTTP Server Settings¶
| Env | Description |
|---|---|
| DW_REMOTE_HOST type: string required: Y |
Kodo address, or next Dataway address, format: http://host:port |
| DW_WHITE_LIST type: string required: N |
Dataway client IP whitelist, comma-separated |
| DW_HTTP_TIMEOUT type: string required: N |
Dataway request timeout to Kodo or next Dataway, default 3s |
| DW_HTTP_MAX_IDLE_CONN_PERHOST type: int required: N |
Dataway maximum idle connections to Kodo Version-1.6.2 Default 1000 Version-1.11.2 |
| DW_HTTP_MAX_CONN_PERHOST type: int required: N |
Dataway maximum connections to Kodo, default unlimited Version-1.6.2 |
| DW_KODO_QUEUE_ENABLED type: boolean required: N |
Enable the bounded dispatch queue for write/upload requests sent to Kodo or the next Dataway, default true |
| DW_KODO_QUEUE_WORKERS type: int required: N |
Number of Kodo dispatch queue workers, default 256 |
| DW_KODO_QUEUE_SIZE type: int required: N |
Maximum number of requests waiting in the Kodo dispatch queue, default 1024 |
| DW_KODO_QUEUE_MAX_BYTES type: int/string required: N |
Maximum total request body bytes reserved by queued, in-flight, and waiting-to-enqueue Kodo queue requests. Supports values such as 1073741824, 1GB, or 1024MB; default 1GB |
| DW_KODO_QUEUE_ENQUEUE_TIMEOUT type: string required: N |
Maximum time a request waits for a Kodo queue slot before Dataway returns 503, default 100ms |
| DW_BIND type: string required: N |
Dataway HTTP API binding address, default 0.0.0.0:9528 |
| DW_API_LIMIT type: int required: N |
Dataway API rate limiting, e.g., 1000 means each API can only be requested 1000 times per second, default 100K |
| DW_HEARTBEAT type: string required: N |
Dataway heartbeat interval with center, default 60s |
| DW_MAX_HTTP_BODY_BYTES type: int required: N |
Dataway API maximum allowed HTTP Body (in bytes), default 64MB |
| DW_TLS_INSECURE_SKIP_VERIFY type: boolean required: N |
Ignore HTTPS/TLS certificate errors |
| DW_HTTP_CLIENT_TRACE type: boolean required: N |
Dataway as HTTP client can enable related metrics collection, which will be output in its Prometheus metrics |
| DW_ENABLE_TLS type: boolean required: N |
Enable HTTPS Version-1.4.1 |
| DW_TLS_CRT type: file-path required: N |
Specify HTTPS/TLS crt file directory Version-1.4.0 |
| DW_TLS_KEY type: file-path required: N |
Specify HTTPS/TLS key file directory Version-1.4.0 |
| DW_SNI type: string required: N |
Specify current Dataway SNI information Version-1.6.0 |
| DW_DISABLE_404PAGE type: boolean required: N |
Disable 404 page Version-1.6.1 |
| DW_HTTP_IP_BLACKLIST_TTL type: string required: N |
Set IP blacklist TTL, default 1m Version-1.11.0 |
| DW_HTTP_IP_BLACKLIST_CLEAN_INTERVAL type: string required: N |
Set IP blacklist cleanup interval, default 1h Version-1.11.0 |
HTTP TLS Settings¶
To generate a TLS certificate with a one-year validity period, you can use the following OpenSSL command:
# Generate a TLS certificate with a one-year validity
$ openssl req -new -newkey rsa:4096 -x509 -sha256 -days 365 -nodes -out tls.crt -keyout tls.key
...
After executing this command, the system will prompt you to enter some necessary information, including your country, region, city, organization name, department name, and your email address. This information will be included in your certificate.
After completing the information input, you will generate two files: tls.crt (certificate file) and tls.key (private key file). Please keep your private key file secure and safe.
In order for the application to use these TLS certificates, you need to set the absolute paths of these two files to the application's environment variables. Here is an example of setting environment variables:
DW_ENABLE_TLSmust be turned on first, and then the other two ENVs (DW_TLS_CRT/DW_TLS_KEY) will take effect. Version-1.4.1
env:
- name: DW_ENABLE_TLS
value: "true"
- name: DW_TLS_CRT
value: "/path/to/your/tls.crt"
- name: DW_TLS_KEY
value: "/path/to/your/tls.key"
Replace /path/to/your/tls.crt and /path/to/your/tls.key with the actual paths to your tls.crt and tls.key files, respectively.
After setting up the environment variables, you can test whether TLS is functioning correctly with the following command:
If successful, you should see an ASCII art message indicating "It's working!" If the certificates are missing, you might encounter an error in the Dataway logs similar to:
In this case, Dataway would not start, and the curl command would also result in an error:
Ensure that the paths to the TLS certificate and key are correctly specified and that the files have the appropriate permissions for the application to read them.
Log Settings¶
| Env | Description |
|---|---|
| DW_LOG type: string required: N |
Log path, defaults to log. Set to stdout to output logs to standard output for easier log collection |
| DW_LOG_LEVEL type: string required: N |
Default is info, options include debug |
| DW_GIN_LOG type: string required: N |
Default is gin.log, can also be set to stdout for easier collection |
| DW_LOG_PKG_ID type: bool required: N |
Version-1.12.0 Enable log package id or not, default true |
Token/UUID Settings¶
| Env | Description |
|---|---|
| DW_UUID type: string required: Y |
Dataway UUID, generated by the system workspace when creating a new Dataway |
| DW_TOKEN type: string required: Y |
Usually the data upload token of the system workspace |
| DW_SECRET_TOKEN type: string required: N |
Can be set when Sinker functionality is enabled |
| DW_ENABLE_INTERNAL_TOKEN type: boolean required: N |
Allow using __internal__ as client token, defaults to system workspace token |
| DW_ENABLE_EMPTY_TOKEN type: boolean required: N |
Allow data upload without token, defaults to system workspace token |
Sinker Settings¶
| Env | Description |
|---|---|
| DW_SECRET_TOKEN type: string required: N |
Can be set when Sinker functionality is enabled |
| DW_CASCADED type: string required: N |
Whether Dataway is cascaded |
| DW_SINKER_ETCD_URLS type: string required: N |
etcd address list, comma-separated, e.g., http://1.2.3.4:2379,http://1.2.3.4:2380 |
| DW_SINKER_ETCD_DIAL_TIMEOUT type: string required: N |
etcd connection timeout, default 30s |
| DW_SINKER_ETCD_KEY_SPACE type: string required: N |
etcd key name for Sinker configuration (default /dw_sinker) |
| DW_SINKER_ETCD_USERNAME type: string required: N |
etcd username |
| DW_SINKER_ETCD_PASSWORD type: string required: N |
etcd password |
| DW_SINKER_FILE_PATH type: file-path required: N |
Specify sinker rule configuration via local file |
| DW_SINKER_CACHE_BUCKETS type: int required: N* |
Version-1.11.0 Specifies the number of Sinker cache buckets, default 64 |
| DW_SINKER_CACHE_RESERVED_CAPACITY type: int required: N* |
Version-1.11.0 Specifies the maximum capacity of Sinker cache, default 1 million (1<<20) |
| DW_SINKER_CACHE_TTL type: duration required: N* |
Version-1.11.0 Specifies the TTL of Sinker cache elements, default 10m (10 minutes) |
| DW_SINKER_CACHE_PREALLOC type: bool required: N* |
Version-1.11.0 Pre-allocates cache memory, default false |
Warning
If both local file and etcd methods are specified, the Sinker rules in the local file take priority.
Prometheus Metrics Exposure¶
| Env | Description |
|---|---|
| DW_PROM_URL type: string required: N |
Prometheus metrics URL path (default /metrics) |
| DW_PROM_LISTEN type: string required: N |
Prometheus metrics exposure address (default localhost:9090) |
| DW_PROM_DISABLED type: boolean required: N |
Disable Prometheus metrics exposure |
Disk Cache Settings¶
| Env | Description |
|---|---|
| DW_DISKCACHE_DIR type: file-path required: N |
Set cache directory, this directory should generally be mounted storage |
| DW_DISKCACHE_DISABLE type: boolean required: N |
Disable disk cache, if cache is not disabled, remove this environment variable |
| DW_DISKCACHE_CLEAN_INTERVAL type: string required: N |
Cache cleanup interval, default 1s |
| DW_DISKCACHE_EXPIRE_DURATION type: string required: N |
Cache expiration time, default 168h (7d) |
| DW_DISKCACHE_CAPACITY_MB type: int required: N |
Version-1.6.0 Set available disk space size in MB, default 20GB |
| DW_DISKCACHE_BATCH_SIZE_MB type: int required: N |
Version-1.6.0 Set maximum size of single disk cache file in MB, default 64MB |
| DW_DISKCACHE_MAX_DATA_SIZE_MB type: int required: N |
Version-1.6.0 Set maximum size of single cache content (e.g., single HTTP body) in MB, default 64MB. Data packets exceeding this size will be discarded |
Tips
Set DW_DISKCACHE_DISABLE to disable disk cache.
Performance Settings¶
| Env | Description |
|---|---|
| DW_COPY_BUFFER_DROP_SIZE type: int required: N |
HTTP body buffers exceeding specified size (in bytes) will be immediately cleared to avoid excessive memory consumption. Default 256KB |
Dataway API List¶
Details of each API below are to be added.
GET /v1/ping¶
- API description: Get current dataway version and release date, also return the public IP address of your request
If 404 page disabled(
disable_404page), this API not working.
GET /v1/ntp/¶
- API description: Get current Dataway unix timestamp(unit: second)
POST /v1/write/:category¶
- API description: Receive various collection data uploaded by Datakit
GET /v1/datakit/pull¶
- API description: Handles Datakit pull center configuration (blacklist/pipeline) requests
POST /v1/write/rum/replay¶
- API description: Receive Session Replay data uploaded by Datakit
POST /v1/upload/profiling¶
- API description: Receive profiling data uploaded by Datakit
POST /v1/election¶
- API description: Handles election requests for Datakit
POST /v1/election/heartbeat¶
- API description: Handles election heartbeat requests for Datakit
POST /v1/query/raw¶
- API description: Handles DQL query requests initiated by the Datakit side
POST /v1/workspace¶
- API description: Handles workspace query requests initiated by Datakit
POST /v1/object/labels¶
- API description: Handles requests to modify object labels
DELETE /v1/object/labels¶
- API description: Handles delete object Label requests
GET /v1/check/:token¶
- API description: Detect if tokken is legitimate
Dataway metrics collection¶
HTTP client metrics collection
If you want to collect metrics for Dataway HTTP requests to Kodo (or Dataway next hop), you need to manually enable the http_client_trace configuration. You can also specify DW_HTTP_CLIENT_TRACE=true during the installation phase.
Dataway itself exposes Prometheus metrics, which can be collected through Datakit's built-in prom collector, which is configured as follows:
You can add annotations on pods (requires [Datakit 1.14.2] (../datakit/changelog.md#cl-1.14.2) or above):
annotations:
datakit/prom.instances: |
[[inputs.prom]]
url = "http://$IP:9090/metrics" # Here the port (default 9090) is as appropriate
source = "dataway"
measurement_name = "dw" # pinned to this metric set
interval = "30s"
[inputs.prom.tags]
namespace = "$NAMESPACE"
pod_name = "$PODNAME"
node_name = "$NODENAME"
If the collection is successful, search for dataway in the "Scene"/"Built-in View" of the Guance to see the corresponding monitoring view.
Dataway Metric List¶
The following are the indicators exposed by Dataway, which can be obtained by requesting http://localhost:9090/metrics, and you can view (3s) a specific indicator in real time by following the following command:
If some metrics cannot be queried, it may be caused by the relevant business module not running.
| TYPE | NAME | LABELS | HELP |
|---|---|---|---|
| SUMMARY | dataway_kodo_queue_wait_seconds |
api,method |
Kodo queue wait duration before worker dispatch |
| SUMMARY | dataway_http_api_elapsed_seconds |
api,method,sinked,status |
API request latency |
| SUMMARY | dataway_http_api_body_buffer_utilization |
api |
API body buffer utillization(Len/Cap) |
| SUMMARY | dataway_http_api_body_copy |
api |
API body copy |
| SUMMARY | dataway_http_api_body_copy_seconds |
api |
API body copy latency |
| SUMMARY | dataway_http_api_body_copy_enlarge |
api |
API body copy enlarged pooled buffer |
| SUMMARY | dataway_http_api_resp_size_bytes |
api,method,status |
API response size |
| SUMMARY | dataway_http_api_req_size_bytes |
api,method,status |
API request size |
| COUNTER | dataway_http_api_body_too_large_dropped_total |
api,method |
API request too large dropped |
| COUNTER | dataway_http_api_with_inner_token |
api,method |
API request with inner token |
| COUNTER | dataway_http_api_dropped_total |
api,method |
API request dropped when sinker rule match failed |
| COUNTER | dataway_ip_blacklist_blocked_total |
api,method |
IP blacklist blocked requests total |
| COUNTER | dataway_ip_blacklist_missed_total |
api,method |
IP blacklist missed total |
| COUNTER | dataway_ip_blacklist_added_total |
api,method,reason |
IP blacklist added total |
| COUNTER | dataway_syncpool_stats |
name,type |
sync.Pool usage stats |
| COUNTER | dataway_http_api_copy_body_failed_total |
api |
API copy body failed count |
| COUNTER | dataway_http_api_signed_total |
api,method |
API signature count |
| SUMMARY | dataway_http_api_cached_bytes |
api,cache_type,method,reason |
API cached body bytes |
| SUMMARY | dataway_http_api_reusable_body_read_bytes |
api,method |
API re-read body on forking request |
| SUMMARY | dataway_http_api_recv_points |
api |
API /v1/write/:category recevied points |
| SUMMARY | dataway_http_api_send_points |
api |
API /v1/write/:category send points |
| SUMMARY | dataway_http_api_cache_points |
api,cache_type |
Disk cached /v1/write/:category points |
| SUMMARY | dataway_http_api_cache_cleaned_points |
api,cache_type,status |
Disk cache cleaned /v1/write/:category points |
| COUNTER | dataway_http_api_forked_total |
api,method,token |
API request forked total |
| GAUGE | dataway_http_cli_info |
max_conn_per_host,max_idle_conn,max_idle_conn_per_host,timeout |
Dataway as client settings |
| GAUGE | dataway_http_info |
cascaded,docker,http_client_trace,listen,max_body,release_date,remote,version |
Dataway API basic info |
| GAUGE | dataway_kodo_queue_depth |
N/A |
Current Kodo dispatch queue depth including in-flight tasks |
| GAUGE | dataway_kodo_queue_bytes |
N/A |
Current Kodo dispatch queue body bytes including in-flight tasks |
| COUNTER | dataway_kodo_queue_enqueued_total |
api,method |
Kodo queue enqueued tasks |
| COUNTER | dataway_kodo_queue_full_total |
api,method,action |
Kodo queue full events |
| COUNTER | dataway_kodo_queue_dispatch_total |
api,method,status |
Kodo queue dispatch results |
| GAUGE | dataway_last_heartbeat_time |
N/A |
Dataway last heartbeat with Kodo timestamp |
| SUMMARY | dataway_http_api_copy_buffer_drop_total |
max |
API copy buffer dropped(too large cached buffer) count |
| GAUGE | dataway_cpu_usage |
N/A |
Dataway CPU usage(%) |
| GAUGE | dataway_mem_stat |
type |
Dataway memory usage stats |
| GAUGE | dataway_open_files |
N/A |
Dataway open files |
| GAUGE | dataway_cpu_cores |
N/A |
Dataway CPU cores |
| GAUGE | dataway_uptime |
N/A |
Dataway uptime |
| COUNTER | dataway_process_ctx_switch_total |
type |
Dataway process context switch count(Linux only) |
| COUNTER | dataway_process_io_count_total |
type |
Dataway process IO count |
| COUNTER | dataway_process_io_bytes_total |
type |
Dataway process IO bytes count |
| SUMMARY | dataway_http_api_dropped_cache |
api,method,reason |
Dropped cache data dur to various reasons |
| COUNTER | dataway_http_api_body_size_bytes_total |
api,token |
Accumulated API body bytes for aggregate or tailSampling |
| COUNTER | dataway_http_aggr_point_total |
api,token |
point count of aggregate or tailSampling |
| COUNTER | dataway_http_tail_sampling_trace_total |
token |
tailSampling trace count |
| COUNTER | dataway_http_tail_sampling_span_total |
token |
tailSampling span count |
| COUNTER | dataway_http_tail_sampling_packet_send_total |
token,data_type,result |
tailSampling packet send result count |
| GAUGE | dataway_httpcli_dns_resolved_address |
api,coalesced,host,server |
HTTP DNS resolved address |
| SUMMARY | dataway_httpcli_dns_cost_seconds |
api,coalesced,host,server |
HTTP DNS cost |
| SUMMARY | dataway_httpcli_tls_handshake_seconds |
api,server |
HTTP TLS handshake cost |
| SUMMARY | dataway_httpcli_http_connect_cost_seconds |
api,server |
HTTP connect cost |
| SUMMARY | dataway_httpcli_got_first_resp_byte_cost_seconds |
api,server |
Got first response byte cost |
| SUMMARY | http_latency |
api,server |
HTTP latency |
| COUNTER | dataway_httpcli_tcp_conn_total |
api,server,remote,type |
HTTP TCP connection count |
| COUNTER | dataway_httpcli_conn_reused_from_idle_total |
api,server |
HTTP connection reused from idle count |
| SUMMARY | dataway_httpcli_conn_idle_time_seconds |
api,server |
HTTP connection idle time |
| GAUGE | dataway_sinker_rule_cache_size |
name |
Sinker rule cache size |
| GAUGE | dataway_sinker_rule_error |
error |
Rule errors |
| GAUGE | dataway_sinker_default_rule_hit |
info |
Default sinker rule hit count |
| GAUGE | dataway_sinker_rule_last_applied_time |
source,version |
Rule last applied time(Unix timestamp) |
| SUMMARY | dataway_sinker_rule_cost_seconds |
type |
Rule cost time seconds |
| SUMMARY | dataway_sinker_lru_cache_cleaned |
name |
Sinker LRU cache cleanup removed entries |
| SUMMARY | dataway_sinker_lru_cache_dropped_ttl_seconds |
bucket,name,reason |
Sinker LRU cache dropped TTL seconds |
| COUNTER | dataway_sinker_pull_total |
event,source |
Sinker pulled or pushed total |
| GAUGE | dataway_sinker_rule_count |
type,with_default |
Sinker rule count |
| GAUGE | dataway_sinker_rule_cache_get_total |
name,type |
Sinker rule cache get hit/miss count |
| COUNTER | diskcache_rotate_total |
path |
Cache rotate count, mean file rotate from data to data.0000xxx |
| COUNTER | diskcache_remove_total |
path |
Removed file count, if some file read EOF, remove it from un-read list |
| COUNTER | diskcache_wakeup_total |
path |
Wakeup count on sleeping write file |
| COUNTER | diskcache_pos_updated_total |
op,path |
.pos file updated count |
| COUNTER | diskcache_seek_back_total |
path |
Seek back when Get() got any error |
| GAUGE | diskcache_capacity |
path |
Current capacity(in bytes) |
| GAUGE | diskcache_max_data |
path |
Max data to Put(in bytes), default 0 |
| GAUGE | diskcache_batch_size |
path |
Data file size(in bytes) |
| GAUGE | diskcache_size |
path |
Current cache size that waiting to be consumed(get). The size include header bytes |
| GAUGE | diskcache_open_time |
no_fallback_on_error,no_lock,no_pos,no_sync,path |
Current cache Open time in unix timestamp(second) |
| GAUGE | diskcache_last_close_time |
path |
Current cache last Close time in unix timestamp(second) |
| GAUGE | diskcache_datafiles |
path |
Current un-read data files |
| HISTOGRAM | diskcache_lock_wait_seconds |
lock_type,path |
Time spent waiting for locks by lock type |
| COUNTER | diskcache_lock_contention_total |
lock_type,path |
Number of lock contention events |
| SUMMARY | diskcache_get_latency |
path |
Get() cost seconds |
| SUMMARY | diskcache_put_latency |
path |
Put() cost seconds |
| SUMMARY | diskcache_put_bytes |
path |
Cache Put() bytes |
| SUMMARY | diskcache_get_bytes |
path |
Cache Get() bytes |
| SUMMARY | diskcache_dropped_data |
path,reason |
Dropped data during Put() when capacity reached. |
Metrics under Docker¶
There are two modes for non-Kubernetes, host mode and Docker mode. This section will specifically discuss the differences in metrics collection when installing in Docker.
When installed in docker, the HTTP port that exposes metrics will be mapped to port 19090 on the host machine (by default). In this case, the metrics collection address is http://localhost:19090/metrics.
If a different port is specified, the installer will add 10000 to the specified port during installation. Therefore, the specified port should not exceed 45535.
In addition, when installed in Docker mode, a profile collection port will also be exposed, which is mapped to port 16060 on the host machine by default. The mechanism is also to add 10000 to the specified port.
Dataway's Own Log Collection and Processing¶
Dataway's own logs are divided into two categories: one is the gin log, and the other is the Dataway's own log. The following Pipeline can separate them:
# Pipeline for dataway logging
# Testing sample loggin
'''
2023-12-14T11:27:06.744+0800 DEBUG apis apis/api_upload_profile.go:272 save profile file to disk [ok] /v1/upload/profiling?token=****************a4e3db8481c345a94fe5a
[GIN] 2021/10/25 - 06:48:07 | 200 | 30.890624ms | 114.215.200.73 | POST "/v1/write/logging?token=tkn_5c862a11111111111111111111111"
'''
add_pattern("TOKEN", "tkn_\\w+")
add_pattern("GINTIME", "%{YEAR}/%{MONTHNUM}/%{MONTHDAY}%{SPACE}-%{SPACE}%{HOUR}:%{MINUTE}:%{SECOND}")
grok(_,"\\[GIN\\]%{SPACE}%{GINTIME:timestamp}%{SPACE}\\|%{SPACE}%{NUMBER:dataway_code}%{SPACE}\\|%{SPACE}%{NOTSPACE:cost_time}%{SPACE}\\|%{SPACE}%{NOTSPACE:client_ip}%{SPACE}\\|%{SPACE}%{NOTSPACE:method}%{SPACE}%{GREEDYDATA:http_url}")
# gin logging
if cost_time != nil {
if http_url != nil {
grok(http_url, "%{TOKEN:token}")
cover(token, [5, 15])
replace(message, "tkn_\\w{0,5}\\w{6}", "****************$4")
replace(http_url, "tkn_\\w{0,5}\\w{6}", "****************$4")
}
group_between(dataway_code, [200,299], "info", status)
group_between(dataway_code, [300,399], "notice", status)
group_between(dataway_code, [400,499], "warning", status)
group_between(dataway_code, [500,599], "error", status)
if sample(0.1) { # drop 90% debug log
drop()
exit()
} else {
set_tag(sample_rate, "0.1")
}
parse_duration(cost_time)
duration_precision(cost_time, "ns", "ms")
set_measurement('gin', true)
set_tag(service,"dataway")
exit()
}
# app logging
if cost_time == nil {
grok(_,"%{TIMESTAMP_ISO8601:timestamp}%{SPACE}%{NOTSPACE:status}%{SPACE}%{NOTSPACE:module}%{SPACE}%{NOTSPACE:code}%{SPACE}%{GREEDYDATA:msg}")
if level == nil {
grok(message,"Error%{SPACE}%{DATA:errormsg}")
if errormsg != nil {
add_key(status,"error")
drop_key(errormsg)
}
}
lowercase(level)
# if debug level enabled, drop most of them
if status == 'debug' {
if sample(0.1) { # drop 90% debug log
drop()
exit()
} else {
set_tag(sample_rate, "0.1")
}
}
group_in(status, ["error", "panic", "dpanic", "fatal","err","fat"], "error", status) # mark them as 'error'
if msg != nil {
grok(msg, "%{TOKEN:token}")
cover(token, [5, 15])
replace(message, "tkn_\\w{0,5}\\w{6}", "****************$4")
replace(msg, "tkn_\\w{0,5}\\w{6}", "****************$4")
}
set_measurement("dataway-log", true)
set_tag(service,"dataway")
}
Dataway Bug Report¶
Dataway exposes its own metrics and profiling collection endpoints, allowing us to gather this information for troubleshooting purposes.
The following information collection should based on actual configured ports and addresses. These listed commands are based on default configurations.
br_dir="dw-br-$(date +%s)"
mkdir -p $br_dir
echo "Save bug report to ${br_dir}"
# Modify the following configurations according to your actual situation
dw_ip="localhost" # The IP address where Dataway's metrics/profile is exposed
metric_port=9090 # The port where metrics are exposed
profile_port=6060 # The port where profiling information is exposed
dw_yaml_conf="/usr/local/cloudcare/dataflux/dataway/dataway.yaml"
dw_dot_yaml_conf="/usr/local/cloudcare/dataflux/dataway/.dataway.yaml"
# Collect runtime metrics
curl -v "http://${dw_ip}:${metric_port}/metrics" -o $br_dir/metrics
# Collect profiling information
curl -v "http://${dw_ip}:${profile_port}/debug/pprof/allocs" -o $br_dir/allocs
curl -v "http://${dw_ip}:${profile_port}/debug/pprof/heap" -o $br_dir/heap
curl -v "http://${dw_ip}:${profile_port}/debug/pprof/profile" -o $br_dir/profile # This command will take about 30 seconds to run
cp $dw_yaml_conf $br_dir/dataway.yaml.copy
cp $dw_dot_yaml_conf $br_dir/.dataway.yaml.copy
tar czvf ${br_dir}.tar.gz ${br_dir}
rm -rf ${br_dir}
Run the script:
After execution, a file similar to dw-br-1721188604.tar.gz will be generated. You can then retrieve this file for further use.
FAQ¶
Request Entity Too Large Issue¶
Dataway has a default setting for the size of the request body (default is 64MB), but when the request body is too large, the client will receive an HTTP 413 error (Request Entity Too Large). If the request body is within a reasonable range, you can appropriately increase this value (unit is bytes):
- Set the environment variable
DW_MAX_HTTP_BODY_BYTESfor Kubernetes Pod install - In dataway.yaml, set
max_http_body_bytesfor host install
If there is a request that is too large during runtime, it is reflected in both metrics and logs:
- The metric
dataway_http_too_large_dropped_totalexposes the number of discarded large requests - Search the Dataway logs with
cat log | grep 'drop too large request'. The logs will output the details of the HTTP request Header, which is helpful for further understanding the client situation
Warning
In the disk cache module, there is also a maximum data block write limit (default 64MB). If you increase the maximum request body configuration, you should also adjust this configuration accordingly (ENV_DISKCACHE_MAX_DATA_SIZE), to ensure that large requests can be correctly written to the disk cache.
-
This limite will cause Dataway's performance decline. Under high payload, we should increase CPU limit or add more Dataway instances. ↩