Skip to content

Dataway


Introduction

DataWay is the data gateway of the Guance, and the collector needs to pass through the DataWay gateway to report data to the Guance.

Dataway Installation

  • New Dataway

On the Data Gateways page in the Guance management console, click Create Dataway. Enter a name and binding address, and then click Create.

After the creation is successful, a new Dataway is automatically created and the installation script for the Dataway is generated.

Info

The binding address is the Dataway gateway address, which must be filled in as a complete HTTP address, such as http(s)://1.2.3.4:9528, including the protocol, host address and port, the host address can generally use the IP address of the Dataway machine deployed or specified as a domain name, and the domain name needs to be resolved.

Note: Make sure that the collector can access the address, otherwise the data collection will not be successful)

  • Install Dataway
DW_KODO=http://kodo_ip:port \
   DW_TOKEN=<tkn_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX> \
   DW_UUID=<YOUR_UUID> \
   bash -c "$(curl https://static.guance.com/dataway/install.sh)"

Install Dataway on host is deprecated, we should install Dataway in Kubernetes as statefulset.

After the installation is complete, in the installation directory, dataway.yaml will be generated, the content of which can be manually modified and take effect by restarting the service.

dataway.yaml
# ============= DATAWAY CONFIG =============

# Dataway UUID, we can get it on during create a new dataway
uuid:

# It's the workspace token, most of the time, it's
# system worker space's token.
token:

# secret_token used under sinker mode, and to check if incomming datakit
# requests are valid.
secret_token:

# If __internal__ token allowed? If ok, the data/request will direct to
# the workspace with the token above
enable_internal_token: false

# is empty token allowed? If ok, the data/request will direct to
# the workspace with the token above
enable_empty_token: false

# Is dataway cascaded? For cascaded Dataway, it's remote_host is
# another Dataway and not Kodo.
cascaded: false

# kodo(next dataway) related configures
remote_host:
http_timeout: 3s

http_max_idle_conn_perhost: 0 # default to CPU cores
http_max_conn_perhost: 0      # default no limit
kodo_queue:
  enabled: true
  workers: 256
  queue_size: 1024
  queue_max_bytes: 1073741824 # 1GB
  enqueue_timeout: 100ms

insecure_skip_verify: false
http_client_trace: false
sni: ""

# dataway API configures
bind: 0.0.0.0:9528

# disable 404 page
disable_404page: false

# dataway TLS file path
tls_crt:
tls_key:

# enable pprof
pprof_bind: localhost:6060

api_limit_rate : 100000         # 100K
max_http_body_bytes : 67108864  # 64MB
copy_buffer_drop_size : 262144  # 256KB, if copy buffer memory larger than this, this memory released
reserved_pool_size: 4096        # reserved pool size for better GC

within_docker: false

log_level: info
log: log
gin_log: gin.log

ip_blacklist:
  ttl = "1m"
  clean_interval = "1h"

cache_cfg:
  # cache disk path
  dir: "disk_cache"

  # disable cache
  disabled: false

  clean_interval: "1s"

  # in MB, max single data package size in disk cache, such as HTTP body
  max_data_size: 100

  # in MB, single disk-batch(single file) size
  batch_size: 128

  # in MB, max disk size allowed to cache data
  max_disk_size: 65535

  # expire duration, default 7 days
  expire_duration: "168h"

prometheus:
  listen: "localhost:9090"
  url: "/metrics"
  enable: true

#sinker:
#  cache_options:
#    prealloc: true
#    reserved_capacity: 10000000 # max cached items
#    buckets: 64
#    ttl: 10m # clear unactive matches
#  etcd:
#    urls:
#    - http://localhost:2379 # one or multiple etcd host
#    dial_timeout: 30s
#    key_space: "/dw_sinker" # subscribe to the etcd key
#    username: "dataway"
#    password: "<PASSWORD>"
#  file:
#    path: /path/to/sinker.json

Dataway pod yaml:

dataway-statefulset.yaml
---

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app: sts-utils-dataway
  name: dataway
  namespace: utils
spec:
  replicas: 2
  selector:
    matchLabels:
      app: sts-utils-dataway
  serviceName: dataway
  template:
    metadata:
      annotations:
        datakit/logs: |
          [
            {
              "disable": false,
              "source": "dataway",
              "service": "dataway",
              "multiline_match": "^\\d{4}|^\\[GIN\\]"
            }
          ]
        datakit/prom.instances: |
          [[inputs.prom]]
            url = "http://$IP:9090/metrics"

            source = "dataway"
            measurement_name = "dw"
            interval = "10s"
            disable_instance_tag = true
          [inputs.prom.tags]
            service = "dataway"
            instance = "$PODNAME" # we can set as "xxx-$PODNAME"
      labels:
        app: sts-utils-dataway
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: app
                    operator: In
                    values:
                      - sts-utils-dataway
              topologyKey: kubernetes.io/hostname
      containers:
        - env:
            - name: DW_REMOTE_HOST
              value: http://kodo.forethought-kodo:9527
            - name: DW_BIND
              value: 0.0.0.0:9528
            - name: DW_UUID
              value: agnt_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx   # Dataway UUID
            - name: DW_TOKEN
              value: tkn_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx  # Dataway token
            - name: DW_PROM_LISTEN
              value: 0.0.0.0:9090
            - name: DW_LOG
              value: stdout
            - name: DW_LOG_LEVEL
              value: info
            - name: DW_GIN_LOG
              value: stdout
            - name: DW_DISKCACHE_DIR
              value: cache
            - name: DW_HTTP_TIMEOUT
              value: '3s'
            - name: DW_ENABLE_INTERNAL_TOKEN
              value: 'false'
            - name: DW_MAX_HTTP_BODY_BYTES
              value: '67108864'
            - name: DW_HTTP_CLIENT_TRACE
              value: 'on'
            - name: DW_RESERVED_POOL_SIZE
              value: '0'
            - name: DW_COPY_BUFFER_DROP_SIZE
              value: '262144'
            - name: DW_DISKCACHE_CAPACITY_MB
              value: 102400
          image: pubrepo.guance.com/dataflux/dataway:1.15.0
          imagePullPolicy: IfNotPresent
          name: dataway
          ports:
            - containerPort: 9528
              name: 9528tcp01
              protocol: TCP
          resources:
            limits:
              cpu: '4'
              memory: 4Gi
            requests:
              cpu: 100m
              memory: 512Mi
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
            - mountPath: /usr/local/cloudcare/dataflux/dataway/cache
              name: dataway-cache
      dnsPolicy: ClusterFirst
      imagePullSecrets: []
      #nodeSelector:
      #  nodepool: dataway
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      #tolerations:
      #  - effect: NoSchedule
      #    key: nodepool
      #    operator: Equal
      #    value: dataway
  updateStrategy:
    rollingUpdate:
      partition: 0
    type: RollingUpdate
  volumeClaimTemplates:
    - apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        name: dataway-cache
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 100Gi
        storageClassName: xxxxxx  # High-Performance Storage StorageClass
        volumeMode: Filesystem
      status:
        phase: Pending

---

apiVersion: v1
kind: Service
metadata:
  name: dataway
  namespace: utils
spec:
  ports:
    - name: 9528tcp02
      nodePort: 30928
      port: 9528
      protocol: TCP
      targetPort: 9528
  selector:
    app: sts-utils-dataway
  type: NodePort
---

apiVersion: v1
kind: Service
metadata:
  name: dataway
  namespace: utils
spec:
  ports:
  - name: 9528tcp02
    port: 9528
    protocol: TCP
    targetPort: 9528
    nodePort: 30928
  selector:
    app: deployment-utils-dataway
  type: NodePort

In dataway-statefulset.yaml, you can modify the Dataway configuration through environment variables, see [here] (dataway.md#img-envs).

For DataWay docker install, the environments are the same as in Kubernetes. We can start a DataWay docker container like this:

docker run -d \
    --name <YOUR-DW-IN-DOCKER> \
    -p 19528:9528 -p 19090:9090 \
    --mount type=bind,source=<host/path/for/diskcache>,target=/usr/local/cloudcare/dataflux/dataway/cache \
    --memory=2g --memory-reservation=256m \
    --cpus="2" \
    -e DW_UUID=<YOUR-AGNT_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX> \
    -e DW_TOKEN=<YOUR-TKN_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX> \
    -e DW_REMOTE_HOST=http://kodo.forethought-kodo:9527 \
    -e DW_BIND=0.0.0.0:9528 \
    -e DW_PROM_LISTEN=0.0.0.0:9090 \
    -e DW_HTTP_CLIENT_TRACE=true \
    -e DW_LOG_LEVEL=info \
    -e DW_LOG=stdout \
    -e DW_GIN_LOG=stdout \
    -e DW_DISKCACHE_CAPACITY_MB=65536 \
    pubrepo.guance.com/dataflux/dataway:1.15.0

Notes
  • Can only run on Linux systems
  • When the host is installed, the Dataway installation path is /usr/local/cloudcare/dataflux/dataway
  • Verify Dataway installation

After installation, wait for a while to refresh the "Data Gateway" page, if you see the version number in the "Version Information" column of the data gateway you just added, it means that the Dataway has been successfully connected to the Guance center, and front-end users can access data through it.

After Dataway is successfully connected to the Guance center, log in to the Guance console, view all Dataway addresses on the "Integration" / DataKit page, select the required Dataway gateway address, and obtain the DataKit installation command to execute on the server to start collecting data.

Manage DataWay

Delete DataWay

On the "Data Gateway" page of the Guance management background, select the DataWay to be deleted, click "Configure", and click the "Delete" button in the lower left corner of the pop-up Edit DataWay dialog box.

Warning

After deleting DataWay, you also need to log in to the server where the DataWay gateway is deployed to stop the operation of DataWay, and then delete the installation directory to completely delete the DataWay.

Upgrade DataWay

On the Data Gateways page in the Guance management background, if an upgradeable version exists for DataWay, an upgrade prompt appears in the version information.

DW_UPGRADE=1 bash -c "$(curl https://static.guance.com/dataway/install.sh)"

Replace the image version directly:

- image: pubrepo.guance.com/dataflux/dataway:1.15.0

Dataway Service Management

When the host installs Dataway, you can use the following command to manage the Dataway service:

# Start
$ systemctl start dataway

# Reboot
$ systemctl restart dataway

# Stop
$ systemctl stop dataway

Kubernetes can restart the corresponding pod.

Environment variable

Docker image environment variable

When Dataway runs in a Kubernetes environment, it supports the following environment variables.

Compatible with lagacy dataway.yaml

Some old Dataway's configure imported by ConfigMap(and mount to install path with the name of dataway.yaml). After Dataway image started, if detect the file dataway.yaml, all the configures from DW_* are ignored and only apply the lagacy dataway.yaml. We can remove the ConfigMap on dataway.yaml to recover these environment configures.

If these environment configures applied, there was a hidden file .dataway.ayml(view them via ls -a) under install path, we can cat it to check if all these environment configures applied ok.

HTTP Server Settings

Env Description
DW_REMOTE_HOST
type: string
required: Y
Kodo address, or next Dataway address, format: http://host:port
DW_WHITE_LIST
type: string
required: N
Dataway client IP whitelist, comma-separated
DW_HTTP_TIMEOUT
type: string
required: N
Dataway request timeout to Kodo or next Dataway, default 3s
DW_HTTP_MAX_IDLE_CONN_PERHOST
type: int
required: N
Dataway maximum idle connections to Kodo Version-1.6.2
Default 1000 Version-1.11.2
DW_HTTP_MAX_CONN_PERHOST
type: int
required: N
Dataway maximum connections to Kodo, default unlimited Version-1.6.2
DW_KODO_QUEUE_ENABLED
type: boolean
required: N
Enable the bounded dispatch queue for write/upload requests sent to Kodo or the next Dataway, default true
DW_KODO_QUEUE_WORKERS
type: int
required: N
Number of Kodo dispatch queue workers, default 256
DW_KODO_QUEUE_SIZE
type: int
required: N
Maximum number of requests waiting in the Kodo dispatch queue, default 1024
DW_KODO_QUEUE_MAX_BYTES
type: int/string
required: N
Maximum total request body bytes reserved by queued, in-flight, and waiting-to-enqueue Kodo queue requests. Supports values such as 1073741824, 1GB, or 1024MB; default 1GB
DW_KODO_QUEUE_ENQUEUE_TIMEOUT
type: string
required: N
Maximum time a request waits for a Kodo queue slot before Dataway returns 503, default 100ms
DW_BIND
type: string
required: N
Dataway HTTP API binding address, default 0.0.0.0:9528
DW_API_LIMIT
type: int
required: N
Dataway API rate limiting, e.g., 1000 means each API can only be requested 1000 times per second, default 100K
DW_HEARTBEAT
type: string
required: N
Dataway heartbeat interval with center, default 60s
DW_MAX_HTTP_BODY_BYTES
type: int
required: N
Dataway API maximum allowed HTTP Body (in bytes), default 64MB
DW_TLS_INSECURE_SKIP_VERIFY
type: boolean
required: N
Ignore HTTPS/TLS certificate errors
DW_HTTP_CLIENT_TRACE
type: boolean
required: N
Dataway as HTTP client can enable related metrics collection, which will be output in its Prometheus metrics
DW_ENABLE_TLS
type: boolean
required: N
Enable HTTPS Version-1.4.1
DW_TLS_CRT
type: file-path
required: N
Specify HTTPS/TLS crt file directory Version-1.4.0
DW_TLS_KEY
type: file-path
required: N
Specify HTTPS/TLS key file directory Version-1.4.0
DW_SNI
type: string
required: N
Specify current Dataway SNI information Version-1.6.0
DW_DISABLE_404PAGE
type: boolean
required: N
Disable 404 page Version-1.6.1
DW_HTTP_IP_BLACKLIST_TTL
type: string
required: N
Set IP blacklist TTL, default 1m Version-1.11.0
DW_HTTP_IP_BLACKLIST_CLEAN_INTERVAL
type: string
required: N
Set IP blacklist cleanup interval, default 1h Version-1.11.0
HTTP TLS Settings

To generate a TLS certificate with a one-year validity period, you can use the following OpenSSL command:

# Generate a TLS certificate with a one-year validity
$ openssl req -new -newkey rsa:4096 -x509 -sha256 -days 365 -nodes -out tls.crt -keyout tls.key
...

After executing this command, the system will prompt you to enter some necessary information, including your country, region, city, organization name, department name, and your email address. This information will be included in your certificate.

After completing the information input, you will generate two files: tls.crt (certificate file) and tls.key (private key file). Please keep your private key file secure and safe.

In order for the application to use these TLS certificates, you need to set the absolute paths of these two files to the application's environment variables. Here is an example of setting environment variables:

DW_ENABLE_TLS must be turned on first, and then the other two ENVs (DW_TLS_CRT/DW_TLS_KEY) will take effect. Version-1.4.1

env:
- name: DW_ENABLE_TLS
  value: "true"
- name: DW_TLS_CRT
  value: "/path/to/your/tls.crt"
- name: DW_TLS_KEY
  value: "/path/to/your/tls.key"

Replace /path/to/your/tls.crt and /path/to/your/tls.key with the actual paths to your tls.crt and tls.key files, respectively.

After setting up the environment variables, you can test whether TLS is functioning correctly with the following command:

curl -k https://localhost:9528

If successful, you should see an ASCII art message indicating "It's working!" If the certificates are missing, you might encounter an error in the Dataway logs similar to:

server listen(TLS) failed: open /path/to/your/tls.{crt,key}: no such file or directory

In this case, Dataway would not start, and the curl command would also result in an error:

curl: (7) Failed to connect to localhost port 9528 after 6 ms: Couldn't connect to server

Ensure that the paths to the TLS certificate and key are correctly specified and that the files have the appropriate permissions for the application to read them.

Log Settings

Env Description
DW_LOG
type: string
required: N
Log path, defaults to log. Set to stdout to output logs to standard output for easier log collection
DW_LOG_LEVEL
type: string
required: N
Default is info, options include debug
DW_GIN_LOG
type: string
required: N
Default is gin.log, can also be set to stdout for easier collection
DW_LOG_PKG_ID
type: bool
required: N
Version-1.12.0 Enable log package id or not, default true

Token/UUID Settings

Env Description
DW_UUID
type: string
required: Y
Dataway UUID, generated by the system workspace when creating a new Dataway
DW_TOKEN
type: string
required: Y
Usually the data upload token of the system workspace
DW_SECRET_TOKEN
type: string
required: N
Can be set when Sinker functionality is enabled
DW_ENABLE_INTERNAL_TOKEN
type: boolean
required: N
Allow using __internal__ as client token, defaults to system workspace token
DW_ENABLE_EMPTY_TOKEN
type: boolean
required: N
Allow data upload without token, defaults to system workspace token

Sinker Settings

Env Description
DW_SECRET_TOKEN
type: string
required: N
Can be set when Sinker functionality is enabled
DW_CASCADED
type: string
required: N
Whether Dataway is cascaded
DW_SINKER_ETCD_URLS
type: string
required: N
etcd address list, comma-separated, e.g., http://1.2.3.4:2379,http://1.2.3.4:2380
DW_SINKER_ETCD_DIAL_TIMEOUT
type: string
required: N
etcd connection timeout, default 30s
DW_SINKER_ETCD_KEY_SPACE
type: string
required: N
etcd key name for Sinker configuration (default /dw_sinker)
DW_SINKER_ETCD_USERNAME
type: string
required: N
etcd username
DW_SINKER_ETCD_PASSWORD
type: string
required: N
etcd password
DW_SINKER_FILE_PATH
type: file-path
required: N
Specify sinker rule configuration via local file
DW_SINKER_CACHE_BUCKETS
type: int
required: N*
Version-1.11.0 Specifies the number of Sinker cache buckets, default 64
DW_SINKER_CACHE_RESERVED_CAPACITY
type: int
required: N*
Version-1.11.0 Specifies the maximum capacity of Sinker cache, default 1 million (1<<20)
DW_SINKER_CACHE_TTL
type: duration
required: N*
Version-1.11.0 Specifies the TTL of Sinker cache elements, default 10m (10 minutes)
DW_SINKER_CACHE_PREALLOC
type: bool
required: N*
Version-1.11.0 Pre-allocates cache memory, default false
Warning

If both local file and etcd methods are specified, the Sinker rules in the local file take priority.

Prometheus Metrics Exposure

Env Description
DW_PROM_URL
type: string
required: N
Prometheus metrics URL path (default /metrics)
DW_PROM_LISTEN
type: string
required: N
Prometheus metrics exposure address (default localhost:9090)
DW_PROM_DISABLED
type: boolean
required: N
Disable Prometheus metrics exposure

Disk Cache Settings

Env Description
DW_DISKCACHE_DIR
type: file-path
required: N
Set cache directory, this directory should generally be mounted storage
DW_DISKCACHE_DISABLE
type: boolean
required: N
Disable disk cache, if cache is not disabled, remove this environment variable
DW_DISKCACHE_CLEAN_INTERVAL
type: string
required: N
Cache cleanup interval, default 1s
DW_DISKCACHE_EXPIRE_DURATION
type: string
required: N
Cache expiration time, default 168h (7d)
DW_DISKCACHE_CAPACITY_MB
type: int
required: N
Version-1.6.0 Set available disk space size in MB, default 20GB
DW_DISKCACHE_BATCH_SIZE_MB
type: int
required: N
Version-1.6.0 Set maximum size of single disk cache file in MB, default 64MB
DW_DISKCACHE_MAX_DATA_SIZE_MB
type: int
required: N
Version-1.6.0 Set maximum size of single cache content (e.g., single HTTP body) in MB, default 64MB. Data packets exceeding this size will be discarded
Tips

Set DW_DISKCACHE_DISABLE to disable disk cache.

Performance Settings

Version-1.6.0

Env Description
DW_COPY_BUFFER_DROP_SIZE
type: int
required: N
HTTP body buffers exceeding specified size (in bytes) will be immediately cleared to avoid excessive memory consumption. Default 256KB

Dataway API List

Details of each API below are to be added.

GET /v1/ping

Version-1.11.0

  • API description: Get current dataway version and release date, also return the public IP address of your request

If 404 page disabled(disable_404page), this API not working.

GET /v1/ntp/

Version-1.6.0

  • API description: Get current Dataway unix timestamp(unit: second)

POST /v1/write/:category

  • API description: Receive various collection data uploaded by Datakit

GET /v1/datakit/pull

  • API description: Handles Datakit pull center configuration (blacklist/pipeline) requests

POST /v1/write/rum/replay

  • API description: Receive Session Replay data uploaded by Datakit

POST /v1/upload/profiling

  • API description: Receive profiling data uploaded by Datakit

POST /v1/election

  • API description: Handles election requests for Datakit

POST /v1/election/heartbeat

  • API description: Handles election heartbeat requests for Datakit

POST /v1/query/raw

  • API description: Handles DQL query requests initiated by the Datakit side

POST /v1/workspace

  • API description: Handles workspace query requests initiated by Datakit

POST /v1/object/labels

  • API description: Handles requests to modify object labels

DELETE /v1/object/labels

  • API description: Handles delete object Label requests

GET /v1/check/:token

  • API description: Detect if tokken is legitimate

Dataway metrics collection

HTTP client metrics collection

If you want to collect metrics for Dataway HTTP requests to Kodo (or Dataway next hop), you need to manually enable the http_client_trace configuration. You can also specify DW_HTTP_CLIENT_TRACE=true during the installation phase.

Dataway itself exposes Prometheus metrics, which can be collected through Datakit's built-in prom collector, which is configured as follows:

[[inputs.prom]]
  ## Exporter URLs.
  urls = [ "http://localhost:9090/metrics", ]

  source = "dataway"

  election = true

  ## Dataway metric set fixed to dw, do not change
  measurement_name = "dw"

You can add annotations on pods (requires [Datakit 1.14.2] (../datakit/changelog.md#cl-1.14.2) or above):

annotations:
   datakit/prom.instances: |
     [[inputs.prom]]
       url = "http://$IP:9090/metrics" # Here the port (default 9090) is as appropriate
       source = "dataway"
       measurement_name = "dw" # pinned to this metric set
       interval = "30s"

       [inputs.prom.tags]
         namespace = "$NAMESPACE"
         pod_name = "$PODNAME"
         node_name = "$NODENAME"

If the collection is successful, search for dataway in the "Scene"/"Built-in View" of the Guance to see the corresponding monitoring view.

Dataway Metric List

The following are the indicators exposed by Dataway, which can be obtained by requesting http://localhost:9090/metrics, and you can view (3s) a specific indicator in real time by following the following command:

If some metrics cannot be queried, it may be caused by the relevant business module not running.

watch -n 3 'curl -s http://localhost:9090/metrics | grep -a <METRIC-NAME>'
TYPE NAME LABELS HELP
SUMMARY dataway_kodo_queue_wait_seconds api,method Kodo queue wait duration before worker dispatch
SUMMARY dataway_http_api_elapsed_seconds api,method,sinked,status API request latency
SUMMARY dataway_http_api_body_buffer_utilization api API body buffer utillization(Len/Cap)
SUMMARY dataway_http_api_body_copy api API body copy
SUMMARY dataway_http_api_body_copy_seconds api API body copy latency
SUMMARY dataway_http_api_body_copy_enlarge api API body copy enlarged pooled buffer
SUMMARY dataway_http_api_resp_size_bytes api,method,status API response size
SUMMARY dataway_http_api_req_size_bytes api,method,status API request size
COUNTER dataway_http_api_body_too_large_dropped_total api,method API request too large dropped
COUNTER dataway_http_api_with_inner_token api,method API request with inner token
COUNTER dataway_http_api_dropped_total api,method API request dropped when sinker rule match failed
COUNTER dataway_ip_blacklist_blocked_total api,method IP blacklist blocked requests total
COUNTER dataway_ip_blacklist_missed_total api,method IP blacklist missed total
COUNTER dataway_ip_blacklist_added_total api,method,reason IP blacklist added total
COUNTER dataway_syncpool_stats name,type sync.Pool usage stats
COUNTER dataway_http_api_copy_body_failed_total api API copy body failed count
COUNTER dataway_http_api_signed_total api,method API signature count
SUMMARY dataway_http_api_cached_bytes api,cache_type,method,reason API cached body bytes
SUMMARY dataway_http_api_reusable_body_read_bytes api,method API re-read body on forking request
SUMMARY dataway_http_api_recv_points api API /v1/write/:category recevied points
SUMMARY dataway_http_api_send_points api API /v1/write/:category send points
SUMMARY dataway_http_api_cache_points api,cache_type Disk cached /v1/write/:category points
SUMMARY dataway_http_api_cache_cleaned_points api,cache_type,status Disk cache cleaned /v1/write/:category points
COUNTER dataway_http_api_forked_total api,method,token API request forked total
GAUGE dataway_http_cli_info max_conn_per_host,max_idle_conn,max_idle_conn_per_host,timeout Dataway as client settings
GAUGE dataway_http_info cascaded,docker,http_client_trace,listen,max_body,release_date,remote,version Dataway API basic info
GAUGE dataway_kodo_queue_depth N/A Current Kodo dispatch queue depth including in-flight tasks
GAUGE dataway_kodo_queue_bytes N/A Current Kodo dispatch queue body bytes including in-flight tasks
COUNTER dataway_kodo_queue_enqueued_total api,method Kodo queue enqueued tasks
COUNTER dataway_kodo_queue_full_total api,method,action Kodo queue full events
COUNTER dataway_kodo_queue_dispatch_total api,method,status Kodo queue dispatch results
GAUGE dataway_last_heartbeat_time N/A Dataway last heartbeat with Kodo timestamp
SUMMARY dataway_http_api_copy_buffer_drop_total max API copy buffer dropped(too large cached buffer) count
GAUGE dataway_cpu_usage N/A Dataway CPU usage(%)
GAUGE dataway_mem_stat type Dataway memory usage stats
GAUGE dataway_open_files N/A Dataway open files
GAUGE dataway_cpu_cores N/A Dataway CPU cores
GAUGE dataway_uptime N/A Dataway uptime
COUNTER dataway_process_ctx_switch_total type Dataway process context switch count(Linux only)
COUNTER dataway_process_io_count_total type Dataway process IO count
COUNTER dataway_process_io_bytes_total type Dataway process IO bytes count
SUMMARY dataway_http_api_dropped_cache api,method,reason Dropped cache data dur to various reasons
COUNTER dataway_http_api_body_size_bytes_total api,token Accumulated API body bytes for aggregate or tailSampling
COUNTER dataway_http_aggr_point_total api,token point count of aggregate or tailSampling
COUNTER dataway_http_tail_sampling_trace_total token tailSampling trace count
COUNTER dataway_http_tail_sampling_span_total token tailSampling span count
COUNTER dataway_http_tail_sampling_packet_send_total token,data_type,result tailSampling packet send result count
GAUGE dataway_httpcli_dns_resolved_address api,coalesced,host,server HTTP DNS resolved address
SUMMARY dataway_httpcli_dns_cost_seconds api,coalesced,host,server HTTP DNS cost
SUMMARY dataway_httpcli_tls_handshake_seconds api,server HTTP TLS handshake cost
SUMMARY dataway_httpcli_http_connect_cost_seconds api,server HTTP connect cost
SUMMARY dataway_httpcli_got_first_resp_byte_cost_seconds api,server Got first response byte cost
SUMMARY http_latency api,server HTTP latency
COUNTER dataway_httpcli_tcp_conn_total api,server,remote,type HTTP TCP connection count
COUNTER dataway_httpcli_conn_reused_from_idle_total api,server HTTP connection reused from idle count
SUMMARY dataway_httpcli_conn_idle_time_seconds api,server HTTP connection idle time
GAUGE dataway_sinker_rule_cache_size name Sinker rule cache size
GAUGE dataway_sinker_rule_error error Rule errors
GAUGE dataway_sinker_default_rule_hit info Default sinker rule hit count
GAUGE dataway_sinker_rule_last_applied_time source,version Rule last applied time(Unix timestamp)
SUMMARY dataway_sinker_rule_cost_seconds type Rule cost time seconds
SUMMARY dataway_sinker_lru_cache_cleaned name Sinker LRU cache cleanup removed entries
SUMMARY dataway_sinker_lru_cache_dropped_ttl_seconds bucket,name,reason Sinker LRU cache dropped TTL seconds
COUNTER dataway_sinker_pull_total event,source Sinker pulled or pushed total
GAUGE dataway_sinker_rule_count type,with_default Sinker rule count
GAUGE dataway_sinker_rule_cache_get_total name,type Sinker rule cache get hit/miss count
COUNTER diskcache_rotate_total path Cache rotate count, mean file rotate from data to data.0000xxx
COUNTER diskcache_remove_total path Removed file count, if some file read EOF, remove it from un-read list
COUNTER diskcache_wakeup_total path Wakeup count on sleeping write file
COUNTER diskcache_pos_updated_total op,path .pos file updated count
COUNTER diskcache_seek_back_total path Seek back when Get() got any error
GAUGE diskcache_capacity path Current capacity(in bytes)
GAUGE diskcache_max_data path Max data to Put(in bytes), default 0
GAUGE diskcache_batch_size path Data file size(in bytes)
GAUGE diskcache_size path Current cache size that waiting to be consumed(get). The size include header bytes
GAUGE diskcache_open_time no_fallback_on_error,no_lock,no_pos,no_sync,path Current cache Open time in unix timestamp(second)
GAUGE diskcache_last_close_time path Current cache last Close time in unix timestamp(second)
GAUGE diskcache_datafiles path Current un-read data files
HISTOGRAM diskcache_lock_wait_seconds lock_type,path Time spent waiting for locks by lock type
COUNTER diskcache_lock_contention_total lock_type,path Number of lock contention events
SUMMARY diskcache_get_latency path Get() cost seconds
SUMMARY diskcache_put_latency path Put() cost seconds
SUMMARY diskcache_put_bytes path Cache Put() bytes
SUMMARY diskcache_get_bytes path Cache Get() bytes
SUMMARY diskcache_dropped_data path,reason Dropped data during Put() when capacity reached.

Metrics under Docker

There are two modes for non-Kubernetes, host mode and Docker mode. This section will specifically discuss the differences in metrics collection when installing in Docker.

When installed in docker, the HTTP port that exposes metrics will be mapped to port 19090 on the host machine (by default). In this case, the metrics collection address is http://localhost:19090/metrics.

If a different port is specified, the installer will add 10000 to the specified port during installation. Therefore, the specified port should not exceed 45535.

In addition, when installed in Docker mode, a profile collection port will also be exposed, which is mapped to port 16060 on the host machine by default. The mechanism is also to add 10000 to the specified port.

Dataway's Own Log Collection and Processing

Dataway's own logs are divided into two categories: one is the gin log, and the other is the Dataway's own log. The following Pipeline can separate them:

# Pipeline for dataway logging

# Testing sample loggin
'''
2023-12-14T11:27:06.744+0800    DEBUG   apis    apis/api_upload_profile.go:272  save profile file to disk [ok] /v1/upload/profiling?token=****************a4e3db8481c345a94fe5a
[GIN] 2021/10/25 - 06:48:07 | 200 |   30.890624ms |  114.215.200.73 | POST     "/v1/write/logging?token=tkn_5c862a11111111111111111111111"
'''

add_pattern("TOKEN", "tkn_\\w+")
add_pattern("GINTIME", "%{YEAR}/%{MONTHNUM}/%{MONTHDAY}%{SPACE}-%{SPACE}%{HOUR}:%{MINUTE}:%{SECOND}")
grok(_,"\\[GIN\\]%{SPACE}%{GINTIME:timestamp}%{SPACE}\\|%{SPACE}%{NUMBER:dataway_code}%{SPACE}\\|%{SPACE}%{NOTSPACE:cost_time}%{SPACE}\\|%{SPACE}%{NOTSPACE:client_ip}%{SPACE}\\|%{SPACE}%{NOTSPACE:method}%{SPACE}%{GREEDYDATA:http_url}")

# gin logging
if cost_time != nil {
  if http_url != nil  {
    grok(http_url, "%{TOKEN:token}")
    cover(token, [5, 15])
    replace(message, "tkn_\\w{0,5}\\w{6}", "****************$4")
    replace(http_url, "tkn_\\w{0,5}\\w{6}", "****************$4")
  }

  group_between(dataway_code, [200,299], "info", status)
  group_between(dataway_code, [300,399], "notice", status)
  group_between(dataway_code, [400,499], "warning", status)
  group_between(dataway_code, [500,599], "error", status)

  if sample(0.1) { # drop 90% debug log
    drop()
    exit()
  } else {
    set_tag(sample_rate, "0.1")
  }

  parse_duration(cost_time)
  duration_precision(cost_time, "ns", "ms")

  set_measurement('gin', true)
  set_tag(service,"dataway")
  exit()
}

# app logging
if cost_time == nil {
  grok(_,"%{TIMESTAMP_ISO8601:timestamp}%{SPACE}%{NOTSPACE:status}%{SPACE}%{NOTSPACE:module}%{SPACE}%{NOTSPACE:code}%{SPACE}%{GREEDYDATA:msg}")
  if level == nil {
    grok(message,"Error%{SPACE}%{DATA:errormsg}")
    if errormsg != nil {
      add_key(status,"error")
      drop_key(errormsg)
    }
  }
  lowercase(level)

  # if debug level enabled, drop most of them
  if status == 'debug' {
    if sample(0.1) { # drop 90% debug log
      drop()
      exit()
    } else {
      set_tag(sample_rate, "0.1")
    }
  }

  group_in(status, ["error", "panic", "dpanic", "fatal","err","fat"], "error", status) # mark them as 'error'

  if msg != nil {
    grok(msg, "%{TOKEN:token}")
    cover(token, [5, 15])
    replace(message, "tkn_\\w{0,5}\\w{6}", "****************$4")
    replace(msg, "tkn_\\w{0,5}\\w{6}", "****************$4")
  }

  set_measurement("dataway-log", true)
  set_tag(service,"dataway")
}

Dataway Bug Report

Dataway exposes its own metrics and profiling collection endpoints, allowing us to gather this information for troubleshooting purposes.

The following information collection should based on actual configured ports and addresses. These listed commands are based on default configurations.

dw-bug-report.sh
br_dir="dw-br-$(date +%s)"
mkdir -p $br_dir

echo "Save bug report to ${br_dir}"

# Modify the following configurations according to your actual situation
dw_ip="localhost" # The IP address where Dataway's metrics/profile is exposed
metric_port=9090  # The port where metrics are exposed
profile_port=6060 # The port where profiling information is exposed
dw_yaml_conf="/usr/local/cloudcare/dataflux/dataway/dataway.yaml"
dw_dot_yaml_conf="/usr/local/cloudcare/dataflux/dataway/.dataway.yaml"

# Collect runtime metrics
curl -v "http://${dw_ip}:${metric_port}/metrics" -o $br_dir/metrics

# Collect profiling information
curl -v "http://${dw_ip}:${profile_port}/debug/pprof/allocs" -o $br_dir/allocs
curl -v "http://${dw_ip}:${profile_port}/debug/pprof/heap" -o $br_dir/heap
curl -v "http://${dw_ip}:${profile_port}/debug/pprof/profile" -o $br_dir/profile # This command will take about 30 seconds to run

cp $dw_yaml_conf $br_dir/dataway.yaml.copy
cp $dw_dot_yaml_conf $br_dir/.dataway.yaml.copy

tar czvf ${br_dir}.tar.gz ${br_dir}
rm -rf ${br_dir}

Run the script:

$ sh dw-bug-report.sh
...

After execution, a file similar to dw-br-1721188604.tar.gz will be generated. You can then retrieve this file for further use.

FAQ

Request Entity Too Large Issue

Version-1.3.7

Dataway has a default setting for the size of the request body (default is 64MB), but when the request body is too large, the client will receive an HTTP 413 error (Request Entity Too Large). If the request body is within a reasonable range, you can appropriately increase this value (unit is bytes):

  • Set the environment variable DW_MAX_HTTP_BODY_BYTES for Kubernetes Pod install
  • In dataway.yaml, set max_http_body_bytes for host install

If there is a request that is too large during runtime, it is reflected in both metrics and logs:

  • The metric dataway_http_too_large_dropped_total exposes the number of discarded large requests
  • Search the Dataway logs with cat log | grep 'drop too large request'. The logs will output the details of the HTTP request Header, which is helpful for further understanding the client situation
Warning

In the disk cache module, there is also a maximum data block write limit (default 64MB). If you increase the maximum request body configuration, you should also adjust this configuration accordingly (ENV_DISKCACHE_MAX_DATA_SIZE), to ensure that large requests can be correctly written to the disk cache.


  1. This limite will cause Dataway's performance decline. Under high payload, we should increase CPU limit or add more Dataway instances. 

Feedback

Is this page helpful? ×