Best Practices for Pod Log Collection¶
Preface¶
When deploying microservices using containerization, the microservices run inside containers. A Pod is composed of one or a group of tightly coupled containers and is the smallest scheduling unit in Kubernetes.
This article lists three methods for collecting logs from Pods using DataKit.
Solution One¶
DataKit enables the Logfwd collector, which collects business container logs in Sidecar mode.
1 Enable the Logfwd Collector¶
If Kubernetes is not integrated with DataKit, please log in to Guance, go to "Integration" - "Datakit" - "Kubernetes", and use the datakit.yaml
file to integrate DataKit.
Modify the datakit.yaml
file to mount the logfwdserver.conf
file to DataKit's /usr/local/datakit/conf.d/log/
directory.
Add the following configuration to datakit.yaml
:
---
apiVersion: v1
kind: ConfigMap
metadata:
name: datakit-conf
namespace: datakit
data:
#### logfwdserver
logfwdserver.conf: |-
[inputs.logfwdserver]
## Logfwd receiver listening address and port
address = "0.0.0.0:9531"
[inputs.logfwdserver.tags]
# some_tag = "some_value"
# more_tag = "some_other_value"
Add the following to the DaemonSet resource:
- mountPath: /usr/local/datakit/conf.d/log/logfwdserver.conf
name: datakit-conf
subPath: logfwdserver.conf
2 Mount Pipeline¶
Modify the datakit.yaml
file to mount the pod-logging-demo.p
file to DataKit's /usr/local/datakit/pipeline/
directory.
Add the following to the ConfigMap resource:
pod-logging-demo.p: |-
# Log format
#2021-12-01 10:41:06.015 [http-nio-8090-exec-2] INFO c.s.d.c.HealthController - [getPing,19] - - Call ping interface
grok(_, "%{TIMESTAMP_ISO8601:time} %{NOTSPACE:thread_name} %{LOGLEVEL:status}%{SPACE}%{NOTSPACE:class_name} - \\[%{NOTSPACE:method_name},%{NUMBER:line}\\] - - %{GREEDYDATA:msg}")
default_time(time,"Asia/Shanghai")
Add the following to the DaemonSet resource:
- mountPath: /usr/local/datakit/pipeline/pod-logging-demo.p
name: datakit-conf
subPath: pod-logging-demo.p
Note: If you do not need to use Pipeline for log parsing, this step can be skipped.
3 Restart DataKit¶
4 Collect Logs with Logfwd Sidecar¶
Deploy the Logfwd image and the business image in the same Pod. Below, we use log-demo-service:v1
as the business image, generating a /data/app/logs/log.log
log file. Use Logfwd to read the log file via shared storage and send it to DataKit. Use pod-logging-demo.p
to parse the log and match multiple lines using dates.
Sample Configuration Files
apiVersion: apps/v1
kind: Deployment
metadata:
name: log-fwd-deployment
spec:
replicas: 1
selector:
matchLabels:
app: log-fwd-pod
template:
metadata:
labels:
app: log-fwd-pod
annotations:
spec:
nodeName: k8s-node2
containers:
- name: log-fwd-container
image: 172.16.0.238/df-demo/log-demo-service:v2
ports:
- containerPort: 8090
protocol: TCP
volumeMounts:
- mountPath: /data/app/logs
name: varlog
- name: logfwd
image: pubrepo.jiagouyun.com/datakit/logfwd:1.2.12
env:
- name: LOGFWD_DATAKIT_HOST
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.hostIP
- name: LOGFWD_DATAKIT_PORT
value: "9531"
- name: LOGFWD_ANNOTATION_DATAKIT_LOGS
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.annotations['datakit/logs']
- name: LOGFWD_POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: LOGFWD_POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
volumeMounts:
- mountPath: /var/log
name: varlog
- mountPath: /opt/logfwd/config
name: logfwd-config
subPath: config
restartPolicy: Always
volumes:
- name: varlog
emptyDir: {}
- configMap:
name: logfwd-conf
name: logfwd-config
---
apiVersion: v1
kind: ConfigMap
metadata:
name: logfwd-conf
data:
config: |
[
{
"loggings": [
{
"logfiles": ["/var/log/log.log"],
"source": "log_fwd_demo",
"pipeline": "pod-logging-demo.p",
"multiline_match": "^\\d{4}-\\d{2}-\\d{2}",
"tags": {
"flag": "tag1"
}
}
]
}
]
Parameters explanation for logfwd-conf
:
- logfiles: List of log files.
- ignore: File path filter using glob rules; files matching any condition will not be collected.
- source: Data source.
- service: Additional tag, defaults to
$source
if empty. - pipeline: Script path when using Pipeline.
- character_encoding: Character encoding selection.
- multiline_match: Multi-line matching.
- remove_ansi_escape_codes: Whether to remove ANSI escape codes (e.g., text color in standard output); values are true or false.
- tags: Define tags in key-value pairs; optional.
Environment variable explanations:
- LOGFWD_DATAKIT_HOST: DataKit address.
- LOGFWD_DATAKIT_PORT: Logfwd port.
5 View Logs¶
Log in to Guance - "Logs", search for log_fwd_demo
.
Solution Two¶
DataKit collects logs output to Stdout by default. To handle log formats specially, Annotations are often added to the Deployment controller's YAML file when deploying Pods.
Below is an example of log collection for a Springboot microservice project, with the jar package being log-springboot-demo-1.0-SNAPSHOT.jar
, and logs using Logback. The specific steps are as follows:
1 Write logback-spring.xml
¶
logback-spring.xml
<?xml version="1.0" encoding="UTF-8"?>
<configuration scan="true" scanPeriod="60 seconds" debug="false">
<contextName>logback</contextName>
<!-- Log root directory - -
<property name="log.root.dir" value="./logs"/>
<!-- Log output format - -
<property name="log.pattern" value="%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{20} - [%method,%line] - - %msg%n" />
<!-- Print logs to console - -
<appender name="Console" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>${log.pattern}</pattern>
</encoder>
</appender>
<root level="INFO">
<appender-ref ref="Console"/>
</root>
</configuration>
2 Build Image¶
Dockerfile as follows:
FROM openjdk:8u292
RUN /bin/cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
RUN echo 'Asia/Shanghai' >/etc/timezone
ENV jar log-springboot-demo-1.0-SNAPSHOT.jar
ENV workdir /data/app/
RUN mkdir -p ${workdir}
WORKDIR ${workdir}
ENTRYPOINT ["sh", "-ec", "exec java ${JAVA_OPTS} -jar ${jar} "]
Build the image and push it to the Harbor repository:
3 Write pod-log-service.yaml
File¶
pod-log-service.yaml
apiVersion: v1
kind: Service
metadata:
name: log-demo-service
labels:
app: log-demo-service
spec:
selector:
app: log-demo-service
ports:
- protocol: TCP
port: 8090
nodePort: 30053
targetPort: 8090
type: NodePort
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: log-demo-service
labels:
app: log-demo-service
spec:
replicas: 1
selector:
matchLabels:
app: log-demo-service
template:
metadata:
labels:
app: log-demo-service
annotations:
datakit/logs: |
[
{
"source": "pod-logging-testing-demo",
"service": "pod-logging-testing-demo",
"pipeline": "pod-logging-demo.p",
"multiline_match": "^\\d{4}-\\d{2}-\\d{2}"
}
]
spec:
containers:
- env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
name: log-service
image: <your-harbor>/log-demo-service:v1
ports:
- containerPort: 8090
protocol: TCP
restartPolicy: Always
volumes:
- name: ddagent
emptyDir: {}
Annotations parameters explanation:
- source: Data source
- service: Tag marker
- pipeline: Pipeline script path
- ignore_status:
- multiline_match: Regular expression to match a line of log; e.g., starting with a date (like 2021-11-26) indicates a new log line, while subsequent lines not starting with that date are considered part of the previous log entry.
- remove_ansi_escape_codes: Whether to remove ANSI escape codes (e.g., text color in standard output).
4 Configure Pipeline¶
Add the pod-logging-demo.p
section to the ConfigMap resource in the datakit-default.yaml
file.
apiVersion: v1
kind: ConfigMap
metadata:
name: datakit-conf
namespace: datakit
data:
pod-logging-demo.p: |-
# Log format
#2021-12-01 10:41:06.015 [http-nio-8090-exec-2] INFO c.s.d.c.HealthController - [getPing,19] - - Call ping interface
grok(_, "%{TIMESTAMP_ISO8601:time} %{NOTSPACE:thread_name} %{LOGLEVEL:status}%{SPACE}%{NOTSPACE:class_name} - \\[%{NOTSPACE:method_name},%{NUMBER:line}\\] - - %{GREEDYDATA:msg}")
default_time(time)
Mount Pod-logging-demo.p
to DataKit:
- mountPath: /usr/local/datakit/pipeline/pod-logging-demo.p
name: datakit-conf
subPath: pod-logging-demo.p
5 View Logs¶
Execute the following command to deploy the Pod:
Access the microservice:
Log in to Guance "Logs" module, input log-demo-service
, and successfully view the logs.
Solution Three¶
Mount Volume to Pod using hostPath type, mounting log files to the host machine. Deploy DataKit using DaemonSet, also mounting hostPath type Volume, allowing DataKit to collect logs from within the Pod.