Container Logs in Kubernetes

2022-02-06 kubernetes

Note

This article only talks about the kubelet + CRI based container runtimes. Pre-CRI based implementations (i.e., Docker) are slightly different.

Logging & Kubernetes

Logs can provide helpful insights into an application. Especially when troubleshooting bugs, logs can help us understand the why? In Kubernetes, logs from containers are being handled by the container-runtime (e.g. containerd, cri-o, etc.) & the kubelet, as long as containers write logs to stdout or stderr. By default, logs can be fetched using kubectl -n <namespace> logs <pod> -c <container>. Though, it is not always an option, nor is it especially convenient. There are several situations where Kubernetes will move pods across nodes, which causes the kubelet on the old node to delete the logs. To avoid losing logs and make searching logs easier, they can be shipped to a central location.

Kubelet & container-runtime

The Kubelet communicates with the container runtime via the Container Runtime Interface. In short: It’s an API between the container runtime & Kubernetes to manage the lifecycle of a pod and its containers. Whenever you create a new pod in Kubernetes, and it gets scheduled, the kubelet will call the container runtime to start the container via the CRI.

When starting a new pod on a node, the kubelet creates a directory + logfile for every container of the pod (/var/log/pods/<namespace>_<pod>_<pod_uid>/<container>/<restart-count>.log). Everytime a container restarts, a new logfile will be created. i.e. As the log file includes the restart count, everytime a container restarts, it gets a new log file. i.e.:

0 restart(s) -> /var/log/pods/<namespace>_<pod>_<pod_uid>/<container>/0.log
1 restart(s) -> /var/log/pods/<namespace>_<pod>_<pod_uid>/<container>/1.log

Only the current & previous log files are kept. For a container that had 2 restarts, only 1.log(previous) & 2.log(current) will exist.

The kubelet passes the log path to the container-runtime in 2 API calls:

RunPodSandbox
- Absolute path to the directory for the pod logs(/var/log/pods/<namespace>_<pod>_<pod_uid>/) gets passed via RunPodSandboxRequest.config.log_directory
CreateContainer
- The relative path(<container>/<restart-count>.log) for the container gets passed via CreateContainerRequest.config.log_path

The container runtime will then pipe all stdout & stderr logs of the container to the specified log file.

Log rotation

With CRI implementations, unlike with Docker, the kubelet is responsible for rotating log files. The Log rotation routine will be invoked every 10s & can be configured using 2 flags on the kubelet:

--container-log-max-files Set the maximum number of container log files that can be present for a container. The number must be >= 2. This flag can only be used with --container-runtime=remote.
--container-log-max-size  Set the maximum size (e.g. 10Mi) of the container log file before it is rotated. This flag can only be used with --container-runtime=remote.

The corresponding fields in the kubelet config: .containerLogMaxSize & .containerLogMaxFiles. Details on the structure of the kubelet config can be found in the upstream documentation.

The <restart-count>.log file will be rotated once it exceeds --container-log-max-size. For the rotation, the current log file will be renamed to <restart-count>.log.<timestamp>. After renaming the log file, the kubelet calls ReopenContainerLog on the container runtime, which makes the runtime create a new log file (The filename is taken from the initial CreateContainer request) where new logs will be forwarded to. The previously rotated files with the <timestamp> suffix will be gzip-compressed on the next rotation. On the filesystem, this will look similar:

[root@node-0 counter]# du -sh /var/log/pods/default_counter_f9d6dec4-6ea1-446f-b29d-7de7f292f944/counter/*
1.5M   /var/log/pods/default_counter_f9d6dec4-6ea1-446f-b29d-7de7f292f944/counter/0.log
108K   /var/log/pods/default_counter_f9d6dec4-6ea1-446f-b29d-7de7f292f944/counter/0.log.20220130-200217.gz
112K   /var/log/pods/default_counter_f9d6dec4-6ea1-446f-b29d-7de7f292f944/counter/0.log.20220130-200327.gz
112K   /var/log/pods/default_counter_f9d6dec4-6ea1-446f-b29d-7de7f292f944/counter/0.log.20220130-200437.gz
11M    /var/log/pods/default_counter_f9d6dec4-6ea1-446f-b29d-7de7f292f944/counter/0.log.20220130-200548

--container-log-max-files includes the current + not yet rotated file, thus it must be >= 2.

Once the <restart-count>.log file got rotated, those logs cannot be fetched anymore using kubectl logs

Log format

Container logs that are forwarded by the container-runtime to /var/log/pods/<namespace>_<pod>_<pod_uid>/<container>/0.log are written in a special CRI format:

# Format: <timestamp> <stream> <tag> <container log message>
# 
# timestamp: RFC3339 with nanoseconds. i.e. 2022-01-30T20:30:58.395515030+01:00
# stream   : Originating stream. stdout or stderr
# tag      : "F" for full, a full message. P" for a partial log message. Messages that exceed the maximal length or don't end with a newline are treated as partial.

# Example
2022-01-30T20:36:58.439215654+01:00 stdout F foo

Metrics

The kubelet provides a gauge for the usage of container logs: kubelet_container_log_filesystem_used_bytes{uid,namespace,pod,container}. The metric is exposed via a collector, avoiding keeping around metrics for removed containers.

kubelet_container_log_filesystem_used_bytes{container="counter",namespace="default",pod="counter",uid="eeef7958-380a-40a7-9cfd-3d137a2fa755"} 5.742592e+06

Shipping logs

Shipping logs to a central location solves two issues:

Log messages are not lost when the container-runtime rotates a container’s log file
Access to logs can be accessed in a central location, potentially with a system that offers a DSL for querying logs.

A common way to ship logs is using a log shipper. The log shipper is an agent running on every node, which “ships” logs to a defined destination. An additional feature most log shippers provide is adding metadata to logs. In Kubernetes, this is often the metadata of the pod that was running a container. Common solutions are Fluentd, Fluent Bit & Filebeat.

Example using filebeat

filebeat.autodiscover:
  providers:
    # Watch kubernetes pods
    - type: kubernetes
      # Filter by node the agent is running on
      node: ${NODE_NAME}
      # Allow hints from pod annotations
      hints.enabled: true
      hints.default_config:
        # Create a container input for every container/pod
        type: container
        paths:
          - "/var/log/pods/${data.kubernetes.namespace}_${data.kubernetes.pod.name}_${data.kubernetes.pod.uid}/${data.kubernetes.container.name}/*.log"

The config will configure filebeat to monitor the container logs from pods running on the same node as the filebeat agent. Kubernetes metadata(pod, node, namespace) will be attached to every log event.

The hints settings enable controlling filebeat via pod annotations. i.e., you could annotate a pod with co.elastic.logs/enabled: "false", which will disable log shipping for the annotated pod.

The type: container config instructs filebeat to use the container input, which can parse logs in the CRI format.