KubernetesのactiveDeadlineSecondsはJobとPodでちょっと違う

まとめ

KubernetesのPodとJobにそれぞれ稼働時間の上限activeDeadlineSecondsが指定できるけど, ちょっと挙動が違うので調べてみた.

種類	`activeDeadlineSeconds`超過時の挙動
`Pod`	`kubelet`によりコンテナがKILL(停止)される `Kubernetes API`上には`phase:Failed`の`Pod`が残る `node`上には停止したコンテナが残る `Pod`とコンテナはガベージコレクションのタイミングで削除される
`Job`	`Job Controller`により`phase:Active`の`Pod`が明示的に削除される (それ以外のphaseは残る) `Kubernetes API`上には`Job`が残り, `Pod`は削除される `Pod`が削除されるため, `node`上のコンテナも削除される

環境

macOS Big Sur 11.2.3
Docker Desktop for Mac
- Version 3.4.0
- Docker Engine Version 20.10.7
- Docker Compose Version 1.29.2
kind
- v0.11.0 go1.16.4 darwin/amd64
Kubernetes
- v1.21.1

$ kubectl explain pod.spec.activeDeadlineSeconds
KIND:     Pod
VERSION:  v1

FIELD:    activeDeadlineSeconds <integer>

DESCRIPTION:
     Optional duration in seconds the pod may be active on the node relative to
     StartTime before the system will actively try to mark it failed and kill
     associated containers. Value must be a positive integer.

(意訳) Podが開始してからactiveDeadlineSecondsより長く稼働してるとシステムがPodを失敗扱いにしてコンテナをKILLするよ

ということで適当なPodを作成して確認してみる.

# Pod作成
$ kubectl apply -f pod-01.yaml

# Podが開始した時刻(startTime)の20秒後に削除されている(containerStatuses[].state.terminated.finishedAt)
# プロセスのexitCodeが137(シグナルを受けて終了)となっているので, 
# おそらくkubeletからコンテナランタイム(containerd)経由でSIGTERMが送られている.
$ kubectl get pod -o yaml pod-01 | yq -C r - "status"
containerStatuses:
  - containerID: containerd://f32f5d2995838397faaf0d2c4f312d941c1047001cb85e968fb4685cc75b5bda
    image: docker.io/library/busybox:latest
    imageID: docker.io/library/busybox@sha256:930490f97e5b921535c153e0e7110d251134cc4b72bbb8133c6a5065cc68580d
    lastState: {}
    name: main-container
    ready: false
    restartCount: 0
    state:
      terminated:
        containerID: containerd://f32f5d2995838397faaf0d2c4f312d941c1047001cb85e968fb4685cc75b5bda
        exitCode: 137
        finishedAt: "2021-06-11T13:32:19Z"
        reason: Error
        startedAt: "2021-06-11T13:31:31Z"
message: Pod was active on the node longer than the specified deadline
phase: Failed
podIP: 10.244.0.36
podIPs:
  - ip: 10.244.0.36
qosClass: Guaranteed
reason: DeadlineExceeded
startTime: "2021-06-11T13:31:29Z"

# ログ(途中まで)は残っている
$ kubectl logs pod-01 --timestamps
2021-06-11T13:31:41.951165516Z 1
2021-06-11T13:31:51.933749153Z 2
2021-06-11T13:32:01.934770631Z 3
2021-06-11T13:32:11.936129723Z 4

起動した時刻(startTime)からactiveDeadlineSecondsで設定した秒数以上経過したあと(containerStatuses[].state.terminated.finishedAt)にPodがphase:Failedになっている.
(たぶん同期のタイミングとかで少し遅れている)

試しにnode上のコンテナの状態も確認してみる.

# kindなのでnodeもdockerコンテナとして動いている
$ docker container ls
CONTAINER ID   IMAGE                  COMMAND                  CREATED        STATUS        PORTS                       NAMES
05f0d805c4dc   kindest/node:v1.21.1   "/usr/local/bin/entr…"   25 hours ago   Up 25 hours   127.0.0.1:53211->6443/tcp   kind-control-plane

# nodeの中のコンテナをcontainerStatuses[].containerIDで探してみると確かにコンテナが残っている(プロセスは終了している)
$ docker container exec 05f0d805c4dc ctr --namespace k8s.io container ls | grep f32f5d2995838397faaf0d2c4f312d941c1047001cb85e968fb4685cc75b5bda
f32f5d2995838397faaf0d2c4f312d941c1047001cb85e968fb4685cc75b5bda    docker.io/library/busybox:latest                    io.containerd.runc.v2

以上より,
Podの稼働時間がactiveDeadlineSecondsを超えてしまった場合はPodの情報がKubernetes APIに残り,
nodeにもコンテナの情報が残ることがわかった(プロセスはシグナルを受けて異常終了する).

JobのactiveDeadlineSeconds

JobにもactiveDeadlineSecondsを設定できる. 説明は以下の通り.

# JobのactiveDeadlineSecondsの説明
$ kubectl explain job.spec.activeDeadlineSeconds
KIND:     Job
VERSION:  batch/v1

FIELD:    activeDeadlineSeconds <integer>

DESCRIPTION:
     Specifies the duration in seconds relative to the startTime that the job
     may be continuously active before the system tries to terminate it; value
     must be positive integer. If a Job is suspended (at creation or through an
     update), this timer will effectively be stopped and reset when the Job is
     resumed again.

(意訳) Jobが開始してからactiveDeadlineSecondsより長く稼働してるとシステムがJobを消すよ, アップデート(nodeの更新?)等でJobが中断した場合は再度稼働時間を数え直してくれるよ

Podのときと同様にJobを作成して動作を確認してみる.
コンテナの中身はだいたいおなじ.

# Job作成
$ kubectl apply -f job-01.yaml

# DeadlineExceededによりJobがFailedになっている
$ kubectl get job -o yaml job-01 | yq -C r - "status"
conditions:
  - lastProbeTime: "2021-06-11T14:02:17Z"
    lastTransitionTime: "2021-06-11T14:02:17Z"
    message: Job was active longer than specified deadline
    reason: DeadlineExceeded
    status: "True"
    type: Failed
failed: 1
startTime: "2021-06-11T14:01:57Z"

# Podが見つからない...
$ kubectl get pods -l job-name=job-01
No resources found in default namespace.
$ kubectl get pods
No resources found in default namespace.

# Eventを確認するとPod(job-01-xntdw)が明示的に削除されている
$ kubectl describe job job-01
...(省略)
Events:
  Type     Reason            Age    From            Message
  ----     ------            ----   ----            -------
  Normal   SuccessfulCreate  7m44s  job-controller  Created pod: job-01-xntdw
  Normal   SuccessfulDelete  7m24s  job-controller  Deleted pod: job-01-xntdw
  Warning  DeadlineExceeded  7m24s  job-controller  Job was active longer than specified deadline

# 念の為nodeをimage名で探しても見つからない...
# (kindなのでnodeもdockerコンテナとして動いている)
$ docker container exec 05f0d805c4dc ctr --namespace k8s.io container ls | grep -c busybox
0

今度はJobは残っているもののPodが消えてしまった…
Jobの稼働時間がactiveDeadlineSecondsを超えた場合はPodがKubernetes APIから削除され,
node上のコンテナも削除されてしまうらしい.

ちなみにPodとJob両方にactiveDeadlineSecondsを設定した場合はどうなるか?
まずはPodのactiveDeadlineSeconds < JobのactiveDeadlineSecondsの条件で試してみる.

# Job作成
$ kubectl apply -f job-02.yaml

# 今度はDeadlineExceededでなくBackoffLimitExceededによりJobがFailedになっている
$ kubectl get job -o yaml job-02 | yq -C r - "status"
conditions:
  - lastProbeTime: "2021-06-11T14:30:25Z"
    lastTransitionTime: "2021-06-11T14:30:25Z"
    message: Job has reached the specified backoff limit
    reason: BackoffLimitExceeded
    status: "True"
    type: Failed
failed: 1
startTime: "2021-06-11T14:30:05Z"

# Podが残っている
$ kubectl get pods -l job-name=job-02
NAME           READY   STATUS   RESTARTS   AGE
job-02-d2w6v   0/1     Error    0          2m16s

# PodのactiveDeadlineSeconds超過時の挙動をしている
$ kubectl get pod -o yaml job-02-d2w6v | yq -C r - "status"
containerStatuses:
  - containerID: containerd://607cdbd019d8a1dc97473f6e62960ec68c0e70575b9c8dc7ac1fdfa18dab1dbd
    image: docker.io/library/busybox:latest
    imageID: docker.io/library/busybox@sha256:930490f97e5b921535c153e0e7110d251134cc4b72bbb8133c6a5065cc68580d
    lastState: {}
    name: main-container
    ready: false
    restartCount: 0
    state:
      terminated:
        containerID: containerd://607cdbd019d8a1dc97473f6e62960ec68c0e70575b9c8dc7ac1fdfa18dab1dbd
        exitCode: 137
        finishedAt: "2021-06-11T14:30:55Z"
        reason: Error
        startedAt: "2021-06-11T14:30:08Z"
message: Pod was active on the node longer than the specified deadline
phase: Failed
podIP: 10.244.0.39
podIPs:
  - ip: 10.244.0.39
qosClass: Guaranteed
reason: DeadlineExceeded
startTime: "2021-06-11T14:30:05Z"

# ログが残っている
$ kubectl logs job-02-d2w6v --timestamps
2021-06-11T14:30:18.567995219Z 1
2021-06-11T14:30:28.568766279Z 2
2021-06-11T14:30:38.570408867Z 3
2021-06-11T14:30:48.548898562Z 4

# nodeにコンテナも残っている
# (kindなのでnodeもdockerコンテナとして動いている)
$ docker container exec 05f0d805c4dc ctr --namespace k8s.io container ls | grep 607cdbd019d8a1dc97473f6e62960ec68c0e70575b9c8dc7ac1fdfa18dab1dbd
607cdbd019d8a1dc97473f6e62960ec68c0e70575b9c8dc7ac1fdfa18dab1dbd    docker.io/library/busybox:latest                    io.containerd.runc.v2

おそらくPodのactiveDeadlineSecondsのほうが短いため, JobのactiveDeadlineSecondsを超える前にPodがphase:Failedになったように見える.

次にPodのactiveDeadlineSeconds > JobのactiveDeadlineSecondsの条件で試してみる.

# Job作成
$ kubectl apply -f job-03.yaml

# 今度はDeadlineExceededでJobがFailedになっている
$ kubectl get job -o yaml job-03 | yq -C r - "status"
conditions:
  - lastProbeTime: "2021-06-11T14:47:00Z"
    lastTransitionTime: "2021-06-11T14:47:00Z"
    message: Job was active longer than specified deadline
    reason: DeadlineExceeded
    status: "True"
    type: Failed
failed: 1
startTime: "2021-06-11T14:46:40Z"

# Podが残っていない
$ kubectl get pods -l job-name=job-03
No resources found in default namespace.

# Podが明示的に削除されている
$ kubectl describe job job-03
...(省略)
Events:
  Type     Reason            Age   From            Message
  ----     ------            ----  ----            -------
  Normal   SuccessfulCreate  83s   job-controller  Created pod: job-03-h66hc
  Normal   SuccessfulDelete  63s   job-controller  Deleted pod: job-03-h66hc
  Warning  DeadlineExceeded  63s   job-controller  Job was active longer than specified deadline

# nodeを探してもコンテナが見つからない
# (kindなのでnodeもdockerコンテナとして動いている)
$ docker container exec 05f0d805c4dc ctr --namespace k8s.io container ls | grep -c busybox
0

今度はJobのactiveDeadlineSecondsのほうが短いため, その時点でphase:ActiveだったPodとコンテナが削除されてしまった.

詳しく見てみる

これらの挙動の違いはおそらく各オブジェクトを管理しているコンポーネントの違いから来ている.

Pod

Podはkubeletによって管理されていて,
activeDeadlineSecondsを超過している場合はactiveDeadlineHandler.ShouldEvict()¹とKubelet.generateAPIPodStatus()²によってKubernetes API上のPodがphase:Failedにされ, Kubelet.KillPod()³によってコンテナがKILLされるみたい.
また, 公式Docsにも次のように書かれているあたりkubeletがKubernetes API上からPod情報を削除するような処理はほぼないと思われる.

失敗したPodは人間またはcontrollerが明示的に削除するまで存在します。⁴

(詳しく調べたところkubeletがKubernetes API上のPod情報を消してそうな部分が一つだけ見つかった⁵が, これはStatic Podを扱う場合にしか呼び出されないため, 通常のPodには関係ないはず.)

このため, PodがactiveDeadlineSecondsを超過したときの挙動は通常のPodが異常終了してFailedとなる場合と変わらず,
やがてはガベージコレクションで削除される.

activeDeadlineSecondsを超過してもPodとコンテナが残っていたのはこのためだと思う.

Job

じゃあJobはどうなのかというと, こちらはkubeletではなくJob Controllerによって管理されている.
JobがactiveDeadlineSecondsを超過した場合はController.syncJob()⁶とRealPodControl.DeletePod()⁷によってKubernetes APIを呼び出してその時点でPhase: ActiveなPodを明示的に削除するみたい.

したがって, backoffLimitの設定によりJobのPodが失敗しても再度作成されるような場合は最後の1つだけがAPI上から削除される.

# Job作成
$ kubectl apply -f job-04.yaml

# 今度はDeadlineExceededによりJobがFailedになっている
$ kubectl get job -o yaml job-04 | yq -C r - "status"
conditions:
  - lastProbeTime: "2021-06-12T00:35:57Z"
    lastTransitionTime: "2021-06-12T00:35:57Z"
    message: Job was active longer than specified deadline
    reason: DeadlineExceeded
    status: "True"
    type: Failed
failed: 1
startTime: "2021-06-12T00:35:07Z"

# Podが1つ(自身のactiveDeadlineSecondsによりphase:Failedとなったもの)だけ残っている
$ kubectl get pods -l job-name=job-04
NAME           READY   STATUS   RESTARTS   AGE
job-04-vsw8x   0/1     Error    0          2m43s

# JobのactiveDeadlineSeconds超過時にActiveだったPod(job-04-5bjnz)が明示的に削除されている
$ kubectl describe job job-04
...(省略)
Events:
  Type     Reason            Age                  From            Message
  ----     ------            ----                 ----            -------
  Normal   SuccessfulCreate  2m59s                job-controller  Created pod: job-04-vsw8x
  Normal   SuccessfulCreate  2m19s                job-controller  Created pod: job-04-5bjnz
  Normal   SuccessfulDelete  2m9s                 job-controller  Deleted pod: job-04-5bjnz
  Warning  DeadlineExceeded  2m9s (x2 over 2m9s)  job-controller  Job was active longer than specified deadline

# nodeを探してもコンテナは1つだけ
# (kindなのでnodeもdockerコンテナとして動いている)
$ docker container exec 05f0d805c4dc ctr --namespace k8s.io container ls | grep -c busybox
1

おわり

PodとJobのactiveDeadlineSecondsの違いを確認した.

個人的にはPodが消えちゃうと後から処理に時間がかかった理由がわからなくなっちゃうので,
単純なJobの場合はPodのactiveDeadlineSecondsを設定するほうが良い気がする.

コードを読むのは疲れる…

おまけ

不機嫌なねこ