TECHSTEP

ITインフラ関連の記事を公開してます。

Argo RolloutsのAnalysisTemplateを用いた自動ロールバックを試してみる

Argo Rolloutsは、Kubernetesで利用できるRolling Updateよりも高度なデプロイ・リリース方式を利用することができます。その中には「Progressive Delivery」という、デプロイ後に特定の分析を行い、デプロイの結果を評価するという方式も含まれています。

Argo RolloutsにはAnalysisTemplate AnalysisRunなど、分析に関するCRDが含まれており、この結果をもとに自動ロールバックを実行することができます。

今回はArgo Rolloutsで利用できる自動ロールバックを試してみました。Argo Rolloutsの概要については、前回の記事をご覧ください。

検証環境

今回の環境は以下の通りです。

Argo Rolloutsのデプロイ

まずはArgo Rolloutsを利用できるよう、Kubernetes環境にデプロイします。

# Argo Rolloutsのデプロイ
$ git clone https://github.com/argoproj/argo-rollouts.git
$ cd argo-rollouts/manifests/
$ kubectl create ns argo-rollouts
namespace/argo-rollouts created

$ kubectl get ns
NAME              STATUS   AGE
argo-rollouts     Active   5s
default           Active   63m
kube-node-lease   Active   63m
kube-public       Active   63m
kube-system       Active   63m

$ kubectl apply -n argo-rollouts -f install.yaml
customresourcedefinition.apiextensions.k8s.io/analysisruns.argoproj.io created
customresourcedefinition.apiextensions.k8s.io/analysistemplates.argoproj.io created
customresourcedefinition.apiextensions.k8s.io/clusteranalysistemplates.argoproj.io created
customresourcedefinition.apiextensions.k8s.io/experiments.argoproj.io created
customresourcedefinition.apiextensions.k8s.io/rollouts.argoproj.io created
serviceaccount/argo-rollouts created
role.rbac.authorization.k8s.io/argo-rollouts-role created
clusterrole.rbac.authorization.k8s.io/argo-rollouts-aggregate-to-admin created
clusterrole.rbac.authorization.k8s.io/argo-rollouts-aggregate-to-edit created
clusterrole.rbac.authorization.k8s.io/argo-rollouts-aggregate-to-view created
clusterrole.rbac.authorization.k8s.io/argo-rollouts-clusterrole created
rolebinding.rbac.authorization.k8s.io/argo-rollouts-role-binding created
clusterrolebinding.rbac.authorization.k8s.io/argo-rollouts-clusterrolebinding created
service/argo-rollouts-metrics created
deployment.apps/argo-rollouts created

# デプロイ後の確認
$ kubectl get all -n argo-rollouts
NAME                                 READY   STATUS    RESTARTS   AGE
pod/argo-rollouts-8454b64759-rhf47   1/1     Running   0          9s

NAME                            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/argo-rollouts-metrics   ClusterIP   10.100.217.123   <none>        8090/TCP   10s

NAME                            READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/argo-rollouts   1/1     1            1           9s

NAME                                       DESIRED   CURRENT   READY   AGE
replicaset.apps/argo-rollouts-8454b64759   1         1         1       9s

AnalysisTemplateを利用しない場合

ここから、実際にArgo Rolloutsを利用します。今回はBlue/Green Deploymentを利用したときの様子を見ていきます。またAnalysisTemplateを利用しない場合と、利用した場合を試し、AnalysisTemplateを利用することでどう変わるかを見ていきます。

まずはAnalysisTemplateを利用しない場合を見てみます。

Rolloutのデプロイ

今回は以下のようなRollout Service用のファイルを利用しました。Argo RolloutsでBlue/Green Deploymentを利用する場合、activeServiceというServiceを指定する必要があります。指定したServiceが存在しない場合、Rolloutが作成された後もPodが作成されません。

rollout-bg-test.yml

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: rollout-bg-test
spec:
  replicas: 2
  revisionHistoryLimit: 2
  selector:
    matchLabels:
      app: rollout-bg
  template:
    metadata:
      labels:
        app: rollout-bg
    spec:
      containers:
        - name: nginx-container
          image: nginx:latest
          ports:
          - containerPort: 80
  strategy:
    blueGreen:
      autoPromotionEnabled: true
      activeService: rollout-active-service

rollout-service.yml

apiVersion: v1
kind: Service
metadata:
  name: rollout-active-service
spec:
  ports:
  - port: 8080
    targetPort: 80
    protocol: TCP
  selector:
    app: rollout-bg

上記2つのリソースをデプロイします。

# Rolloutの作成
$ kubectl apply -f rollout-service.yml
service/rollout-active-service created

$ kubectl apply -f rollout-bg-test.yml
rollout.argoproj.io/rollout-bg-test created


# デプロイ後の確認
$ kubectl get svc
NAME                     TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
kubernetes               ClusterIP   10.100.0.1      <none>        443/TCP    21h
rollout-active-service   ClusterIP   10.100.121.64   <none>        8080/TCP   33s

$ kubectl get pods
NAME                               READY   STATUS    RESTARTS   AGE
rollout-bg-test-797d88cdd8-4ww9s   1/1     Running   0          14s
rollout-bg-test-797d88cdd8-stc74   1/1     Running   0          14s

$ kubectl get rollout
NAME              DESIRED   CURRENT   UP-TO-DATE   AVAILABLE
rollout-bg-test   2         2         2            2

$ kubectl argo rollouts get rollout rollout-bg-test
Name:            rollout-bg-test
Namespace:       default
Status:          ✔ Healthy
Strategy:        BlueGreen
Images:          nginx:latest (active)
Replicas:
  Desired:       2
  Current:       2
  Updated:       2
  Ready:         2
  Available:     2

NAME                                         KIND        STATUS     AGE  INFO
⟳ rollout-bg-test                            Rollout     ✔ Healthy  62s
└──# revision:1
   └──⧉ rollout-bg-test-797d88cdd8           ReplicaSet  ✔ Healthy  62s  active
      ├──□ rollout-bg-test-797d88cdd8-4ww9s  Pod         ✔ Running  62s  ready:1/1
      └──□ rollout-bg-test-797d88cdd8-stc74  Pod         ✔ Running  62s  ready:1/1

Rolloutのアップデート

次に、デプロイしたRolloutのイメージタグを変更し、アップデートされる様子を見ていきます。

# 別のウィンドウを開き、事前に実行する
$ kubectl argo rollouts get rollout rollout-bg-test -w

# イメージタグの変更
$ kubectl argo rollouts set image rollout-bg-test nginx-container=nginx:stable
rollout "rollout-bg-test" image updated


# アップデートの経過を確認する
## イメージタグ変更直後
Name:            rollout-bg-test
Namespace:       default
Status:          ✔ Healthy
Strategy:        BlueGreen
Images:          nginx:latest (active)
Replicas:
  Desired:       2
  Current:       2
  Updated:       2
  Ready:         2
  Available:     2

NAME                                         KIND        STATUS     AGE    INFO
⟳ rollout-bg-test                            Rollout     ✔ Healthy  2m49s
└──# revision:1
   └──⧉ rollout-bg-test-797d88cdd8           ReplicaSet  ✔ Healthy  2m49s  active
      ├──□ rollout-bg-test-797d88cdd8-4ww9s  Pod         ✔ Running  2m49s  ready:1/1
      └──□ rollout-bg-test-797d88cdd8-stc74  Pod         ✔ Running  2m49s  ready:1/1


## アップデート開始
Name:            rollout-bg-test
Namespace:       default
Status:          ✔ Healthy
Strategy:        BlueGreen
Images:          nginx:latest (active)
Replicas:
  Desired:       2
  Current:       2
  Updated:       2
  Ready:         2
  Available:     2

NAME                                         KIND        STATUS         AGE    INFO
⟳ rollout-bg-test                            Rollout     ✔ Healthy      2m50s
├──# revision:2
│  └──⧉ rollout-bg-test-5fd48d44d            ReplicaSet  ◌ Progressing  0s
└──# revision:1
   └──⧉ rollout-bg-test-797d88cdd8           ReplicaSet  ✔ Healthy      2m50s  active
      ├──□ rollout-bg-test-797d88cdd8-4ww9s  Pod         ✔ Running      2m50s  ready:1/1
      └──□ rollout-bg-test-797d88cdd8-stc74  Pod         ✔ Running      2m50s  ready:1/1


Name:            rollout-bg-test
Namespace:       default
Status:          ◌ Progressing
Strategy:        BlueGreen
Images:          nginx:latest (active)
                 nginx:stable
Replicas:
  Desired:       2
  Current:       4
  Updated:       2
  Ready:         2
  Available:     0

NAME                                         KIND        STATUS               AGE    INFO
⟳ rollout-bg-test                            Rollout     ◌ Progressing        2m50s
├──# revision:2
│  └──⧉ rollout-bg-test-5fd48d44d            ReplicaSet  ◌ Progressing        0s
│     ├──□ rollout-bg-test-5fd48d44d-89nnw   Pod         ◌ ContainerCreating  0s     ready:0/1
│     └──□ rollout-bg-test-5fd48d44d-s8ntr   Pod         ◌ ContainerCreating  0s     ready:0/1
└──# revision:1
   └──⧉ rollout-bg-test-797d88cdd8           ReplicaSet  ✔ Healthy            2m50s  active
      ├──□ rollout-bg-test-797d88cdd8-4ww9s  Pod         ✔ Running            2m50s  ready:1/1
      └──□ rollout-bg-test-797d88cdd8-stc74  Pod         ✔ Running            2m50s  ready:1/1


## 切り替え完了
## revision 2のReplicaSetがactiveとなり、revision 1は削除されるまでの時刻がカウントダウンされる 
Name:            rollout-bg-test
Namespace:       default
Status:          ◌ Progressing
Strategy:        BlueGreen
Images:          nginx:latest
                 nginx:stable (active)
Replicas:
  Desired:       2
  Current:       4
  Updated:       2
  Ready:         4
  Available:     2

NAME                                         KIND        STATUS         AGE    INFO
⟳ rollout-bg-test                            Rollout     ◌ Progressing  2m52s
├──# revision:2
│  └──⧉ rollout-bg-test-5fd48d44d            ReplicaSet  ✔ Healthy      1s     active
│     ├──□ rollout-bg-test-5fd48d44d-89nnw   Pod         ✔ Running      1s     ready:1/1
│     └──□ rollout-bg-test-5fd48d44d-s8ntr   Pod         ✔ Running      1s     ready:1/1
└──# revision:1
   └──⧉ rollout-bg-test-797d88cdd8           ReplicaSet  ✔ Healthy      2m52s  delay:30s
      ├──□ rollout-bg-test-797d88cdd8-4ww9s  Pod         ✔ Running      2m52s  ready:1/1
      └──□ rollout-bg-test-797d88cdd8-stc74  Pod         ✔ Running      2m52s  ready:1/1


## 完了後
Name:            rollout-bg-test
Namespace:       default
Status:          ✔ Healthy
Strategy:        BlueGreen
Images:          nginx:stable (active)
Replicas:
  Desired:       2
  Current:       2
  Updated:       2
  Ready:         2
  Available:     2

NAME                                        KIND        STATUS        AGE    INFO
⟳ rollout-bg-test                           Rollout     ✔ Healthy     6m7s
├──# revision:2
│  └──⧉ rollout-bg-test-5fd48d44d           ReplicaSet  ✔ Healthy     3m16s  active
│     ├──□ rollout-bg-test-5fd48d44d-89nnw  Pod         ✔ Running     3m16s  ready:1/1
│     └──□ rollout-bg-test-5fd48d44d-s8ntr  Pod         ✔ Running     3m16s  ready:1/1
└──# revision:1
   └──⧉ rollout-bg-test-797d88cdd8          ReplicaSet  • ScaledDown  6m7s

完了後のリソースは以下の通りです。

$ kubectl get rollout
NAME              DESIRED   CURRENT   UP-TO-DATE   AVAILABLE
rollout-bg-test   2         2         2            2

$ kubectl describe rollout
Name:         rollout-bg-test
Namespace:    default
Labels:       <none>
Annotations:  rollout.argoproj.io/revision: 2
API Version:  argoproj.io/v1alpha1
Kind:         Rollout
Metadata:
  Creation Timestamp:  2020-10-10T01:57:48Z
  Generation:          14
  Resource Version:    237409
  Self Link:           /apis/argoproj.io/v1alpha1/namespaces/default/rollouts/rollout-bg-test
  UID:                 d5ba96ef-2292-4f3c-91f1-cd0427c50f59
Spec:
  Replicas:                2
  Revision History Limit:  2
  Selector:
    Match Labels:
      App:  rollout-bg
  Strategy:
    Blue Green:
      Active Service:          rollout-active-service
      Auto Promotion Enabled:  true
  Template:
    Metadata:
      Creation Timestamp:  <nil>
      Labels:
        App:  rollout-bg
    Spec:
      Containers:
        Image:  nginx:stable
        Name:   nginx-container
        Ports:
          Container Port:  80
        Resources:
Status:
  HPA Replicas:        2
  Available Replicas:  2
  Blue Green:
    Active Selector:  5fd48d44d
  Canary:
  Conditions:
    Last Transition Time:  2020-10-10T01:57:52Z
    Last Update Time:      2020-10-10T01:57:52Z
    Message:               Rollout has minimum availability
    Reason:                AvailableReason
    Status:                True
    Type:                  Available
    Last Transition Time:  2020-10-10T01:57:48Z
    Last Update Time:      2020-10-10T02:00:41Z
    Message:               ReplicaSet "rollout-bg-test-5fd48d44d" has successfully progressed.
    Reason:                NewReplicaSetAvailable
    Status:                True
    Type:                  Progressing
  Current Pod Hash:        5fd48d44d
  Observed Generation:     576f58fbb8
  Ready Replicas:          2
  Replicas:                2
  Selector:                app=rollout-bg,rollouts-pod-template-hash=5fd48d44d
  Stable RS:               5fd48d44d
  Updated Replicas:        2
Events:
  Type    Reason             Age    From                 Message
  ----    ------             ----   ----                 -------
  Normal  ScalingReplicaSet  7m52s  rollouts-controller  Scaled up replica set rollout-bg-test-797d88cdd8 to 2
  Normal  SwitchService      7m48s  rollouts-controller  Switched selector for service 'rollout-active-service' to value '797d88cdd8'
  Normal  ScalingReplicaSet  5m1s   rollouts-controller  Scaled up replica set rollout-bg-test-5fd48d44d to 2
  Normal  SwitchService      4m59s  rollouts-controller  Switched selector for service 'rollout-active-service' to value '5fd48d44d'
  Normal  ScalingReplicaSet  4m29s  rollouts-controller  Scaled down replica set rollout-bg-test-797d88cdd8 to 0

AnalysisTemplateを利用する場合

ここからはRollout Serviceに加えAnalysisTemplateリソースを作成し、ロールアウト実行時にAnalysisが実行されるようにします。

今回は、Analysisに成功した場合・失敗した場合を見るために、実行後exit 0を返す(=Analysisに必ず成功する)ようなAnalysisTemplateを用意します。そして、デプロイ後にkubectl editコマンドによってAnalysisTemplateを編集し、実行後exit 1を返す(=Analysisに必ず失敗する)ようにして失敗した場合を見てみます。

AnalysisTemplateは、Rolloutリソース中で宣言をされると、Rolloutのアップデート時に実行されます。実行と書きましたが、実際はAnalysisRunという、分析を実行するためのCRDが作成され、AnalysisTemplateに定義された内容を元に分析を実行します。

AnalysisTemplateで実行する分析では様々な種類のメトリクスを利用することができます。今回はKubernetesリソースの1つであるJobを利用しました。AnalysisRunによって分析が実行されると、分析を行うためのJobが作成され、Jobが正常に終了すれば成功となります。

Argo RolloutsのBlue/Green Deploymentは、Analysisを実行するタイミングとしてprePromotionAnalysis postPromotionAnalysisのどちらかを利用することができ、それぞれ切り替え前・切り替え後に実行をすることができます。今回はpostPromotionAnalysisを利用し、切り替え後にAnalysisを実行して、失敗した場合は元のバージョンへの切り戻しを行い、古いバージョンへロールバックするようにします。

Rolloutのデプロイ

ここでは以下の3つのファイルを利用します。

rollout-bg-test-analysis.yml

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: rollout-bg-test-analysis
spec:
  replicas: 2
  revisionHistoryLimit: 2
  selector:
    matchLabels:
      app: rollout-bg-analysis
  template:
    metadata:
      labels:
        app: rollout-bg-analysis
    spec:
      containers:
        - name: nginx-container
          image: nginx:latest
          ports:
          - containerPort: 80
  strategy:
    blueGreen:
      activeService: rollout-active-service-analysis
      postPromotionAnalysis:
        templates:
        - templateName: test-analysis

rollout-active-service-analysis.yml

apiVersion: v1
kind: Service
metadata:
  name: rollout-active-service-analysis
spec:
  ports:
  - port: 8080
    targetPort: 80
    protocol: TCP
  selector:
    app: rollout-bg-analysis

test-analysistemp.yml

kind: AnalysisTemplate
apiVersion: argoproj.io/v1alpha1
metadata:
  name: test-analysis
spec:
  metrics:
  - name: test-analysis
    provider:
      job:
        spec:
          template:
            spec:
              containers:
              - name: sleep
                image: alpine:3.8
                command: [sh, -c]
                args: [exit 0]
              restartPolicy: Never
          backoffLimit: 1

上記3つのファイルをデプロイします。

# デプロイ
$ kubectl apply -f test-analysistemp.yml
analysistemplate.argoproj.io/test-analysis created

$ kubectl apply -f rollout-active-service-analysis.yml
service/rollout-active-service-analysis created

$ kubectl apply -f rollout-bg-test-analysis.yml
rollout.argoproj.io/rollout-bg-test-analysis created


# デプロイ後の確認
$ kubectl get analysistemplate
NAME            AGE
test-analysis   23s

$ kubectl get svc
NAME                              TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
kubernetes                        ClusterIP   10.100.0.1      <none>        443/TCP    21h
rollout-active-service-analysis   ClusterIP   10.100.255.31   <none>        8080/TCP   20s

$ kubectl get pods
NAME                                        READY   STATUS    RESTARTS   AGE
rollout-bg-test-analysis-766b7567dc-qgpzx   1/1     Running   0          15s
rollout-bg-test-analysis-766b7567dc-tzlft   1/1     Running   0          15s

$ kubectl get rollout
NAME                       DESIRED   CURRENT   UP-TO-DATE   AVAILABLE
rollout-bg-test-analysis   2         2         2            2

$ kubectl argo rollouts get rollout rollout-bg-test-analysis
Name:            rollout-bg-test-analysis
Namespace:       default
Status:          ✔ Healthy
Strategy:        BlueGreen
Images:          nginx:latest (active)
Replicas:
  Desired:       2
  Current:       2
  Updated:       2
  Ready:         2
  Available:     2

NAME                                                  KIND        STATUS     AGE  INFO
⟳ rollout-bg-test-analysis                            Rollout     ✔ Healthy  26s
└──# revision:1
   └──⧉ rollout-bg-test-analysis-766b7567dc           ReplicaSet  ✔ Healthy  26s  active
      ├──□ rollout-bg-test-analysis-766b7567dc-qgpzx  Pod         ✔ Running  26s  ready:1/1
      └──□ rollout-bg-test-analysis-766b7567dc-tzlft  Pod         ✔ Running  26s  ready:1/1

Rolloutのアップデート (Analysisに成功した場合)

まずはAnalysisに成功した場合を見てみます。先ほどと同様にイメージタグを更新してみます。

# 事前に別ターミナルで実行
$ kubectl argo rollouts get rollout rollout-bg-test-analysis --watch


# イメージタグの更新
$ kubectl argo rollouts set image rollout-bg-test-analysis nginx-container=nginx:stable
rollout "rollout-bg-test-analysis" image updated


# アップデートの経過を確認する
## 開始直後

Name:            rollout-bg-test-analysis
Namespace:       default
Status:          ✔ Healthy
Strategy:        BlueGreen
Images:          nginx:latest (active)
Replicas:
  Desired:       2
  Current:       2
  Updated:       2
  Ready:         2
  Available:     2

NAME                                                  KIND        STATUS         AGE   INFO
⟳ rollout-bg-test-analysis                            Rollout     ✔ Healthy      2m9s
├──# revision:2
│  └──⧉ rollout-bg-test-analysis-6bcfbc585f           ReplicaSet  ◌ Progressing  0s
└──# revision:1
   └──⧉ rollout-bg-test-analysis-766b7567dc           ReplicaSet  ✔ Healthy      2m9s  active
      ├──□ rollout-bg-test-analysis-766b7567dc-qgpzx  Pod         ✔ Running      2m9s  ready:1/1
      └──□ rollout-bg-test-analysis-766b7567dc-tzlft  Pod         ✔ Running      2m9s  ready:1/1


## 新規Podの作成
Name:            rollout-bg-test-analysis
Namespace:       default
Status:          ◌ Progressing
Strategy:        BlueGreen
Images:          nginx:latest (active)
                 nginx:stable
Replicas:
  Desired:       2
  Current:       4
  Updated:       2
  Ready:         3
  Available:     1

NAME                                                  KIND        STATUS         AGE    INFO
⟳ rollout-bg-test-analysis                            Rollout     ◌ Progressing  2m10s
├──# revision:2
│  └──⧉ rollout-bg-test-analysis-6bcfbc585f           ReplicaSet  ◌ Progressing  0s
│     ├──□ rollout-bg-test-analysis-6bcfbc585f-cq2q5  Pod         ✔ Running      0s     ready:1/1
│     └──□ rollout-bg-test-analysis-6bcfbc585f-dtxdc  Pod         ✔ Running      0s     ready:1/1
└──# revision:1
   └──⧉ rollout-bg-test-analysis-766b7567dc           ReplicaSet  ✔ Healthy      2m10s  active
      ├──□ rollout-bg-test-analysis-766b7567dc-qgpzx  Pod         ✔ Running      2m10s  ready:1/1
      └──□ rollout-bg-test-analysis-766b7567dc-tzlft  Pod         ✔ Running      2m10s  ready:1/1


## トラフィックの切り替え完了とAnalysisの開始
Name:            rollout-bg-test-analysis
Namespace:       default
Status:          ◌ Progressing
Strategy:        BlueGreen
Images:          nginx:latest
                 nginx:stable (active)
Replicas:
  Desired:       2
  Current:       4
  Updated:       2
  Ready:         4
  Available:     2

NAME                                                             KIND         STATUS         AGE    INFO
⟳ rollout-bg-test-analysis                                       Rollout      ◌ Progressing  2m10s
├──# revision:2
│  ├──⧉ rollout-bg-test-analysis-6bcfbc585f                      ReplicaSet   ✔ Healthy      0s     active
│  │  ├──□ rollout-bg-test-analysis-6bcfbc585f-cq2q5             Pod          ✔ Running      0s     ready:1/1
│  │  └──□ rollout-bg-test-analysis-6bcfbc585f-dtxdc             Pod          ✔ Running      0s     ready:1/1
│  └──α rollout-bg-test-analysis-6bcfbc585f-2                    AnalysisRun  ◌ Running      0s
│     └──⊞ 471f5e5b-553b-4f94-bae3-cefca88afcd6.test-analysis.1  Job          ◌ Running      0s
└──# revision:1
   └──⧉ rollout-bg-test-analysis-766b7567dc                      ReplicaSet   ✔ Healthy      2m10s  delay:30s
      ├──□ rollout-bg-test-analysis-766b7567dc-qgpzx             Pod          ✔ Running      2m10s  ready:1/1
      └──□ rollout-bg-test-analysis-766b7567dc-tzlft             Pod          ✔ Running      2m10s  ready:1/1


## Analysisの完了(成功)
## AnalysisRunの結果がSuccessfulとなっていることが確認できる
Name:            rollout-bg-test-analysis
Namespace:       default
Status:          ◌ Progressing
Strategy:        BlueGreen
Images:          nginx:latest
                 nginx:stable (active)
Replicas:
  Desired:       2
  Current:       4
  Updated:       2
  Ready:         4
  Available:     2

NAME                                                             KIND         STATUS         AGE    INFO
⟳ rollout-bg-test-analysis                                       Rollout      ◌ Progressing  2m11s
├──# revision:2
│  ├──⧉ rollout-bg-test-analysis-6bcfbc585f                      ReplicaSet   ✔ Healthy      1s     active
│  │  ├──□ rollout-bg-test-analysis-6bcfbc585f-cq2q5             Pod          ✔ Running      1s     ready:1/1
│  │  └──□ rollout-bg-test-analysis-6bcfbc585f-dtxdc             Pod          ✔ Running      1s     ready:1/1
│  └──α rollout-bg-test-analysis-6bcfbc585f-2                    AnalysisRun  ✔ Successful   0s     ✔ 1
│     └──⊞ 471f5e5b-553b-4f94-bae3-cefca88afcd6.test-analysis.1  Job          ✔ Successful   0s
└──# revision:1
   └──⧉ rollout-bg-test-analysis-766b7567dc                      ReplicaSet   ✔ Healthy      2m11s  delay:29s
      ├──□ rollout-bg-test-analysis-766b7567dc-qgpzx             Pod          ✔ Running      2m11s  ready:1/1
      └──□ rollout-bg-test-analysis-766b7567dc-tzlft             Pod          ✔ Running      2m11s  ready:1/1


## 完了後
Name:            rollout-bg-test-analysis
Namespace:       default
Status:          ✔ Healthy
Strategy:        BlueGreen
Images:          nginx:stable (active)
Replicas:
  Desired:       2
  Current:       2
  Updated:       2
  Ready:         2
  Available:     2

NAME                                                             KIND         STATUS        AGE    INFO
⟳ rollout-bg-test-analysis                                       Rollout      ✔ Healthy     2m45s
├──# revision:2
│  ├──⧉ rollout-bg-test-analysis-6bcfbc585f                      ReplicaSet   ✔ Healthy     35s    active
│  │  ├──□ rollout-bg-test-analysis-6bcfbc585f-cq2q5             Pod          ✔ Running     35s    ready:1/1
│  │  └──□ rollout-bg-test-analysis-6bcfbc585f-dtxdc             Pod          ✔ Running     35s    ready:1/1
│  └──α rollout-bg-test-analysis-6bcfbc585f-2                    AnalysisRun  ✔ Successful  34s    ✔ 1
│     └──⊞ 471f5e5b-553b-4f94-bae3-cefca88afcd6.test-analysis.1  Job          ✔ Successful  34s
└──# revision:1
   └──⧉ rollout-bg-test-analysis-766b7567dc                      ReplicaSet   • ScaledDown  2m45s

新しいバージョンのデプロイ後、ロールアウトが完了し、通信が切り替わった後にAnalysisを実行する様子、そしてAnalysisに成功した場合、そのまま新しいバージョンのほうに切り替わったまま、古いバージョンが削除される様子が確認できました。

なお、完了後のリソースは以下の通りです。ロールアウトが実行されることでAnalysisRunというリソースが作成・実行され、その結果を確認することができます。

$ kubectl get pods
NAME                                                         READY   STATUS      RESTARTS   AGE
471f5e5b-553b-4f94-bae3-cefca88afcd6.test-analysis.1-8hz2s   0/1     Completed   0          4m26s
rollout-bg-test-analysis-6bcfbc585f-cq2q5                    1/1     Running     0          4m27s
rollout-bg-test-analysis-6bcfbc585f-dtxdc                    1/1     Running     0          4m27s

$ kubectl get analysisrun
NAME                                    STATUS
rollout-bg-test-analysis-6bcfbc585f-2   Successful

$ kubectl get rollout
NAME                       DESIRED   CURRENT   UP-TO-DATE   AVAILABLE
rollout-bg-test-analysis   2         2         2            2

$ kubectl describe analysisrun rollout-bg-test-analysis-6bcfbc585f-2
Name:         rollout-bg-test-analysis-6bcfbc585f-2
Namespace:    default
Labels:       rollout-type=PostPromotion
              rollouts-pod-template-hash=6bcfbc585f
Annotations:  rollout.argoproj.io/revision: 2
API Version:  argoproj.io/v1alpha1
Kind:         AnalysisRun
Metadata:
  Creation Timestamp:  2020-10-10T02:27:13Z
  Generation:          3
  Owner References:
    API Version:           argoproj.io/v1alpha1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  Rollout
    Name:                  rollout-bg-test-analysis
    UID:                   e9ba7057-29db-491a-8618-13180c406fef
  Resource Version:        242399
  Self Link:               /apis/argoproj.io/v1alpha1/namespaces/default/analysisruns/rollout-bg-test-analysis-6bcfbc585f-2
  UID:                     471f5e5b-553b-4f94-bae3-cefca88afcd6
Spec:
  Metrics:
    Name:  test-analysis
    Provider:
      Job:
        Metadata:
          Creation Timestamp:  <nil>
        Spec:
          Backoff Limit:  1
          Template:
            Metadata:
              Creation Timestamp:  <nil>
            Spec:
              Containers:
                Args:
                  exit 0
                Command:
                  sh
                  -c
                Image:  alpine:3.8
                Name:   sleep
                Resources:
              Restart Policy:  Never
Status:
  Metric Results:
    Count:  1
    Measurements:
      Finished At:  2020-10-10T02:27:14Z
      Metadata:
        Job - Name:  471f5e5b-553b-4f94-bae3-cefca88afcd6.test-analysis.1
      Phase:         Successful
      Started At:    2020-10-10T02:27:13Z
    Name:            test-analysis
    Phase:           Successful
    Successful:      1
  Phase:             Successful
  Started At:        2020-10-10T02:27:13Z
Events:
  Type    Reason    Age   From                 Message
  ----    ------    ----  ----                 -------
  Normal  Complete  20m   rollouts-controller  metric 'test-analysis' completed Successful
  Normal  Complete  20m   rollouts-controller  analysis completed Successful


$ kubectl describe rollout rollout-bg-test-analysis
Name:         rollout-bg-test-analysis
Namespace:    default
Labels:       <none>
Annotations:  rollout.argoproj.io/revision: 2
API Version:  argoproj.io/v1alpha1
Kind:         Rollout
Metadata:
  Creation Timestamp:  2020-10-10T02:25:02Z
  Generation:          16
  Resource Version:    242503
  Self Link:           /apis/argoproj.io/v1alpha1/namespaces/default/rollouts/rollout-bg-test-analysis
  UID:                 e9ba7057-29db-491a-8618-13180c406fef
Spec:
  Replicas:                2
  Revision History Limit:  2
  Selector:
    Match Labels:
      App:  rollout-bg-analysis
  Strategy:
    Blue Green:
      Active Service:  rollout-active-service-analysis
      Post Promotion Analysis:
        Templates:
          Template Name:  test-analysis
  Template:
    Metadata:
      Creation Timestamp:  <nil>
      Labels:
        App:  rollout-bg-analysis
    Spec:
      Containers:
        Image:  nginx:stable
        Name:   nginx-container
        Ports:
          Container Port:  80
        Resources:
Status:
  HPA Replicas:        2
  Available Replicas:  2
  Blue Green:
    Active Selector:  6bcfbc585f
  Canary:
  Conditions:
    Last Transition Time:  2020-10-10T02:25:07Z
    Last Update Time:      2020-10-10T02:25:07Z
    Message:               Rollout has minimum availability
    Reason:                AvailableReason
    Status:                True
    Type:                  Available
    Last Transition Time:  2020-10-10T02:25:02Z
    Last Update Time:      2020-10-10T02:27:13Z
    Message:               ReplicaSet "rollout-bg-test-analysis-6bcfbc585f" has successfully progressed.
    Reason:                NewReplicaSetAvailable
    Status:                True
    Type:                  Progressing
  Current Pod Hash:        6bcfbc585f
  Observed Generation:     786976f646
  Ready Replicas:          2
  Replicas:                2
  Selector:                app=rollout-bg-analysis,rollouts-pod-template-hash=6bcfbc585f
  Stable RS:               6bcfbc585f
  Updated Replicas:        2
Events:
  Type    Reason                   Age    From                 Message
  ----    ------                   ----   ----                 -------
  Normal  ScalingReplicaSet        7m6s   rollouts-controller  Scaled up replica set rollout-bg-test-analysis-766b7567dc to 2
  Normal  SwitchService            7m1s   rollouts-controller  Switched selector for service 'rollout-active-service-analysis' to value '766b7567dc'
  Normal  ScalingReplicaSet        4m56s  rollouts-controller  Scaled up replica set rollout-bg-test-analysis-6bcfbc585f to 2
  Normal  SwitchService            4m55s  rollouts-controller  Switched selector for service 'rollout-active-service-analysis' to value '6bcfbc585f'
  Normal  AnalysisRunStatusChange  4m55s  rollouts-controller  PostPromotion Analysis Run 'rollout-bg-test-analysis-6bcfbc585f-2' Status New: '' Previous: 'NoPreviousStatus'
  Normal  AnalysisRunStatusChange  4m55s  rollouts-controller  PostPromotion Analysis Run 'rollout-bg-test-analysis-6bcfbc585f-2' Status New: 'Running' Previous: ''
  Normal  AnalysisRunStatusChange  4m54s  rollouts-controller  PostPromotion Analysis Run 'rollout-bg-test-analysis-6bcfbc585f-2' Status New: 'Successful' Previous: 'Running'
  Normal  ScalingReplicaSet        4m25s  rollouts-controller  Scaled down replica set rollout-bg-test-analysis-766b7567dc to 0

Rolloutのアップデート (Analysisに失敗した場合)

次にAnalysisに失敗した場合を見てみます。

まずはデプロイ済みのAnalysisTemplateを一部編集し、Analysis実行時に失敗するようにします。

$ kubectl edit analysistemplate test-analysis
analysistemplate.argoproj.io/test-analysis edited

# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"argoproj.io/v1alpha1","kind":"AnalysisTemplate","metadata":{"annotations":{},"name":"test-analysis","namespace":"default"},"spec":{"metrics":[{"name":"test-analysis","provider":{"job":{"spec":{"backoffLimit":1,"template":{"spec":{"containers":[{"args":["exit 0"],"command":["sh","-c"],"image":"alpine:3.8","name":"sleep"}],"restartPolicy":"Never"}}}}}}]}}
  creationTimestamp: "2020-10-10T02:24:45Z"
  generation: 1
  name: test-analysis
  namespace: default
  resourceVersion: "241845"
  selfLink: /apis/argoproj.io/v1alpha1/namespaces/default/analysistemplates/test-analysis
  uid: 2fc98c6c-f40e-4903-a134-a6430aaa0b0d
spec:
  metrics:
  - name: test-analysis
    provider:
      job:
        spec:
          backoffLimit: 1
          template:
            spec:
              containers:
              - args:
                - exit 1 # 変更
                command:
                - sh
                - -c
                image: alpine:3.8
                name: sleep
              restartPolicy: Never

編集が完了したら、先ほどと同様にイメージタグの更新を行い、アップデートの様子を確認します。

# 事前に別のターミナルで実行しておく
$ kubectl argo rollouts get rollout rollout-bg-test-analysis --watch


# イメージタグの更新
$ kubectl argo rollouts set image rollout-bg-test-analysis nginx-container=nginx:1.19
rollout "rollout-bg-test-analysis" image updated


# アップデートの様子を確認する
## アップデート直後
Name:            rollout-bg-test-analysis
Namespace:       default
Status:          ✔ Healthy
Strategy:        BlueGreen
Images:          nginx:stable (active)
Replicas:
  Desired:       2
  Current:       2
  Updated:       2
  Ready:         2
  Available:     2

NAME                                                             KIND         STATUS        AGE    INFO
⟳ rollout-bg-test-analysis                                       Rollout      ✔ Healthy     10m
├──# revision:2
│  ├──⧉ rollout-bg-test-analysis-6bcfbc585f                      ReplicaSet   ✔ Healthy     7m53s  active
│  │  ├──□ rollout-bg-test-analysis-6bcfbc585f-cq2q5             Pod          ✔ Running     7m53s  ready:1/1
│  │  └──□ rollout-bg-test-analysis-6bcfbc585f-dtxdc             Pod          ✔ Running     7m53s  ready:1/1
│  └──α rollout-bg-test-analysis-6bcfbc585f-2                    AnalysisRun  ✔ Successful  7m52s  ✔ 1
│     └──⊞ 471f5e5b-553b-4f94-bae3-cefca88afcd6.test-analysis.1  Job          ✔ Successful  7m52s
└──# revision:1
   └──⧉ rollout-bg-test-analysis-766b7567dc                      ReplicaSet   • ScaledDown  10m


## アップデート開始
Name:            rollout-bg-test-analysis
Namespace:       default
Status:          ◌ Progressing
Strategy:        BlueGreen
Images:          nginx:stable (active)
Replicas:
  Desired:       2
  Current:       2
  Updated:       0
  Ready:         2
  Available:     0

NAME                                                             KIND         STATUS               AGE    INFO
⟳ rollout-bg-test-analysis                                       Rollout      ◌ Progressing        10m
├──# revision:3
│  └──⧉ rollout-bg-test-analysis-6b7c8784cc                      ReplicaSet   ◌ Progressing        0s
│     ├──□ rollout-bg-test-analysis-6b7c8784cc-b29n5             Pod          ◌ Pending            0s     ready:0/1
│     └──□ rollout-bg-test-analysis-6b7c8784cc-ztnjn             Pod          ◌ ContainerCreating  0s     ready:0/1
├──# revision:2
│  ├──⧉ rollout-bg-test-analysis-6bcfbc585f                      ReplicaSet   ✔ Healthy            7m56s  active
│  │  ├──□ rollout-bg-test-analysis-6bcfbc585f-cq2q5             Pod          ✔ Running            7m56s  ready:1/1
│  │  └──□ rollout-bg-test-analysis-6bcfbc585f-dtxdc             Pod          ✔ Running            7m56s  ready:1/1
│  └──α rollout-bg-test-analysis-6bcfbc585f-2                    AnalysisRun  ✔ Successful         7m55s  ✔ 1
│     └──⊞ 471f5e5b-553b-4f94-bae3-cefca88afcd6.test-analysis.1  Job          ✔ Successful         7m55s
└──# revision:1
   └──⧉ rollout-bg-test-analysis-766b7567dc                      ReplicaSet   • ScaledDown         10m


## 切り替えの完了とAnalysisの開始
## revision 3のReplicaSetがactiveになっている
Name:            rollout-bg-test-analysis
Namespace:       default
Status:          ◌ Progressing
Strategy:        BlueGreen
Images:          nginx:1.19 (active)
                 nginx:stable
Replicas:
  Desired:       2
  Current:       4
  Updated:       2
  Ready:         4
  Available:     2

NAME                                                             KIND         STATUS         AGE    INFO
⟳ rollout-bg-test-analysis                                       Rollout      ◌ Progressing  10m
├──# revision:3
│  ├──⧉ rollout-bg-test-analysis-6b7c8784cc                      ReplicaSet   ✔ Healthy      0s     active
│  │  ├──□ rollout-bg-test-analysis-6b7c8784cc-b29n5             Pod          ✔ Running      0s     ready:1/1
│  │  └──□ rollout-bg-test-analysis-6b7c8784cc-ztnjn             Pod          ✔ Running      0s     ready:1/1
│  └──α rollout-bg-test-analysis-6b7c8784cc-3                    AnalysisRun  ◌ Running      0s
│     └──⊞ 0208d4a6-ee21-4ec5-b969-4b75b6784a4b.test-analysis.1  Job          ◌ Running      0s
├──# revision:2
│  ├──⧉ rollout-bg-test-analysis-6bcfbc585f                      ReplicaSet   ✔ Healthy      7m57s  delay:30s
│  │  ├──□ rollout-bg-test-analysis-6bcfbc585f-cq2q5             Pod          ✔ Running      7m57s  ready:1/1
│  │  └──□ rollout-bg-test-analysis-6bcfbc585f-dtxdc             Pod          ✔ Running      7m57s  ready:1/1
│  └──α rollout-bg-test-analysis-6bcfbc585f-2                    AnalysisRun  ✔ Successful   7m56s  ✔ 1
│     └──⊞ 471f5e5b-553b-4f94-bae3-cefca88afcd6.test-analysis.1  Job          ✔ Successful   7m56s
└──# revision:1
   └──⧉ rollout-bg-test-analysis-766b7567dc                      ReplicaSet   • ScaledDown   10m


## Analysis完了(失敗)
Name:            rollout-bg-test-analysis
Namespace:       default
Status:          ◌ Progressing
Strategy:        BlueGreen
Images:          nginx:1.19 (active)
                 nginx:stable
Replicas:
  Desired:       2
  Current:       4
  Updated:       2
  Ready:         4
  Available:     2

NAME                                                             KIND         STATUS         AGE   INFO
⟳ rollout-bg-test-analysis                                       Rollout      ◌ Progressing  10m
├──# revision:3
│  ├──⧉ rollout-bg-test-analysis-6b7c8784cc                      ReplicaSet   ✔ Healthy      12s   active
│  │  ├──□ rollout-bg-test-analysis-6b7c8784cc-b29n5             Pod          ✔ Running      12s   ready:1/1
│  │  └──□ rollout-bg-test-analysis-6b7c8784cc-ztnjn             Pod          ✔ Running      12s   ready:1/1
│  └──α rollout-bg-test-analysis-6b7c8784cc-3                    AnalysisRun  ✖ Failed       11s   ✖ 1
│     └──⊞ 0208d4a6-ee21-4ec5-b969-4b75b6784a4b.test-analysis.1  Job          ✖ Failed       11s
├──# revision:2
│  ├──⧉ rollout-bg-test-analysis-6bcfbc585f                      ReplicaSet   ✔ Healthy      8m9s  delay:18s
│  │  ├──□ rollout-bg-test-analysis-6bcfbc585f-cq2q5             Pod          ✔ Running      8m9s  ready:1/1
│  │  └──□ rollout-bg-test-analysis-6bcfbc585f-dtxdc             Pod          ✔ Running      8m9s  ready:1/1
│  └──α rollout-bg-test-analysis-6bcfbc585f-2                    AnalysisRun  ✔ Successful   8m8s  ✔ 1
│     └──⊞ 471f5e5b-553b-4f94-bae3-cefca88afcd6.test-analysis.1  Job          ✔ Successful   8m8s
└──# revision:1
   └──⧉ rollout-bg-test-analysis-766b7567dc                      ReplicaSet   • ScaledDown   10m


## 完了後
## revision 2のReplicaSetがactiveとなっている
Name:            rollout-bg-test-analysis
Namespace:       default
Status:          ✖ Degraded
Strategy:        BlueGreen
Images:          nginx:1.19
                 nginx:stable (active)
Replicas:
  Desired:       2
  Current:       4
  Updated:       2
  Ready:         4
  Available:     2

NAME                                                             KIND         STATUS        AGE   INFO
⟳ rollout-bg-test-analysis                                       Rollout      ✖ Degraded    10m
├──# revision:3
│  ├──⧉ rollout-bg-test-analysis-6b7c8784cc                      ReplicaSet   ✔ Healthy     12s
│  │  ├──□ rollout-bg-test-analysis-6b7c8784cc-b29n5             Pod          ✔ Running     12s   ready:1/1
│  │  └──□ rollout-bg-test-analysis-6b7c8784cc-ztnjn             Pod          ✔ Running     12s   ready:1/1
│  └──α rollout-bg-test-analysis-6b7c8784cc-3                    AnalysisRun  ✖ Failed      11s   ✖ 1
│     └──⊞ 0208d4a6-ee21-4ec5-b969-4b75b6784a4b.test-analysis.1  Job          ✖ Failed      11s
├──# revision:2
│  ├──⧉ rollout-bg-test-analysis-6bcfbc585f                      ReplicaSet   ✔ Healthy     8m9s  active
│  │  ├──□ rollout-bg-test-analysis-6bcfbc585f-cq2q5             Pod          ✔ Running     8m9s  ready:1/1
│  │  └──□ rollout-bg-test-analysis-6bcfbc585f-dtxdc             Pod          ✔ Running     8m9s  ready:1/1
│  └──α rollout-bg-test-analysis-6bcfbc585f-2                    AnalysisRun  ✔ Successful  8m8s  ✔ 1
│     └──⊞ 471f5e5b-553b-4f94-bae3-cefca88afcd6.test-analysis.1  Job          ✔ Successful  8m8s
└──# revision:1
   └──⧉ rollout-bg-test-analysis-766b7567dc                      ReplicaSet   • ScaledDown  10m

上記の通り、新しいバージョンがデプロイされ、一度はそちらのReplicaSetがactiveになったものの、AnalysisRunが失敗したため、古いバージョンがactiveになったこと、またRolloutのStatusがDegrededの状態となることが確認できました。

RolloutのStatusをDegrededからHealthyに戻すには、元の設定(ここではAnalysisTemplateの修正)に戻すよう再デプロイする必要があります。

なお、完了後のリソースは以下のようになります。AnalysisRunFailedであること、また失敗したJobが残っていることなどが確認できます。

$ kubectl get pods
NAME                                                         READY   STATUS      RESTARTS   AGE
0208d4a6-ee21-4ec5-b969-4b75b6784a4b.test-analysis.1-fk7hn   0/1     Error       0          7m23s
0208d4a6-ee21-4ec5-b969-4b75b6784a4b.test-analysis.1-mdfzs   0/1     Error       0          7m25s
471f5e5b-553b-4f94-bae3-cefca88afcd6.test-analysis.1-8hz2s   0/1     Completed   0          15m
rollout-bg-test-analysis-6b7c8784cc-b29n5                    1/1     Running     0          7m26s
rollout-bg-test-analysis-6b7c8784cc-ztnjn                    1/1     Running     0          7m26s
rollout-bg-test-analysis-6bcfbc585f-cq2q5                    1/1     Running     0          15m
rollout-bg-test-analysis-6bcfbc585f-dtxdc                    1/1     Running     0          15m

$ kubectl get analysisrun
NAME                                    STATUS
rollout-bg-test-analysis-6b7c8784cc-3   Failed
rollout-bg-test-analysis-6bcfbc585f-2   Successful

$ kubectl get rollout
NAME                       DESIRED   CURRENT   UP-TO-DATE   AVAILABLE
rollout-bg-test-analysis   2         4         2            2


$ kubectl describe analysisrun rollout-bg-test-analysis-6b7c8784cc-3
Name:         rollout-bg-test-analysis-6b7c8784cc-3
Namespace:    default
Labels:       rollout-type=PostPromotion
              rollouts-pod-template-hash=6b7c8784cc
Annotations:  rollout.argoproj.io/revision: 3
API Version:  argoproj.io/v1alpha1
Kind:         AnalysisRun
Metadata:
  Creation Timestamp:  2020-10-10T02:35:10Z
  Generation:          3
  Owner References:
    API Version:           argoproj.io/v1alpha1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  Rollout
    Name:                  rollout-bg-test-analysis
    UID:                   e9ba7057-29db-491a-8618-13180c406fef
  Resource Version:        243985
  Self Link:               /apis/argoproj.io/v1alpha1/namespaces/default/analysisruns/rollout-bg-test-analysis-6b7c8784cc-3
  UID:                     0208d4a6-ee21-4ec5-b969-4b75b6784a4b
Spec:
  Metrics:
    Name:  test-analysis
    Provider:
      Job:
        Metadata:
          Creation Timestamp:  <nil>
        Spec:
          Backoff Limit:  1
          Template:
            Metadata:
              Creation Timestamp:  <nil>
            Spec:
              Containers:
                Args:
                  exit 1
                Command:
                  sh
                  -c
                Image:  alpine:3.8
                Name:   sleep
                Resources:
              Restart Policy:  Never
Status:
  Message:  metric "test-analysis" assessed Failed due to failed (1) > failureLimit (0)
  Metric Results:
    Count:   1
    Failed:  1
    Measurements:
      Finished At:  2020-10-10T02:35:22Z
      Metadata:
        Job - Name:  0208d4a6-ee21-4ec5-b969-4b75b6784a4b.test-analysis.1
      Phase:         Failed
      Started At:    2020-10-10T02:35:10Z
    Name:            test-analysis
    Phase:           Failed
  Phase:             Failed
  Started At:        2020-10-10T02:35:10Z
Events:
  Type     Reason  Age    From                 Message
  ----     ------  ----   ----                 -------
  Warning  Failed  8m27s  rollouts-controller  metric 'test-analysis' completed Failed
  Warning  Failed  8m27s  rollouts-controller  analysis completed Failed


$ kubectl describe rollout rollout-bg-test-analysis
Name:         rollout-bg-test-analysis
Namespace:    default
Labels:       <none>
Annotations:  rollout.argoproj.io/revision: 3
API Version:  argoproj.io/v1alpha1
Kind:         Rollout
Metadata:
  Creation Timestamp:  2020-10-10T02:25:02Z
  Generation:          27
  Resource Version:    244302
  Self Link:           /apis/argoproj.io/v1alpha1/namespaces/default/rollouts/rollout-bg-test-analysis
  UID:                 e9ba7057-29db-491a-8618-13180c406fef
Spec:
  Replicas:                2
  Revision History Limit:  2
  Selector:
    Match Labels:
      App:  rollout-bg-analysis
  Strategy:
    Blue Green:
      Active Service:  rollout-active-service-analysis
      Post Promotion Analysis:
        Templates:
          Template Name:  test-analysis
  Template:
    Metadata:
      Creation Timestamp:  <nil>
      Labels:
        App:  rollout-bg-analysis
    Spec:
      Containers:
        Image:  nginx:1.19
        Name:   nginx-container
        Ports:
          Container Port:  80
        Resources:
Status:
  HPA Replicas:        2
  Abort:               true
  Aborted At:          2020-10-10T02:37:02Z
  Available Replicas:  2
  Blue Green:
    Active Selector:              6bcfbc585f
    Post Promotion Analysis Run:  rollout-bg-test-analysis-6b7c8784cc-3
    Post Promotion Analysis Run Status:
      Message:  metric "test-analysis" assessed Failed due to failed (1) > failureLimit (0)
      Name:     rollout-bg-test-analysis-6b7c8784cc-3
      Status:   Failed
  Canary:
  Conditions:
    Last Transition Time:  2020-10-10T02:25:07Z
    Last Update Time:      2020-10-10T02:25:07Z
    Message:               Rollout has minimum availability
    Reason:                AvailableReason
    Status:                True
    Type:                  Available
    Last Transition Time:  2020-10-10T02:35:22Z
    Last Update Time:      2020-10-10T02:35:22Z
    Message:               metric "test-analysis" assessed Failed due to failed (1) > failureLimit (0)
    Reason:                RolloutAborted
    Status:                False
    Type:                  Progressing
  Current Pod Hash:        6b7c8784cc
  Observed Generation:     57cf5bd85b
  Ready Replicas:          4
  Replicas:                4
  Selector:                app=rollout-bg-analysis,rollouts-pod-template-hash=6bcfbc585f
  Stable RS:               6bcfbc585f
  Updated Replicas:        2
Events:
  Type     Reason                   Age                  From                 Message
  ----     ------                   ----                 ----                 -------
  Normal   ScalingReplicaSet        19m                  rollouts-controller  Scaled up replica set rollout-bg-test-analysis-766b7567dc to 2
  Normal   SwitchService            19m                  rollouts-controller  Switched selector for service 'rollout-active-service-analysis' to value '766b7567dc'
  Normal   ScalingReplicaSet        17m                  rollouts-controller  Scaled up replica set rollout-bg-test-analysis-6bcfbc585f to 2
  Normal   AnalysisRunStatusChange  17m                  rollouts-controller  PostPromotion Analysis Run 'rollout-bg-test-analysis-6bcfbc585f-2' Status New: 'Running' Previous: ''
  Normal   AnalysisRunStatusChange  17m                  rollouts-controller  PostPromotion Analysis Run 'rollout-bg-test-analysis-6bcfbc585f-2' Status New: '' Previous: 'NoPreviousStatus'
  Normal   AnalysisRunStatusChange  17m                  rollouts-controller  PostPromotion Analysis Run 'rollout-bg-test-analysis-6bcfbc585f-2' Status New: 'Successful' Previous: 'Running'
  Normal   ScalingReplicaSet        17m                  rollouts-controller  Scaled down replica set rollout-bg-test-analysis-766b7567dc to 0
  Normal   ScalingReplicaSet        9m51s                rollouts-controller  Scaled up replica set rollout-bg-test-analysis-6b7c8784cc to 2
  Normal   SwitchService            9m50s                rollouts-controller  Switched selector for service 'rollout-active-service-analysis' to value '6b7c8784cc'
  Normal   AnalysisRunStatusChange  9m50s                rollouts-controller  PostPromotion Analysis Run 'rollout-bg-test-analysis-6b7c8784cc-3' Status New: '' Previous: 'NoPreviousStatus'
  Normal   AnalysisRunStatusChange  9m50s                rollouts-controller  PostPromotion Analysis Run 'rollout-bg-test-analysis-6b7c8784cc-3' Status New: 'Running' Previous: ''
  Normal   SwitchService            9m38s (x2 over 17m)  rollouts-controller  Switched selector for service 'rollout-active-service-analysis' to value '6bcfbc585f'
  Warning  AnalysisRunStatusChange  9m38s                rollouts-controller  PostPromotion Analysis Run 'rollout-bg-test-analysis-6b7c8784cc-3' Status New: 'Failed' Previous: 'Running'

実際のアップデート時に失敗しそうなケースを見てみる

上記ではAnalysisTemplateを利用し、条件を満たさない場合に自動的にロールバックする様子を見ました。ここからは、実際にKubernetes上で動かすアプリケーションに対してアップデートを行った時、アップデートが失敗する原因となりうる2つのケースについて、追検証をしてみました。

コンテナ起動に失敗した場合

1つ目は、コンテナの起動に失敗した場合です。今回は、わざと起動に失敗するようなコンテナイメージを用意し、イメージ更新時にそのイメージを指定して、どのような挙動を取るかを確認しました。

コンテナの起動に失敗するように、以下のDockerfileを利用しました。ここでは、存在しないファイルに対してheadコマンドを実行しています。

Dockerfile

FROM nginx:latest

CMD head /foo/bar

Dockerfileを用いてビルドを行った後、Amazon ECRへプッシュをして、EKSからそのイメージを利用する形で検証をしました。

# コンテナイメージのビルド・Push
$ docker build -t test/test03:fail .
$ docker tag test/test03:fail 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:fail
$ docker push 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:fail

また、起動に成功するようなコンテナイメージも、合わせて用意しておきます。

Dockerfile

FROM nginx:latest
# コンテナイメージのビルド・Push
$ docker build -t test/test03:success .
$ docker tag test/test03:success 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success
$ docker push 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success

今回の検証で利用したマニフェストファイルは以下の通りです。

rollout-bg-test-start-fail.yml

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: rollout-bg-test-start-fail
spec:
  replicas: 2
  revisionHistoryLimit: 2
  selector:
    matchLabels:
      app: rollout-bg-analysis-start-fail
  template:
    metadata:
      labels:
        app: rollout-bg-analysis-start-fail
    spec:
      containers:
        - name: nginx-container
          image: 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success
          ports:
          - containerPort: 80
  strategy:
    blueGreen:
      activeService: rollout-active-service-start-fail
      postPromotionAnalysis:
        templates:
        - templateName: test-analysis

rollout-active-service-start-fail.yml

apiVersion: v1
kind: Service
metadata:
  name: rollout-active-service-start-fail
spec:
  ports:
  - port: 8080
    targetPort: 80
    protocol: TCP
  selector:
    app: rollout-bg-analysis-start-fail

まずは上記Yamlファイル、そしてtest-analysisを含むAnalysisTemplateを作成しておきます。

$ kubectl apply -f test-analysistemp.yml
analysistemplate.argoproj.io/test-analysis created

$ kubectl apply -f rollout-active-service-start-fail.yml
service/rollout-active-service-start-fail created

$ kubectl apply -f rollout-bg-test-start-fail.yml
rollout.argoproj.io/rollout-bg-test-start-fail created


# デプロイ後の確認
$ kubectl get svc
NAME                                TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
kubernetes                          ClusterIP   10.100.0.1      <none>        443/TCP    85m
rollout-active-service-start-fail   ClusterIP   10.100.32.113   <none>        8080/TCP   104s

$ kubectl get pods
NAME                                         READY   STATUS    RESTARTS   AGE
rollout-bg-test-start-fail-8cdb4dcc6-cnjlt   1/1     Running   0          13s
rollout-bg-test-start-fail-8cdb4dcc6-phsgq   1/1     Running   0          13s

$ kubectl get rollout
NAME                         DESIRED   CURRENT   UP-TO-DATE   AVAILABLE
rollout-bg-test-start-fail   2         2         2            2


$ kubectl argo rollouts get rollout rollout-bg-test-start-fail
Name:            rollout-bg-test-start-fail
Namespace:       default
Status:          ✔ Healthy
Strategy:        BlueGreen
Images:          111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success (active)
Replicas:
  Desired:       2
  Current:       2
  Updated:       2
  Ready:         2
  Available:     2

NAME                                                   KIND        STATUS     AGE  INFO
⟳ rollout-bg-test-start-fail                           Rollout     ✔ Healthy  11m
└──# revision:1
   └──⧉ rollout-bg-test-start-fail-8cdb4dcc6           ReplicaSet  ✔ Healthy  11m  active
      ├──□ rollout-bg-test-start-fail-8cdb4dcc6-cnjlt  Pod         ✔ Running  11m  ready:1/1
      └──□ rollout-bg-test-start-fail-8cdb4dcc6-phsgq  Pod         ✔ Running  11m  ready:1/1

次に、イメージの更新を実行し、アップデートの様子を確認します。

# 別のターミナルで実行
$ kubectl argo rollouts get rollout rollout-bg-test-start-fail --watch


# イメージの更新
$ kubectl argo rollouts set image rollout-bg-test-start-fail nginx-container=111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:fail
rollout "rollout-bg-test-start-fail" image updated


# アップデートの様子を確認
## アップデート直後
Name:            rollout-bg-test-start-fail
Namespace:       default
Status:          ✔ Healthy
Strategy:        BlueGreen
Images:          111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success (active)
Replicas:
  Desired:       2
  Current:       2
  Updated:       2
  Ready:         2
  Available:     2

NAME                                                   KIND        STATUS     AGE  INFO
⟳ rollout-bg-test-start-fail                           Rollout     ✔ Healthy  12m
└──# revision:1
   └──⧉ rollout-bg-test-start-fail-8cdb4dcc6           ReplicaSet  ✔ Healthy  12m  active
      ├──□ rollout-bg-test-start-fail-8cdb4dcc6-cnjlt  Pod         ✔ Running  12m  ready:1/1
      └──□ rollout-bg-test-start-fail-8cdb4dcc6-phsgq  Pod         ✔ Running  12m  ready:1/1


## 新規コンテナの作成開始
Name:            rollout-bg-test-start-fail
Namespace:       default
Status:          ◌ Progressing
Strategy:        BlueGreen
Images:          111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:fail
                 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success (active)
Replicas:
  Desired:       2
  Current:       4
  Updated:       2
  Ready:         2
  Available:     2

NAME                                                    KIND        STATUS               AGE  INFO
⟳ rollout-bg-test-start-fail                            Rollout     ◌ Progressing        12m
├──# revision:2
│  └──⧉ rollout-bg-test-start-fail-7fb74fd5f5           ReplicaSet  ◌ Progressing        0s
│     ├──□ rollout-bg-test-start-fail-7fb74fd5f5-948tm  Pod         ◌ ContainerCreating  0s   ready:0/1
│     └──□ rollout-bg-test-start-fail-7fb74fd5f5-mr4n4  Pod         ◌ ContainerCreating  0s   ready:0/1
└──# revision:1
   └──⧉ rollout-bg-test-start-fail-8cdb4dcc6            ReplicaSet  ✔ Healthy            12m  active
      ├──□ rollout-bg-test-start-fail-8cdb4dcc6-cnjlt   Pod         ✔ Running            12m  ready:1/1
      └──□ rollout-bg-test-start-fail-8cdb4dcc6-phsgq   Pod         ✔ Running            12m  ready:1/1


## 作成に失敗
Name:            rollout-bg-test-start-fail
Namespace:       default
Status:          ◌ Progressing
Strategy:        BlueGreen
Images:          111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:fail
                 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success (active)
Replicas:
  Desired:       2
  Current:       4
  Updated:       2
  Ready:         2
  Available:     2

NAME                                                    KIND        STATUS               AGE  INFO
⟳ rollout-bg-test-start-fail                            Rollout     ◌ Progressing        12m
├──# revision:2
│  └──⧉ rollout-bg-test-start-fail-7fb74fd5f5           ReplicaSet  ◌ Progressing        1s
│     ├──□ rollout-bg-test-start-fail-7fb74fd5f5-948tm  Pod         ⚠ Error              1s   ready:0/1
│     └──□ rollout-bg-test-start-fail-7fb74fd5f5-mr4n4  Pod         ◌ ContainerCreating  1s   ready:0/1
└──# revision:1
   └──⧉ rollout-bg-test-start-fail-8cdb4dcc6            ReplicaSet  ✔ Healthy            12m  active
      ├──□ rollout-bg-test-start-fail-8cdb4dcc6-cnjlt   Pod         ✔ Running            12m  ready:1/1
      └──□ rollout-bg-test-start-fail-8cdb4dcc6-phsgq   Pod         ✔ Running            12m  ready:1/1
Name:            rollout-bg-test-start-fail
Namespace:       default
Status:          ◌ Progressing
Strategy:        BlueGreen
Images:          111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:fail
                 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success (active)
Replicas:
  Desired:       2
  Current:       4
  Updated:       2
  Ready:         2
  Available:     2

NAME                                                    KIND        STATUS         AGE  INFO
⟳ rollout-bg-test-start-fail                            Rollout     ◌ Progressing  12m
├──# revision:2
│  └──⧉ rollout-bg-test-start-fail-7fb74fd5f5           ReplicaSet  ◌ Progressing  1s
│     ├──□ rollout-bg-test-start-fail-7fb74fd5f5-948tm  Pod         ⚠ Error        1s   ready:0/1
│     └──□ rollout-bg-test-start-fail-7fb74fd5f5-mr4n4  Pod         ⚠ Error        1s   ready:0/1
└──# revision:1
   └──⧉ rollout-bg-test-start-fail-8cdb4dcc6            ReplicaSet  ✔ Healthy      12m  active
      ├──□ rollout-bg-test-start-fail-8cdb4dcc6-cnjlt   Pod         ✔ Running      12m  ready:1/1
      └──□ rollout-bg-test-start-fail-8cdb4dcc6-phsgq   Pod         ✔ Running      12m  ready:1/1

Name:            rollout-bg-test-start-fail
Namespace:       default
Status:          ◌ Progressing
Strategy:        BlueGreen
Images:          111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:fail
                 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success (active)
Replicas:
  Desired:       2
  Current:       4
  Updated:       2
  Ready:         2
  Available:     2

NAME                                                    KIND        STATUS              AGE  INFO
⟳ rollout-bg-test-start-fail                            Rollout     ◌ Progressing       12m
├──# revision:2
│  └──⧉ rollout-bg-test-start-fail-7fb74fd5f5           ReplicaSet  ◌ Progressing       3s
│     ├──□ rollout-bg-test-start-fail-7fb74fd5f5-948tm  Pod         ✖ CrashLoopBackOff  3s   ready:0/1,restarts:1
│     └──□ rollout-bg-test-start-fail-7fb74fd5f5-mr4n4  Pod         ⚠ Error             3s   ready:0/1,restarts:1
└──# revision:1
   └──⧉ rollout-bg-test-start-fail-8cdb4dcc6            ReplicaSet  ✔ Healthy           12m  active
      ├──□ rollout-bg-test-start-fail-8cdb4dcc6-cnjlt   Pod         ✔ Running           12m  ready:1/1
      └──□ rollout-bg-test-start-fail-8cdb4dcc6-phsgq   Pod         ✔ Running           12m  ready:1/1
Name:            rollout-bg-test-start-fail
Namespace:       default
Status:          ◌ Progressing
Strategy:        BlueGreen
Images:          111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:fail
                 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success (active)
Replicas:
  Desired:       2
  Current:       4
  Updated:       2
  Ready:         2
  Available:     2

NAME                                                    KIND        STATUS              AGE  INFO
⟳ rollout-bg-test-start-fail                            Rollout     ◌ Progressing       12m
├──# revision:2
│  └──⧉ rollout-bg-test-start-fail-7fb74fd5f5           ReplicaSet  ◌ Progressing       3s
│     ├──□ rollout-bg-test-start-fail-7fb74fd5f5-948tm  Pod         ✖ CrashLoopBackOff  3s   ready:0/1,restarts:1
│     └──□ rollout-bg-test-start-fail-7fb74fd5f5-mr4n4  Pod         ✖ CrashLoopBackOff  3s   ready:0/1,restarts:1
└──# revision:1
   └──⧉ rollout-bg-test-start-fail-8cdb4dcc6            ReplicaSet  ✔ Healthy           12m  active
      ├──□ rollout-bg-test-start-fail-8cdb4dcc6-cnjlt   Pod         ✔ Running           12m  ready:1/1
      └──□ rollout-bg-test-start-fail-8cdb4dcc6-phsgq   Pod         ✔ Running           12m  ready:1/1


## しばらくするとDegradedになる
Name:            rollout-bg-test-start-fail
Namespace:       default
Status:          ✖ Degraded
Strategy:        BlueGreen
Images:          111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:fail
                 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success (active)
Replicas:
  Desired:       2
  Current:       4
  Updated:       2
  Ready:         2
  Available:     2

NAME                                                    KIND        STATUS              AGE  INFO
⟳ rollout-bg-test-start-fail                            Rollout     ✖ Degraded          25m
├──# revision:2
│  └──⧉ rollout-bg-test-start-fail-7fb74fd5f5           ReplicaSet  ◌ Progressing       13m
│     ├──□ rollout-bg-test-start-fail-7fb74fd5f5-948tm  Pod         ✖ CrashLoopBackOff  13m  ready:0/1,restarts:7
│     └──□ rollout-bg-test-start-fail-7fb74fd5f5-mr4n4  Pod         ✖ CrashLoopBackOff  13m  ready:0/1,restarts:7
└──# revision:1
   └──⧉ rollout-bg-test-start-fail-8cdb4dcc6            ReplicaSet  ✔ Healthy           25m  active
      ├──□ rollout-bg-test-start-fail-8cdb4dcc6-cnjlt   Pod         ✔ Running           25m  ready:1/1
      └──□ rollout-bg-test-start-fail-8cdb4dcc6-phsgq   Pod         ✔ Running           25m  ready:1/1

上記の通り、コンテナの起動に失敗した場合は、Analysisが実行される前に起動に失敗するため、新しいバージョンへの切り替えは発生しませんでした。StatusはDegradedとなってしまいますが、コンテナの起動に失敗した場合は、アプリケーションの稼働時間に対しての影響はなさそうに見えます。

なお、RolloutのStatusはDegradedになるため、元のコンテナイメージを用いて再デプロイをすることでHealthyにすることができます。

Liveness Probeに失敗し続けた場合

次にLiveness Probeで失敗する場合について見ていきます。今回は以下のようなマニフェストファイルを用意し、あとからLiveness Probeの条件を変更することで、Probeに失敗する状況を作っています。

なお、今回はLiveness Probeの設定(initialDelaySeconds periodSeconds)により、Liveness Probeより先にAnalysisRunが起動する形となっております。

rollout-bg-test-liveness-fail.yml

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: rollout-bg-test-liveness-fail
spec:
  replicas: 2
  revisionHistoryLimit: 2
  selector:
    matchLabels:
      app: rollout-bg-analysis-liveness-fail
  template:
    metadata:
      labels:
        app: rollout-bg-analysis-liveness-fail
    spec:
      containers:
        - name: nginx-container
          image: 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success
          ports:
          - containerPort: 80
          livenessProbe:
            tcpSocket:
              port: 80
            initialDelaySeconds: 5
            periodSeconds: 5
            timeoutSeconds: 1
            successThreshold: 1
            failureThreshold: 1
  strategy:
    blueGreen:
      activeService: rollout-active-service-liveness-fail
      postPromotionAnalysis:
        templates:
        - templateName: test-analysis

rollout-active-service-liveness-fail.yml

apiVersion: v1
kind: Service
metadata:
  name: rollout-active-service-liveness-fail
spec:
  ports:
  - port: 8080
    targetPort: 80
    protocol: TCP
  selector:
    app: rollout-bg-analysis-liveness-fail

上記2つのYamlファイル、そしてtest-analysisを含むAnalysisTemplateを作成しておきます。

$ kubectl apply -f rollout-active-service-liveness-fail.yml
service/rollout-active-service-liveness-fail created

$ kubectl apply -f rollout-bg-test-liveness-fail.yml
rollout.argoproj.io/rollout-bg-test-liveness-fail created

# デプロイ後の確認
$ kubectl get svc
NAME                                   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
kubernetes                             ClusterIP   10.100.0.1      <none>        443/TCP    150m
rollout-active-service-liveness-fail   ClusterIP   10.100.21.204   <none>        8080/TCP   9m1s

$ kubectl get pods
NAME                                             READY   STATUS    RESTARTS   AGE
rollout-bg-test-liveness-fail-54495f7df4-df2dk   1/1     Running   0          15s
rollout-bg-test-liveness-fail-54495f7df4-svb25   1/1     Running   0          15s

$ kubectl get rollout
NAME                            DESIRED   CURRENT   UP-TO-DATE   AVAILABLE
rollout-bg-test-liveness-fail   2         2         2            2

$ kubectl argo rollouts get rollout rollout-bg-test-liveness-fail
Name:            rollout-bg-test-liveness-fail
Namespace:       default
Status:          ✔ Healthy
Strategy:        BlueGreen
Images:          111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success (active)
Replicas:
  Desired:       2
  Current:       2
  Updated:       2
  Ready:         2
  Available:     2

NAME                                                       KIND        STATUS     AGE    INFO
⟳ rollout-bg-test-liveness-fail                            Rollout     ✔ Healthy  2m37s
└──# revision:1
   └──⧉ rollout-bg-test-liveness-fail-54495f7df4           ReplicaSet  ✔ Healthy  2m36s  active
      ├──□ rollout-bg-test-liveness-fail-54495f7df4-df2dk  Pod         ✔ Running  2m36s  ready:1/1
      └──□ rollout-bg-test-liveness-fail-54495f7df4-svb25  Pod         ✔ Running  2m36s  ready:1/1

次にLiveness Probeの内容を一部変更し、それによるアップデートの推移を確認してみます。

# 別のターミナルで実行
$ kubectl argo rollouts get rollout rollout-bg-test-liveness-fail --watch


# Rolloutの編集
$ kubectl edit rollout rollout-bg-test-liveness-fail
rollout.argoproj.io/rollout-bg-test-liveness-fail edited


# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"argoproj.io/v1alpha1","kind":"Rollout","metadata":{"annotations":{},"name":"rollout-bg-test-liveness-fail","namespace":"default"},"spec":{"replicas":2,"revisionHistoryLimit":2,"selector":{"matchLabels":{"app":"rollout-bg-analysis-liveness-fail"}},"strategy":{"blueGreen":{"activeService":"rollout-active-service-liveness-fail","postPromotionAnalysis":{"templates":[{"templateName":"test-analysis"}]}}},"template":{"metadata":{"labels":{"app":"rollout-bg-analysis-liveness-fail"}},"spec":{"containers":[{"image":"111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success","livenessProbe":{"failureThreshold":1,"initialDelaySeconds":5,"periodSeconds":5,"successThreshold":1,"tcpSocket":{"port":80},"timeoutSeconds":1},"name":"nginx-container","ports":[{"containerPort":80}]}]}}}}
    rollout.argoproj.io/revision: "1"
  creationTimestamp: "2020-10-12T06:52:16Z"
  generation: 6
  name: rollout-bg-test-liveness-fail
  namespace: default
  resourceVersion: "28459"
  selfLink: /apis/argoproj.io/v1alpha1/namespaces/default/rollouts/rollout-bg-test-liveness-fail
  uid: 70fa4d0b-fb3a-46c6-8bfd-321e2d829694
spec:
  replicas: 2
  revisionHistoryLimit: 2
  selector:
    matchLabels:
      app: rollout-bg-analysis-liveness-fail
  strategy:
    blueGreen:
      activeService: rollout-active-service-liveness-fail
      postPromotionAnalysis:
        templates:
        - templateName: test-analysis
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: rollout-bg-analysis-liveness-fail
    spec:
      containers:
      - image: 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success
        livenessProbe:
          failureThreshold: 1
          initialDelaySeconds: 5
          periodSeconds: 5
          successThreshold: 1
          tcpSocket:
            port: 8080  # 変更
          timeoutSeconds: 1
        name: nginx-container
        ports:
        - containerPort: 80
        resources: {}
status:
  HPAReplicas: 2
  availableReplicas: 2
  blueGreen:
    activeSelector: 54495f7df4
  canary: {}
  conditions:
  - lastTransitionTime: "2020-10-12T06:52:17Z"
    lastUpdateTime: "2020-10-12T06:52:18Z"
    message: ReplicaSet "rollout-bg-test-liveness-fail-54495f7df4" has successfully
      progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  - lastTransitionTime: "2020-10-12T06:52:18Z"
    lastUpdateTime: "2020-10-12T06:52:18Z"
    message: Rollout has minimum availability
    reason: AvailableReason
    status: "True"
    type: Available
  currentPodHash: 54495f7df4
  observedGeneration: 75d6d6f664
  readyReplicas: 2
  replicas: 2
  selector: app=rollout-bg-analysis-liveness-fail,rollouts-pod-template-hash=54495f7df4
  stableRS: 54495f7df4
  updatedReplicas: 2


# アップデートの様子を確認
## コンテナの作成開始
Name:            rollout-bg-test-liveness-fail
Namespace:       default
Status:          ◌ Progressing
Strategy:        BlueGreen
Images:          111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success (active)
Replicas:
  Desired:       2
  Current:       4
  Updated:       2
  Ready:         2
  Available:     2

NAME                                                       KIND        STATUS               AGE  INFO
⟳ rollout-bg-test-liveness-fail                            Rollout     ◌ Progressing        14m
├──# revision:2
│  └──⧉ rollout-bg-test-liveness-fail-7bb5898d6            ReplicaSet  ◌ Progressing        0s
│     ├──□ rollout-bg-test-liveness-fail-7bb5898d6-tw8lt   Pod         ◌ ContainerCreating  0s   ready:0/1
│     └──□ rollout-bg-test-liveness-fail-7bb5898d6-wb944   Pod         ◌ ContainerCreating  0s   ready:0/1
└──# revision:1
   └──⧉ rollout-bg-test-liveness-fail-54495f7df4           ReplicaSet  ✔ Healthy            14m  active
      ├──□ rollout-bg-test-liveness-fail-54495f7df4-df2dk  Pod         ✔ Running            14m  ready:1/1
      └──□ rollout-bg-test-liveness-fail-54495f7df4-svb25  Pod         ✔ Running            14m  ready:1/1


## コンテナの作成完了とAnalysisの開始
Name:            rollout-bg-test-liveness-fail
Namespace:       default
Status:          ◌ Progressing
Strategy:        BlueGreen
Images:          111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success (active)
Replicas:
  Desired:       2
  Current:       4
  Updated:       2
  Ready:         2
  Available:     2

NAME                                                             KIND         STATUS         AGE  INFO
⟳ rollout-bg-test-liveness-fail                                  Rollout      ◌ Progressing  14m
├──# revision:2
│  ├──⧉ rollout-bg-test-liveness-fail-7bb5898d6                  ReplicaSet   ✔ Healthy      1s   active
│  │  ├──□ rollout-bg-test-liveness-fail-7bb5898d6-tw8lt         Pod          ✔ Running      1s   ready:1/1
│  │  └──□ rollout-bg-test-liveness-fail-7bb5898d6-wb944         Pod          ✔ Running      1s   ready:1/1
│  └──α rollout-bg-test-liveness-fail-7bb5898d6-2-post           AnalysisRun  ◌ Running      0s
│     └──⊞ 3bf49f08-e851-4185-8cbf-88886c0da2ec.test-analysis.1  Job          ◌ Running      0s
└──# revision:1
   └──⧉ rollout-bg-test-liveness-fail-54495f7df4                 ReplicaSet   ✔ Healthy      14m  delay:29s
      ├──□ rollout-bg-test-liveness-fail-54495f7df4-df2dk        Pod          ✔ Running      14m  ready:1/1
      └──□ rollout-bg-test-liveness-fail-54495f7df4-svb25        Pod          ✔ Running      14m  ready:1/1


## Analysisの完了(成功)
Name:            rollout-bg-test-liveness-fail
Namespace:       default
Status:          ◌ Progressing
Strategy:        BlueGreen
Images:          111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success (active)
Replicas:
  Desired:       2
  Current:       4
  Updated:       2
  Ready:         2
  Available:     2

NAME                                                             KIND         STATUS         AGE  INFO
⟳ rollout-bg-test-liveness-fail                                  Rollout      ◌ Progressing  14m
├──# revision:2
│  ├──⧉ rollout-bg-test-liveness-fail-7bb5898d6                  ReplicaSet   ✔ Healthy      5s   active
│  │  ├──□ rollout-bg-test-liveness-fail-7bb5898d6-tw8lt         Pod          ✔ Running      5s   ready:1/1
│  │  └──□ rollout-bg-test-liveness-fail-7bb5898d6-wb944         Pod          ✔ Running      5s   ready:1/1
│  └──α rollout-bg-test-liveness-fail-7bb5898d6-2-post           AnalysisRun  ✔ Successful   4s   ✔ 1
│     └──⊞ 3bf49f08-e851-4185-8cbf-88886c0da2ec.test-analysis.1  Job          ✔ Successful   4s
└──# revision:1
   └──⧉ rollout-bg-test-liveness-fail-54495f7df4                 ReplicaSet   ✔ Healthy      14m  delay:25s
      ├──□ rollout-bg-test-liveness-fail-54495f7df4-df2dk        Pod          ✔ Running      14m  ready:1/1
      └──□ rollout-bg-test-liveness-fail-54495f7df4-svb25        Pod          ✔ Running      14m  ready:1/1


## コンテナのRestart
Name:            rollout-bg-test-liveness-fail
Namespace:       default
Status:          ◌ Progressing
Strategy:        BlueGreen
Images:          111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success (active)
Replicas:
  Desired:       2
  Current:       4
  Updated:       2
  Ready:         2
  Available:     2

NAME                                                             KIND         STATUS         AGE  INFO
⟳ rollout-bg-test-liveness-fail                                  Rollout      ◌ Progressing  14m
├──# revision:2
│  ├──⧉ rollout-bg-test-liveness-fail-7bb5898d6                  ReplicaSet   ✔ Healthy      8s   active
│  │  ├──□ rollout-bg-test-liveness-fail-7bb5898d6-tw8lt         Pod          ✔ Running      8s   ready:1/1,restarts:1
│  │  └──□ rollout-bg-test-liveness-fail-7bb5898d6-wb944         Pod          ✔ Running      8s   ready:1/1,restarts:1
│  └──α rollout-bg-test-liveness-fail-7bb5898d6-2-post           AnalysisRun  ✔ Successful   7s   ✔ 1
│     └──⊞ 3bf49f08-e851-4185-8cbf-88886c0da2ec.test-analysis.1  Job          ✔ Successful   7s
└──# revision:1
   └──⧉ rollout-bg-test-liveness-fail-54495f7df4                 ReplicaSet   ✔ Healthy      14m  delay:22s
      ├──□ rollout-bg-test-liveness-fail-54495f7df4-df2dk        Pod          ✔ Running      14m  ready:1/1
      └──□ rollout-bg-test-liveness-fail-54495f7df4-svb25        Pod          ✔ Running      14m  ready:1/1


## 以降はコンテナの再起動と失敗を繰り返す
Name:            rollout-bg-test-liveness-fail
Namespace:       default
Status:          ◌ Progressing
Strategy:        BlueGreen
Images:          111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success (active)
Replicas:
  Desired:       2
  Current:       2
  Updated:       2
  Ready:         1
  Available:     1

NAME                                                             KIND         STATUS              AGE  INFO
⟳ rollout-bg-test-liveness-fail                                  Rollout      ◌ Progressing       14m
├──# revision:2
│  ├──⧉ rollout-bg-test-liveness-fail-7bb5898d6                  ReplicaSet   ◌ Progressing       31s  active
│  │  ├──□ rollout-bg-test-liveness-fail-7bb5898d6-tw8lt         Pod          ✔ Running           31s  ready:1/1,restarts:3
│  │  └──□ rollout-bg-test-liveness-fail-7bb5898d6-wb944         Pod          ✖ CrashLoopBackOff  31s  ready:0/1,restarts:2
│  └──α rollout-bg-test-liveness-fail-7bb5898d6-2-post           AnalysisRun  ✔ Successful        30s  ✔ 1
│     └──⊞ 3bf49f08-e851-4185-8cbf-88886c0da2ec.test-analysis.1  Job          ✔ Successful        30s
└──# revision:1
   └──⧉ rollout-bg-test-liveness-fail-54495f7df4                 ReplicaSet   • ScaledDown        14m
      ├──□ rollout-bg-test-liveness-fail-54495f7df4-df2dk        Pod          ◌ Terminating       14m  ready:0/1
      └──□ rollout-bg-test-liveness-fail-54495f7df4-svb25        Pod          ◌ Terminating       14m  ready:0/1




Name:            rollout-bg-test-liveness-fail
Namespace:       default
Status:          ◌ Progressing
Strategy:        BlueGreen
Images:          111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success (active)
Replicas:
  Desired:       2
  Current:       2
  Updated:       2
  Ready:         1
  Available:     1

NAME                                                             KIND         STATUS              AGE  INFO
⟳ rollout-bg-test-liveness-fail                                  Rollout      ◌ Progressing       15m
├──# revision:2
│  ├──⧉ rollout-bg-test-liveness-fail-7bb5898d6                  ReplicaSet   ◌ Progressing       43s  active
│  │  ├──□ rollout-bg-test-liveness-fail-7bb5898d6-tw8lt         Pod          ✖ CrashLoopBackOff  43s  ready:0/1,restarts:3
│  │  └──□ rollout-bg-test-liveness-fail-7bb5898d6-wb944         Pod          ✖ CrashLoopBackOff  43s  ready:0/1,restarts:4
│  └──α rollout-bg-test-liveness-fail-7bb5898d6-2-post           AnalysisRun  ✔ Successful        42s  ✔ 1
│     └──⊞ 3bf49f08-e851-4185-8cbf-88886c0da2ec.test-analysis.1  Job          ✔ Successful        42s
└──# revision:1
   └──⧉ rollout-bg-test-liveness-fail-54495f7df4                 ReplicaSet   • ScaledDown        15m

上記の通り、Liveness Probeに失敗すると、一度はPodが作成され、トラフィックの切り替えも発生しますが、Probeに失敗する限りコンテナのRestartが繰り返される状況となることがわかりました。またRolloutのStatusはProgressingの状態が続き、これをHealthyに戻すには、やはり正常に動くRollout(ここではLiveness Probeの設定を修正したもの)を再デプロイする必要があります。

今回のAnalysisTemplateは、Liveness Probeより先に実行されるようにしており、また実行すれば必ず成功するものだったため、あまり意味のないものでした。一方でAnalysisの実行内容を工夫することで(例えば起動後に一定時間Podへの疎通確認を行うなど?)、この問題を解決することができるかもしれません。またBlue/Green Deploymentを利用する場合prePromotionAnalysisを設定することで、切り替え前の分析を実行することもできます。これにより、切り替え前にLiveness Probeの設定(不備?)によるコンテナの再起動の繰り返しが起きる場合に備え、Analysisを実行して検知をするよう設定することもできるのでは、と考えています。

Prometheus等の監視メトリクスによって問題を検知した場合

今回は検証を行いませんが、Argo RolloutsではAnalysisTemplatePrometheusのメトリクスなどを利用することができます。これにより、新バージョンのデプロイ・リリース完了前後でアプリケーション等に問題が見られた場合に、自動的にロールバックを行うこともできます。

公式ドキュメントでは、以下のようなマニフェストファイルの例が紹介されています。Analysisの成功・失敗の基準(successCondition)と実際のAnalysis(provider.prometheus.query)を定義し、条件を満たさない場合はロールバックを行います。

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
spec:
  args:
  - name: service-name
  - name: prometheus-port
    value: 9090
  metrics:
  - name: success-rate
    successCondition: result[0] >= 0.95
    provider:
      prometheus:
        address: "http://prometheus.example.com:{{args.prometheus-port}}"
        query: |
          sum(irate(
            istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}",response_code!~"5.*"}[5m]
          )) / 
          sum(irate(
            istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}"}[5m]
          ))

参考ドキュメント