Argo Rolloutsは、Kubernetesで利用できるRolling Updateよりも高度なデプロイ・リリース方式を利用することができます。その中には「Progressive Delivery」という、デプロイ後に特定の分析を行い、デプロイの結果を評価するという方式も含まれています。
Argo RolloutsにはAnalysisTemplate
AnalysisRun
など、分析に関するCRDが含まれており、この結果をもとに自動ロールバックを実行することができます。
今回はArgo Rolloutsで利用できる自動ロールバックを試してみました。Argo Rolloutsの概要については、前回の記事をご覧ください。
検証環境
今回の環境は以下の通りです。
- 利用サービス: Amazon EKS
- 構築方法: eksctlを利用
- ローカル環境: WSL (Ubuntu 18.04.4)
Argo Rolloutsのデプロイ
まずはArgo Rolloutsを利用できるよう、Kubernetes環境にデプロイします。
# Argo Rolloutsのデプロイ $ git clone https://github.com/argoproj/argo-rollouts.git $ cd argo-rollouts/manifests/ $ kubectl create ns argo-rollouts namespace/argo-rollouts created $ kubectl get ns NAME STATUS AGE argo-rollouts Active 5s default Active 63m kube-node-lease Active 63m kube-public Active 63m kube-system Active 63m $ kubectl apply -n argo-rollouts -f install.yaml customresourcedefinition.apiextensions.k8s.io/analysisruns.argoproj.io created customresourcedefinition.apiextensions.k8s.io/analysistemplates.argoproj.io created customresourcedefinition.apiextensions.k8s.io/clusteranalysistemplates.argoproj.io created customresourcedefinition.apiextensions.k8s.io/experiments.argoproj.io created customresourcedefinition.apiextensions.k8s.io/rollouts.argoproj.io created serviceaccount/argo-rollouts created role.rbac.authorization.k8s.io/argo-rollouts-role created clusterrole.rbac.authorization.k8s.io/argo-rollouts-aggregate-to-admin created clusterrole.rbac.authorization.k8s.io/argo-rollouts-aggregate-to-edit created clusterrole.rbac.authorization.k8s.io/argo-rollouts-aggregate-to-view created clusterrole.rbac.authorization.k8s.io/argo-rollouts-clusterrole created rolebinding.rbac.authorization.k8s.io/argo-rollouts-role-binding created clusterrolebinding.rbac.authorization.k8s.io/argo-rollouts-clusterrolebinding created service/argo-rollouts-metrics created deployment.apps/argo-rollouts created # デプロイ後の確認 $ kubectl get all -n argo-rollouts NAME READY STATUS RESTARTS AGE pod/argo-rollouts-8454b64759-rhf47 1/1 Running 0 9s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/argo-rollouts-metrics ClusterIP 10.100.217.123 <none> 8090/TCP 10s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/argo-rollouts 1/1 1 1 9s NAME DESIRED CURRENT READY AGE replicaset.apps/argo-rollouts-8454b64759 1 1 1 9s
AnalysisTemplate
を利用しない場合
ここから、実際にArgo Rolloutsを利用します。今回はBlue/Green Deploymentを利用したときの様子を見ていきます。またAnalysisTemplate
を利用しない場合と、利用した場合を試し、AnalysisTemplate
を利用することでどう変わるかを見ていきます。
まずはAnalysisTemplate
を利用しない場合を見てみます。
Rollout
のデプロイ
今回は以下のようなRollout
Service
用のファイルを利用しました。Argo RolloutsでBlue/Green Deploymentを利用する場合、activeService
というServiceを指定する必要があります。指定したServiceが存在しない場合、Rolloutが作成された後もPodが作成されません。
rollout-bg-test.yml
apiVersion: argoproj.io/v1alpha1 kind: Rollout metadata: name: rollout-bg-test spec: replicas: 2 revisionHistoryLimit: 2 selector: matchLabels: app: rollout-bg template: metadata: labels: app: rollout-bg spec: containers: - name: nginx-container image: nginx:latest ports: - containerPort: 80 strategy: blueGreen: autoPromotionEnabled: true activeService: rollout-active-service
rollout-service.yml
apiVersion: v1 kind: Service metadata: name: rollout-active-service spec: ports: - port: 8080 targetPort: 80 protocol: TCP selector: app: rollout-bg
上記2つのリソースをデプロイします。
# Rolloutの作成 $ kubectl apply -f rollout-service.yml service/rollout-active-service created $ kubectl apply -f rollout-bg-test.yml rollout.argoproj.io/rollout-bg-test created # デプロイ後の確認 $ kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.100.0.1 <none> 443/TCP 21h rollout-active-service ClusterIP 10.100.121.64 <none> 8080/TCP 33s $ kubectl get pods NAME READY STATUS RESTARTS AGE rollout-bg-test-797d88cdd8-4ww9s 1/1 Running 0 14s rollout-bg-test-797d88cdd8-stc74 1/1 Running 0 14s $ kubectl get rollout NAME DESIRED CURRENT UP-TO-DATE AVAILABLE rollout-bg-test 2 2 2 2 $ kubectl argo rollouts get rollout rollout-bg-test Name: rollout-bg-test Namespace: default Status: ✔ Healthy Strategy: BlueGreen Images: nginx:latest (active) Replicas: Desired: 2 Current: 2 Updated: 2 Ready: 2 Available: 2 NAME KIND STATUS AGE INFO ⟳ rollout-bg-test Rollout ✔ Healthy 62s └──# revision:1 └──⧉ rollout-bg-test-797d88cdd8 ReplicaSet ✔ Healthy 62s active ├──□ rollout-bg-test-797d88cdd8-4ww9s Pod ✔ Running 62s ready:1/1 └──□ rollout-bg-test-797d88cdd8-stc74 Pod ✔ Running 62s ready:1/1
Rollout
のアップデート
次に、デプロイしたRollout
のイメージタグを変更し、アップデートされる様子を見ていきます。
# 別のウィンドウを開き、事前に実行する $ kubectl argo rollouts get rollout rollout-bg-test -w # イメージタグの変更 $ kubectl argo rollouts set image rollout-bg-test nginx-container=nginx:stable rollout "rollout-bg-test" image updated # アップデートの経過を確認する ## イメージタグ変更直後 Name: rollout-bg-test Namespace: default Status: ✔ Healthy Strategy: BlueGreen Images: nginx:latest (active) Replicas: Desired: 2 Current: 2 Updated: 2 Ready: 2 Available: 2 NAME KIND STATUS AGE INFO ⟳ rollout-bg-test Rollout ✔ Healthy 2m49s └──# revision:1 └──⧉ rollout-bg-test-797d88cdd8 ReplicaSet ✔ Healthy 2m49s active ├──□ rollout-bg-test-797d88cdd8-4ww9s Pod ✔ Running 2m49s ready:1/1 └──□ rollout-bg-test-797d88cdd8-stc74 Pod ✔ Running 2m49s ready:1/1 ## アップデート開始 Name: rollout-bg-test Namespace: default Status: ✔ Healthy Strategy: BlueGreen Images: nginx:latest (active) Replicas: Desired: 2 Current: 2 Updated: 2 Ready: 2 Available: 2 NAME KIND STATUS AGE INFO ⟳ rollout-bg-test Rollout ✔ Healthy 2m50s ├──# revision:2 │ └──⧉ rollout-bg-test-5fd48d44d ReplicaSet ◌ Progressing 0s └──# revision:1 └──⧉ rollout-bg-test-797d88cdd8 ReplicaSet ✔ Healthy 2m50s active ├──□ rollout-bg-test-797d88cdd8-4ww9s Pod ✔ Running 2m50s ready:1/1 └──□ rollout-bg-test-797d88cdd8-stc74 Pod ✔ Running 2m50s ready:1/1 Name: rollout-bg-test Namespace: default Status: ◌ Progressing Strategy: BlueGreen Images: nginx:latest (active) nginx:stable Replicas: Desired: 2 Current: 4 Updated: 2 Ready: 2 Available: 0 NAME KIND STATUS AGE INFO ⟳ rollout-bg-test Rollout ◌ Progressing 2m50s ├──# revision:2 │ └──⧉ rollout-bg-test-5fd48d44d ReplicaSet ◌ Progressing 0s │ ├──□ rollout-bg-test-5fd48d44d-89nnw Pod ◌ ContainerCreating 0s ready:0/1 │ └──□ rollout-bg-test-5fd48d44d-s8ntr Pod ◌ ContainerCreating 0s ready:0/1 └──# revision:1 └──⧉ rollout-bg-test-797d88cdd8 ReplicaSet ✔ Healthy 2m50s active ├──□ rollout-bg-test-797d88cdd8-4ww9s Pod ✔ Running 2m50s ready:1/1 └──□ rollout-bg-test-797d88cdd8-stc74 Pod ✔ Running 2m50s ready:1/1 ## 切り替え完了 ## revision 2のReplicaSetがactiveとなり、revision 1は削除されるまでの時刻がカウントダウンされる Name: rollout-bg-test Namespace: default Status: ◌ Progressing Strategy: BlueGreen Images: nginx:latest nginx:stable (active) Replicas: Desired: 2 Current: 4 Updated: 2 Ready: 4 Available: 2 NAME KIND STATUS AGE INFO ⟳ rollout-bg-test Rollout ◌ Progressing 2m52s ├──# revision:2 │ └──⧉ rollout-bg-test-5fd48d44d ReplicaSet ✔ Healthy 1s active │ ├──□ rollout-bg-test-5fd48d44d-89nnw Pod ✔ Running 1s ready:1/1 │ └──□ rollout-bg-test-5fd48d44d-s8ntr Pod ✔ Running 1s ready:1/1 └──# revision:1 └──⧉ rollout-bg-test-797d88cdd8 ReplicaSet ✔ Healthy 2m52s delay:30s ├──□ rollout-bg-test-797d88cdd8-4ww9s Pod ✔ Running 2m52s ready:1/1 └──□ rollout-bg-test-797d88cdd8-stc74 Pod ✔ Running 2m52s ready:1/1 ## 完了後 Name: rollout-bg-test Namespace: default Status: ✔ Healthy Strategy: BlueGreen Images: nginx:stable (active) Replicas: Desired: 2 Current: 2 Updated: 2 Ready: 2 Available: 2 NAME KIND STATUS AGE INFO ⟳ rollout-bg-test Rollout ✔ Healthy 6m7s ├──# revision:2 │ └──⧉ rollout-bg-test-5fd48d44d ReplicaSet ✔ Healthy 3m16s active │ ├──□ rollout-bg-test-5fd48d44d-89nnw Pod ✔ Running 3m16s ready:1/1 │ └──□ rollout-bg-test-5fd48d44d-s8ntr Pod ✔ Running 3m16s ready:1/1 └──# revision:1 └──⧉ rollout-bg-test-797d88cdd8 ReplicaSet • ScaledDown 6m7s
完了後のリソースは以下の通りです。
$ kubectl get rollout NAME DESIRED CURRENT UP-TO-DATE AVAILABLE rollout-bg-test 2 2 2 2 $ kubectl describe rollout Name: rollout-bg-test Namespace: default Labels: <none> Annotations: rollout.argoproj.io/revision: 2 API Version: argoproj.io/v1alpha1 Kind: Rollout Metadata: Creation Timestamp: 2020-10-10T01:57:48Z Generation: 14 Resource Version: 237409 Self Link: /apis/argoproj.io/v1alpha1/namespaces/default/rollouts/rollout-bg-test UID: d5ba96ef-2292-4f3c-91f1-cd0427c50f59 Spec: Replicas: 2 Revision History Limit: 2 Selector: Match Labels: App: rollout-bg Strategy: Blue Green: Active Service: rollout-active-service Auto Promotion Enabled: true Template: Metadata: Creation Timestamp: <nil> Labels: App: rollout-bg Spec: Containers: Image: nginx:stable Name: nginx-container Ports: Container Port: 80 Resources: Status: HPA Replicas: 2 Available Replicas: 2 Blue Green: Active Selector: 5fd48d44d Canary: Conditions: Last Transition Time: 2020-10-10T01:57:52Z Last Update Time: 2020-10-10T01:57:52Z Message: Rollout has minimum availability Reason: AvailableReason Status: True Type: Available Last Transition Time: 2020-10-10T01:57:48Z Last Update Time: 2020-10-10T02:00:41Z Message: ReplicaSet "rollout-bg-test-5fd48d44d" has successfully progressed. Reason: NewReplicaSetAvailable Status: True Type: Progressing Current Pod Hash: 5fd48d44d Observed Generation: 576f58fbb8 Ready Replicas: 2 Replicas: 2 Selector: app=rollout-bg,rollouts-pod-template-hash=5fd48d44d Stable RS: 5fd48d44d Updated Replicas: 2 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ScalingReplicaSet 7m52s rollouts-controller Scaled up replica set rollout-bg-test-797d88cdd8 to 2 Normal SwitchService 7m48s rollouts-controller Switched selector for service 'rollout-active-service' to value '797d88cdd8' Normal ScalingReplicaSet 5m1s rollouts-controller Scaled up replica set rollout-bg-test-5fd48d44d to 2 Normal SwitchService 4m59s rollouts-controller Switched selector for service 'rollout-active-service' to value '5fd48d44d' Normal ScalingReplicaSet 4m29s rollouts-controller Scaled down replica set rollout-bg-test-797d88cdd8 to 0
AnalysisTemplate
を利用する場合
ここからはRollout
Service
に加えAnalysisTemplate
リソースを作成し、ロールアウト実行時にAnalysisが実行されるようにします。
今回は、Analysisに成功した場合・失敗した場合を見るために、実行後exit 0
を返す(=Analysisに必ず成功する)ようなAnalysisTemplate
を用意します。そして、デプロイ後にkubectl edit
コマンドによってAnalysisTemplate
を編集し、実行後exit 1
を返す(=Analysisに必ず失敗する)ようにして失敗した場合を見てみます。
AnalysisTemplate
は、Rolloutリソース中で宣言をされると、Rolloutのアップデート時に実行されます。実行と書きましたが、実際はAnalysisRun
という、分析を実行するためのCRDが作成され、AnalysisTemplate
に定義された内容を元に分析を実行します。
AnalysisTemplate
で実行する分析では様々な種類のメトリクスを利用することができます。今回はKubernetesリソースの1つであるJobを利用しました。AnalysisRun
によって分析が実行されると、分析を行うためのJobが作成され、Jobが正常に終了すれば成功となります。
Argo RolloutsのBlue/Green Deploymentは、Analysisを実行するタイミングとしてprePromotionAnalysis
postPromotionAnalysis
のどちらかを利用することができ、それぞれ切り替え前・切り替え後に実行をすることができます。今回はpostPromotionAnalysis
を利用し、切り替え後にAnalysisを実行して、失敗した場合は元のバージョンへの切り戻しを行い、古いバージョンへロールバックするようにします。
Rollout
のデプロイ
ここでは以下の3つのファイルを利用します。
rollout-bg-test-analysis.yml
apiVersion: argoproj.io/v1alpha1 kind: Rollout metadata: name: rollout-bg-test-analysis spec: replicas: 2 revisionHistoryLimit: 2 selector: matchLabels: app: rollout-bg-analysis template: metadata: labels: app: rollout-bg-analysis spec: containers: - name: nginx-container image: nginx:latest ports: - containerPort: 80 strategy: blueGreen: activeService: rollout-active-service-analysis postPromotionAnalysis: templates: - templateName: test-analysis
rollout-active-service-analysis.yml
apiVersion: v1 kind: Service metadata: name: rollout-active-service-analysis spec: ports: - port: 8080 targetPort: 80 protocol: TCP selector: app: rollout-bg-analysis
test-analysistemp.yml
kind: AnalysisTemplate apiVersion: argoproj.io/v1alpha1 metadata: name: test-analysis spec: metrics: - name: test-analysis provider: job: spec: template: spec: containers: - name: sleep image: alpine:3.8 command: [sh, -c] args: [exit 0] restartPolicy: Never backoffLimit: 1
上記3つのファイルをデプロイします。
# デプロイ $ kubectl apply -f test-analysistemp.yml analysistemplate.argoproj.io/test-analysis created $ kubectl apply -f rollout-active-service-analysis.yml service/rollout-active-service-analysis created $ kubectl apply -f rollout-bg-test-analysis.yml rollout.argoproj.io/rollout-bg-test-analysis created # デプロイ後の確認 $ kubectl get analysistemplate NAME AGE test-analysis 23s $ kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.100.0.1 <none> 443/TCP 21h rollout-active-service-analysis ClusterIP 10.100.255.31 <none> 8080/TCP 20s $ kubectl get pods NAME READY STATUS RESTARTS AGE rollout-bg-test-analysis-766b7567dc-qgpzx 1/1 Running 0 15s rollout-bg-test-analysis-766b7567dc-tzlft 1/1 Running 0 15s $ kubectl get rollout NAME DESIRED CURRENT UP-TO-DATE AVAILABLE rollout-bg-test-analysis 2 2 2 2 $ kubectl argo rollouts get rollout rollout-bg-test-analysis Name: rollout-bg-test-analysis Namespace: default Status: ✔ Healthy Strategy: BlueGreen Images: nginx:latest (active) Replicas: Desired: 2 Current: 2 Updated: 2 Ready: 2 Available: 2 NAME KIND STATUS AGE INFO ⟳ rollout-bg-test-analysis Rollout ✔ Healthy 26s └──# revision:1 └──⧉ rollout-bg-test-analysis-766b7567dc ReplicaSet ✔ Healthy 26s active ├──□ rollout-bg-test-analysis-766b7567dc-qgpzx Pod ✔ Running 26s ready:1/1 └──□ rollout-bg-test-analysis-766b7567dc-tzlft Pod ✔ Running 26s ready:1/1
Rollout
のアップデート (Analysisに成功した場合)
まずはAnalysisに成功した場合を見てみます。先ほどと同様にイメージタグを更新してみます。
# 事前に別ターミナルで実行 $ kubectl argo rollouts get rollout rollout-bg-test-analysis --watch # イメージタグの更新 $ kubectl argo rollouts set image rollout-bg-test-analysis nginx-container=nginx:stable rollout "rollout-bg-test-analysis" image updated # アップデートの経過を確認する ## 開始直後 Name: rollout-bg-test-analysis Namespace: default Status: ✔ Healthy Strategy: BlueGreen Images: nginx:latest (active) Replicas: Desired: 2 Current: 2 Updated: 2 Ready: 2 Available: 2 NAME KIND STATUS AGE INFO ⟳ rollout-bg-test-analysis Rollout ✔ Healthy 2m9s ├──# revision:2 │ └──⧉ rollout-bg-test-analysis-6bcfbc585f ReplicaSet ◌ Progressing 0s └──# revision:1 └──⧉ rollout-bg-test-analysis-766b7567dc ReplicaSet ✔ Healthy 2m9s active ├──□ rollout-bg-test-analysis-766b7567dc-qgpzx Pod ✔ Running 2m9s ready:1/1 └──□ rollout-bg-test-analysis-766b7567dc-tzlft Pod ✔ Running 2m9s ready:1/1 ## 新規Podの作成 Name: rollout-bg-test-analysis Namespace: default Status: ◌ Progressing Strategy: BlueGreen Images: nginx:latest (active) nginx:stable Replicas: Desired: 2 Current: 4 Updated: 2 Ready: 3 Available: 1 NAME KIND STATUS AGE INFO ⟳ rollout-bg-test-analysis Rollout ◌ Progressing 2m10s ├──# revision:2 │ └──⧉ rollout-bg-test-analysis-6bcfbc585f ReplicaSet ◌ Progressing 0s │ ├──□ rollout-bg-test-analysis-6bcfbc585f-cq2q5 Pod ✔ Running 0s ready:1/1 │ └──□ rollout-bg-test-analysis-6bcfbc585f-dtxdc Pod ✔ Running 0s ready:1/1 └──# revision:1 └──⧉ rollout-bg-test-analysis-766b7567dc ReplicaSet ✔ Healthy 2m10s active ├──□ rollout-bg-test-analysis-766b7567dc-qgpzx Pod ✔ Running 2m10s ready:1/1 └──□ rollout-bg-test-analysis-766b7567dc-tzlft Pod ✔ Running 2m10s ready:1/1 ## トラフィックの切り替え完了とAnalysisの開始 Name: rollout-bg-test-analysis Namespace: default Status: ◌ Progressing Strategy: BlueGreen Images: nginx:latest nginx:stable (active) Replicas: Desired: 2 Current: 4 Updated: 2 Ready: 4 Available: 2 NAME KIND STATUS AGE INFO ⟳ rollout-bg-test-analysis Rollout ◌ Progressing 2m10s ├──# revision:2 │ ├──⧉ rollout-bg-test-analysis-6bcfbc585f ReplicaSet ✔ Healthy 0s active │ │ ├──□ rollout-bg-test-analysis-6bcfbc585f-cq2q5 Pod ✔ Running 0s ready:1/1 │ │ └──□ rollout-bg-test-analysis-6bcfbc585f-dtxdc Pod ✔ Running 0s ready:1/1 │ └──α rollout-bg-test-analysis-6bcfbc585f-2 AnalysisRun ◌ Running 0s │ └──⊞ 471f5e5b-553b-4f94-bae3-cefca88afcd6.test-analysis.1 Job ◌ Running 0s └──# revision:1 └──⧉ rollout-bg-test-analysis-766b7567dc ReplicaSet ✔ Healthy 2m10s delay:30s ├──□ rollout-bg-test-analysis-766b7567dc-qgpzx Pod ✔ Running 2m10s ready:1/1 └──□ rollout-bg-test-analysis-766b7567dc-tzlft Pod ✔ Running 2m10s ready:1/1 ## Analysisの完了(成功) ## AnalysisRunの結果がSuccessfulとなっていることが確認できる Name: rollout-bg-test-analysis Namespace: default Status: ◌ Progressing Strategy: BlueGreen Images: nginx:latest nginx:stable (active) Replicas: Desired: 2 Current: 4 Updated: 2 Ready: 4 Available: 2 NAME KIND STATUS AGE INFO ⟳ rollout-bg-test-analysis Rollout ◌ Progressing 2m11s ├──# revision:2 │ ├──⧉ rollout-bg-test-analysis-6bcfbc585f ReplicaSet ✔ Healthy 1s active │ │ ├──□ rollout-bg-test-analysis-6bcfbc585f-cq2q5 Pod ✔ Running 1s ready:1/1 │ │ └──□ rollout-bg-test-analysis-6bcfbc585f-dtxdc Pod ✔ Running 1s ready:1/1 │ └──α rollout-bg-test-analysis-6bcfbc585f-2 AnalysisRun ✔ Successful 0s ✔ 1 │ └──⊞ 471f5e5b-553b-4f94-bae3-cefca88afcd6.test-analysis.1 Job ✔ Successful 0s └──# revision:1 └──⧉ rollout-bg-test-analysis-766b7567dc ReplicaSet ✔ Healthy 2m11s delay:29s ├──□ rollout-bg-test-analysis-766b7567dc-qgpzx Pod ✔ Running 2m11s ready:1/1 └──□ rollout-bg-test-analysis-766b7567dc-tzlft Pod ✔ Running 2m11s ready:1/1 ## 完了後 Name: rollout-bg-test-analysis Namespace: default Status: ✔ Healthy Strategy: BlueGreen Images: nginx:stable (active) Replicas: Desired: 2 Current: 2 Updated: 2 Ready: 2 Available: 2 NAME KIND STATUS AGE INFO ⟳ rollout-bg-test-analysis Rollout ✔ Healthy 2m45s ├──# revision:2 │ ├──⧉ rollout-bg-test-analysis-6bcfbc585f ReplicaSet ✔ Healthy 35s active │ │ ├──□ rollout-bg-test-analysis-6bcfbc585f-cq2q5 Pod ✔ Running 35s ready:1/1 │ │ └──□ rollout-bg-test-analysis-6bcfbc585f-dtxdc Pod ✔ Running 35s ready:1/1 │ └──α rollout-bg-test-analysis-6bcfbc585f-2 AnalysisRun ✔ Successful 34s ✔ 1 │ └──⊞ 471f5e5b-553b-4f94-bae3-cefca88afcd6.test-analysis.1 Job ✔ Successful 34s └──# revision:1 └──⧉ rollout-bg-test-analysis-766b7567dc ReplicaSet • ScaledDown 2m45s
新しいバージョンのデプロイ後、ロールアウトが完了し、通信が切り替わった後にAnalysisを実行する様子、そしてAnalysisに成功した場合、そのまま新しいバージョンのほうに切り替わったまま、古いバージョンが削除される様子が確認できました。
なお、完了後のリソースは以下の通りです。ロールアウトが実行されることでAnalysisRun
というリソースが作成・実行され、その結果を確認することができます。
$ kubectl get pods NAME READY STATUS RESTARTS AGE 471f5e5b-553b-4f94-bae3-cefca88afcd6.test-analysis.1-8hz2s 0/1 Completed 0 4m26s rollout-bg-test-analysis-6bcfbc585f-cq2q5 1/1 Running 0 4m27s rollout-bg-test-analysis-6bcfbc585f-dtxdc 1/1 Running 0 4m27s $ kubectl get analysisrun NAME STATUS rollout-bg-test-analysis-6bcfbc585f-2 Successful $ kubectl get rollout NAME DESIRED CURRENT UP-TO-DATE AVAILABLE rollout-bg-test-analysis 2 2 2 2 $ kubectl describe analysisrun rollout-bg-test-analysis-6bcfbc585f-2 Name: rollout-bg-test-analysis-6bcfbc585f-2 Namespace: default Labels: rollout-type=PostPromotion rollouts-pod-template-hash=6bcfbc585f Annotations: rollout.argoproj.io/revision: 2 API Version: argoproj.io/v1alpha1 Kind: AnalysisRun Metadata: Creation Timestamp: 2020-10-10T02:27:13Z Generation: 3 Owner References: API Version: argoproj.io/v1alpha1 Block Owner Deletion: true Controller: true Kind: Rollout Name: rollout-bg-test-analysis UID: e9ba7057-29db-491a-8618-13180c406fef Resource Version: 242399 Self Link: /apis/argoproj.io/v1alpha1/namespaces/default/analysisruns/rollout-bg-test-analysis-6bcfbc585f-2 UID: 471f5e5b-553b-4f94-bae3-cefca88afcd6 Spec: Metrics: Name: test-analysis Provider: Job: Metadata: Creation Timestamp: <nil> Spec: Backoff Limit: 1 Template: Metadata: Creation Timestamp: <nil> Spec: Containers: Args: exit 0 Command: sh -c Image: alpine:3.8 Name: sleep Resources: Restart Policy: Never Status: Metric Results: Count: 1 Measurements: Finished At: 2020-10-10T02:27:14Z Metadata: Job - Name: 471f5e5b-553b-4f94-bae3-cefca88afcd6.test-analysis.1 Phase: Successful Started At: 2020-10-10T02:27:13Z Name: test-analysis Phase: Successful Successful: 1 Phase: Successful Started At: 2020-10-10T02:27:13Z Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Complete 20m rollouts-controller metric 'test-analysis' completed Successful Normal Complete 20m rollouts-controller analysis completed Successful $ kubectl describe rollout rollout-bg-test-analysis Name: rollout-bg-test-analysis Namespace: default Labels: <none> Annotations: rollout.argoproj.io/revision: 2 API Version: argoproj.io/v1alpha1 Kind: Rollout Metadata: Creation Timestamp: 2020-10-10T02:25:02Z Generation: 16 Resource Version: 242503 Self Link: /apis/argoproj.io/v1alpha1/namespaces/default/rollouts/rollout-bg-test-analysis UID: e9ba7057-29db-491a-8618-13180c406fef Spec: Replicas: 2 Revision History Limit: 2 Selector: Match Labels: App: rollout-bg-analysis Strategy: Blue Green: Active Service: rollout-active-service-analysis Post Promotion Analysis: Templates: Template Name: test-analysis Template: Metadata: Creation Timestamp: <nil> Labels: App: rollout-bg-analysis Spec: Containers: Image: nginx:stable Name: nginx-container Ports: Container Port: 80 Resources: Status: HPA Replicas: 2 Available Replicas: 2 Blue Green: Active Selector: 6bcfbc585f Canary: Conditions: Last Transition Time: 2020-10-10T02:25:07Z Last Update Time: 2020-10-10T02:25:07Z Message: Rollout has minimum availability Reason: AvailableReason Status: True Type: Available Last Transition Time: 2020-10-10T02:25:02Z Last Update Time: 2020-10-10T02:27:13Z Message: ReplicaSet "rollout-bg-test-analysis-6bcfbc585f" has successfully progressed. Reason: NewReplicaSetAvailable Status: True Type: Progressing Current Pod Hash: 6bcfbc585f Observed Generation: 786976f646 Ready Replicas: 2 Replicas: 2 Selector: app=rollout-bg-analysis,rollouts-pod-template-hash=6bcfbc585f Stable RS: 6bcfbc585f Updated Replicas: 2 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ScalingReplicaSet 7m6s rollouts-controller Scaled up replica set rollout-bg-test-analysis-766b7567dc to 2 Normal SwitchService 7m1s rollouts-controller Switched selector for service 'rollout-active-service-analysis' to value '766b7567dc' Normal ScalingReplicaSet 4m56s rollouts-controller Scaled up replica set rollout-bg-test-analysis-6bcfbc585f to 2 Normal SwitchService 4m55s rollouts-controller Switched selector for service 'rollout-active-service-analysis' to value '6bcfbc585f' Normal AnalysisRunStatusChange 4m55s rollouts-controller PostPromotion Analysis Run 'rollout-bg-test-analysis-6bcfbc585f-2' Status New: '' Previous: 'NoPreviousStatus' Normal AnalysisRunStatusChange 4m55s rollouts-controller PostPromotion Analysis Run 'rollout-bg-test-analysis-6bcfbc585f-2' Status New: 'Running' Previous: '' Normal AnalysisRunStatusChange 4m54s rollouts-controller PostPromotion Analysis Run 'rollout-bg-test-analysis-6bcfbc585f-2' Status New: 'Successful' Previous: 'Running' Normal ScalingReplicaSet 4m25s rollouts-controller Scaled down replica set rollout-bg-test-analysis-766b7567dc to 0
Rollout
のアップデート (Analysisに失敗した場合)
次にAnalysisに失敗した場合を見てみます。
まずはデプロイ済みのAnalysisTemplate
を一部編集し、Analysis実行時に失敗するようにします。
$ kubectl edit analysistemplate test-analysis analysistemplate.argoproj.io/test-analysis edited # Please edit the object below. Lines beginning with a '#' will be ignored, # and an empty file will abort the edit. If an error occurs while saving this file will be # reopened with the relevant failures. # apiVersion: argoproj.io/v1alpha1 kind: AnalysisTemplate metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"argoproj.io/v1alpha1","kind":"AnalysisTemplate","metadata":{"annotations":{},"name":"test-analysis","namespace":"default"},"spec":{"metrics":[{"name":"test-analysis","provider":{"job":{"spec":{"backoffLimit":1,"template":{"spec":{"containers":[{"args":["exit 0"],"command":["sh","-c"],"image":"alpine:3.8","name":"sleep"}],"restartPolicy":"Never"}}}}}}]}} creationTimestamp: "2020-10-10T02:24:45Z" generation: 1 name: test-analysis namespace: default resourceVersion: "241845" selfLink: /apis/argoproj.io/v1alpha1/namespaces/default/analysistemplates/test-analysis uid: 2fc98c6c-f40e-4903-a134-a6430aaa0b0d spec: metrics: - name: test-analysis provider: job: spec: backoffLimit: 1 template: spec: containers: - args: - exit 1 # 変更 command: - sh - -c image: alpine:3.8 name: sleep restartPolicy: Never
編集が完了したら、先ほどと同様にイメージタグの更新を行い、アップデートの様子を確認します。
# 事前に別のターミナルで実行しておく $ kubectl argo rollouts get rollout rollout-bg-test-analysis --watch # イメージタグの更新 $ kubectl argo rollouts set image rollout-bg-test-analysis nginx-container=nginx:1.19 rollout "rollout-bg-test-analysis" image updated # アップデートの様子を確認する ## アップデート直後 Name: rollout-bg-test-analysis Namespace: default Status: ✔ Healthy Strategy: BlueGreen Images: nginx:stable (active) Replicas: Desired: 2 Current: 2 Updated: 2 Ready: 2 Available: 2 NAME KIND STATUS AGE INFO ⟳ rollout-bg-test-analysis Rollout ✔ Healthy 10m ├──# revision:2 │ ├──⧉ rollout-bg-test-analysis-6bcfbc585f ReplicaSet ✔ Healthy 7m53s active │ │ ├──□ rollout-bg-test-analysis-6bcfbc585f-cq2q5 Pod ✔ Running 7m53s ready:1/1 │ │ └──□ rollout-bg-test-analysis-6bcfbc585f-dtxdc Pod ✔ Running 7m53s ready:1/1 │ └──α rollout-bg-test-analysis-6bcfbc585f-2 AnalysisRun ✔ Successful 7m52s ✔ 1 │ └──⊞ 471f5e5b-553b-4f94-bae3-cefca88afcd6.test-analysis.1 Job ✔ Successful 7m52s └──# revision:1 └──⧉ rollout-bg-test-analysis-766b7567dc ReplicaSet • ScaledDown 10m ## アップデート開始 Name: rollout-bg-test-analysis Namespace: default Status: ◌ Progressing Strategy: BlueGreen Images: nginx:stable (active) Replicas: Desired: 2 Current: 2 Updated: 0 Ready: 2 Available: 0 NAME KIND STATUS AGE INFO ⟳ rollout-bg-test-analysis Rollout ◌ Progressing 10m ├──# revision:3 │ └──⧉ rollout-bg-test-analysis-6b7c8784cc ReplicaSet ◌ Progressing 0s │ ├──□ rollout-bg-test-analysis-6b7c8784cc-b29n5 Pod ◌ Pending 0s ready:0/1 │ └──□ rollout-bg-test-analysis-6b7c8784cc-ztnjn Pod ◌ ContainerCreating 0s ready:0/1 ├──# revision:2 │ ├──⧉ rollout-bg-test-analysis-6bcfbc585f ReplicaSet ✔ Healthy 7m56s active │ │ ├──□ rollout-bg-test-analysis-6bcfbc585f-cq2q5 Pod ✔ Running 7m56s ready:1/1 │ │ └──□ rollout-bg-test-analysis-6bcfbc585f-dtxdc Pod ✔ Running 7m56s ready:1/1 │ └──α rollout-bg-test-analysis-6bcfbc585f-2 AnalysisRun ✔ Successful 7m55s ✔ 1 │ └──⊞ 471f5e5b-553b-4f94-bae3-cefca88afcd6.test-analysis.1 Job ✔ Successful 7m55s └──# revision:1 └──⧉ rollout-bg-test-analysis-766b7567dc ReplicaSet • ScaledDown 10m ## 切り替えの完了とAnalysisの開始 ## revision 3のReplicaSetがactiveになっている Name: rollout-bg-test-analysis Namespace: default Status: ◌ Progressing Strategy: BlueGreen Images: nginx:1.19 (active) nginx:stable Replicas: Desired: 2 Current: 4 Updated: 2 Ready: 4 Available: 2 NAME KIND STATUS AGE INFO ⟳ rollout-bg-test-analysis Rollout ◌ Progressing 10m ├──# revision:3 │ ├──⧉ rollout-bg-test-analysis-6b7c8784cc ReplicaSet ✔ Healthy 0s active │ │ ├──□ rollout-bg-test-analysis-6b7c8784cc-b29n5 Pod ✔ Running 0s ready:1/1 │ │ └──□ rollout-bg-test-analysis-6b7c8784cc-ztnjn Pod ✔ Running 0s ready:1/1 │ └──α rollout-bg-test-analysis-6b7c8784cc-3 AnalysisRun ◌ Running 0s │ └──⊞ 0208d4a6-ee21-4ec5-b969-4b75b6784a4b.test-analysis.1 Job ◌ Running 0s ├──# revision:2 │ ├──⧉ rollout-bg-test-analysis-6bcfbc585f ReplicaSet ✔ Healthy 7m57s delay:30s │ │ ├──□ rollout-bg-test-analysis-6bcfbc585f-cq2q5 Pod ✔ Running 7m57s ready:1/1 │ │ └──□ rollout-bg-test-analysis-6bcfbc585f-dtxdc Pod ✔ Running 7m57s ready:1/1 │ └──α rollout-bg-test-analysis-6bcfbc585f-2 AnalysisRun ✔ Successful 7m56s ✔ 1 │ └──⊞ 471f5e5b-553b-4f94-bae3-cefca88afcd6.test-analysis.1 Job ✔ Successful 7m56s └──# revision:1 └──⧉ rollout-bg-test-analysis-766b7567dc ReplicaSet • ScaledDown 10m ## Analysis完了(失敗) Name: rollout-bg-test-analysis Namespace: default Status: ◌ Progressing Strategy: BlueGreen Images: nginx:1.19 (active) nginx:stable Replicas: Desired: 2 Current: 4 Updated: 2 Ready: 4 Available: 2 NAME KIND STATUS AGE INFO ⟳ rollout-bg-test-analysis Rollout ◌ Progressing 10m ├──# revision:3 │ ├──⧉ rollout-bg-test-analysis-6b7c8784cc ReplicaSet ✔ Healthy 12s active │ │ ├──□ rollout-bg-test-analysis-6b7c8784cc-b29n5 Pod ✔ Running 12s ready:1/1 │ │ └──□ rollout-bg-test-analysis-6b7c8784cc-ztnjn Pod ✔ Running 12s ready:1/1 │ └──α rollout-bg-test-analysis-6b7c8784cc-3 AnalysisRun ✖ Failed 11s ✖ 1 │ └──⊞ 0208d4a6-ee21-4ec5-b969-4b75b6784a4b.test-analysis.1 Job ✖ Failed 11s ├──# revision:2 │ ├──⧉ rollout-bg-test-analysis-6bcfbc585f ReplicaSet ✔ Healthy 8m9s delay:18s │ │ ├──□ rollout-bg-test-analysis-6bcfbc585f-cq2q5 Pod ✔ Running 8m9s ready:1/1 │ │ └──□ rollout-bg-test-analysis-6bcfbc585f-dtxdc Pod ✔ Running 8m9s ready:1/1 │ └──α rollout-bg-test-analysis-6bcfbc585f-2 AnalysisRun ✔ Successful 8m8s ✔ 1 │ └──⊞ 471f5e5b-553b-4f94-bae3-cefca88afcd6.test-analysis.1 Job ✔ Successful 8m8s └──# revision:1 └──⧉ rollout-bg-test-analysis-766b7567dc ReplicaSet • ScaledDown 10m ## 完了後 ## revision 2のReplicaSetがactiveとなっている Name: rollout-bg-test-analysis Namespace: default Status: ✖ Degraded Strategy: BlueGreen Images: nginx:1.19 nginx:stable (active) Replicas: Desired: 2 Current: 4 Updated: 2 Ready: 4 Available: 2 NAME KIND STATUS AGE INFO ⟳ rollout-bg-test-analysis Rollout ✖ Degraded 10m ├──# revision:3 │ ├──⧉ rollout-bg-test-analysis-6b7c8784cc ReplicaSet ✔ Healthy 12s │ │ ├──□ rollout-bg-test-analysis-6b7c8784cc-b29n5 Pod ✔ Running 12s ready:1/1 │ │ └──□ rollout-bg-test-analysis-6b7c8784cc-ztnjn Pod ✔ Running 12s ready:1/1 │ └──α rollout-bg-test-analysis-6b7c8784cc-3 AnalysisRun ✖ Failed 11s ✖ 1 │ └──⊞ 0208d4a6-ee21-4ec5-b969-4b75b6784a4b.test-analysis.1 Job ✖ Failed 11s ├──# revision:2 │ ├──⧉ rollout-bg-test-analysis-6bcfbc585f ReplicaSet ✔ Healthy 8m9s active │ │ ├──□ rollout-bg-test-analysis-6bcfbc585f-cq2q5 Pod ✔ Running 8m9s ready:1/1 │ │ └──□ rollout-bg-test-analysis-6bcfbc585f-dtxdc Pod ✔ Running 8m9s ready:1/1 │ └──α rollout-bg-test-analysis-6bcfbc585f-2 AnalysisRun ✔ Successful 8m8s ✔ 1 │ └──⊞ 471f5e5b-553b-4f94-bae3-cefca88afcd6.test-analysis.1 Job ✔ Successful 8m8s └──# revision:1 └──⧉ rollout-bg-test-analysis-766b7567dc ReplicaSet • ScaledDown 10m
上記の通り、新しいバージョンがデプロイされ、一度はそちらのReplicaSetがactive
になったものの、AnalysisRun
が失敗したため、古いバージョンがactive
になったこと、またRolloutのStatusがDegreded
の状態となることが確認できました。
RolloutのStatusをDegreded
からHealthy
に戻すには、元の設定(ここではAnalysisTemplate
の修正)に戻すよう再デプロイする必要があります。
なお、完了後のリソースは以下のようになります。AnalysisRun
がFailed
であること、また失敗したJobが残っていることなどが確認できます。
$ kubectl get pods NAME READY STATUS RESTARTS AGE 0208d4a6-ee21-4ec5-b969-4b75b6784a4b.test-analysis.1-fk7hn 0/1 Error 0 7m23s 0208d4a6-ee21-4ec5-b969-4b75b6784a4b.test-analysis.1-mdfzs 0/1 Error 0 7m25s 471f5e5b-553b-4f94-bae3-cefca88afcd6.test-analysis.1-8hz2s 0/1 Completed 0 15m rollout-bg-test-analysis-6b7c8784cc-b29n5 1/1 Running 0 7m26s rollout-bg-test-analysis-6b7c8784cc-ztnjn 1/1 Running 0 7m26s rollout-bg-test-analysis-6bcfbc585f-cq2q5 1/1 Running 0 15m rollout-bg-test-analysis-6bcfbc585f-dtxdc 1/1 Running 0 15m $ kubectl get analysisrun NAME STATUS rollout-bg-test-analysis-6b7c8784cc-3 Failed rollout-bg-test-analysis-6bcfbc585f-2 Successful $ kubectl get rollout NAME DESIRED CURRENT UP-TO-DATE AVAILABLE rollout-bg-test-analysis 2 4 2 2 $ kubectl describe analysisrun rollout-bg-test-analysis-6b7c8784cc-3 Name: rollout-bg-test-analysis-6b7c8784cc-3 Namespace: default Labels: rollout-type=PostPromotion rollouts-pod-template-hash=6b7c8784cc Annotations: rollout.argoproj.io/revision: 3 API Version: argoproj.io/v1alpha1 Kind: AnalysisRun Metadata: Creation Timestamp: 2020-10-10T02:35:10Z Generation: 3 Owner References: API Version: argoproj.io/v1alpha1 Block Owner Deletion: true Controller: true Kind: Rollout Name: rollout-bg-test-analysis UID: e9ba7057-29db-491a-8618-13180c406fef Resource Version: 243985 Self Link: /apis/argoproj.io/v1alpha1/namespaces/default/analysisruns/rollout-bg-test-analysis-6b7c8784cc-3 UID: 0208d4a6-ee21-4ec5-b969-4b75b6784a4b Spec: Metrics: Name: test-analysis Provider: Job: Metadata: Creation Timestamp: <nil> Spec: Backoff Limit: 1 Template: Metadata: Creation Timestamp: <nil> Spec: Containers: Args: exit 1 Command: sh -c Image: alpine:3.8 Name: sleep Resources: Restart Policy: Never Status: Message: metric "test-analysis" assessed Failed due to failed (1) > failureLimit (0) Metric Results: Count: 1 Failed: 1 Measurements: Finished At: 2020-10-10T02:35:22Z Metadata: Job - Name: 0208d4a6-ee21-4ec5-b969-4b75b6784a4b.test-analysis.1 Phase: Failed Started At: 2020-10-10T02:35:10Z Name: test-analysis Phase: Failed Phase: Failed Started At: 2020-10-10T02:35:10Z Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning Failed 8m27s rollouts-controller metric 'test-analysis' completed Failed Warning Failed 8m27s rollouts-controller analysis completed Failed $ kubectl describe rollout rollout-bg-test-analysis Name: rollout-bg-test-analysis Namespace: default Labels: <none> Annotations: rollout.argoproj.io/revision: 3 API Version: argoproj.io/v1alpha1 Kind: Rollout Metadata: Creation Timestamp: 2020-10-10T02:25:02Z Generation: 27 Resource Version: 244302 Self Link: /apis/argoproj.io/v1alpha1/namespaces/default/rollouts/rollout-bg-test-analysis UID: e9ba7057-29db-491a-8618-13180c406fef Spec: Replicas: 2 Revision History Limit: 2 Selector: Match Labels: App: rollout-bg-analysis Strategy: Blue Green: Active Service: rollout-active-service-analysis Post Promotion Analysis: Templates: Template Name: test-analysis Template: Metadata: Creation Timestamp: <nil> Labels: App: rollout-bg-analysis Spec: Containers: Image: nginx:1.19 Name: nginx-container Ports: Container Port: 80 Resources: Status: HPA Replicas: 2 Abort: true Aborted At: 2020-10-10T02:37:02Z Available Replicas: 2 Blue Green: Active Selector: 6bcfbc585f Post Promotion Analysis Run: rollout-bg-test-analysis-6b7c8784cc-3 Post Promotion Analysis Run Status: Message: metric "test-analysis" assessed Failed due to failed (1) > failureLimit (0) Name: rollout-bg-test-analysis-6b7c8784cc-3 Status: Failed Canary: Conditions: Last Transition Time: 2020-10-10T02:25:07Z Last Update Time: 2020-10-10T02:25:07Z Message: Rollout has minimum availability Reason: AvailableReason Status: True Type: Available Last Transition Time: 2020-10-10T02:35:22Z Last Update Time: 2020-10-10T02:35:22Z Message: metric "test-analysis" assessed Failed due to failed (1) > failureLimit (0) Reason: RolloutAborted Status: False Type: Progressing Current Pod Hash: 6b7c8784cc Observed Generation: 57cf5bd85b Ready Replicas: 4 Replicas: 4 Selector: app=rollout-bg-analysis,rollouts-pod-template-hash=6bcfbc585f Stable RS: 6bcfbc585f Updated Replicas: 2 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ScalingReplicaSet 19m rollouts-controller Scaled up replica set rollout-bg-test-analysis-766b7567dc to 2 Normal SwitchService 19m rollouts-controller Switched selector for service 'rollout-active-service-analysis' to value '766b7567dc' Normal ScalingReplicaSet 17m rollouts-controller Scaled up replica set rollout-bg-test-analysis-6bcfbc585f to 2 Normal AnalysisRunStatusChange 17m rollouts-controller PostPromotion Analysis Run 'rollout-bg-test-analysis-6bcfbc585f-2' Status New: 'Running' Previous: '' Normal AnalysisRunStatusChange 17m rollouts-controller PostPromotion Analysis Run 'rollout-bg-test-analysis-6bcfbc585f-2' Status New: '' Previous: 'NoPreviousStatus' Normal AnalysisRunStatusChange 17m rollouts-controller PostPromotion Analysis Run 'rollout-bg-test-analysis-6bcfbc585f-2' Status New: 'Successful' Previous: 'Running' Normal ScalingReplicaSet 17m rollouts-controller Scaled down replica set rollout-bg-test-analysis-766b7567dc to 0 Normal ScalingReplicaSet 9m51s rollouts-controller Scaled up replica set rollout-bg-test-analysis-6b7c8784cc to 2 Normal SwitchService 9m50s rollouts-controller Switched selector for service 'rollout-active-service-analysis' to value '6b7c8784cc' Normal AnalysisRunStatusChange 9m50s rollouts-controller PostPromotion Analysis Run 'rollout-bg-test-analysis-6b7c8784cc-3' Status New: '' Previous: 'NoPreviousStatus' Normal AnalysisRunStatusChange 9m50s rollouts-controller PostPromotion Analysis Run 'rollout-bg-test-analysis-6b7c8784cc-3' Status New: 'Running' Previous: '' Normal SwitchService 9m38s (x2 over 17m) rollouts-controller Switched selector for service 'rollout-active-service-analysis' to value '6bcfbc585f' Warning AnalysisRunStatusChange 9m38s rollouts-controller PostPromotion Analysis Run 'rollout-bg-test-analysis-6b7c8784cc-3' Status New: 'Failed' Previous: 'Running'
実際のアップデート時に失敗しそうなケースを見てみる
上記ではAnalysisTemplate
を利用し、条件を満たさない場合に自動的にロールバックする様子を見ました。ここからは、実際にKubernetes上で動かすアプリケーションに対してアップデートを行った時、アップデートが失敗する原因となりうる2つのケースについて、追検証をしてみました。
コンテナ起動に失敗した場合
1つ目は、コンテナの起動に失敗した場合です。今回は、わざと起動に失敗するようなコンテナイメージを用意し、イメージ更新時にそのイメージを指定して、どのような挙動を取るかを確認しました。
コンテナの起動に失敗するように、以下のDockerfile
を利用しました。ここでは、存在しないファイルに対してhead
コマンドを実行しています。
Dockerfile
FROM nginx:latest CMD head /foo/bar
Dockerfileを用いてビルドを行った後、Amazon ECRへプッシュをして、EKSからそのイメージを利用する形で検証をしました。
# コンテナイメージのビルド・Push $ docker build -t test/test03:fail . $ docker tag test/test03:fail 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:fail $ docker push 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:fail
また、起動に成功するようなコンテナイメージも、合わせて用意しておきます。
Dockerfile
FROM nginx:latest
# コンテナイメージのビルド・Push $ docker build -t test/test03:success . $ docker tag test/test03:success 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success $ docker push 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success
今回の検証で利用したマニフェストファイルは以下の通りです。
rollout-bg-test-start-fail.yml
apiVersion: argoproj.io/v1alpha1 kind: Rollout metadata: name: rollout-bg-test-start-fail spec: replicas: 2 revisionHistoryLimit: 2 selector: matchLabels: app: rollout-bg-analysis-start-fail template: metadata: labels: app: rollout-bg-analysis-start-fail spec: containers: - name: nginx-container image: 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success ports: - containerPort: 80 strategy: blueGreen: activeService: rollout-active-service-start-fail postPromotionAnalysis: templates: - templateName: test-analysis
rollout-active-service-start-fail.yml
apiVersion: v1 kind: Service metadata: name: rollout-active-service-start-fail spec: ports: - port: 8080 targetPort: 80 protocol: TCP selector: app: rollout-bg-analysis-start-fail
まずは上記Yamlファイル、そしてtest-analysis
を含むAnalysisTemplate
を作成しておきます。
$ kubectl apply -f test-analysistemp.yml analysistemplate.argoproj.io/test-analysis created $ kubectl apply -f rollout-active-service-start-fail.yml service/rollout-active-service-start-fail created $ kubectl apply -f rollout-bg-test-start-fail.yml rollout.argoproj.io/rollout-bg-test-start-fail created # デプロイ後の確認 $ kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.100.0.1 <none> 443/TCP 85m rollout-active-service-start-fail ClusterIP 10.100.32.113 <none> 8080/TCP 104s $ kubectl get pods NAME READY STATUS RESTARTS AGE rollout-bg-test-start-fail-8cdb4dcc6-cnjlt 1/1 Running 0 13s rollout-bg-test-start-fail-8cdb4dcc6-phsgq 1/1 Running 0 13s $ kubectl get rollout NAME DESIRED CURRENT UP-TO-DATE AVAILABLE rollout-bg-test-start-fail 2 2 2 2 $ kubectl argo rollouts get rollout rollout-bg-test-start-fail Name: rollout-bg-test-start-fail Namespace: default Status: ✔ Healthy Strategy: BlueGreen Images: 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success (active) Replicas: Desired: 2 Current: 2 Updated: 2 Ready: 2 Available: 2 NAME KIND STATUS AGE INFO ⟳ rollout-bg-test-start-fail Rollout ✔ Healthy 11m └──# revision:1 └──⧉ rollout-bg-test-start-fail-8cdb4dcc6 ReplicaSet ✔ Healthy 11m active ├──□ rollout-bg-test-start-fail-8cdb4dcc6-cnjlt Pod ✔ Running 11m ready:1/1 └──□ rollout-bg-test-start-fail-8cdb4dcc6-phsgq Pod ✔ Running 11m ready:1/1
次に、イメージの更新を実行し、アップデートの様子を確認します。
# 別のターミナルで実行 $ kubectl argo rollouts get rollout rollout-bg-test-start-fail --watch # イメージの更新 $ kubectl argo rollouts set image rollout-bg-test-start-fail nginx-container=111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:fail rollout "rollout-bg-test-start-fail" image updated # アップデートの様子を確認 ## アップデート直後 Name: rollout-bg-test-start-fail Namespace: default Status: ✔ Healthy Strategy: BlueGreen Images: 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success (active) Replicas: Desired: 2 Current: 2 Updated: 2 Ready: 2 Available: 2 NAME KIND STATUS AGE INFO ⟳ rollout-bg-test-start-fail Rollout ✔ Healthy 12m └──# revision:1 └──⧉ rollout-bg-test-start-fail-8cdb4dcc6 ReplicaSet ✔ Healthy 12m active ├──□ rollout-bg-test-start-fail-8cdb4dcc6-cnjlt Pod ✔ Running 12m ready:1/1 └──□ rollout-bg-test-start-fail-8cdb4dcc6-phsgq Pod ✔ Running 12m ready:1/1 ## 新規コンテナの作成開始 Name: rollout-bg-test-start-fail Namespace: default Status: ◌ Progressing Strategy: BlueGreen Images: 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:fail 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success (active) Replicas: Desired: 2 Current: 4 Updated: 2 Ready: 2 Available: 2 NAME KIND STATUS AGE INFO ⟳ rollout-bg-test-start-fail Rollout ◌ Progressing 12m ├──# revision:2 │ └──⧉ rollout-bg-test-start-fail-7fb74fd5f5 ReplicaSet ◌ Progressing 0s │ ├──□ rollout-bg-test-start-fail-7fb74fd5f5-948tm Pod ◌ ContainerCreating 0s ready:0/1 │ └──□ rollout-bg-test-start-fail-7fb74fd5f5-mr4n4 Pod ◌ ContainerCreating 0s ready:0/1 └──# revision:1 └──⧉ rollout-bg-test-start-fail-8cdb4dcc6 ReplicaSet ✔ Healthy 12m active ├──□ rollout-bg-test-start-fail-8cdb4dcc6-cnjlt Pod ✔ Running 12m ready:1/1 └──□ rollout-bg-test-start-fail-8cdb4dcc6-phsgq Pod ✔ Running 12m ready:1/1 ## 作成に失敗 Name: rollout-bg-test-start-fail Namespace: default Status: ◌ Progressing Strategy: BlueGreen Images: 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:fail 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success (active) Replicas: Desired: 2 Current: 4 Updated: 2 Ready: 2 Available: 2 NAME KIND STATUS AGE INFO ⟳ rollout-bg-test-start-fail Rollout ◌ Progressing 12m ├──# revision:2 │ └──⧉ rollout-bg-test-start-fail-7fb74fd5f5 ReplicaSet ◌ Progressing 1s │ ├──□ rollout-bg-test-start-fail-7fb74fd5f5-948tm Pod ⚠ Error 1s ready:0/1 │ └──□ rollout-bg-test-start-fail-7fb74fd5f5-mr4n4 Pod ◌ ContainerCreating 1s ready:0/1 └──# revision:1 └──⧉ rollout-bg-test-start-fail-8cdb4dcc6 ReplicaSet ✔ Healthy 12m active ├──□ rollout-bg-test-start-fail-8cdb4dcc6-cnjlt Pod ✔ Running 12m ready:1/1 └──□ rollout-bg-test-start-fail-8cdb4dcc6-phsgq Pod ✔ Running 12m ready:1/1 Name: rollout-bg-test-start-fail Namespace: default Status: ◌ Progressing Strategy: BlueGreen Images: 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:fail 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success (active) Replicas: Desired: 2 Current: 4 Updated: 2 Ready: 2 Available: 2 NAME KIND STATUS AGE INFO ⟳ rollout-bg-test-start-fail Rollout ◌ Progressing 12m ├──# revision:2 │ └──⧉ rollout-bg-test-start-fail-7fb74fd5f5 ReplicaSet ◌ Progressing 1s │ ├──□ rollout-bg-test-start-fail-7fb74fd5f5-948tm Pod ⚠ Error 1s ready:0/1 │ └──□ rollout-bg-test-start-fail-7fb74fd5f5-mr4n4 Pod ⚠ Error 1s ready:0/1 └──# revision:1 └──⧉ rollout-bg-test-start-fail-8cdb4dcc6 ReplicaSet ✔ Healthy 12m active ├──□ rollout-bg-test-start-fail-8cdb4dcc6-cnjlt Pod ✔ Running 12m ready:1/1 └──□ rollout-bg-test-start-fail-8cdb4dcc6-phsgq Pod ✔ Running 12m ready:1/1 Name: rollout-bg-test-start-fail Namespace: default Status: ◌ Progressing Strategy: BlueGreen Images: 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:fail 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success (active) Replicas: Desired: 2 Current: 4 Updated: 2 Ready: 2 Available: 2 NAME KIND STATUS AGE INFO ⟳ rollout-bg-test-start-fail Rollout ◌ Progressing 12m ├──# revision:2 │ └──⧉ rollout-bg-test-start-fail-7fb74fd5f5 ReplicaSet ◌ Progressing 3s │ ├──□ rollout-bg-test-start-fail-7fb74fd5f5-948tm Pod ✖ CrashLoopBackOff 3s ready:0/1,restarts:1 │ └──□ rollout-bg-test-start-fail-7fb74fd5f5-mr4n4 Pod ⚠ Error 3s ready:0/1,restarts:1 └──# revision:1 └──⧉ rollout-bg-test-start-fail-8cdb4dcc6 ReplicaSet ✔ Healthy 12m active ├──□ rollout-bg-test-start-fail-8cdb4dcc6-cnjlt Pod ✔ Running 12m ready:1/1 └──□ rollout-bg-test-start-fail-8cdb4dcc6-phsgq Pod ✔ Running 12m ready:1/1 Name: rollout-bg-test-start-fail Namespace: default Status: ◌ Progressing Strategy: BlueGreen Images: 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:fail 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success (active) Replicas: Desired: 2 Current: 4 Updated: 2 Ready: 2 Available: 2 NAME KIND STATUS AGE INFO ⟳ rollout-bg-test-start-fail Rollout ◌ Progressing 12m ├──# revision:2 │ └──⧉ rollout-bg-test-start-fail-7fb74fd5f5 ReplicaSet ◌ Progressing 3s │ ├──□ rollout-bg-test-start-fail-7fb74fd5f5-948tm Pod ✖ CrashLoopBackOff 3s ready:0/1,restarts:1 │ └──□ rollout-bg-test-start-fail-7fb74fd5f5-mr4n4 Pod ✖ CrashLoopBackOff 3s ready:0/1,restarts:1 └──# revision:1 └──⧉ rollout-bg-test-start-fail-8cdb4dcc6 ReplicaSet ✔ Healthy 12m active ├──□ rollout-bg-test-start-fail-8cdb4dcc6-cnjlt Pod ✔ Running 12m ready:1/1 └──□ rollout-bg-test-start-fail-8cdb4dcc6-phsgq Pod ✔ Running 12m ready:1/1 ## しばらくするとDegradedになる Name: rollout-bg-test-start-fail Namespace: default Status: ✖ Degraded Strategy: BlueGreen Images: 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:fail 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success (active) Replicas: Desired: 2 Current: 4 Updated: 2 Ready: 2 Available: 2 NAME KIND STATUS AGE INFO ⟳ rollout-bg-test-start-fail Rollout ✖ Degraded 25m ├──# revision:2 │ └──⧉ rollout-bg-test-start-fail-7fb74fd5f5 ReplicaSet ◌ Progressing 13m │ ├──□ rollout-bg-test-start-fail-7fb74fd5f5-948tm Pod ✖ CrashLoopBackOff 13m ready:0/1,restarts:7 │ └──□ rollout-bg-test-start-fail-7fb74fd5f5-mr4n4 Pod ✖ CrashLoopBackOff 13m ready:0/1,restarts:7 └──# revision:1 └──⧉ rollout-bg-test-start-fail-8cdb4dcc6 ReplicaSet ✔ Healthy 25m active ├──□ rollout-bg-test-start-fail-8cdb4dcc6-cnjlt Pod ✔ Running 25m ready:1/1 └──□ rollout-bg-test-start-fail-8cdb4dcc6-phsgq Pod ✔ Running 25m ready:1/1
上記の通り、コンテナの起動に失敗した場合は、Analysisが実行される前に起動に失敗するため、新しいバージョンへの切り替えは発生しませんでした。StatusはDegraded
となってしまいますが、コンテナの起動に失敗した場合は、アプリケーションの稼働時間に対しての影響はなさそうに見えます。
なお、RolloutのStatusはDegraded
になるため、元のコンテナイメージを用いて再デプロイをすることでHealthy
にすることができます。
Liveness Probeに失敗し続けた場合
次にLiveness Probeで失敗する場合について見ていきます。今回は以下のようなマニフェストファイルを用意し、あとからLiveness Probeの条件を変更することで、Probeに失敗する状況を作っています。
なお、今回はLiveness Probeの設定(initialDelaySeconds
periodSeconds
)により、Liveness Probeより先にAnalysisRun
が起動する形となっております。
rollout-bg-test-liveness-fail.yml
apiVersion: argoproj.io/v1alpha1 kind: Rollout metadata: name: rollout-bg-test-liveness-fail spec: replicas: 2 revisionHistoryLimit: 2 selector: matchLabels: app: rollout-bg-analysis-liveness-fail template: metadata: labels: app: rollout-bg-analysis-liveness-fail spec: containers: - name: nginx-container image: 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success ports: - containerPort: 80 livenessProbe: tcpSocket: port: 80 initialDelaySeconds: 5 periodSeconds: 5 timeoutSeconds: 1 successThreshold: 1 failureThreshold: 1 strategy: blueGreen: activeService: rollout-active-service-liveness-fail postPromotionAnalysis: templates: - templateName: test-analysis
rollout-active-service-liveness-fail.yml
apiVersion: v1 kind: Service metadata: name: rollout-active-service-liveness-fail spec: ports: - port: 8080 targetPort: 80 protocol: TCP selector: app: rollout-bg-analysis-liveness-fail
上記2つのYamlファイル、そしてtest-analysis
を含むAnalysisTemplate
を作成しておきます。
$ kubectl apply -f rollout-active-service-liveness-fail.yml service/rollout-active-service-liveness-fail created $ kubectl apply -f rollout-bg-test-liveness-fail.yml rollout.argoproj.io/rollout-bg-test-liveness-fail created # デプロイ後の確認 $ kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.100.0.1 <none> 443/TCP 150m rollout-active-service-liveness-fail ClusterIP 10.100.21.204 <none> 8080/TCP 9m1s $ kubectl get pods NAME READY STATUS RESTARTS AGE rollout-bg-test-liveness-fail-54495f7df4-df2dk 1/1 Running 0 15s rollout-bg-test-liveness-fail-54495f7df4-svb25 1/1 Running 0 15s $ kubectl get rollout NAME DESIRED CURRENT UP-TO-DATE AVAILABLE rollout-bg-test-liveness-fail 2 2 2 2 $ kubectl argo rollouts get rollout rollout-bg-test-liveness-fail Name: rollout-bg-test-liveness-fail Namespace: default Status: ✔ Healthy Strategy: BlueGreen Images: 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success (active) Replicas: Desired: 2 Current: 2 Updated: 2 Ready: 2 Available: 2 NAME KIND STATUS AGE INFO ⟳ rollout-bg-test-liveness-fail Rollout ✔ Healthy 2m37s └──# revision:1 └──⧉ rollout-bg-test-liveness-fail-54495f7df4 ReplicaSet ✔ Healthy 2m36s active ├──□ rollout-bg-test-liveness-fail-54495f7df4-df2dk Pod ✔ Running 2m36s ready:1/1 └──□ rollout-bg-test-liveness-fail-54495f7df4-svb25 Pod ✔ Running 2m36s ready:1/1
次にLiveness Probeの内容を一部変更し、それによるアップデートの推移を確認してみます。
# 別のターミナルで実行 $ kubectl argo rollouts get rollout rollout-bg-test-liveness-fail --watch # Rolloutの編集 $ kubectl edit rollout rollout-bg-test-liveness-fail rollout.argoproj.io/rollout-bg-test-liveness-fail edited # Please edit the object below. Lines beginning with a '#' will be ignored, # and an empty file will abort the edit. If an error occurs while saving this file will be # reopened with the relevant failures. # apiVersion: argoproj.io/v1alpha1 kind: Rollout metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"argoproj.io/v1alpha1","kind":"Rollout","metadata":{"annotations":{},"name":"rollout-bg-test-liveness-fail","namespace":"default"},"spec":{"replicas":2,"revisionHistoryLimit":2,"selector":{"matchLabels":{"app":"rollout-bg-analysis-liveness-fail"}},"strategy":{"blueGreen":{"activeService":"rollout-active-service-liveness-fail","postPromotionAnalysis":{"templates":[{"templateName":"test-analysis"}]}}},"template":{"metadata":{"labels":{"app":"rollout-bg-analysis-liveness-fail"}},"spec":{"containers":[{"image":"111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success","livenessProbe":{"failureThreshold":1,"initialDelaySeconds":5,"periodSeconds":5,"successThreshold":1,"tcpSocket":{"port":80},"timeoutSeconds":1},"name":"nginx-container","ports":[{"containerPort":80}]}]}}}} rollout.argoproj.io/revision: "1" creationTimestamp: "2020-10-12T06:52:16Z" generation: 6 name: rollout-bg-test-liveness-fail namespace: default resourceVersion: "28459" selfLink: /apis/argoproj.io/v1alpha1/namespaces/default/rollouts/rollout-bg-test-liveness-fail uid: 70fa4d0b-fb3a-46c6-8bfd-321e2d829694 spec: replicas: 2 revisionHistoryLimit: 2 selector: matchLabels: app: rollout-bg-analysis-liveness-fail strategy: blueGreen: activeService: rollout-active-service-liveness-fail postPromotionAnalysis: templates: - templateName: test-analysis template: metadata: creationTimestamp: null labels: app: rollout-bg-analysis-liveness-fail spec: containers: - image: 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success livenessProbe: failureThreshold: 1 initialDelaySeconds: 5 periodSeconds: 5 successThreshold: 1 tcpSocket: port: 8080 # 変更 timeoutSeconds: 1 name: nginx-container ports: - containerPort: 80 resources: {} status: HPAReplicas: 2 availableReplicas: 2 blueGreen: activeSelector: 54495f7df4 canary: {} conditions: - lastTransitionTime: "2020-10-12T06:52:17Z" lastUpdateTime: "2020-10-12T06:52:18Z" message: ReplicaSet "rollout-bg-test-liveness-fail-54495f7df4" has successfully progressed. reason: NewReplicaSetAvailable status: "True" type: Progressing - lastTransitionTime: "2020-10-12T06:52:18Z" lastUpdateTime: "2020-10-12T06:52:18Z" message: Rollout has minimum availability reason: AvailableReason status: "True" type: Available currentPodHash: 54495f7df4 observedGeneration: 75d6d6f664 readyReplicas: 2 replicas: 2 selector: app=rollout-bg-analysis-liveness-fail,rollouts-pod-template-hash=54495f7df4 stableRS: 54495f7df4 updatedReplicas: 2 # アップデートの様子を確認 ## コンテナの作成開始 Name: rollout-bg-test-liveness-fail Namespace: default Status: ◌ Progressing Strategy: BlueGreen Images: 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success (active) Replicas: Desired: 2 Current: 4 Updated: 2 Ready: 2 Available: 2 NAME KIND STATUS AGE INFO ⟳ rollout-bg-test-liveness-fail Rollout ◌ Progressing 14m ├──# revision:2 │ └──⧉ rollout-bg-test-liveness-fail-7bb5898d6 ReplicaSet ◌ Progressing 0s │ ├──□ rollout-bg-test-liveness-fail-7bb5898d6-tw8lt Pod ◌ ContainerCreating 0s ready:0/1 │ └──□ rollout-bg-test-liveness-fail-7bb5898d6-wb944 Pod ◌ ContainerCreating 0s ready:0/1 └──# revision:1 └──⧉ rollout-bg-test-liveness-fail-54495f7df4 ReplicaSet ✔ Healthy 14m active ├──□ rollout-bg-test-liveness-fail-54495f7df4-df2dk Pod ✔ Running 14m ready:1/1 └──□ rollout-bg-test-liveness-fail-54495f7df4-svb25 Pod ✔ Running 14m ready:1/1 ## コンテナの作成完了とAnalysisの開始 Name: rollout-bg-test-liveness-fail Namespace: default Status: ◌ Progressing Strategy: BlueGreen Images: 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success (active) Replicas: Desired: 2 Current: 4 Updated: 2 Ready: 2 Available: 2 NAME KIND STATUS AGE INFO ⟳ rollout-bg-test-liveness-fail Rollout ◌ Progressing 14m ├──# revision:2 │ ├──⧉ rollout-bg-test-liveness-fail-7bb5898d6 ReplicaSet ✔ Healthy 1s active │ │ ├──□ rollout-bg-test-liveness-fail-7bb5898d6-tw8lt Pod ✔ Running 1s ready:1/1 │ │ └──□ rollout-bg-test-liveness-fail-7bb5898d6-wb944 Pod ✔ Running 1s ready:1/1 │ └──α rollout-bg-test-liveness-fail-7bb5898d6-2-post AnalysisRun ◌ Running 0s │ └──⊞ 3bf49f08-e851-4185-8cbf-88886c0da2ec.test-analysis.1 Job ◌ Running 0s └──# revision:1 └──⧉ rollout-bg-test-liveness-fail-54495f7df4 ReplicaSet ✔ Healthy 14m delay:29s ├──□ rollout-bg-test-liveness-fail-54495f7df4-df2dk Pod ✔ Running 14m ready:1/1 └──□ rollout-bg-test-liveness-fail-54495f7df4-svb25 Pod ✔ Running 14m ready:1/1 ## Analysisの完了(成功) Name: rollout-bg-test-liveness-fail Namespace: default Status: ◌ Progressing Strategy: BlueGreen Images: 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success (active) Replicas: Desired: 2 Current: 4 Updated: 2 Ready: 2 Available: 2 NAME KIND STATUS AGE INFO ⟳ rollout-bg-test-liveness-fail Rollout ◌ Progressing 14m ├──# revision:2 │ ├──⧉ rollout-bg-test-liveness-fail-7bb5898d6 ReplicaSet ✔ Healthy 5s active │ │ ├──□ rollout-bg-test-liveness-fail-7bb5898d6-tw8lt Pod ✔ Running 5s ready:1/1 │ │ └──□ rollout-bg-test-liveness-fail-7bb5898d6-wb944 Pod ✔ Running 5s ready:1/1 │ └──α rollout-bg-test-liveness-fail-7bb5898d6-2-post AnalysisRun ✔ Successful 4s ✔ 1 │ └──⊞ 3bf49f08-e851-4185-8cbf-88886c0da2ec.test-analysis.1 Job ✔ Successful 4s └──# revision:1 └──⧉ rollout-bg-test-liveness-fail-54495f7df4 ReplicaSet ✔ Healthy 14m delay:25s ├──□ rollout-bg-test-liveness-fail-54495f7df4-df2dk Pod ✔ Running 14m ready:1/1 └──□ rollout-bg-test-liveness-fail-54495f7df4-svb25 Pod ✔ Running 14m ready:1/1 ## コンテナのRestart Name: rollout-bg-test-liveness-fail Namespace: default Status: ◌ Progressing Strategy: BlueGreen Images: 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success (active) Replicas: Desired: 2 Current: 4 Updated: 2 Ready: 2 Available: 2 NAME KIND STATUS AGE INFO ⟳ rollout-bg-test-liveness-fail Rollout ◌ Progressing 14m ├──# revision:2 │ ├──⧉ rollout-bg-test-liveness-fail-7bb5898d6 ReplicaSet ✔ Healthy 8s active │ │ ├──□ rollout-bg-test-liveness-fail-7bb5898d6-tw8lt Pod ✔ Running 8s ready:1/1,restarts:1 │ │ └──□ rollout-bg-test-liveness-fail-7bb5898d6-wb944 Pod ✔ Running 8s ready:1/1,restarts:1 │ └──α rollout-bg-test-liveness-fail-7bb5898d6-2-post AnalysisRun ✔ Successful 7s ✔ 1 │ └──⊞ 3bf49f08-e851-4185-8cbf-88886c0da2ec.test-analysis.1 Job ✔ Successful 7s └──# revision:1 └──⧉ rollout-bg-test-liveness-fail-54495f7df4 ReplicaSet ✔ Healthy 14m delay:22s ├──□ rollout-bg-test-liveness-fail-54495f7df4-df2dk Pod ✔ Running 14m ready:1/1 └──□ rollout-bg-test-liveness-fail-54495f7df4-svb25 Pod ✔ Running 14m ready:1/1 ## 以降はコンテナの再起動と失敗を繰り返す Name: rollout-bg-test-liveness-fail Namespace: default Status: ◌ Progressing Strategy: BlueGreen Images: 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success (active) Replicas: Desired: 2 Current: 2 Updated: 2 Ready: 1 Available: 1 NAME KIND STATUS AGE INFO ⟳ rollout-bg-test-liveness-fail Rollout ◌ Progressing 14m ├──# revision:2 │ ├──⧉ rollout-bg-test-liveness-fail-7bb5898d6 ReplicaSet ◌ Progressing 31s active │ │ ├──□ rollout-bg-test-liveness-fail-7bb5898d6-tw8lt Pod ✔ Running 31s ready:1/1,restarts:3 │ │ └──□ rollout-bg-test-liveness-fail-7bb5898d6-wb944 Pod ✖ CrashLoopBackOff 31s ready:0/1,restarts:2 │ └──α rollout-bg-test-liveness-fail-7bb5898d6-2-post AnalysisRun ✔ Successful 30s ✔ 1 │ └──⊞ 3bf49f08-e851-4185-8cbf-88886c0da2ec.test-analysis.1 Job ✔ Successful 30s └──# revision:1 └──⧉ rollout-bg-test-liveness-fail-54495f7df4 ReplicaSet • ScaledDown 14m ├──□ rollout-bg-test-liveness-fail-54495f7df4-df2dk Pod ◌ Terminating 14m ready:0/1 └──□ rollout-bg-test-liveness-fail-54495f7df4-svb25 Pod ◌ Terminating 14m ready:0/1 Name: rollout-bg-test-liveness-fail Namespace: default Status: ◌ Progressing Strategy: BlueGreen Images: 111111111111.dkr.ecr.ap-northeast-1.amazonaws.com/test/test03:success (active) Replicas: Desired: 2 Current: 2 Updated: 2 Ready: 1 Available: 1 NAME KIND STATUS AGE INFO ⟳ rollout-bg-test-liveness-fail Rollout ◌ Progressing 15m ├──# revision:2 │ ├──⧉ rollout-bg-test-liveness-fail-7bb5898d6 ReplicaSet ◌ Progressing 43s active │ │ ├──□ rollout-bg-test-liveness-fail-7bb5898d6-tw8lt Pod ✖ CrashLoopBackOff 43s ready:0/1,restarts:3 │ │ └──□ rollout-bg-test-liveness-fail-7bb5898d6-wb944 Pod ✖ CrashLoopBackOff 43s ready:0/1,restarts:4 │ └──α rollout-bg-test-liveness-fail-7bb5898d6-2-post AnalysisRun ✔ Successful 42s ✔ 1 │ └──⊞ 3bf49f08-e851-4185-8cbf-88886c0da2ec.test-analysis.1 Job ✔ Successful 42s └──# revision:1 └──⧉ rollout-bg-test-liveness-fail-54495f7df4 ReplicaSet • ScaledDown 15m
上記の通り、Liveness Probeに失敗すると、一度はPodが作成され、トラフィックの切り替えも発生しますが、Probeに失敗する限りコンテナのRestartが繰り返される状況となることがわかりました。またRolloutのStatusはProgressing
の状態が続き、これをHealthy
に戻すには、やはり正常に動くRollout(ここではLiveness Probeの設定を修正したもの)を再デプロイする必要があります。
今回のAnalysisTemplate
は、Liveness Probeより先に実行されるようにしており、また実行すれば必ず成功するものだったため、あまり意味のないものでした。一方でAnalysisの実行内容を工夫することで(例えば起動後に一定時間Podへの疎通確認を行うなど?)、この問題を解決することができるかもしれません。またBlue/Green Deploymentを利用する場合prePromotionAnalysis
を設定することで、切り替え前の分析を実行することもできます。これにより、切り替え前にLiveness Probeの設定(不備?)によるコンテナの再起動の繰り返しが起きる場合に備え、Analysisを実行して検知をするよう設定することもできるのでは、と考えています。
Prometheus等の監視メトリクスによって問題を検知した場合
今回は検証を行いませんが、Argo RolloutsではAnalysisTemplate
にPrometheusのメトリクスなどを利用することができます。これにより、新バージョンのデプロイ・リリース完了前後でアプリケーション等に問題が見られた場合に、自動的にロールバックを行うこともできます。
公式ドキュメントでは、以下のようなマニフェストファイルの例が紹介されています。Analysisの成功・失敗の基準(successCondition
)と実際のAnalysis(provider.prometheus.query
)を定義し、条件を満たさない場合はロールバックを行います。
apiVersion: argoproj.io/v1alpha1 kind: AnalysisTemplate metadata: name: success-rate spec: args: - name: service-name - name: prometheus-port value: 9090 metrics: - name: success-rate successCondition: result[0] >= 0.95 provider: prometheus: address: "http://prometheus.example.com:{{args.prometheus-port}}" query: | sum(irate( istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}",response_code!~"5.*"}[5m] )) / sum(irate( istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}"}[5m] ))