はじめに
Rook v1.3で追加された機能の一つにRook-Ceph Cleanup Policyがあります。これはCephCluster削除時にdataDirHostPath
に指定したディレクトリ上のデータを削除する機能です。
Rook v1.3以前では、検証などでクラスターを削除する際、こちらのページのような手順を、ある程度手動(あるいは自作スクリプトを用意・利用)で行っていました。特にdataDirHostPath
にはCeph Clusterのコンフィグ情報やログデータなどが含まれており、これが削除されない状態で新しいクラスターを作成しても、削除前クラスターの設定を新クラスターが引き継ぎ、結果的にクラスターが正しく作成されなくなってしまいます。
そこで、新しく追加されたCleanup Policyを利用することで、上記のような問題を解消し、クラスター削除を少し楽にしてくれることが期待できます。
Cleanup Policyの概要
まずCleanup Policyについて、こちらのページにある内容を紹介します。
※参考リンク:
GitHub - rook/rook: Ceph cluster clean up policy
ユースケース
ユースケースとしては、ユーザーがRook-Cephクラスターを意図的にアンインストールする場合になります。
ユーザーによる実行許可の確認
dataDirHostPath
のディレクトリを削除する前にユーザーがその動作を有効にすることが必須になります。これはユーザーが誤ってCRを削除した場合、dataDirHostPath
を削除してしまうと元に戻せないためです。
具体的には、CephCluster
CRDにspec.cleanupPolicy
の設定値を追加することで、operatorがオーケストレーションを起動することを阻害し、dataDirHostPath
に指定したディレクトリ内のデータを削除します。
Operatorがクラスターをcleanupする動き
クラスター削除時にOperatorがdataDirHostPath
のデータを削除する動きは以下の通りです。
- ceph cluster上に
deletionTimeStamp
が存在するとき、operatorはcleanupを開始する - clean up前にoperatorはcleanupの設定を確認する
- ceph daemonsが起動しているノードを特定する
- daemonより前にdataDirHostPathが削除されるとdaemonがパニックを起こすため、各ノードでceph daemonが破壊されるまで待機する。
- 各ノード上で起動するbatch jobを作成する
- jobは以下の動きを実行する
なお、Cleanup時に起動するJobは以下のように定義されています。
Cleanup Job Spec
apiVersion: batch/v1 kind: Job metadata: name: rook-ceph-cleanup-<node-name> spec: template: spec: containers: - name: rook-ceph-cleanup-<node-name> securityContext: privileged: true image: <rook-image> env: # if ROOK_DATA_DIR_HOST_PATH is available, then delete the dataDirHostPath - name: ROOK_DATA_DIR_HOST_PATH value: <dataDirHostPath> args: []string{"ceph", "clean"} volumeMounts: - name: cleanup-volume # data dir host path that needs to be cleaned up. mountPath: <dataDirHostPath> volume: - name: cleanup-volume hostPath: #directory location on the host path: <dataDirHostPath> restartPolicy: Never
Cleanup Policyの利用
ではここからCleanup Policyを実際に利用してみます。内容は公式ドキュメントのこちらのページを参照しながら進めます。
※参考リンク:
Rook Docs v1.3 - Ceph Cluster CRD
検証環境
検証環境は以下の通りです。
- Kubernetes:
- version: v1.17.4
- master: 1台
- worker: 1台
- Rook:
- version: v1.3
Cleanup Policyの検証
まずはRook-Cephクラスターを構築します。今回はHost-based Clusterを利用しました。構築後の状態は以下の通りです。
[root@rookmaster ceph]# kubectl get pods -n rook-ceph NAME READY STATUS RESTARTS AGE csi-cephfsplugin-hc7nq 3/3 Running 0 3m14s csi-cephfsplugin-provisioner-674847b584-scb8s 5/5 Running 0 3m14s csi-cephfsplugin-provisioner-674847b584-xdhgd 5/5 Running 0 3m14s csi-rbdplugin-9lsmt 3/3 Running 0 3m15s csi-rbdplugin-provisioner-5777f9cf96-9ls9r 6/6 Running 0 3m15s csi-rbdplugin-provisioner-5777f9cf96-pgswq 6/6 Running 0 3m15s rook-ceph-crashcollector-rookworker-697d74cc96-xxvss 1/1 Terminating 0 89s rook-ceph-crashcollector-rookworker-cb898d58-5kh9m 1/1 Running 0 29s rook-ceph-mgr-a-6c9b758679-ts69c 1/1 Running 0 89s rook-ceph-mon-a-7977674f5f-f52hg 1/1 Running 0 99s rook-ceph-operator-599765ff49-fn858 1/1 Running 0 8m54s rook-ceph-osd-0-6d79874c88-2cn62 1/1 Running 0 29s rook-ceph-osd-prepare-rookworker-526t9 0/1 Completed 0 68s rook-discover-mvp9m 1/1 Running 0 8m37s [root@rookmaster ceph]# kubectl get cephcluster.ceph.rook.io -n rook-ceph NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH rook-ceph /var/lib/rook 1 5m14s Ready Cluster created successfully HEALTH_WARN # Toolboxの作成 [root@rookmaster ceph]# kubectl apply -f toolbox.yaml deployment.apps/rook-ceph-tools created [root@rookmaster ceph]# kubectl exec -it -n rook-ceph $(kubectl get pods -n rook-ceph -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph -s cluster: id: 58fb05e3-de72-435f-af4d-74e774d25df6 health: HEALTH_WARN OSD count 1 < osd_pool_default_size 3 services: mon: 1 daemons, quorum a (age 5m) mgr: a(active, since 4m) osd: 1 osds: 1 up (since 4m), 1 in (since 4m) data: pools: 0 pools, 0 pgs objects: 0 objects, 0 B usage: 1.0 GiB used, 63 GiB / 64 GiB avail pgs: [root@rookmaster ceph]#
今回クラスター作成時に利用したckuster-test.yaml
は以下の通りになります。dataDirHostPath
には/var/lib/rook
を指定しています。
cluster-test.yaml
apiVersion: ceph.rook.io/v1 kind: CephCluster metadata: name: rook-ceph namespace: rook-ceph spec: cephVersion: # The container image used to launch the Ceph daemon pods (mon, mgr, osd, mds, rgw). # v13 is mimic, v14 is nautilus, and v15 is octopus. # RECOMMENDATION: In production, use a specific version tag instead of the general v14 flag, which pulls the latest release and could result in different # versions running within the cluster. See tags available at https://hub.docker.com/r/ceph/ceph/tags/. # If you want to be more precise, you can always use a timestamp tag such ceph/ceph:v14.2.5-20190917 # This tag might not contain a new Ceph version, just security fixes from the underlying operating system, which will reduce vulnerabilities image: ceph/ceph:v14.2.9 # Whether to allow unsupported versions of Ceph. Currently mimic and nautilus are supported, with the recommendation to upgrade to nautilus. # Octopus is the version allowed when this is set to true. # Do not set to true in production. allowUnsupported: false # The path on the host where configuration files will be persisted. Must be specified. # Important: if you reinstall the cluster, make sure you delete this directory from each host or else the mons will fail to start on the new cluster. # In Minikube, the '/data' directory is configured to persist across reboots. Use "/data/rook" in Minikube environment. dataDirHostPath: /var/lib/rook # Whether or not upgrade should continue even if a check fails # This means Ceph's status could be degraded and we don't recommend upgrading but you might decide otherwise # Use at your OWN risk # To understand Rook's upgrade process of Ceph, read https://rook.io/docs/rook/master/ceph-upgrade.html#ceph-version-upgrades skipUpgradeChecks: false # Whether or not continue if PGs are not clean during an upgrade continueUpgradeAfterChecksEvenIfNotHealthy: false # set the amount of mons to be started mon: count: 1 allowMultiplePerNode: false # mgr: # modules: # Several modules should not need to be included in this list. The "dashboard" and "monitoring" modules # are already enabled by other settings in the cluster CR and the "rook" module is always enabled. # - name: pg_autoscaler # enabled: true # enable the ceph dashboard for viewing cluster status dashboard: enabled: true # serve the dashboard under a subpath (useful when you are accessing the dashboard via a reverse proxy) # urlPrefix: /ceph-dashboard # serve the dashboard at the given port. # port: 8443 # serve the dashboard using SSL ssl: true # enable prometheus alerting for cluster monitoring: # requires Prometheus to be pre-installed enabled: false # namespace to deploy prometheusRule in. If empty, namespace of the cluster will be used. # Recommended: # If you have a single rook-ceph cluster, set the rulesNamespace to the same namespace as the cluster or keep it empty. # If you have multiple rook-ceph clusters in the same k8s cluster, choose the same namespace (ideally, namespace with prometheus # deployed) to set rulesNamespace for all the clusters. Otherwise, you will get duplicate alerts with multiple alert definitions. rulesNamespace: rook-ceph network: # enable host networking #provider: host # EXPERIMENTAL: enable the Multus network provider #provider: multus #selectors: # The selector keys are required to be `public` and `cluster`. # Based on the configuration, the operator will do the following: # 1. if only the `public` selector key is specified both public_network and cluster_network Ceph settings will listen on that interface # 2. if both `public` and `cluster` selector keys are specified the first one will point to 'public_network' flag and the second one to 'cluster_network' # # In order to work, each selector value must match a NetworkAttachmentDefinition object in Multus # #public: public-conf --> NetworkAttachmentDefinition object name in Multus #cluster: cluster-conf --> NetworkAttachmentDefinition object name in Multus rbdMirroring: # The number of daemons that will perform the rbd mirroring. # rbd mirroring must be configured with "rbd mirror" from the rook toolbox. workers: 0 # enable the crash collector for ceph daemon crash collection crashCollector: disable: false cleanupPolicy: # cleanupPolicy should only be added to the cluster when the cluster is about to be deleted. # After any field of the cleanup policy is set, Rook will stop configuring the cluster as if the cluster is about # to be destroyed in order to prevent these settings from being deployed unintentionally. # To signify that automatic deletion is desired, use the value "yes-really-destroy-data". Only this and an empty # string are valid values for this field. deleteDataDirOnHosts: "" # To control where various services will be scheduled by kubernetes, use the placement configuration sections below. # The example under 'all' would have all services scheduled on kubernetes nodes labeled with 'role=storage-node' and # tolerate taints with a key of 'storage-node'. # placement: # all: # nodeAffinity: # requiredDuringSchedulingIgnoredDuringExecution: # nodeSelectorTerms: # - matchExpressions: # - key: role # operator: In # values: # - storage-node # podAffinity: # podAntiAffinity: # topologySpreadConstraints: # tolerations: # - key: storage-node # operator: Exists # The above placement information can also be specified for mon, osd, and mgr components # mon: # Monitor deployments may contain an anti-affinity rule for avoiding monitor # collocation on the same node. This is a required rule when host network is used # or when AllowMultiplePerNode is false. Otherwise this anti-affinity rule is a # preferred rule with weight: 50. # osd: # mgr: annotations: # all: # mon: # osd: # If no mgr annotations are set, prometheus scrape annotations will be set by default. # mgr: resources: # The requests and limits set here, allow the mgr pod to use half of one CPU core and 1 gigabyte of memory # mgr: # limits: # cpu: "500m" # memory: "1024Mi" # requests: # cpu: "500m" # memory: "1024Mi" # The above example requests/limits can also be added to the mon and osd components # mon: # osd: # prepareosd: # crashcollector: # The option to automatically remove OSDs that are out and are safe to destroy. removeOSDsIfOutAndSafeToRemove: false # priorityClassNames: # all: rook-ceph-default-priority-class # mon: rook-ceph-mon-priority-class # osd: rook-ceph-osd-priority-class # mgr: rook-ceph-mgr-priority-class storage: # cluster level storage configuration and selection useAllNodes: true useAllDevices: false devices: - name: "sdc" #deviceFilter: config: # metadataDevice: "md0" # specify a non-rotational storage so ceph-volume will use it as block db device of bluestore. # databaseSizeMB: "1024" # uncomment if the disks are smaller than 100 GB # journalSizeMB: "1024" # uncomment if the disks are 20 GB or smaller # osdsPerDevice: "1" # this value can be overridden at the node or device level # encryptedDevice: "true" # the default value for this option is "false" # Individual nodes and their config can be specified as well, but 'useAllNodes' above must be set to false. Then, only the named # nodes below will be used as storage resources. Each node's 'name' field should match their 'kubernetes.io/hostname' label. # nodes: # - name: "172.17.4.201" # devices: # specific devices to use for storage can be specified for each node # - name: "sdb" # - name: "nvme01" # multiple osds can be created on high performance devices # config: # osdsPerDevice: "5" # - name: "/dev/disk/by-id/ata-ST4000DM004-XXXX" # devices can be specified using full udev paths # config: # configuration can be specified at the node level which overrides the cluster level config # storeType: filestore # - name: "172.17.4.301" # deviceFilter: "^sd." # The section for configuring management of daemon disruptions during upgrade or fencing. disruptionManagement: # If true, the operator will create and manage PodDisruptionBudgets for OSD, Mon, RGW, and MDS daemons. OSD PDBs are managed dynamically # via the strategy outlined in the [design](https://github.com/rook/rook/blob/master/design/ceph/ceph-managed-disruptionbudgets.md). The operator will # block eviction of OSDs by default and unblock them safely when drains are detected. managePodBudgets: false # A duration in minutes that determines how long an entire failureDomain like `region/zone/host` will be held in `noout` (in addition to the # default DOWN/OUT interval) when it is draining. This is only relevant when `managePodBudgets` is `true`. The default value is `30` minutes. osdMaintenanceTimeout: 30 # If true, the operator will create and manage MachineDisruptionBudgets to ensure OSDs are only fenced when the cluster is healthy. # Only available on OpenShift. manageMachineDisruptionBudgets: false # Namespace in which to watch for the MachineDisruptionBudgets. machineDisruptionBudgetNamespace: openshift-machine-api
CephCluster
の編集
次にCephCluster
の設定を編集し、cleanUpPolicy
の機能を有効にします。設定自体はとても簡単で、spec.cleanupPolicy.deleteDataDirOnHosts
にyes-really-destroy-data
を追加するだけで有効になります。
# 変更前の状態 [root@rookmaster ceph]# kubectl describe cephcluster.ceph.rook.io rook-ceph -n rook-ceph Name: rook-ceph Namespace: rook-ceph Labels: <none> (中略) Spec: Ceph Version: Image: ceph/ceph:v14.2.9 Cleanup Policy: Delete Data Dir On Hosts: Crash Collector: Disable: false (中略) [root@rookmaster ceph]# # 変更 [root@rookmaster ceph]# kubectl edit cephcluster.ceph.rook.io -n rook-ceph spec: cephVersion: image: ceph/ceph:v14.2.9 cleanupPolicy: deleteDataDirOnHosts: "yes-really-destroy-data" # ここを追加 crashCollector: disable: false cephcluster.ceph.rook.io/rook-ceph edited [root@rookmaster ceph]# # 変更後の状態 [root@rookmaster ceph]# kubectl describe cephcluster.ceph.rook.io rook-ceph -n rook-ceph Name: rook-ceph Namespace: rook-ceph Labels: <none> (中略) Spec: Ceph Version: Image: ceph/ceph:v14.2.9 Cleanup Policy: Delete Data Dir On Hosts: yes-really-destroy-data Crash Collector: Disable: false (中略) [root@rookmaster ceph]#
クラスター削除
cleanUpPolicy
の設定を有効にしたので、クラスターを削除します。すると以下の通りcluster-cleanup-job
が起動し、これがdataDirHostPath
のデータ削除を行います。
# クラスター削除前 [root@rookworker ~]# ll /var/lib/rook/ total 0 drwxr-xr-x 3 root root 18 Apr 19 08:55 mon-a drwxr-xr-x 4 root root 82 Apr 19 08:56 rook-ceph [root@rookworker ~]# # クラスター削除 [root@rookmaster ceph]# kubectl delete -f cluster-test.yaml cephcluster.ceph.rook.io "rook-ceph" deleted [root@rookmaster ceph]# kubectl get pods -n rook-ceph -w NAME READY STATUS RESTARTS AGE csi-cephfsplugin-hc7nq 3/3 Running 0 11m csi-cephfsplugin-provisioner-674847b584-scb8s 5/5 Running 0 11m csi-cephfsplugin-provisioner-674847b584-xdhgd 5/5 Running 0 11m csi-rbdplugin-9lsmt 3/3 Running 0 11m csi-rbdplugin-provisioner-5777f9cf96-9ls9r 6/6 Running 0 11m csi-rbdplugin-provisioner-5777f9cf96-pgswq 6/6 Running 0 11m rook-ceph-crashcollector-rookworker-cb898d58-5kh9m 1/1 Terminating 0 8m57s rook-ceph-mgr-a-6c9b758679-ts69c 0/1 Terminating 0 9m57s rook-ceph-mon-a-7977674f5f-f52hg 0/1 Terminating 0 10m rook-ceph-operator-599765ff49-fn858 1/1 Running 0 17m rook-ceph-tools-877c4d966-7ptvf 1/1 Running 0 6m19s rook-discover-mvp9m 1/1 Running 0 17m rook-ceph-mon-a-7977674f5f-f52hg 0/1 Terminating 0 10m rook-ceph-mon-a-7977674f5f-f52hg 0/1 Terminating 0 10m rook-ceph-mgr-a-6c9b758679-ts69c 0/1 Terminating 0 9m59s rook-ceph-mgr-a-6c9b758679-ts69c 0/1 Terminating 0 9m59s # cleanup jobが起動する cluster-cleanup-job-rookworker-f2nm2 0/1 Pending 0 0s cluster-cleanup-job-rookworker-f2nm2 0/1 Pending 0 0s cluster-cleanup-job-rookworker-f2nm2 0/1 ContainerCreating 0 0s cluster-cleanup-job-rookworker-f2nm2 0/1 Completed 0 1s rook-ceph-crashcollector-rookworker-cb898d58-5kh9m 0/1 Terminating 0 9m23s rook-ceph-crashcollector-rookworker-cb898d58-5kh9m 0/1 Terminating 0 9m29s rook-ceph-crashcollector-rookworker-cb898d58-5kh9m 0/1 Terminating 0 9m29s ^C[root@rookmaster ceph]# [root@rookmaster ceph]# # クラスター削除後 [root@rookworker ~]# ll /var/lib/rook total 0 [root@rookworker ~]# [root@rookmaster ceph]# kubectl get pods -n rook-ceph NAME READY STATUS RESTARTS AGE cluster-cleanup-job-rookworker-f2nm2 0/1 Completed 0 2m13s csi-cephfsplugin-hc7nq 3/3 Running 0 13m csi-cephfsplugin-provisioner-674847b584-scb8s 5/5 Running 0 13m csi-cephfsplugin-provisioner-674847b584-xdhgd 5/5 Running 0 13m csi-rbdplugin-9lsmt 3/3 Running 0 14m csi-rbdplugin-provisioner-5777f9cf96-9ls9r 6/6 Running 0 14m csi-rbdplugin-provisioner-5777f9cf96-pgswq 6/6 Running 0 14m rook-ceph-operator-599765ff49-fn858 1/1 Running 0 19m rook-ceph-tools-877c4d966-7ptvf 1/1 Running 0 8m36s rook-discover-mvp9m 1/1 Running 0 19m [root@rookmaster ceph]#