资源列表
一、前言
提前部署
etcdctl命令行工具。ETCD是k8s集群极为重要的一块服务,存储了集群所有的数据信息。同理,如果发生灾难或者etcd的数据丢失,都会影响集群数据的恢复。所以,etcd的备份和恢复是非常重要的。
1.1、ETCD简介
ETCD适用于共享配置和服务发现的分布式,一致性的KEY存储系统。ETCD是CoreOS公司发起的一个开源项目,授权协议为Apache。
1.2、ETCD使用场景
配置管理
服务注册于发现
选主
应用调度
分布式队列
分布式锁
1.3、ETCD一些查询操作
查看集群状态
[root@master1 ~]# ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints=https://192.168.93.101:2379,https://192.168.93.102:2379,https://192.168.93.103:2379 endpoint health
https://192.168.93.103:2379 is healthy: successfully committed proposal: took = 13.608542ms
https://192.168.93.101:2379 is healthy: successfully committed proposal: took = 17.198897ms
https://192.168.93.102:2379 is healthy: successfully committed proposal: took = 13.452481ms获取etcd版本信息
[root@master1 ~]# ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints=https://192.168.93.101:2379,https://192.168.93.102:2379,https://192.168.93.103:2379 version
etcdctl version: 3.5.7
API version: 3.5获取etcd所有key
[root@master1 ~]# ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints=https://192.168.93.101:2379,https://192.168.93.102:2379,https://192.168.93.103:2379 get / --prefix --keys-only
/registry/events/kube-system/calico-kube-controllers-64cc74d646-qm6c6.184663ddf4613d2f
/registry/events/kube-system/calico-kube-controllers-64cc74d646-qm6c6.184663de1115a851
/registry/events/kube-system/calico-kube-controllers-64cc74d646-qm6c6.184663de1a5b1e63
/registry/events/kube-system/calico-kube-controllers-64cc74d646.184126d891114002
/registry/events/kube-system/calico-kube-controllers.184126d840475495
/registry/events/kube-system/calico-kube-controllers.184126d88c02d65a
/registry/events/kube-system/calico-node-cf4l5.184126d88e78821f
/registry/events/kube-system/calico-node-cf4l5.184126d8aae08fc2
/registry/events/kube-system/calico-node-cf4l5.184126da9cf04d86
......二、备份
注意:ETCD不通的版本的etcdctl(etcdutl)命令不一样,大致的话都差不多,每次备份一个节点即可。
2.1、备份前操作
可以提前创建一个deployment应以验证恢复etcd后查看deployment是否被恢复。
[root@master1 ~]# kubectl create -n default deployment data-test-pod --image=nginx:latest[root@master1 ~]# kubectl get pod
NAME READY STATUS RESTARTS AGE
data-test-pod-644545fb68-ndbhs 1/1 Running 0 3m19s2.2、命令备份
master1节点执行备份操作即可
# 创建备份目录
[root@master1 ~]# mkdir -p /data/etcd_backup_dir/[root@master1 ~]# ETCDCTL_API=3 etcdctl \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
--endpoints=https://127.0.0.1:2379 \
snapshot save /data/etcd_backup_dir/etcd-snapshot-`date +%Y%m%d`.db
{"level":"info","ts":"2025-06-06T16:42:22.965+0800","caller":"snapshot/v3_snapshot.go:65","msg":"created temporary db file","path":"/data/etcd_backup_dir/etcd-snapshot-20250606.db.part"}
{"level":"info","ts":"2025-06-06T16:42:22.975+0800","logger":"client","caller":"v3@v3.5.7/maintenance.go:212","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":"2025-06-06T16:42:22.975+0800","caller":"snapshot/v3_snapshot.go:73","msg":"fetching snapshot","endpoint":"https://127.0.0.1:2379"}
{"level":"info","ts":"2025-06-06T16:42:23.166+0800","logger":"client","caller":"v3@v3.5.7/maintenance.go:220","msg":"completed snapshot read; closing"}
{"level":"info","ts":"2025-06-06T16:42:23.173+0800","caller":"snapshot/v3_snapshot.go:88","msg":"fetched snapshot","endpoint":"https://127.0.0.1:2379","size":"5.4 MB","took":"now"}
{"level":"info","ts":"2025-06-06T16:42:23.173+0800","caller":"snapshot/v3_snapshot.go:97","msg":"saved","path":"/data/etcd_backup_dir/etcd-snapshot-20250606.db"}
Snapshot saved at /data/etcd_backup_dir/etcd-snapshot-20250606.db2.3、验证快照
[root@master1 ~]# ETCDCTL_API=3 etcdctl --write-out=table snapshot status /data/etcd_backup_dir/etcd-snapshot-20250606.db
Deprecated: Use `etcdutl snapshot status` instead.
+----------+----------+------------+------------+
| HASH | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| 687877d3 | 10435 | 2061 | 5.4 MB |
+----------+----------+------------+------------+2.4、删除测试Deployment
[root@master1 ~]# kubectl delete deployments.apps data-test-pod2.5、查看所有Pod个数
一共25个Pod,实际24个Pod,加上刚刚删除的一个Pod正好25个。
[root@master1 ~]# kubectl get pod -A | wc -l
25三、恢复
3.1、停止组件运行
停止所有Master上的相关组件。所有Master节点都要操作。
mkdir ~/backup_yaml
mv /etc/kubernetes/manifests/* ~/backup_yaml3.2、备份ETCD存储目录数据
所有Master节点都要执行。
mv /var/lib/etcd/member ~/member_bak3.3、拷贝ETCD备份快照
从master1节点上拷贝到master2、master3。
# /data/etcd_backup_dir目录创建即可
[root@master1 ~]# scp /data/etcd_backup_dir/etcd-snapshot-20250606.db root@192.168.93.102:/data/etcd_backup_dir3.4、恢复备份
# master1 节点执行
ETCDCTL_API=3 etcdctl snapshot restore /data/etcd_backup_dir/etcd-snapshot-20250606.db \
--name etcd-0 \
--initial-cluster "etcd-0=https://192.168.93.101:2380,etcd-1=https://192.168.93.102:2380,etcd-2=https://192.168.93.103:2380" \
--initial-cluster-token etcd-cluster \
--initial-advertise-peer-urls https://192.168.93.101:2380 \
--data-dir=/var/lib/etcd/# master2 节点执行
ETCDCTL_API=3 etcdctl snapshot restore /data/etcd_backup_dir/etcd-snapshot-20250606.db \
--name etcd-1 \
--initial-cluster "etcd-0=https://192.168.93.101:2380,etcd-1=https://192.168.93.102:2380,etcd-2=https://192.168.93.103:2380" \
--initial-cluster-token etcd-cluster \
--initial-advertise-peer-urls https://192.168.93.102:2380 \
--data-dir=/var/lib/etcd/# master3 节点执行
ETCDCTL_API=3 etcdctl snapshot restore /data/etcd_backup_dir/etcd-snapshot-20250606.db \
--name etcd-2 \
--initial-cluster "etcd-0=https://192.168.93.101:2380,etcd-1=https://192.168.93.102:2380,etcd-2=https://192.168.93.103:2380" \
--initial-cluster-token etcd-cluster \
--initial-advertise-peer-urls https://192.168.93.103:2380 \
--data-dir=/var/lib/etcd/3.5、恢复YAML文件
所有Master节点都要操作。
mv ~/backup_yaml/* /etc/kubernetes/manifests/3.6、重启Kubelet
所有Master节点都要操作。
systemctl daemon-reload
systemctl restart kubelet.service3.7、再次查看ETCD状态
可以看到依旧是健康状态。
[root@master1 ~]# ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints=https://192.168.93.101:2379,https://192.168.93.102:2379,https://192.168.93.103:2379 endpoint health
https://192.168.93.102:2379 is healthy: successfully committed proposal: took = 9.637032ms
https://192.168.93.103:2379 is healthy: successfully committed proposal: took = 10.035671ms
https://192.168.93.101:2379 is healthy: successfully committed proposal: took = 10.201263ms3.8、查看Pod个数
除去第一行的信息,跟我们预计的Pod个数一致。
[root@master1 ~]# kubectl get pod -A | wc -l
26查看被删除的Pod是否恢复
[root@master1 ~]# kubectl get pod
NAME READY STATUS RESTARTS AGE
data-test-pod-644545fb68-ndbhs 1/1 Running 0 26m