【云原生】Kubernetes多Master Etcd备份恢复

作者:Administrator 发布时间: 2025-11-09 阅读量:3 评论数:0

资源列表

操作系统

配置

主机名

IP

openEuler 22.03

2C4G

master1

192.168.93.101

openEuler 22.03

2C4G

master2

192.168.93.102

openEuler 22.03

2G4G

master3

192.168.93.103

openEuler 22.03

2C4G

node1

192.168.93.104

openEuler 22.03

2C4G

nginx1

192.168.93.105

openEuler 22.03

2C4G

nginx2

192.168.93.106

一、前言

  • 提前部署etcdctl命令行工具。

  • ETCDk8s集群极为重要的一块服务,存储了集群所有的数据信息。同理,如果发生灾难或者etcd的数据丢失,都会影响集群数据的恢复。所以,etcd的备份和恢复是非常重要的。

1.1、ETCD简介

  • ETCD适用于共享配置和服务发现的分布式,一致性的KEY存储系统。ETCD是CoreOS公司发起的一个开源项目,授权协议为Apache

1.2、ETCD使用场景

  • 配置管理

  • 服务注册于发现

  • 选主

  • 应用调度

  • 分布式队列

  • 分布式锁

1.3、ETCD一些查询操作

  • 查看集群状态

[root@master1 ~]# ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints=https://192.168.93.101:2379,https://192.168.93.102:2379,https://192.168.93.103:2379 endpoint health
https://192.168.93.103:2379 is healthy: successfully committed proposal: took = 13.608542ms
https://192.168.93.101:2379 is healthy: successfully committed proposal: took = 17.198897ms
https://192.168.93.102:2379 is healthy: successfully committed proposal: took = 13.452481ms
  • 获取etcd版本信息

[root@master1 ~]# ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints=https://192.168.93.101:2379,https://192.168.93.102:2379,https://192.168.93.103:2379 version
etcdctl version: 3.5.7
API version: 3.5
  • 获取etcd所有key

[root@master1 ~]# ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints=https://192.168.93.101:2379,https://192.168.93.102:2379,https://192.168.93.103:2379 get / --prefix --keys-only
/registry/events/kube-system/calico-kube-controllers-64cc74d646-qm6c6.184663ddf4613d2f
​
/registry/events/kube-system/calico-kube-controllers-64cc74d646-qm6c6.184663de1115a851
​
/registry/events/kube-system/calico-kube-controllers-64cc74d646-qm6c6.184663de1a5b1e63
​
/registry/events/kube-system/calico-kube-controllers-64cc74d646.184126d891114002
​
/registry/events/kube-system/calico-kube-controllers.184126d840475495
​
/registry/events/kube-system/calico-kube-controllers.184126d88c02d65a
​
/registry/events/kube-system/calico-node-cf4l5.184126d88e78821f
​
/registry/events/kube-system/calico-node-cf4l5.184126d8aae08fc2
​
/registry/events/kube-system/calico-node-cf4l5.184126da9cf04d86
......

二、备份

  • 注意:ETCD不通的版本的etcdctl(etcdutl)命令不一样,大致的话都差不多,每次备份一个节点即可。

2.1、备份前操作

  • 可以提前创建一个deployment应以验证恢复etcd后查看deployment是否被恢复。

[root@master1 ~]# kubectl create -n default deployment data-test-pod --image=nginx:latest
[root@master1 ~]# kubectl get pod 
NAME                             READY   STATUS    RESTARTS   AGE
data-test-pod-644545fb68-ndbhs   1/1     Running   0          3m19s

2.2、命令备份

  • master1节点执行备份操作即可

# 创建备份目录
[root@master1 ~]# mkdir -p /data/etcd_backup_dir/
[root@master1 ~]# ETCDCTL_API=3 etcdctl \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
--endpoints=https://127.0.0.1:2379 \
snapshot save /data/etcd_backup_dir/etcd-snapshot-`date +%Y%m%d`.db
​
​
{"level":"info","ts":"2025-06-06T16:42:22.965+0800","caller":"snapshot/v3_snapshot.go:65","msg":"created temporary db file","path":"/data/etcd_backup_dir/etcd-snapshot-20250606.db.part"}
{"level":"info","ts":"2025-06-06T16:42:22.975+0800","logger":"client","caller":"v3@v3.5.7/maintenance.go:212","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":"2025-06-06T16:42:22.975+0800","caller":"snapshot/v3_snapshot.go:73","msg":"fetching snapshot","endpoint":"https://127.0.0.1:2379"}
{"level":"info","ts":"2025-06-06T16:42:23.166+0800","logger":"client","caller":"v3@v3.5.7/maintenance.go:220","msg":"completed snapshot read; closing"}
{"level":"info","ts":"2025-06-06T16:42:23.173+0800","caller":"snapshot/v3_snapshot.go:88","msg":"fetched snapshot","endpoint":"https://127.0.0.1:2379","size":"5.4 MB","took":"now"}
{"level":"info","ts":"2025-06-06T16:42:23.173+0800","caller":"snapshot/v3_snapshot.go:97","msg":"saved","path":"/data/etcd_backup_dir/etcd-snapshot-20250606.db"}
Snapshot saved at /data/etcd_backup_dir/etcd-snapshot-20250606.db

2.3、验证快照

[root@master1 ~]# ETCDCTL_API=3 etcdctl --write-out=table snapshot status /data/etcd_backup_dir/etcd-snapshot-20250606.db 
Deprecated: Use `etcdutl snapshot status` instead.
​
+----------+----------+------------+------------+
|   HASH   | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| 687877d3 |    10435 |       2061 |     5.4 MB |
+----------+----------+------------+------------+

2.4、删除测试Deployment

[root@master1 ~]# kubectl delete deployments.apps data-test-pod

2.5、查看所有Pod个数

  • 一共25个Pod,实际24个Pod,加上刚刚删除的一个Pod正好25个。

[root@master1 ~]# kubectl get pod -A | wc -l
25

三、恢复

3.1、停止组件运行

  • 停止所有Master上的相关组件。所有Master节点都要操作。

mkdir ~/backup_yaml
mv /etc/kubernetes/manifests/* ~/backup_yaml

3.2、备份ETCD存储目录数据

  • 所有Master节点都要执行。

mv /var/lib/etcd/member ~/member_bak

3.3、拷贝ETCD备份快照

  • 从master1节点上拷贝到master2、master3。

# /data/etcd_backup_dir目录创建即可
​
[root@master1 ~]# scp /data/etcd_backup_dir/etcd-snapshot-20250606.db root@192.168.93.102:/data/etcd_backup_dir

3.4、恢复备份

# master1 节点执行
ETCDCTL_API=3 etcdctl snapshot restore /data/etcd_backup_dir/etcd-snapshot-20250606.db \
  --name etcd-0 \
  --initial-cluster "etcd-0=https://192.168.93.101:2380,etcd-1=https://192.168.93.102:2380,etcd-2=https://192.168.93.103:2380" \
  --initial-cluster-token etcd-cluster \
  --initial-advertise-peer-urls https://192.168.93.101:2380 \
  --data-dir=/var/lib/etcd/
# master2 节点执行
ETCDCTL_API=3 etcdctl snapshot restore /data/etcd_backup_dir/etcd-snapshot-20250606.db \
  --name etcd-1 \
  --initial-cluster "etcd-0=https://192.168.93.101:2380,etcd-1=https://192.168.93.102:2380,etcd-2=https://192.168.93.103:2380" \
  --initial-cluster-token etcd-cluster \
  --initial-advertise-peer-urls https://192.168.93.102:2380 \
  --data-dir=/var/lib/etcd/
# master3 节点执行
ETCDCTL_API=3 etcdctl snapshot restore /data/etcd_backup_dir/etcd-snapshot-20250606.db \
  --name etcd-2 \
  --initial-cluster "etcd-0=https://192.168.93.101:2380,etcd-1=https://192.168.93.102:2380,etcd-2=https://192.168.93.103:2380" \
  --initial-cluster-token etcd-cluster \
  --initial-advertise-peer-urls https://192.168.93.103:2380 \
  --data-dir=/var/lib/etcd/

3.5、恢复YAML文件

  • 所有Master节点都要操作。

mv ~/backup_yaml/* /etc/kubernetes/manifests/

3.6、重启Kubelet

  • 所有Master节点都要操作。

systemctl daemon-reload
systemctl restart kubelet.service

3.7、再次查看ETCD状态

  • 可以看到依旧是健康状态。

[root@master1 ~]# ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints=https://192.168.93.101:2379,https://192.168.93.102:2379,https://192.168.93.103:2379 endpoint health
https://192.168.93.102:2379 is healthy: successfully committed proposal: took = 9.637032ms
https://192.168.93.103:2379 is healthy: successfully committed proposal: took = 10.035671ms
https://192.168.93.101:2379 is healthy: successfully committed proposal: took = 10.201263ms

3.8、查看Pod个数

  • 除去第一行的信息,跟我们预计的Pod个数一致。

[root@master1 ~]# kubectl get pod -A | wc -l
26
  • 查看被删除的Pod是否恢复

[root@master1 ~]# kubectl get pod 
NAME                             READY   STATUS    RESTARTS   AGE
data-test-pod-644545fb68-ndbhs   1/1     Running   0          26m


评论