基于K8S1.28.2实验rook部署ceph

Rook 支持 Kubernetes v1.22 或更高版本。

rook版本1.12.8

K8S版本1.28.2

部署出来ceph版本是quincy版本

主机名 ip1(NAT) 系统 新硬盘 磁盘 内存
master1 192.168.48.101 Centos7.9 100G 100G 4G
master2 192.168.48.102 Centos7.9 100G 4G
master3 192.168.48.103 Centos7.9 100G 4G
node01 192.168.48.104 Centos7.9 100G 100G 6G
node02 192.168.48.105 Centos7.9 100G 100G 6G

我这里是五台机,本应该ceph(三节点)是需要部署在三台node上的,这里为了测试方便,仅部署在master1,node01,node02上所以需要给这三台加一个物理硬盘

注意!使用之前,请确定是否去掉master节点的污点

【去污点方法】

以下所有操作都在master进行

前期准备

克隆仓库

1
2
git clone --single-branch --branch v1.12.8 https://github.com/rook/rook.git
cd rook/deploy/examples

查看所需镜像

1
2
3
4
5
6
7
8
9
10
11
[root@master1 examples]# cat operator.yaml | grep IMAGE:
# ROOK_CSI_CEPH_IMAGE: "quay.io/cephcsi/cephcsi:v3.9.0"
# ROOK_CSI_REGISTRAR_IMAGE: "registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.8.0"
# ROOK_CSI_RESIZER_IMAGE: "registry.k8s.io/sig-storage/csi-resizer:v1.8.0"
# ROOK_CSI_PROVISIONER_IMAGE: "registry.k8s.io/sig-storage/csi-provisioner:v3.5.0"
# ROOK_CSI_SNAPSHOTTER_IMAGE: "registry.k8s.io/sig-storage/csi-snapshotter:v6.2.2"
# ROOK_CSI_ATTACHER_IMAGE: "registry.k8s.io/sig-storage/csi-attacher:v4.3.0"
# ROOK_CSIADDONS_IMAGE: "quay.io/csiaddons/k8s-sidecar:v0.7.0"

[root@master1 examples]# cat operator.yaml | grep image:
image: rook/ceph:v1.12.8

image-20231119001814854

image-20231122001213717

基本都是国外的镜像,在这里通过阿里云+github方式构建镜像仓库解决(以下是添加为自己私人构建的镜像)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
sed -i 's/# ROOK_CSI_CEPH_IMAGE: "quay.io\/cephcsi\/cephcsi:v3.9.0"/ROOK_CSI_CEPH_IMAGE: "registry.cn-hangzhou.aliyuncs.com\/qianyios\/cephcsi:v3.9.0"/g' operator.yaml

sed -i 's/# ROOK_CSI_REGISTRAR_IMAGE: "registry.k8s.io\/sig-storage\/csi-node-driver-registrar:v2.8.0"/ROOK_CSI_REGISTRAR_IMAGE: "registry.cn-hangzhou.aliyuncs.com\/qianyios\/csi-node-driver-registrar:v2.8.0"/g' operator.yaml

sed -i 's/# ROOK_CSI_RESIZER_IMAGE: "registry.k8s.io\/sig-storage\/csi-resizer:v1.8.0"/ROOK_CSI_RESIZER_IMAGE: "registry.cn-hangzhou.aliyuncs.com\/qianyios\/csi-resizer:v1.8.0"/g' operator.yaml

sed -i 's/# ROOK_CSI_PROVISIONER_IMAGE: "registry.k8s.io\/sig-storage\/csi-provisioner:v3.5.0"/ROOK_CSI_PROVISIONER_IMAGE: "registry.cn-hangzhou.aliyuncs.com\/qianyios\/csi-provisioner:v3.5.0"/g' operator.yaml

sed -i 's/# ROOK_CSI_SNAPSHOTTER_IMAGE: "registry.k8s.io\/sig-storage\/csi-snapshotter:v6.2.2"/ROOK_CSI_SNAPSHOTTER_IMAGE: "registry.cn-hangzhou.aliyuncs.com\/qianyios\/csi-snapshotter:v6.2.2"/g' operator.yaml

sed -i 's/# ROOK_CSI_ATTACHER_IMAGE: "registry.k8s.io\/sig-storage\/csi-attacher:v4.3.0"/ROOK_CSI_ATTACHER_IMAGE: "registry.cn-hangzhou.aliyuncs.com\/qianyios\/csi-attacher:v4.3.0"/g' operator.yaml

sed -i 's/# ROOK_CSIADDONS_IMAGE: "quay.io\/csiaddons\/k8s-sidecar:v0.7.0"/ROOK_CSIADDONS_IMAGE: "registry.cn-hangzhou.aliyuncs.com\/qianyios\/k8s-sidecar:v0.7.0"/g' operator.yaml

sed -i 's/image: rook\/ceph:v1.12.8/image: registry.cn-hangzhou.aliyuncs.com\/qianyios\/ceph:v1.12.8/g' operator.yaml

开启自动发现磁盘(用于后期扩展)

1
sed -i 's/ROOK_ENABLE_DISCOVERY_DAEMON: "false"/ROOK_ENABLE_DISCOVERY_DAEMON: "true"/' /root/rook/deploy/examples/operator.yaml

建议提前下载镜像

1
2
3
4
5
6
7
8
docker pull registry.cn-hangzhou.aliyuncs.com/qianyios/cephcsi:v3.9.0
docker pull registry.cn-hangzhou.aliyuncs.com/qianyios/csi-node-driver-registrar:v2.8.0
docker pull registry.cn-hangzhou.aliyuncs.com/qianyios/csi-resizer:v1.8.0
docker pull registry.cn-hangzhou.aliyuncs.com/qianyios/csi-provisioner:v3.5.0
docker pull registry.cn-hangzhou.aliyuncs.com/qianyios/csi-snapshotter:v6.2.2
docker pull registry.cn-hangzhou.aliyuncs.com/qianyios/csi-attacher:v4.3.0
docker pull registry.cn-hangzhou.aliyuncs.com/qianyios/k8s-sidecar:v0.7.0
docker pull registry.cn-hangzhou.aliyuncs.com/qianyios/ceph:v1.12.8

安装rook+ceph集群

开始部署

  1. 创建crd&common&operator
1
kubectl create -f crds.yaml -f common.yaml -f operator.yaml

image-20231119182815198

  1. 创建cluster(ceph)

修改配置:等待operator容器和discover容器启动,配置osd节点

先注意一下自己的磁盘(lsblk)根据自身情况修改下面的配置文件

image-20231119011441790

1
2
#更改为国内镜像
sed -i 's#image: quay.io/ceph/ceph:v17.2.6#image: registry.cn-hangzhou.aliyuncs.com/qianyios/ceph:v17.2.6#' cluster.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
vim cluster.yaml
-------------------------------------

- 修改镜像
image: registry.cn-hangzhou.aliyuncs.com/qianyios/ceph:v17.2.6

- 改为false,并非使用所有节点所有磁盘作为osd
- 启用deviceFilter
- 按需配置config
- 会自动跳过非裸盘
storage: # cluster level storage configuration and selection
useAllNodes: false
useAllDevices: false
deviceFilter:
config:
nodes:
- name: "master1"
deviceFilter: "sda"
- name: "node01"
deviceFilter: "sda"
- name: "node02"
deviceFilter: "^sd." #自动匹配sd开头的裸盘

这里的三个节点,是我们开头讲到的三台机,自行根据修改调整,注意这里的名字是k8s集群的名字可以在kubectl get nodes查看

image-20231119011622231

image-20231119011844526

部署cluster

1
kubectl create -f cluster.yaml

查看状态

1
2
3
4
5
6
7
8
- 实时查看pod创建进度
kubectl get pod -n rook-ceph -w

- 实时查看集群创建进度
kubectl get cephcluster -n rook-ceph rook-ceph -w

- 详细描述
kubectl describe cephcluster -n rook-ceph rook-ceph

image-20231119184740824

安装ceph客户端工具

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
- 进入工作目录
cd rook/deploy/examples/

- 查看所需镜像
[root@master1 examples]# cat toolbox.yaml | grep image:
image: quay.io/ceph/ceph:v17.2.6
- 更改为国内镜像

sed -i 's#image: quay.io/ceph/ceph:v17.2.6#image: registry.cn-hangzhou.aliyuncs.com/qianyios/ceph:v17.2.6#' toolbox.yaml

- 创建toolbox
kubectl create -f toolbox.yaml -n rook-ceph

- 查看pod
kubectl get pod -n rook-ceph -l app=rook-ceph-tools

- 进入pod
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash

- 查看集群状态
ceph status

- 查看osd状态
ceph osd status

- 集群空间用量
ceph df

image-20231119185510701

image-20231119185604869

暴露dashboard

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
cat > rook-dashboard.yaml << EOF

---
apiVersion: v1
kind: Service
metadata:
labels:
app: rook-ceph-mgr
ceph_daemon_id: a
rook_cluster: rook-ceph
name: rook-ceph-mgr-dashboard-np
namespace: rook-ceph
spec:
ports:
- name: http-dashboard
port: 8443
protocol: TCP
targetPort: 8443
nodePort: 30700
selector:
app: rook-ceph-mgr
ceph_daemon_id: a
sessionAffinity: None
type: NodePort
EOF

kubectl apply -f rook-dashboard.yaml

查看dashboard密码

1
2
3
kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo

Qmu/!$ZvfQTAd-aCuHF+

访问dashboard

1
https://192.168.48.200:30700

image-20231119194020314

如果出现以下报错(可以按下面解决,反之跳过)

消除HEALTH_WARN警告

  • 查看警告详情
    • AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED: mons are allowing insecure global_id reclaim
    • MON_DISK_LOW: mons a,b,c are low on available space

image-20210621191403196

官方解决方案:https://docs.ceph.com/en/latest/rados/operations/health-checks/

  • AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
方法一:
- 进入toolbox

kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash

ceph config set mon auth_allow_insecure_global_id_reclaim false

方法二:
kubectl get configmap rook-config-override -n rook-ceph -o yaml
kubectl edit configmap rook-config-override -n rook-ceph -o yaml


config: |
[global]
mon clock drift allowed = 1

#删除pod
kubectl -n rook-ceph delete pod $(kubectl -n rook-ceph get pods -o custom-columns=NAME:.metadata.name --no-headers| grep mon)
#显示一下信息
pod "rook-ceph-mon-a-557d88c-6ksmg" deleted
pod "rook-ceph-mon-b-748dcc9b89-j8l24" deleted
pod "rook-ceph-mon-c-5d47c664-p855m" deleted
#最后查看健康值
ceph -s

image-20231203203949171

  • MON_DISK_LOW:根分区使用率过高,清理即可。

image-20210621195325516

Ceph存储使用

三种存储类型

存储类型 特征 应用场景 典型设备
块存储(RBD) 存储速度较快 不支持共享存储 [ReadWriteOnce] 虚拟机硬盘 硬盘 Raid
文件存储(CephFS) 存储速度慢(需经操作系统处理再转为块存储) 支持共享存储 [ReadWriteMany] 文件共享 FTP NFS
对象存储(Object) 具备块存储的读写性能和文件存储的共享特性 操作系统不能直接访问,只能通过应用程序级别的API访问 图片存储 视频存储 OSS

块存储

创建CephBlockPool和StorageClass

  • 文件路径:/root/rook/deploy/examples/csi/rbd/storageclass.yaml
  • CephBlockPool和StorageClass都位于storageclass.yaml 文件
  • 配置文件简要解读:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
cd /root/rook/deploy/examples/csi/rbd
[root@master1 rbd]# grep -vE '^\s*(#|$)' storageclass.yaml
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: replicapool
namespace: rook-ceph # namespace:cluster
spec:
failureDomain: host # host级容灾
replicated:
size: 3 # 默认三个副本
requireSafeReplicaSize: true
---
apiVersion: storage.k8s.io/v1
kind: StorageClass # sc无需指定命名空间
metadata:
name: rook-ceph-block
provisioner: rook-ceph.rbd.csi.ceph.com # 存储驱动
parameters:
clusterID: rook-ceph # namespace:cluster
pool: replicapool # 关联到CephBlockPool
imageFormat: "2"
imageFeatures: layering
csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph # namespace:cluster
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph # namespace:cluster
csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph # namespace:cluster
csi.storage.k8s.io/fstype: ext4
allowVolumeExpansion: true # 是否允许扩容
reclaimPolicy: Delete # PV回收策略
[root@master1 rbd]#

创建CephBlockPool和StorageClass

1
kubectl create -f storageclass.yaml

查看

1
2
3
4
5
- 查看sc
kubectl get sc

- 查看CephBlockPool(也可在dashboard中查看)
kubectl get cephblockpools -n rook-ceph

image-20231119195332077

image-20231119195400264

块存储使用示例

  • Deployment单副本+PersistentVolumeClaim
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
cat > nginx-deploy-rbd.yaml << "EOF"

apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: nginx-deploy-rbd
name: nginx-deploy-rbd
spec:
replicas: 1
selector:
matchLabels:
app: nginx-deploy-rbd
template:
metadata:
labels:
app: nginx-deploy-rbd
spec:
containers:
- image: registry.cn-hangzhou.aliyuncs.com/qianyios/nginx:latest
name: nginx
volumeMounts:
- name: data
mountPath: /usr/share/nginx/html
volumes:
- name: data
persistentVolumeClaim:
claimName: nginx-rbd-pvc

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nginx-rbd-pvc
spec:
storageClassName: "rook-ceph-block" #就是这里指定了前面的创建的sc
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi

EOF
1
2
3
4
5
6
7
8
9
kubectl create -f nginx-deploy-rbd.yaml
kubectl exec -it nginx-deploy-rbd-7886bf6666-qhw74 bash
echo "hello,nginx-deploy-rbd" > /usr/share/nginx/html/index.html
exit

kubectl get pod -o wide | grep nginx

#测试完就删除
kubectl delete -f nginx-deploy-rbd.yaml

image-20231119203414038

image-20231119203735080

image-20231119203755919

  • StatefulSet多副本+volumeClaimTemplates
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
cat > nginx-ss-rbd.yaml << "EOF"

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: nginx-ss-rbd
spec:
selector:
matchLabels:
app: nginx-ss-rbd
serviceName: "nginx"
replicas: 3
template:
metadata:
labels:
app: nginx-ss-rbd
spec:
containers:
- name: nginx
image: registry.cn-hangzhou.aliyuncs.com/qianyios/nginx:latest
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "rook-ceph-block" #就是这里指定了前面的创建的sc
resources:
requests:
storage: 2Gi
EOF

部署

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
kubectl create -f nginx-ss-rbd.yaml

kubectl get pod -o wide | grep nginx-ss

kubectl exec -it nginx-ss-rbd-0 bash
echo "hello,nginx-ss-rbd-0" > /usr/share/nginx/html/index.html && exit

kubectl exec -it nginx-ss-rbd-1 bash
echo "hello,nginx-ss-rbd-1" > /usr/share/nginx/html/index.html && exit

kubectl exec -it nginx-ss-rbd-2 bash
echo "hello,nginx-ss-rbd-2" > /usr/share/nginx/html/index.html && exit

#测试完就删除
kubectl delete -f nginx-ss-rbd.yaml

这里可能需要手动删除一下pvc
[root@master1 ~]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
www-nginx-ss-rbd-0 Bound pvc-4a75f201-eec0-47fa-990c-353c52fe14f4 2Gi RWO rook-ceph-block 6m27s
www-nginx-ss-rbd-1 Bound pvc-d5f7e29f-79e4-4d1e-bcbb-65ece15a8172 2Gi RWO rook-ceph-block 6m21s
www-nginx-ss-rbd-2 Bound pvc-8cce06e9-dfe4-429d-ae44-878f8e4665e0 2Gi RWO rook-ceph-block 5m53s
[root@master1 ~]# kubectl delete pvc www-nginx-ss-rbd-0
persistentvolumeclaim "www-nginx-ss-rbd-0" deleted
[root@master1 ~]# kubectl delete pvc www-nginx-ss-rbd-1
persistentvolumeclaim "www-nginx-ss-rbd-1" deleted
[root@master1 ~]# kubectl delete pvc www-nginx-ss-rbd-2
persistentvolumeclaim "www-nginx-ss-rbd-2" deleted

image-20231119204319948

image-20231119204602188

共享文件存储

部署MDS服务

创建Cephfs文件系统需要先部署MDS服务,该服务负责处理文件系统中的元数据。

  • 文件路径:/root/rook/deploy/examples/filesystem.yaml

配置文件解读

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
cd /root/rook/deploy/examples
[root@master1 examples]# grep -vE '^\s*(#|$)' filesystem.yaml
apiVersion: ceph.rook.io/v1
kind: CephFilesystem
metadata:
name: myfs
namespace: rook-ceph # namespace:cluster
spec:
metadataPool:
replicated:
size: 3 # 元数据副本数
requireSafeReplicaSize: true
parameters:
compression_mode:
none
dataPools:
- name: replicated
failureDomain: host
replicated:
size: 3 # 存储数据的副本数
requireSafeReplicaSize: true
parameters:
compression_mode:
none
preserveFilesystemOnDelete: true
metadataServer:
activeCount: 1 # MDS实例的副本数,默认1,生产环境建议设置为3
activeStandby: true
......省略

kubectl create -f filesystem.yaml
kubectl get pod -n rook-ceph | grep mds

image-20231119210143026

1
2
3
4
5
- 进入pod
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash

- 查看集群状态
ceph status

image-20231119210339096

配置存储(StorageClass)

配置文件:/root/rook/deploy/examples/csi/cephfs/storageclass.yaml

1
2
cd /root/rook/deploy/examples/csi/cephfs
kubectl apply -f storageclass.yaml

image-20231119210625832

共享文件存储使用示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
cat > nginx-deploy-cephfs.yaml << "EOF"

apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: nginx-deploy-cephfs
name: nginx-deploy-cephfs
spec:
replicas: 3
selector:
matchLabels:
app: nginx-deploy-cephfs
template:
metadata:
labels:
app: nginx-deploy-cephfs
spec:
containers:
- image: registry.cn-hangzhou.aliyuncs.com/qianyios/nginx:latest
name: nginx
volumeMounts:
- name: data
mountPath: /usr/share/nginx/html
volumes:
- name: data
persistentVolumeClaim:
claimName: nginx-cephfs-pvc
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nginx-cephfs-pvc
spec:
storageClassName: "rook-cephfs"
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
EOF
kubectl apply -f nginx-deploy-cephfs.yaml
kubectl get pod -o wide | grep cephfs
kubectl exec -it nginx-deploy-cephfs-6dc8797866-4s564 bash
echo "hello cephfs" > /usr/share/nginx/html/index.html && exit

#测试完删除
kubectl delete -f nginx-deploy-cephfs.yaml

image-20231119215220294

在K8S中直接调用出ceph命令

1
2
3
4
5
6
7
8
9
#安装epel源
yum install epel-release -y

#安装ceph仓库
yum install https://mirrors.aliyun.com/ceph/rpm-octopus/el7/noarch/ceph-release-1-1.el7.noarch.rpm -y
yum list ceph-common --showduplicates |sort -r

#安装ceph客户端
yum install ceph-common -y

同步ceph中的认证文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash

[root@master1 ~]# kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash

bash-4.4$ cat /etc/ceph/ceph.conf
[global]
mon_host = 10.97.121.57:6789,10.104.235.210:6789,10.96.136.90:6789

[client.admin]
keyring = /etc/ceph/keyring

bash-4.4$ cat /etc/ceph/keyring
[client.admin]
key = AQC241lltDbVKBAANrzwgqZd1A2eY+8h1A+BOg==
bash-4.4$


注意这两个文件,复制内容之后exit退出

image-20231119221441175

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

直接在master1创建这两个文件(这里的master1是指我要在master1可以调用ceph的客户端)

cat > /etc/ceph/ceph.conf << "EOF"
[global]
mon_host = 10.97.121.57:6789,10.104.235.210:6789,10.96.136.90:6789

[client.admin]
keyring = /etc/ceph/keyring
EOF

cat > /etc/ceph/keyring << "EOF"
[client.admin]
key = AQC241lltDbVKBAANrzwgqZd1A2eY+8h1A+BOg==
EOF

当你添加完之后直接调用ceph的命令

image-20231119221546943

删除pvc,sc及对应的存储资源

1
2
3
4
5
6
7
8
- 按需删除pvc、pv
kubectl get pvc -n [namespace] | awk '{print $1};' | xargs kubectl delete pvc -n [namespace]
kubectl get pv | grep Released | awk '{print $1};' | xargs kubectl delete pv

- 删除块存储及SC
kubectl delete -n rook-ceph cephblockpool replicapool
kubectl delete storageclass rook-ceph-block

特别声明
千屹博客旗下的所有文章,是通过本人课堂学习和课外自学所精心整理的知识巨著
难免会有出错的地方
如果细心的你发现了小失误,可以在下方评论区告诉我,或者私信我!
非常感谢大家的热烈支持!