云野 New-运维运维 2024-01-31

k8s集群部署-过程中的问题解决方案

kubeadm init初始化报错（1）

[root@k8s-master ~]# kubeadm init --pod-network-cidr=10.220.180.0/16 --image-repository registry.aliyuncs.com/google_containers --kubernetes-version v1.28.2
[init] Using Kubernetes version: v1.28.2
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR FileContent--proc-sys-net-ipv4-ip_forward]: /proc/sys/net/ipv4/ip_forward contents are not set to 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

解决办法：

[root@k8s-master ~]# sysctl -w net.ipv4.ip_forward=1
net.ipv4.ip_forward = 1

kubeadm init初始化报错（2）

[kubelet-check] Initial timeout of 40s passed.

Unfortunately, an error has occurred:
timed out waiting for the condition

This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher

解决办法：

[root@k8s-master ~]# containerd config default > /etc/containerd/config.toml
[root@k8s-master ~]# vim /etc/containerd/config.toml 
.......
61 sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.6"
.......
[root@k8s-master ~]# systemctl daemon-reload
[root@k8s-master ~]# systemctl restart containerd.service
[root@k8s-master ~]# kubeadm reset
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0202 16:21:47.717170 27518 reset.go:120] [reset] Unable to fetch the kubeadm-config ConfigMap from cluster: failed to get config map: Get "https://10.220.180.120:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config?timeout=10s": dial tcp 10.220.180.120:6443: connect: connection refused
W0202 16:21:47.717345 27518 preflight.go:56] [reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W0202 16:21:49.468038 27518 removeetcdmember.go:106] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of directories: [/etc/kubernetes/manifests /var/lib/kubelet /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]

The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.

镜像拉取报错

[root@k8s-master ~]# kubeadm config images pull --image-repository registry.aliyuncs.com/google_containers
I0202 14:51:46.672477 24062 version.go:256] remote version is much newer: v1.29.1; falling back to: stable-1.28
failed to pull image "registry.aliyuncs.com/google_containers/kube-apiserver:v1.28.6": output: time="2024-02-02T14:51:48+08:00" level=fatal msg="validate service connection: CRI v1 image API is not implemented for endpoint \"unix:///var/run/containerd/containerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.ImageService"
, error: exit status 1
To see the stack trace of this error execute with --v=5 or higher

解决办法：

[root@k8s-master ~]# cat > /etc/containerd/config.toml <<EOF
[plugins."io.containerd.grpc.v1.cri"]
systemd_cgroup = true
EOF
[root@k8s-master ~]# systemctl restart containerd

kubectl get all命令报错

[root@k8s-node2 ~]# kubectl get all
E0131 11:16:59.362595 15243 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0131 11:16:59.363283 15243 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0131 11:16:59.367220 15243 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0131 11:16:59.369763 15243 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0131 11:16:59.370695 15243 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
The connection to the server localhost:8080 was refused - did you specify the right host or port?

解决方法：
在master节点复制/etc/kubernetes/admin.conf文件到node节点

[root@k8s-node2 ~]# scp /etc/containerd/config.toml k8s-node1:/etc/containerd/
[root@k8s-node2 ~]# mkdir ~/.kube
[root@k8s-node2 kubernetes]# cp /etc/kubernetes/admin.conf ~/.kube/config
[root@k8s-node2 kubernetes]# kubectl get all
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 27m

在node节点安装k8s后，k8s没有起来

[root@k8s-node2 ~]# systemctl status kubelet.service 
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: activating (auto-restart) (Result: exit-code) since Wed 2024-01-31 09:23:40 CST; 2s ago
Docs: https://kubernetes.io/docs/
Process: 9155 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
Main PID: 9155 (code=exited, status=1/FAILURE)

这个报错不用管，后面主节点初始化后把kubeadm join 10.220.180.120:6443 –token……在node节点执行后，自己就起来了

节点扩容报错

[root@k8s-node2 ~]# kubeadm join 10.220.180.120:6443 --token 7m1j99.44fjdyw4u7jadxo7 --discovery-token-ca-cert-hash sha256:ace63743c8f6da2784f4646f3aacd12c735c8cf1066042d48fa8a11ea470d35c
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR CRI]: container runtime is not running: output: time="2024-02-02T16:38:03+08:00" level=fatal msg="validate service connection: CRI v1 runtime API is not implemented for endpoint \"unix:///var/run/containerd/containerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService"
, error: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

解决办法：

[root@k8s-node2 ~]# containerd config default | sudo tee /etc/containerd/config.toml
disabled_plugins = []
imports = []
oom_score = 0
plugin_dir = ""
required_plugins = []
root = "/var/lib/containerd"
state = "/run/containerd"
temp = ""
version = 2
........

[root@k8s-node2 ~]# sed -i '96s/runtime_type.*/runtime_type = "io.containerd.runtime.v1.linux"/' /etc/containerd/config.toml
[root@k8s-node2 ~]# cat -n /etc/containerd/config.toml
1 disabled_plugins = []
2 imports = []
3 oom_score = 0
4 plugin_dir = ""
5 required_plugins = []
6 root = "/var/lib/containerd"
7 state = "/run/containerd"
8 temp = ""
9 version = 2
.......
96 runtime_type = "io.containerd.runtime.v1.linux"
.......

[root@k8s-node2 ~]# kubeadm join 10.220.180.120:6443 --token 7m1j99.44fjdyw4u7jadxo7 --discovery-token-ca-cert-hash sha256:ace63743c8f6da2784f4646f3aacd12c735c8cf1066042d48fa8a11ea470d35c
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
[ERROR Port-10250]: Port 10250 is in use
[ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

[root@k8s-node2 ~]# kubeadm reset
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0202 17:12:58.600028 27133 reset.go:120] [reset] Unable to fetch the kubeadm-config ConfigMap from cluster: failed to getAPIEndpoint: could not retrieve API endpoints for node "k8s-node2" using pod annotations: timed out waiting for the condition
W0202 17:12:58.600674 27133 preflight.go:56] [reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W0202 17:13:04.108817 27133 removeetcdmember.go:106] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] Deleted contents of the etcd data directory: /var/lib/etcd
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of directories: [/etc/kubernetes/manifests /var/lib/kubelet /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]

The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.

[root@k8s-node2 ~]# kubeadm join 10.220.180.120:6443 --token 7m1j99.44fjdyw4u7jadxo7 --discovery-token-ca-cert-hash sha256:ace63743c8f6da2784f4646f3aacd12c735c8cf1066042d48fa8a11ea470d35c
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

coredn-* 报错ContainerCreating：

[root@k8s-master ~]# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-66f779496c-b4nqp 0/1 ContainerCreating 0 119m
coredns-66f779496c-bx6sk 0/1 ContainerCreating 0 119m
etcd-k8s-master 1/1 Running 0 119m
kube-apiserver-k8s-master 1/1 Running 0 119m
kube-controller-manager-k8s-master 1/1 Running 1 119m
kube-proxy-btqzp 0/1 ContainerCreating 0 54m
kube-proxy-hb6n8 1/1 Running 0 119m
kube-proxy-nlxlh 0/1 ContainerCreating 0 88m
kube-scheduler-k8s-master 1/1 Running 0 119m

解决办法：节点创建文件（每个节点都要有）

[root@k8s-master ~]# mkdir /run/flannel/
[root@k8s-master ~]# cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

kube-proxy-* 报错ContainerCreating：

[root@k8s-master ~]# kubectl get pod -n kube-system
NAME                       READY STATUS   RESTARTS AGE
coredns-66f779496c-b4nqp   1/1   Running  0        40h
coredns-66f779496c-bx6sk   1/1   Running  0        40h
etcd-k8s-master            1/1   Running  0        40h
kube-apiserver-k8s-master  1/1   Running  0        40h
kube-controller-manager-k8s-master 1/1 Running     1   40h
kube-proxy-btqzp           0/1   ContainerCreating 0   39h
kube-proxy-hb6n8           1/1   Running           0   40h
kube-proxy-nlxlh           0/1   ContainerCreating 0   40h
kube-scheduler-k8s-master  1/1   Running           0   40h
查看日志：
[root@k8s-node2 ~]# kubectl describe pods -n kube-system coredns-66f779496c-b4nqp
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 31m (x9 over 71m) default-scheduler 0/3 nodes are available: 3 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling..
Normal Scheduled 26m default-scheduler Successfully assigned kube-system/coredns-66f779496c-b4nqp to k8s-master
Warning FailedCreatePodSandBox 26m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "54708a65adf2300d4f1c558d0b1e2e8de45a7acbbabb74c9b6fda33f24152590": plugin type="flannel" failed (add): loadFlannelSubnetEnv failed: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 26m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "ca3a27de8ba123d98d86c8423410e8bf24ce061ab95107aca26d2f682fc28c89": plugin type="flannel" failed (add): loadFlannelSubnetEnv failed: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 26m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "21f816da0e1c6840692bbdd8d00099fe8eb716c93957e283db2af345754c2c3d": plugin type="flannel" failed (add): loadFlannelSubnetEnv failed: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 25m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "5289937d16c50093045c10caf1956c1b1085f1a962ea26dd3886e25e5056e747": plugin type="flannel" failed (add): loadFlannelSubnetEnv failed: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 25m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "5692369b4bf6b804cc02052b40b888209cad25d92e9e347e9178f7c5bff0334f": plugin type="flannel" failed (add): loadFlannelSubnetEnv failed: open /run/flannel/subnet.env: no such file or directory

解决办法：

[root@k8s-master ~]# scp /etc/containerd/config.toml 10.220.180.130:/etc/containerd/
root@10.220.180.130's password: 
config.toml 100% 7065 4.1MB/s 00:00 
[root@k8s-master ~]# scp /etc/containerd/config.toml 10.220.180.131:/etc/containerd/
root@10.220.180.131's password: 
config.toml 100% 7065 4.7MB/s 00:00

[root@k8s-node1 ~]# systemctl restart containerd.service
[root@k8s-node2 ~]# systemctl restart containerd.service

[root@k8s-master ~]# kubectl get pod -n kube-system
NAME                       READY STATUS    RESTARTS AGE
coredns-66f779496c-b4nqp   1/1   Running   0        41h
coredns-66f779496c-bx6sk   1/1   Running   0        41h
etcd-k8s-master            1/1   Running   0        41h
kube-apiserver-k8s-master  1/1   Running   0        41h
kube-controller-manager-k8s-master 1/1 Running 1    41h
kube-proxy-btqzp           1/1   Running   0        40h
kube-proxy-hb6n8           1/1   Running   0        41h
kube-proxy-nlxlh           1/1   Running   0        40h
kube-scheduler-k8s-master  1/1   Running   0        41h

kube-flannel 报错CrashLoopBackOff：

[root@k8s-master ~]# kubectl get pods -o wide -A
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-flannel kube-flannel-ds-9vv2z 0/1 CrashLoopBackOff 6 (5m2s ago) 10m 10.220.180.120 k8s-master <none> <none>
kube-flannel kube-flannel-ds-szbdj 0/1 CrashLoopBackOff 6 (5m ago) 10m 10.220.180.130 k8s-node1 <none> <none>
kube-flannel kube-flannel-ds-t7dxb 0/1 CrashLoopBackOff 6 (5m8s ago) 10m 10.220.180.131 k8s-node2 <none> <none>
kube-system coredns-66f779496c-b4nqp 1/1 Running 0 42h 10.244.0.3 k8s-master <none> <none>
kube-system coredns-66f779496c-bx6sk 1/1 Running 0 42h 10.244.0.2 k8s-master <none> <none>
kube-system etcd-k8s-master 1/1 Running 0 42h 10.220.180.120 k8s-master <none> <none>
kube-system kube-apiserver-k8s-master 1/1 Running 0 42h 10.220.180.120 k8s-master <none> <none>
kube-system kube-controller-manager-k8s-master 1/1 Running 1 42h 10.220.180.120 k8s-master <none> <none>
kube-system kube-proxy-btqzp 1/1 Running 0 41h 10.220.180.130 k8s-node1 <none> <none>
kube-system kube-proxy-hb6n8 1/1 Running 0 42h 10.220.180.120 k8s-master <none> <none>
kube-system kube-proxy-nlxlh 1/1 Running 0 42h 10.220.180.131 k8s-node2 <none> <none>
kube-system kube-scheduler-k8s-master 1/1 Running 0 42h 10.220.180.120 k8s-master <none> <none>

解决办法：

[root@k8s-master ~]# vim  kube-flannel.ym
......
98 net-conf.json: |
99   { 
100    "Network": "10.220.0.0/16",
101    "Backend": {
102      "Type": "vxlan"
103    }
104  }

......

[root@k8s-master ~]# kubectl apply kube-flannel.yml

[root@k8s-master ~]# kubectl get pods -o wide -A
NAMESPACE     NAME R                     EADY STATUS RESTARTS AGE  IP              NODE        NOMINATED NODE READINESS GATES
kube-flannel  kube-flannel-ds-5r9dj      1/1  Running   0      7s  10.220.180.120  k8s-master  <none>         <none>
kube-flannel  kube-flannel-ds-cqt49      1/1  Running   0      7s  10.220.180.130  k8s-node1   <none>         <none>
kube-flannel  kube-flannel-ds-p7b4n      1/1  Running   0      7s  10.220.180.131  k8s-node2   <none>         <none>
kube-system   coredns-66f779496c-b4nqp   1/1  Running   0      43h 10.244.0.3     k8s-master   <none>         <none>
kube-system   coredns-66f779496c-bx6sk   1/1  Running   0      43h 10.244.0.2     k8s-master   <none>         <none>
kube-system   etcd-k8s-master            1/1  Running   0      43h 10.220.180.120 k8s-master   <none>         <none>
kube-system   kube-apiserver-k8s-master  1/1  Running   0      43h 10.220.180.120 k8s-master   <none>         <none>
kube-system   kube-controller-manager-k8s-master 1/1 Running 1 43h 10.220.180.120 k8s-master   <none>         <none>
kube-system   kube-proxy-btqzp           1/1  Running   0      42h 10.220.180.130 k8s-node1    <none>         <none>
kube-system   kube-proxy-hb6n8           1/1  Running   0      43h 10.220.180.120 k8s-master   <none>         <none>
kube-system   kube-proxy-nlxlh           1/1  Running   0      42h 10.220.180.131 k8s-node2    <none>         <none>
kube-system   kube-scheduler-k8s-master  1/1  Running   0      43h 10.220.180.120 k8s-master   <none>         <none>

docker，k8s

云野 » k8s集群部署-过程中的问题解决方案

云野普通

分享到：

1 评论

普通 John E. Snyder

2024年 6月 23日 at 下午7:52 回复

Appreciating the hard work you put into your site and in depth information you present.
It’s nice to come across a blog every once in a
while that isn’t the same old rehashed material.

Great read! I’ve bookmarked your site and I’m including your RSS feeds to my Google account.

Look at my page; John E. Snyder

云野 普通

相关推荐

1 评论

发表回复 取消回复

云野普通

发表回复取消回复