k8s集群部署-过程中的问题解决方案
kubeadm init初始化报错(1)
[root@k8s-master ~]# kubeadm init --pod-network-cidr=10.220.180.0/16 --image-repository registry.aliyuncs.com/google_containers --kubernetes-version v1.28.2 [init] Using Kubernetes version: v1.28.2 [preflight] Running pre-flight checks error execution phase preflight: [preflight] Some fatal errors occurred: [ERROR FileContent--proc-sys-net-ipv4-ip_forward]: /proc/sys/net/ipv4/ip_forward contents are not set to 1 [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...` To see the stack trace of this error execute with --v=5 or higher
解决办法:
[root@k8s-master ~]# sysctl -w net.ipv4.ip_forward=1 net.ipv4.ip_forward = 1
kubeadm init初始化报错(2)
[kubelet-check] Initial timeout of 40s passed. Unfortunately, an error has occurred: timed out waiting for the condition This error is likely caused by: - The kubelet is not running - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled) If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands: - 'systemctl status kubelet' - 'journalctl -xeu kubelet' Additionally, a control plane component may have crashed or exited when started by the container runtime. To troubleshoot, list all containers using your preferred container runtimes CLI. Here is one example how you may list all running Kubernetes containers by using crictl: - 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause' Once you have found the failing container, you can inspect its logs with: - 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID' error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster To see the stack trace of this error execute with --v=5 or higher
解决办法:
[root@k8s-master ~]# containerd config default > /etc/containerd/config.toml [root@k8s-master ~]# vim /etc/containerd/config.toml ....... 61 sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.6" ....... [root@k8s-master ~]# systemctl daemon-reload [root@k8s-master ~]# systemctl restart containerd.service [root@k8s-master ~]# kubeadm reset [reset] Reading configuration from the cluster... [reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' W0202 16:21:47.717170 27518 reset.go:120] [reset] Unable to fetch the kubeadm-config ConfigMap from cluster: failed to get config map: Get "https://10.220.180.120:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config?timeout=10s": dial tcp 10.220.180.120:6443: connect: connection refused W0202 16:21:47.717345 27518 preflight.go:56] [reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted. [reset] Are you sure you want to proceed? [y/N]: y [preflight] Running pre-flight checks W0202 16:21:49.468038 27518 removeetcdmember.go:106] [reset] No kubeadm config, using etcd pod spec to get data directory [reset] Stopping the kubelet service [reset] Unmounting mounted directories in "/var/lib/kubelet" [reset] Deleting contents of directories: [/etc/kubernetes/manifests /var/lib/kubelet /etc/kubernetes/pki] [reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf] The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d The reset process does not reset or clean up iptables rules or IPVS tables. If you wish to reset iptables, you must do so manually by using the "iptables" command. If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar) to reset your system's IPVS tables. The reset process does not clean your kubeconfig files and you must remove them manually. Please, check the contents of the $HOME/.kube/config file.
镜像拉取报错
[root@k8s-master ~]# kubeadm config images pull --image-repository registry.aliyuncs.com/google_containers I0202 14:51:46.672477 24062 version.go:256] remote version is much newer: v1.29.1; falling back to: stable-1.28 failed to pull image "registry.aliyuncs.com/google_containers/kube-apiserver:v1.28.6": output: time="2024-02-02T14:51:48+08:00" level=fatal msg="validate service connection: CRI v1 image API is not implemented for endpoint \"unix:///var/run/containerd/containerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.ImageService" , error: exit status 1 To see the stack trace of this error execute with --v=5 or higher
解决办法:
[root@k8s-master ~]# cat > /etc/containerd/config.toml <<EOF [plugins."io.containerd.grpc.v1.cri"] systemd_cgroup = true EOF [root@k8s-master ~]# systemctl restart containerd
kubectl get all命令报错
[root@k8s-node2 ~]# kubectl get all E0131 11:16:59.362595 15243 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused E0131 11:16:59.363283 15243 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused E0131 11:16:59.367220 15243 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused E0131 11:16:59.369763 15243 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused E0131 11:16:59.370695 15243 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused The connection to the server localhost:8080 was refused - did you specify the right host or port?
解决方法:
在master节点复制/etc/kubernetes/admin.conf文件到node节点
[root@k8s-node2 ~]# scp /etc/containerd/config.toml k8s-node1:/etc/containerd/ [root@k8s-node2 ~]# mkdir ~/.kube [root@k8s-node2 kubernetes]# cp /etc/kubernetes/admin.conf ~/.kube/config [root@k8s-node2 kubernetes]# kubectl get all NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 27m
在node节点安装k8s后,k8s没有起来
[root@k8s-node2 ~]# systemctl status kubelet.service ● kubelet.service - kubelet: The Kubernetes Node Agent Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled) Drop-In: /usr/lib/systemd/system/kubelet.service.d └─10-kubeadm.conf Active: activating (auto-restart) (Result: exit-code) since Wed 2024-01-31 09:23:40 CST; 2s ago Docs: https://kubernetes.io/docs/ Process: 9155 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE) Main PID: 9155 (code=exited, status=1/FAILURE)
这个报错不用管,后面主节点初始化后把kubeadm join 10.220.180.120:6443 –token……在node节点执行后,自己就起来了
节点扩容报错
[root@k8s-node2 ~]# kubeadm join 10.220.180.120:6443 --token 7m1j99.44fjdyw4u7jadxo7 --discovery-token-ca-cert-hash sha256:ace63743c8f6da2784f4646f3aacd12c735c8cf1066042d48fa8a11ea470d35c [preflight] Running pre-flight checks error execution phase preflight: [preflight] Some fatal errors occurred: [ERROR CRI]: container runtime is not running: output: time="2024-02-02T16:38:03+08:00" level=fatal msg="validate service connection: CRI v1 runtime API is not implemented for endpoint \"unix:///var/run/containerd/containerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService" , error: exit status 1 [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...` To see the stack trace of this error execute with --v=5 or higher
解决办法:
[root@k8s-node2 ~]# containerd config default | sudo tee /etc/containerd/config.toml disabled_plugins = [] imports = [] oom_score = 0 plugin_dir = "" required_plugins = [] root = "/var/lib/containerd" state = "/run/containerd" temp = "" version = 2 ........ [root@k8s-node2 ~]# sed -i '96s/runtime_type.*/runtime_type = "io.containerd.runtime.v1.linux"/' /etc/containerd/config.toml [root@k8s-node2 ~]# cat -n /etc/containerd/config.toml 1 disabled_plugins = [] 2 imports = [] 3 oom_score = 0 4 plugin_dir = "" 5 required_plugins = [] 6 root = "/var/lib/containerd" 7 state = "/run/containerd" 8 temp = "" 9 version = 2 ....... 96 runtime_type = "io.containerd.runtime.v1.linux" ....... [root@k8s-node2 ~]# kubeadm join 10.220.180.120:6443 --token 7m1j99.44fjdyw4u7jadxo7 --discovery-token-ca-cert-hash sha256:ace63743c8f6da2784f4646f3aacd12c735c8cf1066042d48fa8a11ea470d35c [preflight] Running pre-flight checks error execution phase preflight: [preflight] Some fatal errors occurred: [ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists [ERROR Port-10250]: Port 10250 is in use [ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...` To see the stack trace of this error execute with --v=5 or higher [root@k8s-node2 ~]# kubeadm reset [reset] Reading configuration from the cluster... [reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' W0202 17:12:58.600028 27133 reset.go:120] [reset] Unable to fetch the kubeadm-config ConfigMap from cluster: failed to getAPIEndpoint: could not retrieve API endpoints for node "k8s-node2" using pod annotations: timed out waiting for the condition W0202 17:12:58.600674 27133 preflight.go:56] [reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted. [reset] Are you sure you want to proceed? [y/N]: y [preflight] Running pre-flight checks W0202 17:13:04.108817 27133 removeetcdmember.go:106] [reset] No kubeadm config, using etcd pod spec to get data directory [reset] Deleted contents of the etcd data directory: /var/lib/etcd [reset] Stopping the kubelet service [reset] Unmounting mounted directories in "/var/lib/kubelet" [reset] Deleting contents of directories: [/etc/kubernetes/manifests /var/lib/kubelet /etc/kubernetes/pki] [reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf] The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d The reset process does not reset or clean up iptables rules or IPVS tables. If you wish to reset iptables, you must do so manually by using the "iptables" command. If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar) to reset your system's IPVS tables. The reset process does not clean your kubeconfig files and you must remove them manually. Please, check the contents of the $HOME/.kube/config file.
[root@k8s-node2 ~]# kubeadm join 10.220.180.120:6443 --token 7m1j99.44fjdyw4u7jadxo7 --discovery-token-ca-cert-hash sha256:ace63743c8f6da2784f4646f3aacd12c735c8cf1066042d48fa8a11ea470d35c [preflight] Running pre-flight checks [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Starting the kubelet [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap... This node has joined the cluster: * Certificate signing request was sent to apiserver and a response was received. * The Kubelet was informed of the new secure connection details. Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
coredn-* 报错ContainerCreating:
[root@k8s-master ~]# kubectl get pod -n kube-system NAME READY STATUS RESTARTS AGE coredns-66f779496c-b4nqp 0/1 ContainerCreating 0 119m coredns-66f779496c-bx6sk 0/1 ContainerCreating 0 119m etcd-k8s-master 1/1 Running 0 119m kube-apiserver-k8s-master 1/1 Running 0 119m kube-controller-manager-k8s-master 1/1 Running 1 119m kube-proxy-btqzp 0/1 ContainerCreating 0 54m kube-proxy-hb6n8 1/1 Running 0 119m kube-proxy-nlxlh 0/1 ContainerCreating 0 88m kube-scheduler-k8s-master 1/1 Running 0 119m
解决办法:节点创建文件(每个节点都要有)
[root@k8s-master ~]# mkdir /run/flannel/ [root@k8s-master ~]# cat /run/flannel/subnet.env FLANNEL_NETWORK=10.244.0.0/16 FLANNEL_SUBNET=10.244.0.1/24 FLANNEL_MTU=1450 FLANNEL_IPMASQ=true
kube-proxy-* 报错ContainerCreating:
[root@k8s-master ~]# kubectl get pod -n kube-system NAME READY STATUS RESTARTS AGE coredns-66f779496c-b4nqp 1/1 Running 0 40h coredns-66f779496c-bx6sk 1/1 Running 0 40h etcd-k8s-master 1/1 Running 0 40h kube-apiserver-k8s-master 1/1 Running 0 40h kube-controller-manager-k8s-master 1/1 Running 1 40h kube-proxy-btqzp 0/1 ContainerCreating 0 39h kube-proxy-hb6n8 1/1 Running 0 40h kube-proxy-nlxlh 0/1 ContainerCreating 0 40h kube-scheduler-k8s-master 1/1 Running 0 40h 查看日志: [root@k8s-node2 ~]# kubectl describe pods -n kube-system coredns-66f779496c-b4nqp Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 31m (x9 over 71m) default-scheduler 0/3 nodes are available: 3 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.. Normal Scheduled 26m default-scheduler Successfully assigned kube-system/coredns-66f779496c-b4nqp to k8s-master Warning FailedCreatePodSandBox 26m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "54708a65adf2300d4f1c558d0b1e2e8de45a7acbbabb74c9b6fda33f24152590": plugin type="flannel" failed (add): loadFlannelSubnetEnv failed: open /run/flannel/subnet.env: no such file or directory Warning FailedCreatePodSandBox 26m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "ca3a27de8ba123d98d86c8423410e8bf24ce061ab95107aca26d2f682fc28c89": plugin type="flannel" failed (add): loadFlannelSubnetEnv failed: open /run/flannel/subnet.env: no such file or directory Warning FailedCreatePodSandBox 26m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "21f816da0e1c6840692bbdd8d00099fe8eb716c93957e283db2af345754c2c3d": plugin type="flannel" failed (add): loadFlannelSubnetEnv failed: open /run/flannel/subnet.env: no such file or directory Warning FailedCreatePodSandBox 25m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "5289937d16c50093045c10caf1956c1b1085f1a962ea26dd3886e25e5056e747": plugin type="flannel" failed (add): loadFlannelSubnetEnv failed: open /run/flannel/subnet.env: no such file or directory Warning FailedCreatePodSandBox 25m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "5692369b4bf6b804cc02052b40b888209cad25d92e9e347e9178f7c5bff0334f": plugin type="flannel" failed (add): loadFlannelSubnetEnv failed: open /run/flannel/subnet.env: no such file or directory
解决办法:
[root@k8s-master ~]# scp /etc/containerd/config.toml 10.220.180.130:/etc/containerd/ root@10.220.180.130's password: config.toml 100% 7065 4.1MB/s 00:00 [root@k8s-master ~]# scp /etc/containerd/config.toml 10.220.180.131:/etc/containerd/ root@10.220.180.131's password: config.toml 100% 7065 4.7MB/s 00:00 [root@k8s-node1 ~]# systemctl restart containerd.service [root@k8s-node2 ~]# systemctl restart containerd.service
[root@k8s-master ~]# kubectl get pod -n kube-system NAME READY STATUS RESTARTS AGE coredns-66f779496c-b4nqp 1/1 Running 0 41h coredns-66f779496c-bx6sk 1/1 Running 0 41h etcd-k8s-master 1/1 Running 0 41h kube-apiserver-k8s-master 1/1 Running 0 41h kube-controller-manager-k8s-master 1/1 Running 1 41h kube-proxy-btqzp 1/1 Running 0 40h kube-proxy-hb6n8 1/1 Running 0 41h kube-proxy-nlxlh 1/1 Running 0 40h kube-scheduler-k8s-master 1/1 Running 0 41h
kube-flannel 报错CrashLoopBackOff:
[root@k8s-master ~]# kubectl get pods -o wide -A NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-flannel kube-flannel-ds-9vv2z 0/1 CrashLoopBackOff 6 (5m2s ago) 10m 10.220.180.120 k8s-master <none> <none> kube-flannel kube-flannel-ds-szbdj 0/1 CrashLoopBackOff 6 (5m ago) 10m 10.220.180.130 k8s-node1 <none> <none> kube-flannel kube-flannel-ds-t7dxb 0/1 CrashLoopBackOff 6 (5m8s ago) 10m 10.220.180.131 k8s-node2 <none> <none> kube-system coredns-66f779496c-b4nqp 1/1 Running 0 42h 10.244.0.3 k8s-master <none> <none> kube-system coredns-66f779496c-bx6sk 1/1 Running 0 42h 10.244.0.2 k8s-master <none> <none> kube-system etcd-k8s-master 1/1 Running 0 42h 10.220.180.120 k8s-master <none> <none> kube-system kube-apiserver-k8s-master 1/1 Running 0 42h 10.220.180.120 k8s-master <none> <none> kube-system kube-controller-manager-k8s-master 1/1 Running 1 42h 10.220.180.120 k8s-master <none> <none> kube-system kube-proxy-btqzp 1/1 Running 0 41h 10.220.180.130 k8s-node1 <none> <none> kube-system kube-proxy-hb6n8 1/1 Running 0 42h 10.220.180.120 k8s-master <none> <none> kube-system kube-proxy-nlxlh 1/1 Running 0 42h 10.220.180.131 k8s-node2 <none> <none> kube-system kube-scheduler-k8s-master 1/1 Running 0 42h 10.220.180.120 k8s-master <none> <none>
解决办法:
[root@k8s-master ~]# vim kube-flannel.ym ...... 98 net-conf.json: | 99 { 100 "Network": "10.220.0.0/16", 101 "Backend": { 102 "Type": "vxlan" 103 } 104 }
...... [root@k8s-master ~]# kubectl apply kube-flannel.yml
[root@k8s-master ~]# kubectl get pods -o wide -A NAMESPACE NAME R EADY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-flannel kube-flannel-ds-5r9dj 1/1 Running 0 7s 10.220.180.120 k8s-master <none> <none> kube-flannel kube-flannel-ds-cqt49 1/1 Running 0 7s 10.220.180.130 k8s-node1 <none> <none> kube-flannel kube-flannel-ds-p7b4n 1/1 Running 0 7s 10.220.180.131 k8s-node2 <none> <none> kube-system coredns-66f779496c-b4nqp 1/1 Running 0 43h 10.244.0.3 k8s-master <none> <none> kube-system coredns-66f779496c-bx6sk 1/1 Running 0 43h 10.244.0.2 k8s-master <none> <none> kube-system etcd-k8s-master 1/1 Running 0 43h 10.220.180.120 k8s-master <none> <none> kube-system kube-apiserver-k8s-master 1/1 Running 0 43h 10.220.180.120 k8s-master <none> <none> kube-system kube-controller-manager-k8s-master 1/1 Running 1 43h 10.220.180.120 k8s-master <none> <none> kube-system kube-proxy-btqzp 1/1 Running 0 42h 10.220.180.130 k8s-node1 <none> <none> kube-system kube-proxy-hb6n8 1/1 Running 0 43h 10.220.180.120 k8s-master <none> <none> kube-system kube-proxy-nlxlh 1/1 Running 0 42h 10.220.180.131 k8s-node2 <none> <none> kube-system kube-scheduler-k8s-master 1/1 Running 0 43h 10.220.180.120 k8s-master <none> <none>
Appreciating the hard work you put into your site and in depth information you present.
It’s nice to come across a blog every once in a
while that isn’t the same old rehashed material.
Great read! I’ve bookmarked your site and I’m including your RSS feeds to my Google account.
Look at my page; John E. Snyder