mirror of
https://github.com/ubicloud/ubicloud.git
synced 2025-11-28 00:20:26 +08:00
Upgrades (including control plane VM replacements) caused disruptions in
pod-to-service communication. Symptoms included CoreDNS failing to reach the
API server ("no route to host"), leading to DNS resolution failures (e.g.,
connection refused to kube-dns at 10.96.0.10), reject rules in iptables, and
broader service access issues. Pod-to-pod and pod-to-host traffic were
unaffected, indicating a service endpoint problem.
Root cause: The kubeadm-config ConfigMap set apiServer.extraArgs.advertise-
address to a static IP (e.g., the initial control plane IP). During upgrades,
this IP became outdated as new VMs received new IPs, but the config wasn't
updated. This led to:
- kube-apiserver advertising the old IP
- The default/kubernetes service’s Endpoints/EndpointSlice being recreated with
the wrong backend IP
- kube-proxy DNAT rules routing traffic (e.g., to 10.96.0.1:443) to the
unreachable old IP
- Circular dependency: CoreDNS couldn’t sync with the API, preventing readiness
and worsening DNS issues.
Solution: Remove the advertise-address arg entirely from kubeadm-config. This
lets kube-apiserver auto-detect and advertise the node’s primary interface IP
(default behavior per Kubernetes docs). On upgrade:
- New control plane VMs advertise their current IP
- Endpoints/EndpointSlice update automatically during manifest regeneration or
upgrade apply
This fix applies universally:
Single-node: Prevents total disruption from IP changes
Multi-node (HA): Each control plane node advertises its own IP;
Endpoints include all nodes for failover
|
||
|---|---|---|
| .. | ||
| bin | ||
| lib | ||
| manifests/ubicsi | ||
| spec | ||