At v1.16, Kubernetes supports clusters with up to 5000 nodes. More specifically, we support configurations that meet all of the following criteria:
A cluster is a set of nodes (physical or virtual machines) running Kubernetes agents, managed by a “master” (the cluster-level control plane).
Normally the number of nodes in a cluster is controlled by the value
NUM_NODES in the platform-specific
config-default.sh file (for example, see GCE’s
Simply changing that value to something very large, however, may cause the setup script to fail for many cloud providers. A GCE deployment, for example, will run in to quota issues and fail to bring the cluster up.
When setting up a large Kubernetes cluster, the following issues must be considered.
To avoid running into cloud provider quota issues, when creating a cluster with many nodes, consider:
To improve performance of large clusters, we store events in a separate dedicated etcd instance.
When creating a cluster, existing salt scripts:
On GCE/Google Kubernetes Engine, and AWS,
kube-up automatically configures the proper VM size for your master depending on the number of nodes
in your cluster. On other providers, you will need to configure it manually. For reference, the sizes we use on GCE are
And the sizes we use on AWS are
On Google Kubernetes Engine, the size of the master node adjusts automatically based on the size of your cluster. For more information, see this blog post.
On AWS, master node sizes are currently set at cluster startup time and do not change, even if you later scale your cluster up or down by manually removing or adding nodes or using a cluster autoscaler.
To prevent memory leaks or other resource issues in cluster addons from consuming all the resources available on a node, Kubernetes sets resource limits on addon containers to limit the CPU and Memory resources they can consume (See PR #10653 and #10778).
containers: - name: fluentd-cloud-logging image: k8s.gcr.io/fluentd-gcp:1.16 resources: limits: cpu: 100m memory: 200Mi
Except for Heapster, these limits are static and are based on data we collected from addons running on 4-node clusters (see #10335). The addons consume a lot more resources when running on large deployment clusters (see #5880). So, if a large cluster is deployed without adjusting these values, the addons may continuously get killed because they keep hitting the limits.
To avoid running into cluster addon resource issues, when creating a cluster with many nodes, consider the following:
Heapster’s resource limits are set dynamically based on the initial size of your cluster (see #16185 and #22940). If you find that Heapster is running out of resources, you should adjust the formulas that compute heapster memory request (see those PRs for details).
For directions on how to detect if addon containers are hitting resource limits, see the Troubleshooting section of Compute Resources.
In the future, we anticipate to set all cluster addon resource limits based on cluster size, and to dynamically adjust them if you grow or shrink your cluster. We welcome PRs that implement those features.
For various reasons (see #18969 for more details) running
kube-up.sh with a very large
NUM_NODES may fail due to a very small number of nodes not coming up properly.
Currently you have two choices: restart the cluster (
kube-down.sh and then
kube-up.sh again), or before
kube-up.sh set the environment variable
ALLOWED_NOTREADY_NODES to whatever value you feel comfortable
with. This will allow
kube-up.sh to succeed with fewer than
NUM_NODES coming up. Depending on the
reason for the failure, those additional nodes may join later or the cluster may remain at a size of
NUM_NODES - ALLOWED_NOTREADY_NODES.