Troubleshooting and solving pod scheduling issues

Scheduling is the process in Kubernetes that manages pods by deciding which pod gets deployed in which node. Effective scheduling ensures proper resource (e.g., CPU, memory) allocation and optimizes load balancing to enhance overall system performance and stability.

By default, the kube-scheduler in the control plane is responsible for dynamically scheduling pods based on current resource availability and operational requirements.

kube-scheduler dynamic pod scheduling Fig. 1: kube-scheduler dynamic pod scheduling

The illustration above demonstrates how different components communicate when you run the YAML file with the kubectl apply -f app-descriptor.yaml command. Once the master node receives the YAML file:

  • kube-apiserver validates and processes the YAML file by converting it into Kubernetes objects.
  • kube-scheduler detects the new unscheduled pods and, according to the given resource requirements, selects the most suitable node for each pod.
  • kube-scheduler updates the pod's information to reflect the chosen node and communicates to the kubelet via the kube-apiserver.
  • kubelet on the designated node takes over to pull the necessary container images, create the containers, and manage their lifecycle.

kube-scheduler considers several factors when scheduling, including resource requirements. For example, assume the above circle image requests a minimum of 1 CPU and 1GB of memory. Then, the scheduler looks for available resources on the worker nodes to identify the best node to place the pod. In addition to resource requests, there can be other constraints such as taints and tolerations, node selectors, affinity and anti-affinity rules, and custom schedulers for more customized scheduling operations.

Unscheduled pods get scheduled based on the resource requirements. Fig. 2: Unscheduled pods get scheduled based on the resource requirements.

Note: The kube-scheduler is responsible solely for scheduling pods onto nodes. It does not manage the running of pods; this task is handled by the kubelet on each node. The kubelet takes charge of running the containers in the pods once they are assigned to nodes by the kube-scheduler.

Common pod scheduling problems and solutions

Kubernetes offers great flexibility in running and managing containers; however, this great flexibility also introduces significant complexity. It’s common to encounter errors during the scheduling phase, which can be time-consuming and sometimes frustrating.

To help lower operational time and costs, developers should be on the alert for common problems and how to mitigate them.

Failed scheduling due to insufficient resources

It is common for pods to fail to start due to an insufficient resources error. There are no default CPU or memory requests or limits in Kubernetes. As a result, any pod can consume as many resources as needed, potentially suffocating other pods or processes running on the same node.

Without set requests or limits, pod 1 may consume all the memory, resulting in the failure of pod 2. Fig. 3: Without set requests or limits, pod 1 may consume all the memory, resulting in the failure of pod 2.

To mitigate this, you will need to know how to use the pod configuration file’s requests and limits. By correctly adjusting these, we can ensure sufficient resources are available.

apiVersion: v1
kind: Pod
spec:
containers:
- name: app
image: nginx
resources:
requests:
memory: "64Mb"
cpu: "0.5"
limits:
memory: "128Mb"
cpu: "1"

The general best practice is to set minimum resources with requests but not limits. This approach guarantees the allocation of the minimum required resources while allowing a pod to consume as much available memory as needed.

Set minimum resources with requests but not limits Fig. 4: Set minimum resources with requests but not limits.

Above we see that in worker node 1, pod 1 consumes 2 GB of memory, exceeding the requested quota, yet it ensures that the requested 2 GB of memory remains available for pod 2. On worker node 2, since no other processes are running, pod 1 can utilize all available memory.

There are exceptions to this scenario, where setting limits is advantageous; for example, when a container is publicly accessible, limits can prevent misuse of the infrastructure by capping resource usage.

Failed scheduling due to a label mismatch between the nodeSelector in the pod specification and the node label

By default, the kube-scheduler randomly assigns pods to nodes based on the availability of resources. However, there are scenarios where specific pods need to be hosted on specific nodes.

For instance, if worker node 2 is equipped with a GPU, it would be optimal to schedule a pod that runs a machine-learning model on this node. In such cases, nodeSelector is used to match the labels on nodes and schedule pods accordingly.

Use labels and node selectors for custom scheduling Fig. 5: Use labels and node selectors for custom scheduling.

If the node label does not match the label specified in the nodeSelector field of a pod's configuration, the Kubernetes scheduler will not schedule the pod on that node. In such a case, the pod will remain in a pending state because it can't be scheduled on any node that doesn't have a matching label.

Failed scheduling due to misconfigured affinity and anti-affinity rules

Affinity and anti-affinity rules provide more granular control over scheduling logic. For example, there are required or preferred rules where the scheduler will prioritize the specified node, but if such a node is unavailable, the scheduler will still schedule the pod on a different node.

There are two types of affinity rules:

  • Node affinity and node anti-affinity extend the functionality of NodeSelector by providing more expressive and flexible options, including the ability to specify rules as required and preferred.
  • Pod affinity and pod anti-affinity group pods together or separate them based on affinity and anti-affinity rules.

Affinity and anti-affinity are advanced expressions embedded in pod configuration files. Due to their flexibility, they offer a high number of options that can be specified through affinity and anti-affinity rules. Note: Writing these complex YAML configurations manually can lead to errors, so it is recommended to refer to the Kubernetes documentation whenever setting up affinity or anti-affinity rules.

Failed scheduling due to taints and tolerations misalignment

Kubernetes controls pod scheduling for each node via taints, which create restrictions for nodes, and tolerations, which allow pods to be scheduled on nodes with specific taints.

Use taints and tolerations to set restrictions. Fig. 6: Use taints and tolerations to set restrictions.

A common mistake in Kubernetes cluster management is the mismatch of taints and tolerations, which can lead to scheduling issues. When a pod's toleration does not correctly match the key, value, or effect of a node's taint, the pod may be unexpectedly rejected from scheduling on that node or evicted if already running.

Failed scheduling due to absent or misconfigured scheduler

Sometimes, pods fail to be scheduled because the scheduler is unavailable to process the scheduling task, either due to a misconfiguration or because it is not running. This problem is easily detected by checking the status and configuration of the Kubernetes scheduler:

# checking the status of the kube-scheduler in the kube-system namespace
kubectl get pods -n kube-system | grep kube-scheduler
#checking the configuration, if scheduler exists
kubectl describe pod -n kube-system kube-scheduler

In self-hosted Kubernetes clusters where the kube-scheduler is absent, you have the option to manually schedule pods or configure a custom scheduler to meet specific requirements.

Troubleshooting techniques

Kubernetes has a complex architecture of multiple interacting components. Hence, cluster operators need to be prepared for potential operational challenges. Mastering the art of diagnosing and resolving issues will yield smoother operations as well as better overall cluster resilience.

Cluster component diagnostics

Examining the cluster and kube-scheduler can be the starting point of your troubleshooting workflow. There are no predefined steps to follow, but generally, we can start by checking the availability of the kube-scheduler or any custom schedulers:

kubectl get pods --all-namespaces --field-selector spec.nodeName=controlplane | grep kube-scheduler

Note: Suppose you are using a managed Kubernetes service—Amazon Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), or Google Kubernetes Engine (GKE). In that case, there is no need to worry about control plane components since they are managed by the cloud service provider (CSP).

If the scheduler is available, then conditions of the data plane nodes, such as CPU, memory, and disk space, may be causing the issue. We can troubleshoot node conditions by checking for memory pressure, disk pressure, and CPU pressure on a node:

kubectl describe node node01 | grep pressure

If the cluster is not under CPU, memory, and disk stress, then additional node constraints might not match with the pods. For example, when using taints and tolerations, the key, value, and effect must exactly match between the node and the pod:

Kubectl describe node node01
Kubectl describe pod image-square-pod

Once you run kubectl describe, it provides a more detailed explanation of why the scheduling failed.

Failed scheduling event example. Fig. 7: Failed scheduling event example

Utilize Kubernetes logs and metrics

Kubernetes cluster logs are stored and managed within the cluster itself, typically within the kube-system namespace. By accessing scheduler logs, you can gain insights into the operational dynamics and troubleshooting aspects of pod scheduling:

kubectl logs $(kubectl get pods --namespace=kube-system | grep 
kube-scheduler | awk '{print $1}') --namespace=kube-system

This will give you the list of logs generated by the kube-scheduler running in the kube-system namespace. Usually, logs are a mixture of informational messages, warnings, and error messages related to the scheduler.

Due to the complexity of these logs, leveraging Kibana (ELK Stack), Elasticsearch, Logstash, Fluentd, or other similar tools for log management and analysis can be extremely beneficial. These collect, search, and visualize log data, uncovering trends and facilitating issue resolution.

Following log analysis, enhancing cluster monitoring with Kubernetes metrics provides a deeper understanding of scheduling dynamics. Prometheus and Grafana are integral for this purpose:

  • Site24x7 offers fully managed off-cluster monitoring, addressing the problem of monitoring silos. It also provides integration for over 200 plugins for seamless interaction with third parties such as Prometheus, Grafana, and Nagios.
  • Prometheus collects detailed metrics about the Kubernetes cluster, including those related to the scheduler's performance, such as the number of scheduling attempts, successes, and failures.
  • Grafana uses these metrics to create visualizations that offer actionable insights, helping administrators optimize scheduling and resource allocation.

Conclusion

Pod scheduling is crucial for resource efficiency and system stability in Kubernetes. This article detailed essential techniques for diagnosing and addressing scheduling issues to enhance performance and operational efficiency.

By implementing these strategies, developers and cluster operators can improve system performance. However, given the complexity of these practices, leveraging a monitoring solution like Site24x7 will make managing your Kubernetes environments significantly easier.

Site24x7 simplifies tracking and analyzing Kubernetes metrics, enabling proactive issue resolution.

Experience the benefits of enhanced monitoring by starting a 30-day free trial with Site24x7 today.

Was this article helpful?

Related Articles