Multiple Schedulers
Understand how to setup multiple schedulers in Kubernetes.
Concept and Usage of Multiple Schedulers
Multiple schedulers are used to schedule pods on different nodes based on the requirements. By default, Kubernetes uses the default scheduler to schedule pods on nodes using an algorithm to distribute the pods across the nodes evenly. But in some cases, you may want to setup your own scheduling algorithm or any custom conditions to place pods on nodes.
Therefore, Kubernetes allows you to write and deploy your own scheduler as default scheduler or as an additional scheduler. In this case, you can use your own custom scheduler to schedule some specific pods (applications) on specific nodes based on your requirements, while other pods can still be scheduled by the default scheduler.
This is the default scheduler, the scheduler name is default-scheduler
and it must be unique in the cluster. You can find your default scheduler configuration on the master node at /etc/kubernetes/manifests/kube-scheduler.yaml
.
Steps to setup and use Multiple Schedulers
Step 1: Create a new scheduler configuration file
leaderElect
- ensures that only one instance of the scheduler is active at a time. Assume that there are multiple instances of the scheduler running on different master nodes as a high-availability setup, only one instance will be elected (selected) as a leader to schedule the pods.resourceName
- assume you have multiple masters, you will need to specify the name of the resource object used for leader election. This is used to ensure that only one instance of the scheduler is active at a time and to avoid conflicts between multiple instances of the scheduler.
Step 2: Deploy Additional Scheduler
You may choose to deploy the scheduler as a Pod or Deployment.
Deploy as a Pod
Deploy as a Deployment
Package the Scheduler
Create a new container image containing the kube-scheduler binary
Build the dockerfile
Define a Kubernetes Deployment for the scheduler
When you create a pod with the schedulerName
field, the pod will be scheduled by the specified scheduler. You can see the pod assignment events by running the following command:
Scheduler Priority and Plugins
We know that pods are sorted based on the priority defined on the pods, you can read more from this page kube-scheduler.
To set a priority for a pod, you need to create a priority class first and apply it to the pod. Reference
priorityClassName
- It determines the scheduling priority of the Pod. Pods with higher priority are scheduled before Pods with lower priority.
Steps of how a pod is scheduled:
- The pods with higher priority are scheduled first by performing sorting based on values (beginning of the queue).
- The pod will enter the filtering phase where the scheduler will check the nodes based on the node selector and affinity rules. Also, the scheduler will check the resources (CPU, memory) of the nodes to ensure that the pods can be scheduled on the nodes.
- Then the node will enter the scoring phase where the scheduler will score the nodes based on the resources. The node with the highest score will be selected to schedule the pod.
- Binding the pod to the node with the highest score.
Scheduler Plugins
Actually, every steps that I mentioned got its own plugins, for example
-
Scheduling Queue plugins
- PrioritySort - Sort the pods based on the priority of the pods.
-
Filtering
- NodeResourcesFit - Identify the nodes that have enough resources to run the pod.
- NodeName - Check the pod has a specific node name mentioned in the pod spec.
- NodeUnschedulable - Check the node is unschedulable or not. You can use this command to check the node unschedulable status
kubectl describe node <node-name>
-
Scoring
- NodeResourcesFit - Score the nodes based on the resources. Remember a single plugin can be used in multiple phases.
- ImageLocality - Score the nodes based on the container image that the pod runs. Meaning, it will select the node that has the container image cached. What if there is no nodes available? It will still place the pod on a node that doesn't have the container image cached.
-
Binding
- DefaultBinder - Bind the pod to the node.
Actually, you can write your own plugins to extend the scheduler functionalities. We call it as extension points. For example, you can write a plugin to check the node health on the filtering phase. Reference
Scheduling Profiles
Let's said we deploy separate schedulers each with a separate scheduler binary and configuration** file**. It's a lot of work required to manage these separate processes and due to separate processes, the other scheduler may schedule a pod on a node without considering the other scheduler's decision (race condition).
So, we can use scheduling profiles to configure multiple schedulers in a single process.