Scaling your Kubernetes PODS to ZERO and BACK with K8s & KEDA

4 min readDec 24, 2023

In this post let see how we can dynamically scale your Data Processing or Data Oriented applications to Zero resources and back to the desired capacity only when there is a demand using modern infrastructure tools like K8s(Kubernetes) & KEDA(Kubernetes Event-driven Autoscaling).

Also we will cover the major pitfall of KEDA in Scaling IN/to-Zero and how we can handle it.

Note : Installation of K8s, KEDA & Kafka are out of scope for this post as you can find numerous post/videos covering them.

Why scale to ZERO ?

Don’t we all want to spend less ?
Why keep the infrastructure up and running when it’s idle ?

Photo by Konstantin Evdokimov on Unsplash

A Simple Application

Let consider a simple demo app for this post. Don’t worry you can easily apply this method/solution to any application of any scale/complexity.

Let’s see how we can scale the Data Processing application to ZERO when there is no data to process and scale to the desired no. of servers/replicas when the messages starts to flow and pile up.

Code : https://github.com/ashokrajar/k8s-keda-kafka-demo

Back to School

Before we go in the solution and implementation, let’s take a step back to understand how K8s & KEDA works together to scale-in/scale-out the pods(application containers).

What is the role of KEDA ?

KEDA handovers the actual POD scaling to K8s native HPA based on custom metrics.
And KEDA publishes the necessary custom metrics listenting to Kafka topics and it’s LAG.

How K8s HPA Works ?

Kubernetes HPA(Horizontal Pod Scaler) has a complex Algorithm based on various metric which itself calls for a separate blog post.

To put it simple….

desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]

And the currentMetricsValue is calculated from

currentMetricValue = currentKafkaTopicTotalLag / currentRepicaCount

You can read more about K8s HPA here. With this simple algorithm in mind let’s jump in to solution and implementation.

Time lines is defined like T1(currentKafkaTopicTotalLag/currentMetricValue/desiredMetricValue)

Enabling KEDA

---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-autoscale-demo
  namespace: keda-kafka-demo

spec:
  scaleTargetRef:
    name: keda-kafka-demo-consumer
  pollingInterval: 5
  cooldownPeriod: 120
  idleReplicaCount: 0
  minReplicaCount:  1
  maxReplicaCount: 5
  triggers:
    - type: kafka
      metadata:
        bootstrapServers: kafka.ashokraja.in:9092
        consumerGroup: k8-keda-kafka-demo-consumer
        topic: k8s_keda_autoscale_testing
        offsetResetPolicy: latest
        lagThreshold: '20'
        activationLagThreshold: '0'

Wow Magic happens and everything works as expected in a perfect world !!!

The message flow is 1:1 or close to it. What it means is consumers are able to consume the messaged at the same rate the producers are producing.

Real World Challenge (All Hell lose !!)

KEDA’s defaults works great when the messages are consumed and processed without any delay by the consumer and are committing back to unblock the Kafka Rebalancing so there are no idle consumers.

But that is not true for an ETL jobs or a workflow or a pipeline which further has to do a lot scrapping/processing/computing/transformation based on the consumed messages. Messages may be produced in a predictable manner.
But the consumer are not consuming or the consumers are committing back to unblock the Kafka Rebalancing so there are no idle consumers.

In such scenario K8s/KEDA/HPA detects that the desired number of pods are way too high not having any knowledge of the consumers which are still processing those messages and starts to Scale Down the pods. And since it doesn’t know which pods are still processing the messages it picks the pod randomly and kills it.

Every problem has a Solution

Solution 1 : Delay K8s HPA Scale Down as long as you desire

Ref: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#example-disable-scale-down

advanced:
    horizontalPodAutoscalerConfig:
      behavior:
        scaleDown:
          selectPolicy: Disabled

Complete Manifest sample:

---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-autoscale-demo
  namespace: keda-kafka-demo

spec:
  scaleTargetRef:
    name: keda-kafka-demo-consumer
  pollingInterval: 5
  cooldownPeriod: 120
  idleReplicaCount: 0
  minReplicaCount:  1
  maxReplicaCount: 5
  advanced:
    horizontalPodAutoscalerConfig:
      behavior:
        scaleDown:
          selectPolicy: Disabled
  triggers:
    - type: kafka
      metadata:
        bootstrapServers: kafka.ashokraja.in:9092
        consumerGroup: k8-keda-kafka-demo-consumer
        topic: k8s_keda_autoscale_testing
        offsetResetPolicy: latest
        lagThreshold: '20'
        activationLagThreshold: '0'

With this tweak in place the pods are not scaled down unless the Lag = 0. Also pay attention to cooldownPeriod = 120sec , tweak this to your desired value depending on how long a consumer is going to process the messages.

Solution 2 : use ScaledJobs instead of ScaledObjects

For very long running jobs say for some weird reason your consumer will be running for hours or say even days ;).

Scaling your Kubernetes PODS to ZERO and BACK with K8s & KEDA

Why scale to ZERO ?

A Simple Application

Back to School

What is the role of KEDA ?

How K8s HPA Works ?

Enabling KEDA

Wow Magic happens and everything works as expected in a perfect world !!!

Real World Challenge (All Hell lose !!)

Every problem has a Solution

Solution 1 : Delay K8s HPA Scale Down as long as you desire

Solution 2 : use ScaledJobs instead of ScaledObjects

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Ashok Raja

No responses yet

More from Ashok Raja

Building a Micro DataCenter at Home

Creating a mini Data Center at home has been my long term dream and finally I was able to afford it and got time to set it up.

Scaling your LLM Infra Cost to Zero

Who does’t like to keep their cloud billing as minimal as possible and how many dream of not paying for idle resources, when there is no…

Docker/K8s & Linux Netfilter’s conntrack Nightmare

Recently I happen to deeply learn & investigate about Linux kernels Netfilter modules connection tracking limits and how it can impact the…

Observability with InfluxDB Cloud in 3 Steps

In this post let see how we can do Obervability using InfluxDB Cloud. For this experiment we are going to use Two x86 and One ARM system.

Recommended from Medium

The Complete Process of How an External HTTP Request Reaches a Pod Container in Kubernetes

How does an external HTTP/HTTPS request reach a container within a Pod in a Kubernetes cluster?

Understanding Kubernetes: Part 33 Startup Probe

📢 If you’ve been following our Kubernetes series 2025, welcome back! For new readers, check out Part 32 Vertical Pod Autoscaler (VPA)

Lists

Natural Language Processing

Kubernetes architecture overview

Kubernetes architecture is built around a master-slave (control plane-worker node) model that allows for the efficient deployment…

Let’s get started on creating a single-node Kubernetes cluster using kubeadm with Cilium CNI on an…

When you start learning Kubernetes, you may learn it on a cloud provider, and then you never think too much about how it works under the…

The Dark Side of Kubernetes Networking: When Default Settings Become Your Worst Enemy

Kubernetes is incredibly powerful, but its networking setup can quickly turn into a mess if you’re not careful. One small misconfiguration…

Troubleshooting Kubernetes Node Not Ready Status

In Kubernetes clusters, encountering a node with a NotReady status can be a common occurrence, especially when troubleshooting. This status…