Playing around with K8s Horizontal Pod Autoscaling

This article delves into the practical application of Kubernetes' Horizontal Pod Autoscaler (HPA), exploring its functionality through hands-on experimentation.

What is HPA?

Horizontal Pod autoscaling in Kubernetes refers to the automatic adjustment of the number of pod replicas in response to changes in resource utilization or custom metrics. This process, known as Horizontal Pod Autoscaling (HPA), allows your application to scale out (increase the number of pods) or scale in (decrease the number of pods) based on predefined rules or metrics. For a detailed definition, you can refer to the Kubernetes documentation on HPA.

Why should you care:

HPA is crucial for ensuring the reliability, stability, and adaptability of your cloud deployments. By automatically adjusting the number of pod replicas based on current demand, HPA helps maintain optimal performance under varying loads, preventing both over-provisioning and under-provisioning of resources. This dynamic scaling capability reduces costs by efficiently utilizing resources, enhances user experience by maintaining responsiveness, and provides a foundation for high availability and fault tolerance in your applications.

So let’s start.

I ran this service:

import numpy as np
import time
import threading
from flask import Flask, jsonify

app = Flask(__name__)

# Global variables
task_thread = None
stop_event = threading.Event()

def cpu_intensive_task(size):
    while not stop_event.is_set():
        a = np.random.rand(size, size)
        b = np.random.rand(size, size)
        result = np.dot(a, b)
        print(f"Matrix multiplication completed. Result shape: {result.shape}")
        time.sleep(0.1)

@app.route('/start/<int:size>')
def start_task(size):
    global task_thread, stop_event
    if task_thread and task_thread.is_alive():
        return jsonify({"message": "Task is already running"}), 400
    
    stop_event.clear()
    task_thread = threading.Thread(target=cpu_intensive_task, args=(size,))
    task_thread.start()
    return jsonify({"message": f"Started CPU-intensive task with matrix size {size}x{size}"}), 200

@app.route('/stop')
def stop_task():
    global task_thread, stop_event
    if not task_thread or not task_thread.is_alive():
        return jsonify({"message": "No task is currently running"}), 400
    
    stop_event.set()
    task_thread.join()
    return jsonify({"message": "CPU-intensive task stopped"}), 200

if __name__ == "__main__":
    app.run(host='0.0.0.0', port=80)

What this service basically does is using numpy matrix multiplication to simulate a CPU intensive job, see cpu_intensive_task().

The service is exposed by this file resource-hogger-service.yaml

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: scalable-hogger
spec:
  selector:
    matchLabels:
      app: scalable-hogger
  template:
    metadata:
      labels:
        app: scalable-hogger
    spec:
      containers:
      - name: scalable-hogger
        image: docker.io/antonbiz/resource-hog:2.0
        imagePullPolicy: Always
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 100m
          limits:
            cpu: 500m

---
apiVersion: v1
kind: Service
metadata:
  name: scalable-hogger-service
spec:
  selector:
    app: scalable-hogger
  ports:
  - port: 80
    targetPort: 80
  type: LoadBalancer
  
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: scalable-hogger-ingress
  annotations:
    kubernetes.io/ingress.class: nginx
spec:
  rules:
  - http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: scalable-hogger-service
            port: 
              number: 80

I started the service:

$ kubectl apply -f resource-hogger-service.yaml

Then I applied the HPA definition from this file resource-hogger-hpa.yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: scalable-hogger-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: scalable-hogger
  minReplicas: 2
  maxReplicas: 4
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

This configuration sets the scalable-hogger deployment to automatically scale between 2 and 4 pods based on their average CPU utilization. If it goes above 70% it will generally try to create more pods, if it goes under 70% it will generally try to destroy some pods.

I applied the HPA on the cluster

$ kubectl apply -f resource-hogger-hpa.yaml

Now I wanted to test the HPA. Will it work? Will it scale the pod numbers of the deployment if the CPU utilisation went up?