This article delves into the practical application of Kubernetes' Horizontal Pod Autoscaler (HPA), exploring its functionality through hands-on experimentation.
Horizontal Pod autoscaling in Kubernetes refers to the automatic adjustment of the number of pod replicas in response to changes in resource utilization or custom metrics. This process, known as Horizontal Pod Autoscaling (HPA), allows your application to scale out (increase the number of pods) or scale in (decrease the number of pods) based on predefined rules or metrics. For a detailed definition, you can refer to the Kubernetes documentation on HPA.
HPA is crucial for ensuring the reliability, stability, and adaptability of your cloud deployments. By automatically adjusting the number of pod replicas based on current demand, HPA helps maintain optimal performance under varying loads, preventing both over-provisioning and under-provisioning of resources. This dynamic scaling capability reduces costs by efficiently utilizing resources, enhances user experience by maintaining responsiveness, and provides a foundation for high availability and fault tolerance in your applications.
So let’s start.
import numpy as np
import time
import threading
from flask import Flask, jsonify
app = Flask(__name__)
# Global variables
task_thread = None
stop_event = threading.Event()
def cpu_intensive_task(size):
while not stop_event.is_set():
a = np.random.rand(size, size)
b = np.random.rand(size, size)
result = np.dot(a, b)
print(f"Matrix multiplication completed. Result shape: {result.shape}")
time.sleep(0.1)
@app.route('/start/<int:size>')
def start_task(size):
global task_thread, stop_event
if task_thread and task_thread.is_alive():
return jsonify({"message": "Task is already running"}), 400
stop_event.clear()
task_thread = threading.Thread(target=cpu_intensive_task, args=(size,))
task_thread.start()
return jsonify({"message": f"Started CPU-intensive task with matrix size {size}x{size}"}), 200
@app.route('/stop')
def stop_task():
global task_thread, stop_event
if not task_thread or not task_thread.is_alive():
return jsonify({"message": "No task is currently running"}), 400
stop_event.set()
task_thread.join()
return jsonify({"message": "CPU-intensive task stopped"}), 200
if __name__ == "__main__":
app.run(host='0.0.0.0', port=80)
What this service basically does is using numpy matrix multiplication to simulate a CPU intensive job, see cpu_intensive_task().
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: scalable-hogger
spec:
selector:
matchLabels:
app: scalable-hogger
template:
metadata:
labels:
app: scalable-hogger
spec:
containers:
- name: scalable-hogger
image: docker.io/antonbiz/resource-hog:2.0
imagePullPolicy: Always
ports:
- containerPort: 80
resources:
requests:
cpu: 100m
limits:
cpu: 500m
---
apiVersion: v1
kind: Service
metadata:
name: scalable-hogger-service
spec:
selector:
app: scalable-hogger
ports:
- port: 80
targetPort: 80
type: LoadBalancer
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: scalable-hogger-ingress
annotations:
kubernetes.io/ingress.class: nginx
spec:
rules:
- http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: scalable-hogger-service
port:
number: 80
$ kubectl apply -f resource-hogger-service.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: scalable-hogger-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: scalable-hogger
minReplicas: 2
maxReplicas: 4
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
This configuration sets the scalable-hogger deployment to automatically scale between 2 and 4 pods based on their average CPU utilization. If it goes above 70% it will generally try to create more pods, if it goes under 70% it will generally try to destroy some pods.
$ kubectl apply -f resource-hogger-hpa.yaml
Now I wanted to test the HPA. Will it work? Will it scale the pod numbers of the deployment if the CPU utilisation went up?