Kubernetes Cost Optimization 2026: Karpenter, Spot Instances & Right-Sizing to Cut 55% Cloud Bill

Posted on: 4/21/2026 10:12:34 AM

Table of contents

1. Why Is Kubernetes So Expensive?
2. Karpenter — Next-Gen Node Provisioning
3. Spot Instances — 70-90% Compute Cost Reduction
1. 3.1. Instance Family Diversification Strategy
2. 3.2. PriorityClass — Protecting Critical Workloads
4. VPA & Right-Sizing — From 12% to 50%+ Utilization
1. 4.1. Two-Phase VPA Deployment
  1. Caution: VPA + HPA Conflict
5. Consolidation — The Art of Bin-Packing
1. 5.1. Two Consolidation Policies
6. Case Study: From $48K to $21.5K/Month
7. Azure AKS: Node Auto-Provisioning (NAP)
1. GKE Has an Equivalent Too
8. FinOps & Governance — Sustaining Results Long-Term
9. 4-Week Implementation Roadmap
10. Conclusion
1. References

Over 68% of organizations overspend on Kubernetes — 20% to 40% of cloud budgets wasted due to inflated pod requests, idle nodes, and lack of intelligent autoscaling strategies. This article dives deep into the most powerful Kubernetes cost optimization toolkit in 2026: Karpenter for intelligent node provisioning, Spot Instances for 70-90% compute savings, VPA for precise right-sizing, and FinOps governance to sustain results long-term. Featuring a real case study cutting from $48K to $21.5K/month.

1. Why Is Kubernetes So Expensive?

Kubernetes isn't expensive — the way we configure it is. Most production clusters fall into the "set-and-forget" pattern: developers request 500m CPU and 1Gi memory per pod "just to be safe," but actual usage is only 25m CPU and 262Mi memory. The result: clusters running at 10-15% utilization while paying for 100% capacity.

68% Organizations overspend on K8s

10-15% Average cluster CPU utilization

80-90% Pods with inflated requests

40-70% Potential cost reduction

Three primary sources of waste:

Waste Source	Root Cause	% Excess Cost
Over-provisioned pods	CPU/memory requests 5-20x higher than actual usage	30-60%
Idle nodes	Cluster Autoscaler slow to react, no consolidation	15-25%
Default On-Demand	Not leveraging Spot/Preemptible for stateless workloads	20-40%

2. Karpenter — Next-Gen Node Provisioning

Karpenter (now under Kubernetes SIGs, supporting both AWS and Azure) is a node autoscaler that completely replaces the traditional Cluster Autoscaler. Instead of scaling fixed node groups, Karpenter looks at pending pods and queries the Cloud Provider API directly to select the most cost-effective instance type — right size, best price, in seconds.

graph TB
    PENDING["Pending Pods
(unschedulable)"] --> KARPENTER["Karpenter Controller"]
    KARPENTER --> EVAL["Evaluate Pod
Requirements"]
    EVAL --> SELECT["Select Optimal
Instance Type"]
    SELECT --> SPOT{"Spot
available?"}
    SPOT -->|"Yes"| LAUNCH_SPOT["Launch Spot
Instance"]
    SPOT -->|"No"| LAUNCH_OD["Launch On-Demand
Instance"]
    LAUNCH_SPOT --> SCHEDULE["Schedule Pods"]
    LAUNCH_OD --> SCHEDULE
    SCHEDULE --> MONITOR["Monitor
Utilization"]
    MONITOR -->|"Underutilized"| CONSOLIDATE["Consolidate:
Bin-pack & Terminate"]
    MONITOR -->|"Healthy"| MONITOR
    CONSOLIDATE --> KARPENTER
    style KARPENTER fill:#e94560,stroke:#fff,color:#fff
    style CONSOLIDATE fill:#e94560,stroke:#fff,color:#fff
    style LAUNCH_SPOT fill:#4CAF50,stroke:#fff,color:#fff
    style LAUNCH_OD fill:#2c3e50,stroke:#fff,color:#fff
    style PENDING fill:#f8f9fa,stroke:#e94560,color:#333
    style EVAL fill:#f8f9fa,stroke:#e94560,color:#333
    style SELECT fill:#f8f9fa,stroke:#e94560,color:#333
    style MONITOR fill:#f8f9fa,stroke:#e94560,color:#333
    style SCHEDULE fill:#f8f9fa,stroke:#e94560,color:#333

Figure 1: Karpenter lifecycle — from pending pod to automatic consolidation

2.1. Karpenter vs Cluster Autoscaler

Criteria	Cluster Autoscaler	Karpenter
Scale-up time	2-5 minutes	~30-60 seconds
Instance selection	Fixed per Node Group	Dynamic — selects from entire instance catalog
Consolidation	Scale-down after 10 min idle	Continuous bin-packing, terminates excess nodes
Spot handling	Requires Mixed Instance Policy config	Native Spot + On-Demand fallback
Multi-arch	Separate node groups for ARM64	Auto-selects ARM64/AMD64 per requirements
Pod awareness	Only counts pending pods	Analyzes topology, affinity, taints

2.2. Cost-Optimized NodePool Configuration

NodePool is Karpenter's core configuration unit, defining "which node types can be created" and "when to consolidate":

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: cost-optimized
spec:
  template:
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      requirements:
        # Only gen 6+ instances (better price/performance)
        - key: "karpenter.k8s.aws/instance-generation"
          operator: Gt
          values: ["5"]
        # Prefer ARM64 (Graviton) — ~20% cheaper
        - key: "kubernetes.io/arch"
          operator: In
          values: ["arm64", "amd64"]
        # Spot first, On-Demand fallback
        - key: "karpenter.sh/capacity-type"
          operator: In
          values: ["spot", "on-demand"]
        # Suitable instance families
        - key: "karpenter.k8s.aws/instance-category"
          operator: In
          values: ["c", "m", "r"]
        # Avoid tiny instances (high overhead)
        - key: "karpenter.k8s.aws/instance-size"
          operator: NotIn
          values: ["nano", "micro", "small"]
  # Consolidation: bin-pack pods and terminate excess nodes
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 1m
    budgets:
      - nodes: "20%"
      - nodes: "0"
        schedule: "0 9 * * 1-5"
        duration: 1h
    expireAfter: 168h  # Recycle nodes every 7 days
  limits:
    cpu: "1000"
    memory: "4000Gi"

What Are Disruption Budgets?

Disruption Budgets control how fast Karpenter can terminate nodes. For example, nodes: "20%" means at any point, Karpenter can only disrupt up to 20% of total nodes. You can also create "maintenance windows" — e.g., block all disruptions at 9 AM Mon-Fri (peak traffic) using nodes: "0" with a cron schedule.

2.3. EC2NodeClass — AMI & Storage Configuration

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2023
  role: eks-karpenter-node
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 50Gi
        volumeType: gp3
        iops: 3000
        encrypted: true
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: my-cluster
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: my-cluster

3. Spot Instances — 70-90% Compute Cost Reduction

Spot Instances are surplus cloud provider capacity sold at 70-90% discount compared to On-Demand. In exchange, the provider can reclaim instances at any time with a 2-minute warning. This is the most powerful savings tool when used correctly.

graph LR
    subgraph SPOT["Spot Instances — Stateless Workloads"]
        API["API Servers"]
        WORKER["Workers"]
        BATCH["Batch Jobs"]
        CI["CI/CD Runners"]
    end
    subgraph OD["On-Demand — Stateful / Critical"]
        DB["Databases"]
        KAFKA["Message Brokers"]
        CTRL["Control Plane"]
        MONITOR["Monitoring"]
    end
    style SPOT fill:#4CAF50,stroke:#fff,color:#fff
    style OD fill:#2c3e50,stroke:#fff,color:#fff
    style API fill:#f8f9fa,stroke:#4CAF50,color:#333
    style WORKER fill:#f8f9fa,stroke:#4CAF50,color:#333
    style BATCH fill:#f8f9fa,stroke:#4CAF50,color:#333
    style CI fill:#f8f9fa,stroke:#4CAF50,color:#333
    style DB fill:#f8f9fa,stroke:#e94560,color:#333
    style KAFKA fill:#f8f9fa,stroke:#e94560,color:#333
    style CTRL fill:#f8f9fa,stroke:#e94560,color:#333
    style MONITOR fill:#f8f9fa,stroke:#e94560,color:#333

Figure 2: Workload classification — Spot for stateless, On-Demand for stateful/critical

3.1. Instance Family Diversification Strategy

The golden rule of Spot: don't put all eggs in one basket. Relying on a single instance type (e.g., m5.xlarge) dramatically increases interruption probability. Diversify across multiple instance families and Availability Zones:

# Karpenter NodePool for Spot workloads
spec:
  template:
    spec:
      requirements:
        - key: "karpenter.sh/capacity-type"
          operator: In
          values: ["spot"]
        # Diversify: more families = fewer interruptions
        - key: "karpenter.k8s.aws/instance-category"
          operator: In
          values: ["c", "m", "r"]
        - key: "karpenter.k8s.aws/instance-generation"
          operator: Gt
          values: ["5"]
        # Allow both ARM64 and AMD64
        - key: "kubernetes.io/arch"
          operator: In
          values: ["arm64", "amd64"]
      # Topology spread across AZs
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule

3.2. PriorityClass — Protecting Critical Workloads

Use PriorityClass to ensure critical workloads always run on On-Demand, while batch/CI jobs accept Spot interruptions:

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: critical-production
value: 1000000
globalDefault: false
description: "Workloads that cannot be interrupted"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: batch-workload
value: 100
preemptionPolicy: Never
description: "Batch jobs that accept Spot"
---
# Critical deployment → On-Demand
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service
spec:
  template:
    spec:
      priorityClassName: critical-production
      nodeSelector:
        karpenter.sh/capacity-type: on-demand
---
# Batch job → Spot
apiVersion: batch/v1
kind: Job
metadata:
  name: data-pipeline
spec:
  template:
    spec:
      priorityClassName: batch-workload
      tolerations:
        - key: "karpenter.sh/capacity-type"
          operator: "Equal"
          value: "spot"

4. VPA & Right-Sizing — From 12% to 50%+ Utilization

Vertical Pod Autoscaler (VPA) analyzes historical CPU/memory usage and recommends or automatically adjusts resource requests. This is often the highest-ROI step — reducing requests from 500m to 25m CPU for a single pod can free up dozens of nodes.

4.1. Two-Phase VPA Deployment

Phase 1 — Observation only (7+ days):

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-service-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: api-service
  updatePolicy:
    updateMode: "Off"  # Recommend only, no automatic changes

Check recommendations after 7 days:

kubectl describe vpa api-service-vpa
# Output:
# Recommendation:
#   Container Recommendations:
#     Container Name: api-service
#     Lower Bound:   Cpu: 15m,  Memory: 128Mi
#     Target:        Cpu: 25m,  Memory: 262Mi  ← Recommendation
#     Upper Bound:   Cpu: 100m, Memory: 512Mi
#
# Compare with current request: 500m CPU, 1Gi memory
# → 20x CPU reduction, 4x memory reduction!

Phase 2 — Automatic adjustment (after validation):

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-service-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: api-service
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
      - containerName: api-service
        minAllowed:
          cpu: "50m"
          memory: "64Mi"
        maxAllowed:
          cpu: "2"
          memory: "2Gi"

Caution: VPA + HPA Conflict

Do not use VPA Auto mode for the same metric as HPA. For example, if HPA scales on CPU while VPA also adjusts CPU requests → conflict. Solution: use VPA for memory (typically stable) and HPA for CPU (typically fluctuates with traffic). Or switch to KEDA with event-driven scaling to avoid conflicts entirely.

5. Consolidation — The Art of Bin-Packing

Consolidation is the process where Karpenter repacks pods from multiple nodes into fewer nodes, then terminates empty or inefficient ones. This keeps cluster utilization consistently high, not just during scale-down events.

graph LR
    subgraph BEFORE["Before Consolidation"]
        N1["Node 1
20% used"]
        N2["Node 2
15% used"]
        N3["Node 3
25% used"]
    end
    subgraph AFTER["After Consolidation"]
        N4["Node 1
60% used"]
        N5["Node 2 — Terminated ✓"]
    end
    BEFORE -->|"Karpenter
bin-pack"| AFTER
    style BEFORE fill:#fff,stroke:#e0e0e0,color:#333
    style AFTER fill:#fff,stroke:#e0e0e0,color:#333
    style N1 fill:#f8f9fa,stroke:#ff9800,color:#333
    style N2 fill:#f8f9fa,stroke:#ff9800,color:#333
    style N3 fill:#f8f9fa,stroke:#ff9800,color:#333
    style N4 fill:#f8f9fa,stroke:#4CAF50,color:#333
    style N5 fill:#f8f9fa,stroke:#e94560,color:#333

Figure 3: Consolidation repacks pods from 3 underutilized nodes into 1 optimized node

5.1. Two Consolidation Policies

Policy	Behavior	Best For
WhenEmpty	Only terminates nodes when zero pods remain	Workloads sensitive to rescheduling
WhenEmptyOrUnderutilized	Terminates empty nodes OR repacks when underutilized	Most production clusters

# Consolidation config with protection windows
disruption:
  consolidationPolicy: WhenEmptyOrUnderutilized
  consolidateAfter: 2m  # Wait 2 minutes before consolidating
  budgets:
    # Normal: max 20% nodes disrupted
    - nodes: "20%"
    # Peak hours: block all disruption
    - nodes: "0"
      schedule: "0 8 * * 1-5"   # 8-9 AM Mon-Fri
      duration: 1h
    # Drifted nodes: allow faster (50%)
    - nodes: "50%"
      reasons:
        - Drifted

6. Case Study: From $48K to $21.5K/Month

A production e-commerce system with ~200 microservices applied all strategies above and achieved:

Before Optimization

Metric	Value
Node fleet	45× m5.2xlarge (8 vCPU, 32GB each)
CPU utilization	12%
Memory utilization	22%
Monthly cost	$48,000

After Optimization

Metric	Value
Critical nodes (On-Demand)	8× r7g.xlarge (Graviton, 4 vCPU, 32GB)
Variable nodes (Spot)	12-35× mixed instances
CPU utilization	48%
Memory utilization	61%
Monthly cost	$21,500

Savings Breakdown

$8,000 VPA right-sizing

$5,000 Karpenter consolidation

$7,000 Spot Instances

$4,500 Graviton ARM64

$2,000 Night-time scaling

55% Total cost reduction

7. Azure AKS: Node Auto-Provisioning (NAP)

Not just AWS — Azure AKS has also integrated Karpenter under the name Node Auto-Provisioning (NAP) since late 2025. NAP brings the same intelligent provisioning capabilities optimized for the Azure ecosystem:

Feature	AWS EKS + Karpenter	Azure AKS + NAP
Config format	NodePool + EC2NodeClass	NodePool + AKSNodeClass
Spot equivalent	EC2 Spot Instances	Azure Spot VMs
ARM equivalent	Graviton (arm64)	Ampere Altra (arm64)
Consolidation	WhenEmptyOrUnderutilized	Same (Karpenter core)
Savings Plans	AWS Savings Plans / RIs	Azure Reservations

GKE Has an Equivalent Too

Google GKE uses GKE Autopilot — a step further where Google manages the entire node layer. You only deploy pods; GKE auto-selects instance types, scales, and consolidates. However, Autopilot is less flexible than Karpenter when fine-tuning instance selection or Spot strategies.

8. FinOps & Governance — Sustaining Results Long-Term

Cost optimization is not a one-time effort. Without FinOps processes, clusters will "drift" back to wasteful states within months. Essential governance tools:

8.1. ResourceQuota & LimitRange

# Resource limits per team/namespace
apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-backend-quota
  namespace: team-backend
spec:
  hard:
    requests.cpu: "20"
    requests.memory: "40Gi"
    limits.cpu: "40"
    limits.memory: "80Gi"
    pods: "100"
---
# Default requests for pods without declarations
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: team-backend
spec:
  limits:
    - type: Container
      defaultRequest:
        cpu: "100m"
        memory: "128Mi"
      default:
        cpu: "500m"
        memory: "512Mi"
      max:
        cpu: "4"
        memory: "8Gi"

8.2. Cost Labeling — Per-Team Visibility

Enforce standardized labeling on all resources for accurate cost allocation:

# OPA/Gatekeeper ConstraintTemplate requiring labels
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
  name: require-cost-labels
spec:
  match:
    kinds:
      - apiGroups: ["apps"]
        kinds: ["Deployment", "StatefulSet"]
  parameters:
    labels:
      - key: "app.kubernetes.io/team"
      - key: "app.kubernetes.io/cost-center"
      - key: "app.kubernetes.io/environment"

8.3. Orphaned Resources — Periodic Cleanup

A simple CronJob to scan and alert on orphaned resources weekly:

# Find PVCs not mounted by any pod
kubectl get pvc --all-namespaces -o json | jq -r '
  .items[] |
  select(.status.phase == "Bound") |
  select(.metadata.name as $pvc |
    [env.pods[] | select(.spec.volumes[]?.persistentVolumeClaim?.claimName == $pvc)] | length == 0
  ) |
  "\(.metadata.namespace)/\(.metadata.name) - \(.spec.resources.requests.storage)"
'

# Find LoadBalancer Services without endpoints
kubectl get svc --all-namespaces -o json | jq -r '
  .items[] |
  select(.spec.type == "LoadBalancer") |
  select((.spec.selector | length) == 0 or
    (.metadata.annotations["service.beta.kubernetes.io/aws-load-balancer-internal"] == null)) |
  "\(.metadata.namespace)/\(.metadata.name)"
'

9. 4-Week Implementation Roadmap

Week 1: Measure

Install metrics-server + Prometheus. Measure baseline utilization (CPU, memory, cost). Deploy VPA in "Off" mode for all Deployments.

Week 2: Right-Size

Analyze VPA recommendations. Adjust resource requests for pods with the largest gaps. Deploy LimitRange defaults for all namespaces.

Week 3: Karpenter + Spot

Migrate from Cluster Autoscaler to Karpenter. Configure NodePool with Spot + diversification. Set up PriorityClass for critical vs batch workloads.

Week 4: Governance

Deploy ResourceQuota per namespace. Enforce cost labels via OPA/Gatekeeper. Set up weekly cost report dashboard. Configure Disruption Budgets.

10. Conclusion

Kubernetes cost optimization isn't a "press the discount button" exercise — it's an engineering process requiring measurement, right-sizing, intelligent autoscaling, and continuous governance. Three core pillars:

Karpenter replaces Cluster Autoscaler — 5x faster provisioning, automatic consolidation, intelligent instance selection.
Spot Instances + Diversification — 70-90% compute savings for stateless workloads with PriorityClass protection.
VPA Right-Sizing — identifies and eliminates 80-90% of inflated resource requests.

Real-world results show 40-70% cost reductions are entirely achievable when applied correctly. The key: start with measurement (metrics-server + VPA Off mode), not with cutting. You can't optimize what you haven't measured.

References

#system design #Kubernetes #DevOps #AWS #Cloud

# Kubernetes Cost Optimization 2026: Karpenter, Spot Instances & Right-Sizing to Cut 55% Cloud Bill

Over **68% of organizations** overspend on Kubernetes — 20% to 40% of cloud budgets wasted due to inflated pod requests, idle nodes, and lack of intelligent autoscaling strategies. This article dives deep into the most powerful Kubernetes cost optimization toolkit in 2026: **Karpenter** for intelligent node provisioning, **Spot Instances** for 70-90% compute savings, **VPA** for precise right-sizing, and **FinOps** governance to sustain results long-term. Featuring a real case study cutting from $48K to $21.5K/month.

## 1. Why Is Kubernetes So Expensive?

Kubernetes isn't expensive — the way we **configure** it is. Most production clusters fall into the "set-and-forget" pattern: developers request 500m CPU and 1Gi memory per pod "just to be safe," but actual usage is only 25m CPU and 262Mi memory. The result: clusters running at **10-15% utilization** while paying for 100% capacity.

68% Organizations overspend on K8s

10-15% Average cluster CPU utilization

80-90% Pods with inflated requests

40-70% Potential cost reduction

Three primary sources of waste:

| Waste Source | Root Cause | % Excess Cost |
| --- | --- | --- |
| **Over-provisioned pods** | CPU/memory requests 5-20x higher than actual usage | 30-60% |
| **Idle nodes** | Cluster Autoscaler slow to react, no consolidation | 15-25% |
| **Default On-Demand** | Not leveraging Spot/Preemptible for stateless workloads | 20-40% |

## 2. Karpenter — Next-Gen Node Provisioning

**Karpenter** (now under Kubernetes SIGs, supporting both AWS and Azure) is a node autoscaler that completely replaces the traditional Cluster Autoscaler. Instead of scaling fixed node groups, Karpenter looks at **pending pods** and queries the Cloud Provider API directly to select the most cost-effective instance type — right size, best price, in seconds.

```
graph TB
    PENDING["Pending Pods  
(unschedulable)"] --> KARPENTER["Karpenter Controller"]
    KARPENTER --> EVAL["Evaluate Pod  
Requirements"]
    EVAL --> SELECT["Select Optimal  
Instance Type"]
    SELECT --> SPOT{"Spot  
available?"}
    SPOT -->|"Yes"| LAUNCH_SPOT["Launch Spot  
Instance"]
    SPOT -->|"No"| LAUNCH_OD["Launch On-Demand  
Instance"]
    LAUNCH_SPOT --> SCHEDULE["Schedule Pods"]
    LAUNCH_OD --> SCHEDULE
    SCHEDULE --> MONITOR["Monitor  
Utilization"]
    MONITOR -->|"Underutilized"| CONSOLIDATE["Consolidate:  
Bin-pack & Terminate"]
    MONITOR -->|"Healthy"| MONITOR
    CONSOLIDATE --> KARPENTER
    style KARPENTER fill:#e94560,stroke:#fff,color:#fff
    style CONSOLIDATE fill:#e94560,stroke:#fff,color:#fff
    style LAUNCH_SPOT fill:#4CAF50,stroke:#fff,color:#fff
    style LAUNCH_OD fill:#2c3e50,stroke:#fff,color:#fff
    style PENDING fill:#f8f9fa,stroke:#e94560,color:#333
    style EVAL fill:#f8f9fa,stroke:#e94560,color:#333
    style SELECT fill:#f8f9fa,stroke:#e94560,color:#333
    style MONITOR fill:#f8f9fa,stroke:#e94560,color:#333
    style SCHEDULE fill:#f8f9fa,stroke:#e94560,color:#333

```
Figure 1: Karpenter lifecycle — from pending pod to automatic consolidation

### 2.1. Karpenter vs Cluster Autoscaler

| Criteria | Cluster Autoscaler | Karpenter |
| --- | --- | --- |
| **Scale-up time** | 2-5 minutes | ~30-60 seconds |
| **Instance selection** | Fixed per Node Group | Dynamic — selects from entire instance catalog |
| **Consolidation** | Scale-down after 10 min idle | Continuous bin-packing, terminates excess nodes |
| **Spot handling** | Requires Mixed Instance Policy config | Native Spot + On-Demand fallback |
| **Multi-arch** | Separate node groups for ARM64 | Auto-selects ARM64/AMD64 per requirements |
| **Pod awareness** | Only counts pending pods | Analyzes topology, affinity, taints |

### 2.2. Cost-Optimized NodePool Configuration

NodePool is Karpenter's core configuration unit, defining "which node types can be created" and "when to consolidate":

```yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: cost-optimized
spec:
  template:
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      requirements:
        # Only gen 6+ instances (better price/performance)
        - key: "karpenter.k8s.aws/instance-generation"
          operator: Gt
          values: ["5"]
        # Prefer ARM64 (Graviton) — ~20% cheaper
        - key: "kubernetes.io/arch"
          operator: In
          values: ["arm64", "amd64"]
        # Spot first, On-Demand fallback
        - key: "karpenter.sh/capacity-type"
          operator: In
          values: ["spot", "on-demand"]
        # Suitable instance families
        - key: "karpenter.k8s.aws/instance-category"
          operator: In
          values: ["c", "m", "r"]
        # Avoid tiny instances (high overhead)
        - key: "karpenter.k8s.aws/instance-size"
          operator: NotIn
          values: ["nano", "micro", "small"]
  # Consolidation: bin-pack pods and terminate excess nodes
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 1m
    budgets:
      - nodes: "20%"
      - nodes: "0"
        schedule: "0 9 * * 1-5"
        duration: 1h
    expireAfter: 168h  # Recycle nodes every 7 days
  limits:
    cpu: "1000"
    memory: "4000Gi"
```

#### What Are Disruption Budgets?

Disruption Budgets control how fast Karpenter can terminate nodes. For example, `nodes: "20%"` means at any point, Karpenter can only disrupt up to 20% of total nodes. You can also create "maintenance windows" — e.g., block all disruptions at 9 AM Mon-Fri (peak traffic) using `nodes: "0"` with a cron schedule.

### 2.3. EC2NodeClass — AMI & Storage Configuration

```yaml
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2023
  role: eks-karpenter-node
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 50Gi
        volumeType: gp3
        iops: 3000
        encrypted: true
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: my-cluster
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: my-cluster
```

## 3. Spot Instances — 70-90% Compute Cost Reduction

Spot Instances are surplus cloud provider capacity sold at 70-90% discount compared to On-Demand. In exchange, the provider can reclaim instances at any time with a **2-minute warning**. This is the most powerful savings tool when used correctly.

```
graph LR
    subgraph SPOT["Spot Instances — Stateless Workloads"]
        API["API Servers"]
        WORKER["Workers"]
        BATCH["Batch Jobs"]
        CI["CI/CD Runners"]
    end
    subgraph OD["On-Demand — Stateful / Critical"]
        DB["Databases"]
        KAFKA["Message Brokers"]
        CTRL["Control Plane"]
        MONITOR["Monitoring"]
    end
    style SPOT fill:#4CAF50,stroke:#fff,color:#fff
    style OD fill:#2c3e50,stroke:#fff,color:#fff
    style API fill:#f8f9fa,stroke:#4CAF50,color:#333
    style WORKER fill:#f8f9fa,stroke:#4CAF50,color:#333
    style BATCH fill:#f8f9fa,stroke:#4CAF50,color:#333
    style CI fill:#f8f9fa,stroke:#4CAF50,color:#333
    style DB fill:#f8f9fa,stroke:#e94560,color:#333
    style KAFKA fill:#f8f9fa,stroke:#e94560,color:#333
    style CTRL fill:#f8f9fa,stroke:#e94560,color:#333
    style MONITOR fill:#f8f9fa,stroke:#e94560,color:#333

```
Figure 2: Workload classification — Spot for stateless, On-Demand for stateful/critical

### 3.1. Instance Family Diversification Strategy

The golden rule of Spot: **don't put all eggs in one basket**. Relying on a single instance type (e.g., m5.xlarge) dramatically increases interruption probability. Diversify across multiple instance families and Availability Zones:

```yaml
# Karpenter NodePool for Spot workloads
spec:
  template:
    spec:
      requirements:
        - key: "karpenter.sh/capacity-type"
          operator: In
          values: ["spot"]
        # Diversify: more families = fewer interruptions
        - key: "karpenter.k8s.aws/instance-category"
          operator: In
          values: ["c", "m", "r"]
        - key: "karpenter.k8s.aws/instance-generation"
          operator: Gt
          values: ["5"]
        # Allow both ARM64 and AMD64
        - key: "kubernetes.io/arch"
          operator: In
          values: ["arm64", "amd64"]
      # Topology spread across AZs
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
```

### 3.2. PriorityClass — Protecting Critical Workloads

Use PriorityClass to ensure critical workloads always run on On-Demand, while batch/CI jobs accept Spot interruptions:

```yaml
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: critical-production
value: 1000000
globalDefault: false
description: "Workloads that cannot be interrupted"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: batch-workload
value: 100
preemptionPolicy: Never
description: "Batch jobs that accept Spot"
---
# Critical deployment → On-Demand
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service
spec:
  template:
    spec:
      priorityClassName: critical-production
      nodeSelector:
        karpenter.sh/capacity-type: on-demand
---
# Batch job → Spot
apiVersion: batch/v1
kind: Job
metadata:
  name: data-pipeline
spec:
  template:
    spec:
      priorityClassName: batch-workload
      tolerations:
        - key: "karpenter.sh/capacity-type"
          operator: "Equal"
          value: "spot"
```

## 4. VPA & Right-Sizing — From 12% to 50%+ Utilization

Vertical Pod Autoscaler (VPA) analyzes historical CPU/memory usage and **recommends or automatically adjusts** resource requests. This is often the highest-ROI step — reducing requests from 500m to 25m CPU for a single pod can free up dozens of nodes.

### 4.1. Two-Phase VPA Deployment

**Phase 1 — Observation only (7+ days):**

```yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-service-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: api-service
  updatePolicy:
    updateMode: "Off"  # Recommend only, no automatic changes
```
Check recommendations after 7 days:

```bash
kubectl describe vpa api-service-vpa
# Output:
# Recommendation:
#   Container Recommendations:
#     Container Name: api-service
#     Lower Bound:   Cpu: 15m,  Memory: 128Mi
#     Target:        Cpu: 25m,  Memory: 262Mi  ← Recommendation
#     Upper Bound:   Cpu: 100m, Memory: 512Mi
#
# Compare with current request: 500m CPU, 1Gi memory
# → 20x CPU reduction, 4x memory reduction!
```
**Phase 2 — Automatic adjustment (after validation):**

```yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-service-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: api-service
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
      - containerName: api-service
        minAllowed:
          cpu: "50m"
          memory: "64Mi"
        maxAllowed:
          cpu: "2"
          memory: "2Gi"
```

#### Caution: VPA + HPA Conflict

Do not use VPA Auto mode for the **same metric** as HPA. For example, if HPA scales on CPU while VPA also adjusts CPU requests → conflict. Solution: use VPA for **memory** (typically stable) and HPA for **CPU** (typically fluctuates with traffic). Or switch to **KEDA** with event-driven scaling to avoid conflicts entirely.

## 5. Consolidation — The Art of Bin-Packing

Consolidation is the process where Karpenter **repacks pods from multiple nodes into fewer nodes**, then terminates empty or inefficient ones. This keeps cluster utilization consistently high, not just during scale-down events.

```
graph LR
    subgraph BEFORE["Before Consolidation"]
        N1["Node 1  
20% used"]
        N2["Node 2  
15% used"]
        N3["Node 3  
25% used"]
    end
    subgraph AFTER["After Consolidation"]
        N4["Node 1  
60% used"]
        N5["Node 2 — Terminated ✓"]
    end
    BEFORE -->|"Karpenter  
bin-pack"| AFTER
    style BEFORE fill:#fff,stroke:#e0e0e0,color:#333
    style AFTER fill:#fff,stroke:#e0e0e0,color:#333
    style N1 fill:#f8f9fa,stroke:#ff9800,color:#333
    style N2 fill:#f8f9fa,stroke:#ff9800,color:#333
    style N3 fill:#f8f9fa,stroke:#ff9800,color:#333
    style N4 fill:#f8f9fa,stroke:#4CAF50,color:#333
    style N5 fill:#f8f9fa,stroke:#e94560,color:#333

```
Figure 3: Consolidation repacks pods from 3 underutilized nodes into 1 optimized node

### 5.1. Two Consolidation Policies

| Policy | Behavior | Best For |
| --- | --- | --- |
| **WhenEmpty** | Only terminates nodes when zero pods remain | Workloads sensitive to rescheduling |
| **WhenEmptyOrUnderutilized** | Terminates empty nodes OR repacks when underutilized | Most production clusters |

```yaml
# Consolidation config with protection windows
disruption:
  consolidationPolicy: WhenEmptyOrUnderutilized
  consolidateAfter: 2m  # Wait 2 minutes before consolidating
  budgets:
    # Normal: max 20% nodes disrupted
    - nodes: "20%"
    # Peak hours: block all disruption
    - nodes: "0"
      schedule: "0 8 * * 1-5"   # 8-9 AM Mon-Fri
      duration: 1h
    # Drifted nodes: allow faster (50%)
    - nodes: "50%"
      reasons:
        - Drifted
```

## 6. Case Study: From $48K to $21.5K/Month

A production e-commerce system with ~200 microservices applied all strategies above and achieved:

### Before Optimization

| Metric | Value |
| --- | --- |
| Node fleet | 45× m5.2xlarge (8 vCPU, 32GB each) |
| CPU utilization | 12% |
| Memory utilization | 22% |
| Monthly cost | **$48,000** |

### After Optimization

| Metric | Value |
| --- | --- |
| Critical nodes (On-Demand) | 8× r7g.xlarge (Graviton, 4 vCPU, 32GB) |
| Variable nodes (Spot) | 12-35× mixed instances |
| CPU utilization | 48% |
| Memory utilization | 61% |
| Monthly cost | **$21,500** |

### Savings Breakdown

$8,000 VPA right-sizing

$5,000 Karpenter consolidation

$7,000 Spot Instances

$4,500 Graviton ARM64

$2,000 Night-time scaling

55% Total cost reduction

## 7. Azure AKS: Node Auto-Provisioning (NAP)

Not just AWS — **Azure AKS** has also integrated Karpenter under the name **Node Auto-Provisioning (NAP)** since late 2025. NAP brings the same intelligent provisioning capabilities optimized for the Azure ecosystem:

| Feature | AWS EKS + Karpenter | Azure AKS + NAP |
| --- | --- | --- |
| **Config format** | NodePool + EC2NodeClass | NodePool + AKSNodeClass |
| **Spot equivalent** | EC2 Spot Instances | Azure Spot VMs |
| **ARM equivalent** | Graviton (arm64) | Ampere Altra (arm64) |
| **Consolidation** | WhenEmptyOrUnderutilized | Same (Karpenter core) |
| **Savings Plans** | AWS Savings Plans / RIs | Azure Reservations |

#### GKE Has an Equivalent Too

Google GKE uses **GKE Autopilot** — a step further where Google manages the entire node layer. You only deploy pods; GKE auto-selects instance types, scales, and consolidates. However, Autopilot is less flexible than Karpenter when fine-tuning instance selection or Spot strategies.

## 8. FinOps & Governance — Sustaining Results Long-Term

Cost optimization is not a one-time effort. Without **FinOps** processes, clusters will "drift" back to wasteful states within months. Essential governance tools:

### 8.1. ResourceQuota & LimitRange

```yaml
# Resource limits per team/namespace
apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-backend-quota
  namespace: team-backend
spec:
  hard:
    requests.cpu: "20"
    requests.memory: "40Gi"
    limits.cpu: "40"
    limits.memory: "80Gi"
    pods: "100"
---
# Default requests for pods without declarations
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: team-backend
spec:
  limits:
    - type: Container
      defaultRequest:
        cpu: "100m"
        memory: "128Mi"
      default:
        cpu: "500m"
        memory: "512Mi"
      max:
        cpu: "4"
        memory: "8Gi"
```

### 8.2. Cost Labeling — Per-Team Visibility

Enforce standardized labeling on all resources for accurate cost allocation:

```yaml
# OPA/Gatekeeper ConstraintTemplate requiring labels
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
  name: require-cost-labels
spec:
  match:
    kinds:
      - apiGroups: ["apps"]
        kinds: ["Deployment", "StatefulSet"]
  parameters:
    labels:
      - key: "app.kubernetes.io/team"
      - key: "app.kubernetes.io/cost-center"
      - key: "app.kubernetes.io/environment"
```

### 8.3. Orphaned Resources — Periodic Cleanup

A simple CronJob to scan and alert on orphaned resources weekly:

```bash
# Find PVCs not mounted by any pod
kubectl get pvc --all-namespaces -o json | jq -r '
  .items[] |
  select(.status.phase == "Bound") |
  select(.metadata.name as $pvc |
    [env.pods[] | select(.spec.volumes[]?.persistentVolumeClaim?.claimName == $pvc)] | length == 0
  ) |
  "\(.metadata.namespace)/\(.metadata.name) - \(.spec.resources.requests.storage)"
'

# Find LoadBalancer Services without endpoints
kubectl get svc --all-namespaces -o json | jq -r '
  .items[] |
  select(.spec.type == "LoadBalancer") |
  select((.spec.selector | length) == 0 or
    (.metadata.annotations["service.beta.kubernetes.io/aws-load-balancer-internal"] == null)) |
  "\(.metadata.namespace)/\(.metadata.name)"
'
```

## 9. 4-Week Implementation Roadmap

Week 1: Measure

Install metrics-server + Prometheus. Measure baseline utilization (CPU, memory, cost). Deploy VPA in "Off" mode for all Deployments.

Week 2: Right-Size

Analyze VPA recommendations. Adjust resource requests for pods with the largest gaps. Deploy LimitRange defaults for all namespaces.

Week 3: Karpenter + Spot

Migrate from Cluster Autoscaler to Karpenter. Configure NodePool with Spot + diversification. Set up PriorityClass for critical vs batch workloads.

Week 4: Governance

Deploy ResourceQuota per namespace. Enforce cost labels via OPA/Gatekeeper. Set up weekly cost report dashboard. Configure Disruption Budgets.

## 10. Conclusion

Kubernetes cost optimization isn't a "press the discount button" exercise — it's an **engineering process** requiring measurement, right-sizing, intelligent autoscaling, and continuous governance. Three core pillars:

1. **Karpenter** replaces Cluster Autoscaler — 5x faster provisioning, automatic consolidation, intelligent instance selection.
2. **Spot Instances + Diversification** — 70-90% compute savings for stateless workloads with PriorityClass protection.
3. **VPA Right-Sizing** — identifies and eliminates 80-90% of inflated resource requests.

Real-world results show **40-70% cost reductions** are entirely achievable when applied correctly. The key: start with **measurement** (metrics-server + VPA Off mode), not with cutting. You can't optimize what you haven't measured.

### References

- [Karpenter Documentation — NodePools](https://karpenter.sh/docs/concepts/nodepools/)
- [Karpenter Disruption & Consolidation](https://karpenter.sh/docs/concepts/disruption/)
- [Kubernetes Cost Optimization: From $50K to $22K/Month — ZeonEdge](https://zeonedge.com/blog/kubernetes-cost-optimization-karpenter-spot-vpa-real-world)
- [Kubernetes Cost Optimization 2026: The Complete Guide — CloudMonitor](https://cloudmonitor.ai/2026/02/kubernetes-cost-optimization-2026/)
- [Configure Node Auto-Provisioning for AKS — Microsoft Learn](https://learn.microsoft.com/en-us/azure/aks/node-auto-provisioning-disruption)
- [Cut AWS Costs by 20% with EKS, Karpenter, and Spot — Tinybird](https://www.tinybird.co/blog/how-we-cut-aws-costs-while-scaling-faster-with-eks-karpenter-and-spot-instances)
- [Kubernetes Autoscaling Explained: HPA, VPA & Best Practices 2026 — Sedai](https://sedai.io/blog/kubernetes-autoscaling)

Pulumi — Infrastructure as Code with C# on .NET 10: Manage Cloud Like Writing Software

Idempotency Pattern — Designing Duplicate-Proof APIs for Distributed Systems

Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.