Engineering Nov 28, 2025 10 min read

Kubernetes Cost Optimization: Saving $200K/Year for a SaaS Client

A real-world case study on right-sizing clusters, autoscaling strategies, and spot instance usage to slash cloud bills without sacrificing performance.

AR

Alex Rivera

Principal DevOps Engineer

The Problem

Our client, a B2B SaaS company with 500+ enterprise customers, was spending $45K/month on their Kubernetes infrastructure. Growth was strong, but cloud costs were growing faster than revenue. At the current trajectory, infrastructure would consume 40% of their revenue within 18 months.

They asked us to cut costs by 30% without impacting performance or reliability. We delivered 37%.

Cost Assessment

Before optimizing, we spent two weeks measuring everything:

  • Resource utilization: Average CPU utilization was 12%. Memory utilization was 23%. Massive waste.
  • Cost allocation: 60% of spend was compute, 25% storage, 15% networking
  • Over-provisioning: 73% of pods had resource requests 3x higher than actual usage
  • Idle resources: Development and staging environments ran 24/7 despite being used only during business hours
  • Storage waste: 2TB of unused PVCs from deleted deployments

The assessment alone identified $15K/month in easy wins.

Right-Sizing Workloads

We implemented a data-driven right-sizing strategy:

  • VPA recommendations: Deployed Vertical Pod Autoscaler in recommendation mode for two weeks, then applied suggestions
  • Resource quotas: Set namespace-level quotas to prevent over-requesting
  • QoS classes: Classified workloads as Guaranteed (databases), Burstable (APIs), and BestEffort (batch jobs)
  • Node pools: Created specialized node pools — compute-optimized for APIs, memory-optimized for caches, spot for batch

Result: 40% reduction in requested resources with zero performance impact.

Autoscaling Strategies

We implemented multi-layer autoscaling:

  • HPA with custom metrics: Scaled based on request latency and queue depth, not just CPU
  • KEDA for event-driven scaling: Scaled workers based on message queue length — zero pods when idle
  • Cluster autoscaler tuning: Reduced scale-down delay from 10 minutes to 2 minutes, enabled scale-to-zero for dev/staging
  • Scheduled scaling: Pre-scaled for known traffic patterns (Monday morning surge, end-of-month reporting)
  • CronJob for non-prod: Shut down dev/staging environments outside business hours (6PM-8AM, weekends)

Autoscaling alone saved $8K/month.

Spot Instance Strategy

Spot instances were the biggest cost lever:

  • Stateless workloads on spot: All API servers and workers run on spot instances (70% cost savings)
  • Multi-AZ, multi-instance-type: Spread across 8 instance types and 3 AZs for availability
  • Graceful shutdown: All services handle SIGTERM with a 30-second drain period
  • Spot fallback: Automatic fallback to on-demand if spot capacity is unavailable
  • PDB (Pod Disruption Budgets): Ensure minimum replica count during spot interruptions

We achieved 95% spot coverage for stateless workloads with zero customer-facing interruptions.

Final Results

After 8 weeks of optimization:

  • Monthly cloud spend: $45K → $28.3K (37% reduction, $200K annual savings)
  • CPU utilization: 12% → 45% average
  • Memory utilization: 23% → 58% average
  • P99 latency: Unchanged (actually improved slightly due to right-sized containers)
  • Availability: 99.98% maintained
  • Developer experience: Improved — dev environments spin up 3x faster with right-sized resources

The optimization paid for itself in the first month. The client now runs quarterly cost reviews using the dashboards and processes we put in place.

Tags
KubernetesDevOpsCloudCost Optimization
AR

Written by

Alex Rivera

Principal DevOps Engineer

Part of the Fixl engineering team, sharing insights from building production-grade software for startups and enterprises.

NDA-friendlyConfidentialEngineering-led