The Problem
Our client, a B2B SaaS company with 500+ enterprise customers, was spending $45K/month on their Kubernetes infrastructure. Growth was strong, but cloud costs were growing faster than revenue. At the current trajectory, infrastructure would consume 40% of their revenue within 18 months.
They asked us to cut costs by 30% without impacting performance or reliability. We delivered 37%.
Cost Assessment
Before optimizing, we spent two weeks measuring everything:
- Resource utilization: Average CPU utilization was 12%. Memory utilization was 23%. Massive waste.
- Cost allocation: 60% of spend was compute, 25% storage, 15% networking
- Over-provisioning: 73% of pods had resource requests 3x higher than actual usage
- Idle resources: Development and staging environments ran 24/7 despite being used only during business hours
- Storage waste: 2TB of unused PVCs from deleted deployments
The assessment alone identified $15K/month in easy wins.
Right-Sizing Workloads
We implemented a data-driven right-sizing strategy:
- VPA recommendations: Deployed Vertical Pod Autoscaler in recommendation mode for two weeks, then applied suggestions
- Resource quotas: Set namespace-level quotas to prevent over-requesting
- QoS classes: Classified workloads as Guaranteed (databases), Burstable (APIs), and BestEffort (batch jobs)
- Node pools: Created specialized node pools — compute-optimized for APIs, memory-optimized for caches, spot for batch
Result: 40% reduction in requested resources with zero performance impact.
Autoscaling Strategies
We implemented multi-layer autoscaling:
- HPA with custom metrics: Scaled based on request latency and queue depth, not just CPU
- KEDA for event-driven scaling: Scaled workers based on message queue length — zero pods when idle
- Cluster autoscaler tuning: Reduced scale-down delay from 10 minutes to 2 minutes, enabled scale-to-zero for dev/staging
- Scheduled scaling: Pre-scaled for known traffic patterns (Monday morning surge, end-of-month reporting)
- CronJob for non-prod: Shut down dev/staging environments outside business hours (6PM-8AM, weekends)
Autoscaling alone saved $8K/month.
Spot Instance Strategy
Spot instances were the biggest cost lever:
- Stateless workloads on spot: All API servers and workers run on spot instances (70% cost savings)
- Multi-AZ, multi-instance-type: Spread across 8 instance types and 3 AZs for availability
- Graceful shutdown: All services handle SIGTERM with a 30-second drain period
- Spot fallback: Automatic fallback to on-demand if spot capacity is unavailable
- PDB (Pod Disruption Budgets): Ensure minimum replica count during spot interruptions
We achieved 95% spot coverage for stateless workloads with zero customer-facing interruptions.
Final Results
After 8 weeks of optimization:
- Monthly cloud spend: $45K → $28.3K (37% reduction, $200K annual savings)
- CPU utilization: 12% → 45% average
- Memory utilization: 23% → 58% average
- P99 latency: Unchanged (actually improved slightly due to right-sized containers)
- Availability: 99.98% maintained
- Developer experience: Improved — dev environments spin up 3x faster with right-sized resources
The optimization paid for itself in the first month. The client now runs quarterly cost reviews using the dashboards and processes we put in place.
Written by
Alex Rivera
Principal DevOps Engineer
Part of the Fixl engineering team, sharing insights from building production-grade software for startups and enterprises.