Kubernetes in Production: Best Practices and Common Pitfalls to Avoid
Kubernetes has become the de facto standard for container orchestration, but running it successfully in production requires careful planning and adherence to best practices.
Resource Management
Right-Sizing Your Containers
One of the most common mistakes is not properly setting resource requests and limits:
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
Horizontal Pod Autoscaling
Implement HPA to automatically scale based on CPU/memory or custom metrics:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Security Hardening
Network Policies
Implement network policies to control traffic between pods:
- Default deny all traffic
- Explicitly allow only necessary connections
- Separate namespaces for different security zones
Pod Security Standards
Use Pod Security Standards (PSS) to enforce security policies:
- **Privileged**: Unrestricted (avoid in production)
- **Baseline**: Minimally restrictive
- **Restricted**: Heavily restricted (recommended)
RBAC Best Practices
- Use least privilege access
- Create service accounts for each application
- Regularly audit RBAC permissions
- Avoid using cluster-admin role
High Availability
Multi-Zone Deployments
Distribute workloads across availability zones:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
topologyKey: topology.kubernetes.io/zone
Health Checks
Always implement proper liveness and readiness probes:
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
Port: 8080
initialDelaySeconds: 5
periodSeconds: 5
Monitoring and Observability
Essential monitoring stack:
- **Prometheus**: Metrics collection
- **Grafana**: Visualization
- **Jaeger/Tempo**: Distributed tracing
- **Loki**: Log aggregation
Common Pitfalls to Avoid
- **Not Setting Resource Limits**: Can lead to resource exhaustion
- **Ignoring Security**: Running containers as root
- **Poor Logging**: Not collecting logs centrally
- **No Backup Strategy**: Always backup etcd and persistent volumes
- **Inadequate Testing**: Test upgrades in staging first
Conclusion
Running Kubernetes in production is a journey, not a destination. Regular updates, monitoring, and continuous improvement are essential for success.
Ready to optimize your Kubernetes infrastructure? Contact our DevOps team for a consultation.