Blog
By Nick Walker | Nov 22, 2024
I’m thrilled to launch Optimize Live’s OOM Response feature to automate away the toil of responding to out of memory (OOM) errors. As traffic patterns change over time, and software is updated regularly, changes in memory usage occur. Unpredictable memory usage spikes may eventually cause OOM kills that lead to service disruptions and performance issues. The OOM Response feature provides insurance for your memory settings.
With Optimize Live OOM Response, you gain a reactive response, in addition to our proactive recommendations. This feature continuously monitors Kubernetes clusters for OOM events. When they’re detected, Optimize Live produces a new recommendation to increase memory by a configurable percentage — by default, we recommend 20% — for the next 4 days. This timeframe allows our machine learning to analyze memory usage data and refine memory recommendations going forward.
Some of the key benefits include:
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-defaults
namespace: stormforge-system
data:
cluster-defaults.yaml: |
live.stormforge.io/reliability.oom.memory-bump-up.percent: >
20,resource:daemonsets=0
live.stormforge.io/reliability.oom.memory-bump-up.min: >
100Mi,resource:daemonsets=0Mi
live.stormforge.io/reliability.oom.memory-bump-up.max: >
2Gi
live.stormforge.io/reliability.oom.memory-bump-up.apply-immediately: >
IfAutoDeployEnabled,resource:daemonsets=Never,resource:statefulsets=Never
Then, apply the cluster-defaults.yaml file and restart the agent to pick up the changes.
kubectl apply -f cluster-defaults.yaml -n stormforge-system;
kubectl rollout restart deployment stormforge-agent-workload-controller -n stormforge-system
With this configuration, you can immediately apply OOM Response recommendations wherever you’ve already enabled auto-deploy — taking your workload optimization to the next level. While this is our recommended configuration for OOM Response, depending on your own specific needs, you may update these cluster defaults or override the behavior more specifically at a namespace or workload level.
It’s important to note that this configuration does not enable OOM Response for DaemonSets because increasing the memory even slightly for a pod that runs on every node can cause quite a bit of change in your cluster. For now, we recommend leaving this feature off for Daemonsets.
Additionally, this configuration does not immediately apply memory increases for StatefulSets as these workloads are often sensitive to restarts. Waiting until the next scheduled recommendation works well for StatefulSets.
Adding this reactive protection from OOM Response to our proactive rightsizing recommendations ensures that every platform team is empowered to drive automated optimization in their environment while improving the reliability of their platform.
Our team is committed to expanding your ability to automate workload optimization. We’re excited to see how the OOM Response feature will empower you to further take control of your resource management.
Test it out with a free trial, or see it in the sandbox environment, and let us know what you think!
We use cookies to provide you with a better website experience and to analyze the site traffic. Please read our "privacy policy" for more information.