Fixing FailedGetResourceMetric Error

Warnings related to "FailedGetResourceMetric" can include the following:

failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API

failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)

failed to get memory utilization: unable to get metrics for resource memory: no metrics returned from resource metrics API

If you are seeing any of these errors, it means you have HPA (Horizontal Pod Autoscaler) enabled, and something is either missing or not working as expected.

There are a few different reasons why this error could happen, so let's go through them:

Pod is not ready or takes a bit longer to start running:

If you often see warnings when a pod is restarting or scaling up, this is likely the reason.

By default, HPA continuously tries to retrieve metrics from the pod every 15 seconds. However, if the pod is not ready or not running, it won't provide metrics, causing HPA to show these warnings every 15 seconds until it receives the metrics.

You can read about these HPA behavior in AKS here https://learn.microsoft.com/en-us/azure/aks/concepts-scale#horizontal-pod-autoscaler

Problem with Metrics-server pod

If your application pod is running and the problem still persists, check the status of the metrics-server pod and its logs. Look for any logs in the metrics-server pod that show errors like “Failed to scrape node" err="Get \"https://<some IP and port>/metrics/resource\\": http2: client connection lost" node="<name of your node>“. If you see this, check the status of the node. If the node is ready and there are no taints, try restarting the metrics-server pod.

Resource misconfiguration

HPA may also fail to obtain resource metrics if there is a misconfiguration or absence of resource requests.

When you create an HPA to scale based on CPU/Memory averageUtilization, it requires information on the current usage and the requested resources for all matching pods.
- Current Usage: This data is collected by the metrics-server (or another metrics service like Prometheus) and accessed via the Metrics API.
- Requested Resources: This is the sum of resource requests for each container in the matching pod.

It is a common misconception that HPA uses resource limits to calculate average utilization, but this is incorrect. HPA uses resource requests instead of limits, which is the issue indicated by this error.

Ensure that resource requests include both CPU and memory, and also verify the requests for sidecar or init containers.

resource:
  requests:
    cpu: 100m
    memory: 500Mi

How to fix FailedGetResourceMetric

Pod is not ready or takes a bit longer to start running:

Problem with Metrics-server pod

Resource misconfiguration

Subscribe to my newsletter

Ashutosh Rathore

Ashutosh Rathore