Prometheus - Convert Cpu_user_seconds to Cpu Usage %?

Better Stack Team
Updated on November 18, 2024

To convert cpu_user_seconds (or a similar metric that represents CPU time) to CPU usage percentage in Prometheus, you need to calculate the rate of CPU usage over a defined period and then normalize that by the number of available CPU cores. This gives you a percentage value that represents the CPU usage.

Step 1: Understanding the Metric

Assuming you have a metric called container_cpu_user_seconds_total, which tracks the total user CPU time consumed by the containers, you can calculate the CPU usage percentage as follows:

  1. Rate Calculation: Use the rate() function to get the per-second rate of CPU usage.
  2. Normalization: Divide the CPU usage rate by the total number of available CPU cores, then multiply by 100 to get the percentage.

Step 2: Sample Query

Here's a sample query that calculates the CPU usage percentage based on container_cpu_user_seconds_total:

 
100 * sum(rate(container_cpu_user_seconds_total[5m])) by (pod, namespace) / count(node_cpu_seconds_total{mode="user"})

Breakdown of the Query

  • rate(container_cpu_user_seconds_total[5m]): This computes the per-second rate of CPU time used in the last 5 minutes. You can adjust the duration as needed.
  • sum(...) by (pod, namespace): This aggregates the CPU usage for all containers grouped by pod and namespace.
  • count(node_cpu_seconds_total{mode="user"}): This counts the number of CPU cores available. You might want to sum it instead if you're interested in total CPU capacity rather than just counting the cores.
  • 100 * ...: This converts the ratio into a percentage.

Example with Total CPU Cores

If you want to calculate the CPU usage based on the total number of CPU cores on a node, you can use the node_cpu_seconds_total metric directly:

 
100 * sum(rate(container_cpu_user_seconds_total[5m])) by (pod, namespace) / sum(count(node_cpu_seconds_total) by (instance))

Step 3: Adjusting for Other Modes

If you want to include other CPU modes like system or idle, you can modify the query accordingly. For example:

 
100 * sum(rate(container_cpu_user_seconds_total[5m]) + rate(container_cpu_system_seconds_total[5m])) by (pod, namespace) / sum(count(node_cpu_seconds_total) by (instance))

Conclusion

By using the rate function along with aggregation and normalization, you can effectively convert CPU usage in seconds to a percentage in Prometheus. This allows for better visibility into resource utilization within your Kubernetes or other environments.

Got an article suggestion? Let us know
Licensed under CC-BY-NC-SA

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.