Skip to content

Monitoring Functions

Introduction

All OpenFaaS metrics are exposed in Prometheus format, and collected by a built-in Prometheus server which is deployed via the OpenFaaS Helm chart.

There are two main uses for the built-in Prometheus server:

  1. To power scale to zero, and the horizontal Pod autoscaler.
  2. To provide basic metrics to end-users, and to power the Grafana dashboards offered to OpenFaaS Standard customers.

Long term retention of metrics

  • There is no persistence by default, so restarting the Prometheus Pod will reset all metrics. This is as designed, since the metrics are collected for autoscaling primarily.
  • The default retention period is 15 days, so anything older than that will no longer be visible. This is as designed, and will allow for SRE/DevOps work and active monitoring.

What if you would like to enable long-term retention?

Our recommendation is not to try to re-configure or alter the built-in Prometheus server, but to deploy your own, and to scrape the internal one via Prometheus Federation.

With Prometheus Federation, you can also specify which series to collect, rather than collecting everything, which is more efficient for storage and works out cheaper in the long run.

Long-term retention can be achieved via the upstream Prometheus project using its Helm chart/operator, a SaaS version hosted by a cloud provider, Grafana Cloud's agent, or a solution built on top of Prometheus like Thanos or Cortex.

Gateway

The Gateway component exposes several metrics to help you monitor the health and behavior of your functions.

The Community Edition exposes basic metrics, with OpenFaaS Pro extending on the data available.

Metric Type Description Labels Edition
gateway_functions_seconds histogram Function invocation time taken function_name Community Edition
gateway_function_invocation_total counter Function invocation count function_name, code Community Edition
gateway_service_count gauge Number of function replicas function_name Community Edition

Advanced metrics for OpenFaaS Pro users:

Metric Type Description Labels Edition
gateway_invocation_function_started counter Invocations started, including async function_name Pro Edition
gateway_invocation_function_invocation_inflight gauge Total connections inflight for function invocations function_name Pro Edition
gateway_service_ready_count gauge Number of function replicas which are in a ready state function_name Pro Edition
gateway_service_target gauge Target load for the function function_name Pro Edition
gateway_service_min gauge Min number of function replicas function_name Pro Edition
http_request_duration_seconds histogram Seconds spent serving HTTP requests method, path, status Pro Edition
http_requests_total counter The total number of HTTP requests method, path, status Pro Edition
http_requests_total counter The total number of HTTP requests method, path, status Pro Edition

The http_request* metrics record the latency and statistics of /system/* routes to monitor the OpenFaaS gateway and its provider. The /async-function route is also recorded in these metrics to observe asynchronous ingestion rate and latency.

CPU & RAM usage/consumption

CPU & RAM usage/consumption metrics are available for OpenFaaS Pro users via Prometheus and the OpenFaaS REST API, OpenFaaS Pro Dashboard and OpenFaaS CLI via faas-cli describe.

Metric Type Description Labels Edition
pod_cpu_usage_seconds_total counter CPU seconds consumed by all the replicas of a given function function_name, namespace Pro Edition
pod_memory_working_set_bytes gauge Bytes of RAM consumed by all the replicas of a given function function_name, namespace Pro Edition

JetStream for OpenFaaS

The queue-worker for NATS JetStream exposes metrics to help you get insight in the behavior of your OpenFaaS queues.

Metric Type Description Labels Edition
queue_worker_pending_messages gauge Amount of messages waiting to be processed on given queue_name, kubernetes_pod_name queue_name Pro Edition
queue_worker_messages_processed_total counter Total number of messages processed queue_name, kubernetes_pod_name Pro Edition
queue_worker_messages_submitted_total gauge Total number of messages submitted to the queue by the gateway queue_name, kubernetes_pod_name Pro Edition

Watchdog

The classic and of-watchdog both provide Prometheus instrumentation on TCP port 8081 on the path /metrics. This is to enable the use-case of HPAv2 from the Kubernetes ecosystem.

Metric Type Description Labels Edition
http_request_duration_seconds histogram Seconds spent serving HTTP requests method, path, status Community Edition
http_requests_total counter The total number of HTTP requests method, path, status Community Edition
http_requests_in_flight gauge The number of HTTP requests in flight method, path, status Pro Edition

Provider

The FaaS Provider is the back-end API used by other OpenFaaS components like the Gateway. It exposes several metrics.

Metric Type Description Labels Edition
provider_http_request_duration_seconds histogram Seconds spent serving HTTP requests method, path, code Pro Edition
provider_http_requests_total counter The total number of HTTP requests method, path, code Pro Edition

The http_request* metrics record the latency and statistics of /system/* routes. Part of this information is also recorded in the metrics for the Gateway component. The purpose of exposing separate metrics on the provider component is to show the count of calls, to show efficiency, and to show the duration for performance testing, along with errors to flag unseen issues.

Example queries for dashboarding

OpenFaaS Pro customers have access to 4 different dashboards which we've co-designed with our users, you can find out more in the comparison page of OpenFaaS CE vs Pro

These basic metrics can be used to track the health of your functions as well a general usage patterns. See the Prometheus documentation and examples for more details about the available options and query functions. Below are several queries you might want to include in a basic Grafana dashboard for observing your OpenFaaS functions

Function invocation rate

Return the per-second rate of invocation as measured over the previous 1 minute:

rate ( gateway_function_invocation_total [1m])

Function replica count / scaling

Return the total function replicas:

gateway_service_count

Total OK Function Invocation

Return the total number of successful function invocations:

sum( gateway_function_invocation_total {  code=\"200\"}

Function execution time

Return the average function execution time, as measure over the previous 20 seconds:

(rate(gateway_functions_seconds_sum[20s]) / rate(gateway_functions_seconds_count[20s]))

Metrics for a single function

Each of the metrics generated by the Gateway are labeled with and can be filtered by the function name, For example The invocation rate for just a single function (e.g. if the function name is echo) is given by

rate ( gateway_function_invocation_total{function_name='echo'} [20s])