Monitoring Functions¶

Introduction¶

All OpenFaaS metrics are exposed in Prometheus format, and collected by a built-in Prometheus server which is deployed via the OpenFaaS Helm chart.

There are two main uses for the built-in Prometheus server:

To power scale to zero, and the horizontal Pod autoscaler.
To provide basic metrics to end-users, and to power the Grafana dashboards offered to OpenFaaS Standard customers.

Viewing metrics¶

See the various Grafana dashboards curated by our team.

Long term retention of metrics¶

There is no persistence in the Prometheus Pod, so restarting the Prometheus will remove all historic metrics. This is as designed, since the metrics are collected for autoscaling and short-term monitoring.
The default retention period is 15 days, so anything older than that will no longer be visible. This is as designed, however the Helm chart does offer a way to modify this if disk space is becoming an issue, or you need to retain metrics for slightly longer.

What if you would like to enable long-term retention of Prometheus metrics?

Our recommendation is not to try to re-configure or alter the built-in Prometheus server, but to deploy your own, and to scrape the internal one via Prometheus Federation.

With Prometheus Federation, you can also specify which series to collect, rather than collecting everything, which is more efficient for storage and works out cheaper in the long run.

Long-term retention can be achieved via the upstream Prometheus project using its Helm chart/operator, a SaaS version hosted by a cloud provider, Grafana Cloud's agent, or a solution built on top of Prometheus like Thanos or Cortex.

Gateway¶

The Gateway component exposes several metrics to help you monitor the health and behavior of your functions.

The Community Edition exposes basic metrics, with OpenFaaS Pro extending on the data available.

Metric	Type	Description	Labels	Edition
`gateway_functions_seconds`	histogram	Function invocation time taken	`function_name`	Community Edition
`gateway_function_invocation_total`	counter	Function invocation count	`function_name`, `code`	Community Edition
`gateway_service_count`	gauge	Number of function replicas	`function_name`	Community Edition

Advanced metrics for OpenFaaS Pro users:

Metric	Type	Description	Labels	Edition
`gateway_invocation_function_started`	counter	Invocations started, including async	`function_name`	Pro Edition
`gateway_invocation_function_invocation_inflight`	gauge	Total connections inflight for function invocations	`function_name`	Pro Edition
`gateway_service_ready_count`	gauge	Number of function replicas which are in a ready state	`function_name`	Pro Edition
`gateway_service_target`	gauge	Target load for the function	`function_name`	Pro Edition
`gateway_service_min`	gauge	Min number of function replicas	`function_name`	Pro Edition
`http_request_duration_seconds`	histogram	Seconds spent serving HTTP requests	`method`, `path`, `status`	Pro Edition
`http_requests_total`	counter	The total number of HTTP requests	`method`, `path`, `status`	Pro Edition
`http_requests_total`	counter	The total number of HTTP requests	`method`, `path`, `status`	Pro Edition

The http_request* metrics record the latency and statistics of /system/* routes to monitor the OpenFaaS gateway and its provider. The /async-function route is also recorded in these metrics to observe asynchronous ingestion rate and latency.

Additional metrics from the Operator:

Metric	Type	Description	Labels	Edition
`faasnetes_scale_total`	counter	Number of times a function has been scaled (ignoring requests where current and desired replicas are equal)	`function_name`, `status`	Pro Edition
`faasnetes_sync_handler_gauge`	gauge	Number of reconciliation functions running at given time	`status`	Pro Edition
`faasnetes_sync_handler_histogram`	histogram	Time taken to reconcile function Custom Resources into Kubernetes objects	`status`	Pro Edition

The faasnetes_scale_total metric is useful for tracking the number of times a function has been scaled up or down. The faasnetes_sync_handler_gauge and faasnetes_sync_handler_histogram metrics are useful for tracking the amount of time spent reconciling function Custom Resources into Kubernetes objects in large deployments of OpenFaaS.

CPU & RAM usage/consumption¶

CPU & RAM usage/consumption metrics are available for OpenFaaS Pro users via Prometheus and the OpenFaaS REST API, OpenFaaS Pro Dashboard and OpenFaaS CLI via faas-cli describe.

Metric	Type	Description	Labels	Edition
`pod_cpu_usage_seconds_total`	counter	CPU seconds consumed by all the replicas of a given function	`function_name`, `namespace`	Pro Edition
`pod_memory_working_set_bytes`	gauge	Bytes of RAM consumed by all the replicas of a given function	`function_name`, `namespace`	Pro Edition

JetStream for OpenFaaS¶

The queue-worker for NATS JetStream exposes metrics to help you get insight in the behavior of your OpenFaaS queues.

Metric	Type	Description	Labels	Edition
`queue_worker_pending_messages`	gauge	Amount of messages waiting to be processed on given `queue_name`, `kubernetes_pod_name`	`queue_name`	Pro Edition
`queue_worker_messages_processed_total`	counter	Total number of messages processed	`queue_name`, `kubernetes_pod_name`	Pro Edition
`queue_worker_messages_submitted_total`	gauge	Total number of messages submitted to the queue by the gateway	`queue_name`, `kubernetes_pod_name`	Pro Edition

Watchdog¶

The classic and of-watchdog both provide Prometheus instrumentation on TCP port 8081 on the path /metrics. This is to enable the use-case of HPAv2 from the Kubernetes ecosystem.

Metric	Type	Description	Labels	Edition
`http_request_duration_seconds`	histogram	Seconds spent serving HTTP requests	`method`, `path`, `status`	Community Edition
`http_requests_total`	counter	The total number of HTTP requests	`method`, `path`, `status`	Community Edition
`http_requests_in_flight`	gauge	The number of HTTP requests in flight	`method`, `path`, `status`	Pro Edition

Provider¶

The FaaS Provider is the back-end API used by other OpenFaaS components like the Gateway. It exposes several metrics.

Metric	Type	Description	Labels	Edition
`provider_http_request_duration_seconds`	histogram	Seconds spent serving HTTP requests	`method`, `path`, `code`	Pro Edition
`provider_http_requests_total`	counter	The total number of HTTP requests	`method`, `path`, `code`	Pro Edition

The http_request* metrics record the latency and statistics of /system/* routes. Part of this information is also recorded in the metrics for the Gateway component. The purpose of exposing separate metrics on the provider component is to show the count of calls, to show efficiency, and to show the duration for performance testing, along with errors to flag unseen issues.

Kafka connector¶

Metric	Type	Description	Labels	Edition
`kafka_connector_consumer_group_connect"`	counter	Total times the consumer group has attempted to connect to the broker	`group`	Pro Edition
`kafka_connector_messages_consumed`	counter	Total messages received from the broker	`group`, `topic`, `partition`, `member_id`	Pro Edition
`kafka_connector_messages_offset`	gauge	Offset committed	`group`, `topic`, `partition`, `member_id`	Pro Edition

RabbitMQ connector¶

Metric	Type	Description	Labels	Edition
`rabbitmq_connector_messages_processed_total`	counter	Total number of messages processed	`queue`	Pro Edition
`rabbitmq_connector_queue_depth`	gauge	The total number of HTTP requests	`queue`	Pro Edition

Example queries for dashboarding¶

OpenFaaS Pro customers have access to 4 different dashboards which we've co-designed with our users, you can find out more in the comparison page of OpenFaaS CE vs Pro

These basic metrics can be used to track the health of your functions as well a general usage patterns. See the Prometheus documentation and examples for more details about the available options and query functions. Below are several queries you might want to include in a basic Grafana dashboard for observing your OpenFaaS functions

Function invocation rate¶

Return the per-second rate of invocation as measured over the previous 1 minute:

rate ( gateway_function_invocation_total [1m])

Function replica count / scaling¶

Return the total function replicas:

gateway_service_count

Total OK Function Invocation¶

Return the total number of successful function invocations:

sum( gateway_function_invocation_total {  code=\"200\"}

Function execution time¶

Return the average function execution time, as measure over the previous 20 seconds:

(rate(gateway_functions_seconds_sum[20s]) / rate(gateway_functions_seconds_count[20s]))

Metrics for a single function¶

Each of the metrics generated by the Gateway are labeled with and can be filtered by the function name, For example The invocation rate for just a single function (e.g. if the function name is echo) is given by

rate ( gateway_function_invocation_total{function_name='echo'} [20s])