Monitoring Functions¶
Introduction¶
All OpenFaaS metrics are exposed in Prometheus format, and collected by a built-in Prometheus server which is deployed via the OpenFaaS Helm chart.
There are two main uses for the built-in Prometheus server:
- To power scale to zero, and the horizontal Pod autoscaler.
- To provide basic metrics to end-users, and to power the Grafana dashboards offered to OpenFaaS Standard customers.
Viewing metrics¶
See the various Grafana dashboards curated by our team.
Long term retention of metrics¶
- There is no persistence in the Prometheus Pod, so restarting the Prometheus will remove all historic metrics. This is as designed, since the metrics are collected for autoscaling and short-term monitoring.
- The default retention period is 15 days, so anything older than that will no longer be visible. This is as designed, however the Helm chart does offer a way to modify this if disk space is becoming an issue, or you need to retain metrics for slightly longer.
What if you would like to enable long-term retention of Prometheus metrics?
Our recommendation is not to try to re-configure or alter the built-in Prometheus server, but to deploy your own, and to scrape the internal one via Prometheus Federation.
With Prometheus Federation, you can also specify which series to collect, rather than collecting everything, which is more efficient for storage and works out cheaper in the long run.
Long-term retention can be achieved via the upstream Prometheus project using its Helm chart/operator, a SaaS version hosted by a cloud provider, Grafana Cloud's agent, or a solution built on top of Prometheus like Thanos or Cortex.
Gateway¶
The Gateway component exposes several metrics to help you monitor the health and behavior of your functions.
The Community Edition exposes basic metrics, with OpenFaaS Pro extending on the data available.
Metric | Type | Description | Labels | Edition |
---|---|---|---|---|
gateway_functions_seconds |
histogram | Function invocation time taken | function_name |
Community Edition |
gateway_function_invocation_total |
counter | Function invocation count | function_name , code |
Community Edition |
gateway_service_count |
gauge | Number of function replicas | function_name |
Community Edition |
Advanced metrics for OpenFaaS Pro users:
Metric | Type | Description | Labels | Edition |
---|---|---|---|---|
gateway_invocation_function_started |
counter | Invocations started, including async | function_name |
Pro Edition |
gateway_invocation_function_invocation_inflight |
gauge | Total connections inflight for function invocations | function_name |
Pro Edition |
gateway_service_ready_count |
gauge | Number of function replicas which are in a ready state | function_name |
Pro Edition |
gateway_service_target |
gauge | Target load for the function | function_name |
Pro Edition |
gateway_service_min |
gauge | Min number of function replicas | function_name |
Pro Edition |
http_request_duration_seconds |
histogram | Seconds spent serving HTTP requests | method , path , status |
Pro Edition |
http_requests_total |
counter | The total number of HTTP requests | method , path , status |
Pro Edition |
http_requests_total |
counter | The total number of HTTP requests | method , path , status |
Pro Edition |
The http_request*
metrics record the latency and statistics of /system/*
routes to monitor the OpenFaaS gateway and its provider. The /async-function
route is also recorded in these metrics to observe asynchronous ingestion rate and latency.
Additional metrics from the Operator:
Metric | Type | Description | Labels | Edition |
---|---|---|---|---|
faasnetes_scale_total |
counter | Number of times a function has been scaled (ignoring requests where current and desired replicas are equal) | function_name , status |
Pro Edition |
faasnetes_sync_handler_gauge |
gauge | Number of reconciliation functions running at given time | status |
Pro Edition |
faasnetes_sync_handler_histogram |
histogram | Time taken to reconcile function Custom Resources into Kubernetes objects | status |
Pro Edition |
The faasnetes_scale_total
metric is useful for tracking the number of times a function has been scaled up or down. The faasnetes_sync_handler_gauge
and faasnetes_sync_handler_histogram
metrics are useful for tracking the amount of time spent reconciling function Custom Resources into Kubernetes objects in large deployments of OpenFaaS.
CPU & RAM usage/consumption¶
CPU & RAM usage/consumption metrics are available for OpenFaaS Pro users via Prometheus and the OpenFaaS REST API, OpenFaaS Pro Dashboard and OpenFaaS CLI via faas-cli describe
.
Metric | Type | Description | Labels | Edition |
---|---|---|---|---|
pod_cpu_usage_seconds_total |
counter | CPU seconds consumed by all the replicas of a given function | function_name , namespace |
Pro Edition |
pod_memory_working_set_bytes |
gauge | Bytes of RAM consumed by all the replicas of a given function | function_name , namespace |
Pro Edition |
JetStream for OpenFaaS¶
The queue-worker for NATS JetStream exposes metrics to help you get insight in the behavior of your OpenFaaS queues.
Metric | Type | Description | Labels | Edition |
---|---|---|---|---|
queue_worker_pending_messages |
gauge | Amount of messages waiting to be processed on given queue_name , kubernetes_pod_name |
queue_name |
Pro Edition |
queue_worker_messages_processed_total |
counter | Total number of messages processed | queue_name , kubernetes_pod_name |
Pro Edition |
queue_worker_messages_submitted_total |
gauge | Total number of messages submitted to the queue by the gateway | queue_name , kubernetes_pod_name |
Pro Edition |
Watchdog¶
The classic and of-watchdog both provide Prometheus instrumentation on TCP port 8081 on the path /metrics. This is to enable the use-case of HPAv2 from the Kubernetes ecosystem.
Metric | Type | Description | Labels | Edition |
---|---|---|---|---|
http_request_duration_seconds |
histogram | Seconds spent serving HTTP requests | method , path , status |
Community Edition |
http_requests_total |
counter | The total number of HTTP requests | method , path , status |
Community Edition |
http_requests_in_flight |
gauge | The number of HTTP requests in flight | method , path , status |
Pro Edition |
Provider¶
The FaaS Provider is the back-end API used by other OpenFaaS components like the Gateway. It exposes several metrics.
Metric | Type | Description | Labels | Edition |
---|---|---|---|---|
provider_http_request_duration_seconds |
histogram | Seconds spent serving HTTP requests | method , path , code |
Pro Edition |
provider_http_requests_total |
counter | The total number of HTTP requests | method , path , code |
Pro Edition |
The http_request*
metrics record the latency and statistics of /system/*
routes. Part of this information is also recorded in the metrics for the Gateway component. The purpose of exposing separate metrics on the provider component is to show the count of calls, to show efficiency, and to show the duration for performance testing, along with errors to flag unseen issues.
Example queries for dashboarding¶
OpenFaaS Pro customers have access to 4 different dashboards which we've co-designed with our users, you can find out more in the comparison page of OpenFaaS CE vs Pro
These basic metrics can be used to track the health of your functions as well a general usage patterns. See the Prometheus documentation and examples for more details about the available options and query functions. Below are several queries you might want to include in a basic Grafana dashboard for observing your OpenFaaS functions
Function invocation rate¶
Return the per-second rate of invocation as measured over the previous 1 minute:
rate ( gateway_function_invocation_total [1m])
Function replica count / scaling¶
Return the total function replicas:
gateway_service_count
Total OK Function Invocation¶
Return the total number of successful function invocations:
sum( gateway_function_invocation_total { code=\"200\"}
Function execution time¶
Return the average function execution time, as measure over the previous 20 seconds:
(rate(gateway_functions_seconds_sum[20s]) / rate(gateway_functions_seconds_count[20s]))
Metrics for a single function¶
Each of the metrics generated by the Gateway are labeled with and can be filtered by the function name, For example The invocation rate for just a single function (e.g. if the function name is echo
) is given by
rate ( gateway_function_invocation_total{function_name='echo'} [20s])