Kubernetes is a complex distributed system, and there are many things that can cause friction for new OpenFaaS users. This guide is devoted to helping you help yourself.
If you want to ask for help, make sure that you have run all of the commands below before doing so.
We recommend that all users run our automated config-checker tool which will help you to identify common problems with timeouts, function configuration and the core components. The tool is designed for OpenFaaS Standard and OpenFaaS for Enterprises, but should still give some useful output for OpenFaaS CE users.
If we've asked you to run the config-checker via email or Slack, then please also collect the logs and output from kubectl by running our openfaas-diagnostics.sh bash script. Send over the resulting openfaas.tgz file to our team.
Have you forgotten to create the password required for the gateway?
The gateway must be able to talk to nats and prometheus. If these are crashing, you probably have networking issues preventing containers from talking to each over or looking up each other over DNS.
Have you got enough resources free in your cluster for all the services to start? kubectl describe nodes or kubectl top node should give you some hints here.
Try using the faas-cli describe command to check whether the function has been updated.
You can usually view the YAML from Kubernetes for a function with the kubectl get -n openfaas deploy/NAME command, then check the logs for the two pods in the gateway and for events in the openfaas-fn namespace.
If you are still encountering problems, try publishing to a different image tag for each version of your function. For instance, if you are working on version 0.1.0, try changing the tag to 0.1.1 or 0.1.0-a and so forth.
You haven't created a secret which is required for your function to start. Check your function request or stack.yml, and create any missing secrets.
Your function is crashing due to an error in your code, check the logs.
My Function Custom Resource had an error, now it's taking too long to recover¶
If you create a Function Custom Resource that is in an invalid state due to a missing Secret, invalid requests/limits, or some other parsing or validation error, then the .Status will will go into a "stalled" condition.
You can view conditions by running kubectl describe -n openfaas-fn function/nodeinfo for instance.
Every time the Operator tries to reconcile the Function, it will fail, and then wait for a back-off period before trying again. This is standard behaviour for Kubernetes controllers, in order to prevent a misconfigured resource from blocking or crashing the system.
If you have fixed the error condition by changing the Function's .Spec, then the operator will immediately try to reconcile the Function again, resetting the back-off period at the same time.
However, if the .Spec has not changed, and instead some other condition like a missing Secret is now satisfied, you can either wait until the next back-off period, or you can annotate the Function to force the Operator to try again:
The value of the uid field can be any value, if you want to force the Operator to try and you already have a value here, then you can just change it to a different value, i.e. 2 or a random string like a UUID.
I'm not sure that any functions are getting reconciled¶
Reconciliation problems occur when you deploy a Function Custom Resource, but you do not see the operator creating a Deployment for it in the same namespace.
First check any Kubernetes Quotas or LimitRanges that you may have in place for the namespace.
Check the logs of all of the gateway Pods to see if it is displaying an error such as RBAC, or whether it is detecting the event for the Function CR.
Check the following fields: acquireTime, renewTime, holderIdentity and leaseDurationSeconds.
Do they match the Pods that you have running for the gateway?
kubectlgetpods-nopenfaas-lapp=gateway
Check the .Status of the Function Custom Resource, i.e. for the nodeinfo function:
kubectldescribe-nopenfaas-fnfunction/nodeinfo
Look for warnings or errors in the .Status field.
If in doubt, and this is a critical issue, then you can try restarting the gateway Deployment, to restart the operator: kubectl rollout restart -n openfaas deploy/gateway
When using OpenFaaS Standard or OpenFaaS for Enterprises along with the Function CRD, a function's name can be no longer than 63 characters. This is due to a limitation on the length of label selectors within Kubernetes.
If you need longer names for the sake of organisation, then consider using namespaces to partition your functions.
For example:
project-skunkworks-long-function-name could be shortened to long-function-name, and then placed in the project-skunkworks namespace, effectively going longer than the 63 character limit for organisation.
OpenFaaS namespaces are available in faasd and OpenFaaS for Enterprises.
Check the logs of the gateway for signs of a time-out, or non-200 HTTP code:
kubectllogs-nopenfaasdeploy/gateway-cgateway
Do the same for the provider:
kubectllogs-nopenfaasdeploy/gateway-cfaas-netes
# Or, if you are using the CRD and Operator:
kubectllogs-nopenfaasdeploy/gateway-coperator
If your function is timing out and you are calling it asynchronously, then check the queue-worker's logs:
kubectllogs-nopenfaasdeploy/queue-worker
Check the logs of the function
faas-clilogsNAME
Common issues:
You are using a service mesh, and therefore must set direct_functions to true so that the gateway uses the name of the function to resolve it
You have not configured a high enough timeout on all the required components. See Expanded timeouts
You are trying to access the gateway from your function, you must use the string http://gateway.openfaas:8080, otherwise it will be unreachable to you.
Your cloud LoadBalancer may have a timeout set of 60 seconds, which could prevent your call from executing successfully, consider increasing the timeout if you can, or execute the function asynchronously.
If the queue-worker keeps retrying your function check if ack_wait is set to a high enough timeout. It should be set to a value higher than your functions timeout.
Some legacy HTTP servers such as WSGI do not supported the default "chunked" transfer encoding. In this case, if you're using the of-watchdog, you should set the environment variable of http_buffer_req_body: true. This causes the HTTP request to be buffered in memory, then sent in one shot to the upstream function.
Then invoke your function via http://127.0.0.1:8081.
If you need to set environment variable, or to simulate secrets being mounted, you can do so with --env/-e and -v to simulate mounting secrets at /var/openfaas/secrets.
I am getting an incorrect password error or authorized access¶
If you're using ArgoCD to install OpenFaaS, then it may be changing the password continually whenever it synchronises the app that you created. Make sure you turn off the "generateBasicAuth" setting in values.yaml or the flags you pass.
Create a password for the admin user before you create the ArgoCD App.
If you're not an ArgoCD user, make sure that nobody has has reinstalled OpenFaaS, and then check the below on "I forgot my credentials"
In the worst case, restart all the components to force them to reload the password from the Kubernetes secret: kubectl rollout restart -n openfaas deploy
Traffic is not being spread evenly between functions¶
You will need to ensure that you are doing one of the following:
Setting direct_functions to false, which allows the provider to balance calls randomly between replicas of your functions.
Use a service mesh like Linkerd or Istio, which can do advanced traffic-management such as least-connections
I'm not seeing CPU or RAM data for functions in Grafana with OpenFaaS Pro¶
You can find the dashboard JSON files in the Customer GitHub. Check that you have the latest version of the dashboard. Sometimes Grafana makes breaking changes in its schema between versions, so edit the panels and check the PromQL statements are present. If they are missing, edit the JSON file in a text editor to retrieve the queries.
You may also want to check that your data source is set to the internal OpenFaaS Prometheus instance.
Finally, check that you've installed the OpenFaaS Pro Helm chart with the ClusterRole setting. This is required to access each node in the cluster to retrieve Pod CPU and RAM usage metrics.
By setting the environment variable prefix_logs to false in your function, this will only send the <msg> part to the terminal. This allows you to use structured logs that outputs a JSON (or equivalent) payload.
How do I rotate or change a secret that my function is already using?¶
With either Kubernetes or faasd, you can delete the secret and create it with the new value. Kubernetes also allows you to edit the secret's value using kubectl, without deleting it.
Then, with Kubernetes, the secret will be updated on disk without needing to restart the Pod. This change could take several minutes before it reflects.
faas-clisecretcreateusername\--from-literalalex
faas-clistoredeployalpine\--secretusername\--envfprocess="cat /var/openfaas/secrets/username"\--nameprint-secret
echo|faas-cliinvoke--nameprint-secret
alex
Then:
faas-clisecretremoveusername
faas-clisecretcreateusername\--from-literalellis
# Wait a few minutesecho|faas-cliinvoke--nameprint-secret
ellis
If your handler reads and caches the password in memory, then you'll also need to restart the function's Pod: