Using cert-manager with Azure AKS and AGIC
Motivation
There are many guides on how to use cert-manager, starting with cert-manager documentation. However, I haven't found a clear tutorial for my case, where I wanted to deploy cert-manager on AKS, with the following characteristics:
Service pods (- hosts) are deployed on non-default namespace
Using Azure Application Gateway Ingress controller (AGIC)
Using
http01
solver (- instead ofdns01
)Using
ClusterIssuer
instead of regular (namespaced)Issuer
Getting TLS certificate for multiple subdomains.
There's a somewhat-outdated tutorial from Microsoft, a more-update one here, and a third one here, as well as few articles here and there, but they don't tick all the requirements mentioned above or are outdated.
As a result, I decided to write down my experience with deploying it successfully.
Prerequisites
It is assumed that you are using Linux or Mac (not a big deal if not, I'm using Bash scripts below that can easily be converted to Batch scripts or used as yaml files with kubectl
), and have:
Deployed your services/web apps on AKS, on non-default namespace, e.g.
prod
;Bought a domain name, and configured it to point to the Application Gateway's public Frontend IP with an
A Record
, or to its DNS name using aCNAME Record
. In my case, I've point two subdomains to the Application Gateway's DNS name.Configured AGIC as your ingress controller to control and enable access to your services with http using AGIC, and verified successful access to them with a K8s Ingress object with proper
rules
. In my case, the Ingress forwards each subdomain to a different container installed with separate deployments on AKS.Azure CLI installed;
Your
kubeconfig
available;kubectl installed;
Helm installed, and used for deployment of your cluster;
Note: You might be a console guy who's familiar with executing kubectl
commands, but otherwise I'd recommend using Lens to look around your K8s cluster easily.
Let's start...
Install cert-manager
There are few ways to install cert-manager
on AKS using Helm. I've used the following script, and named it cert-manager.sh
(remember to grant it execution permissions):
#!/bin/bash
# Add the Jetstack Helm repository
helm repo add jetstack https://charts.jetstack.io
# Update your local Helm chart repository cache
helm repo update
# Install the cert-manager Helm chart
# Helm v3+
helm install \
cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--version v1.10.0 \
--set installCRDs=true \
--set clusterResourceNamespace=prod
# https://cert-manager.io/docs/installation/helm/#installing-with-helm
Note the last --set
: it tells cert-manager to create its secrets etc. in prod
namespace. Otherwise, they'll default to cert-manager
namespace. See here. Creating the secret in your services' namespace solves issues where the secret can't be accessed by your resources. To use the script, execute: ./cert-manager.sh
. After the successful installation of cert-manager, it's time to create ClusterIssuer
resource.
Create ClusterIssuer
ClusterIssuer
- or regular Issuer
- are resources that represent certificate authorities (CAs) able to sign certificates in response to certificate signing requests. The difference between ClusterIssuer
and regular Issuer
is explained here. Again I created the ClusterIssuer
using Bash script, named clusterIssuer.sh
:
#!/bin/bash
kubectl apply -f - <<EOF
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-staging
spec:
acme:
# You must replace this email address with your own.
# Let's Encrypt will use this to contact you about expiring
# certificates, and issues related to your account.
email: <set your email here>
# ACME server URL for Let’s Encrypt’s staging environment.
# The staging environment will not issue trusted certificates but is
# used to ensure that the verification process is working properly
# before moving to production
server: https://acme-staging-v02.api.letsencrypt.org/directory
# After verifying with the staging environment the ability to properly
# get certificates, you can use the production env. with the following URL:
# server: https://acme-v02.api.letsencrypt.org/directory
privateKeySecretRef:
# Secret resource used to store the account's private key.
name: letsencrypt-staging
# Enable the HTTP-01 challenge provider
# you prove ownership of a domain by ensuring that a particular
# file is present at the domain
solvers:
- http01:
ingress:
class: azure/application-gateway
EOF
As mentioned in the file, this ClusterIssuer is set to work with Lets Encrypt's staging environment. The staging environment issues a non-official certificate, which your browser will eventually warn you about. But since the production environment of Lets Encrypt enables only a limited amount of signing requests, it's better to start with the staging environment, and switch to production after verifying the ability to properly get a staging certificate on your hosts.
Create a secret
to be filled by cert-manager
Create an empty TLS secret with this secret.yaml
file:
apiVersion: v1
kind: Secret
metadata:
name: my-services-tls
type: kubernetes.io/tls
stringData:
tls.key: ""
tls.crt: ""
and apply it to your namespace: kubectl apply -f secret.yaml -n prod
Note: This secret can be part of your services Helm chart, and be installed earlier. That's why it doesn't appear here as Bash script. Eventually, cert-manager will fill the tls.key
and tls.crt
values of this secret.
Update your Ingress
My initial Ingress looks something like this:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api-ingress
namespace: prod
annotations:
kubernetes.io/ingress.class: azure/application-gateway
appgw.ingress.kubernetes.io/backend-path-prefix: "/"
spec:
rules:
- host: service-one.mydomain.xyz
http:
paths:
- path: "/"
backend:
service:
name: service-one-image
port:
number: 80
pathType: Exact
- host: service-two.mydomain.xyz
http:
paths:
- path: "/"
backend:
service:
name: service-two-image
port:
number: 80
pathType: Exact
And now it should be updated to look like this:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api-ingress
namespace: prod
annotations:
kubernetes.io/ingress.class: azure/application-gateway
appgw.ingress.kubernetes.io/backend-path-prefix: "/"
appgw.ingress.kubernetes.io/ssl-redirect: "true"
cert-manager.io/cluster-issuer: letsencrypt-staging
spec:
tls:
- hosts:
- service-one.mydomain.xyz
- service-two.mydomain.xyz
secretName: my-services-tls
rules:
- host: service-one.mydomain.xyz
http:
paths:
- path: /
backend:
service:
name: service-one-image
port:
number: 80
pathType: Exact
- host: service-two.mydomain.xyz
http:
paths:
- path: /
backend:
service:
name: service-two-image
port:
number: 80
pathType: Exact
The main additions are:
Referencing the
ClusterIssuer
in theannotations
sectionAdding TLS block, with the hosts (- my subdomains) and secret.
Note that I've set only one secret for both hosts. The generated certificate will be issued on the name of the first host, but combine the second host in it in theCertificate Subject Alternative Name
property of the certificate, so that it'll take care of both hosts. Setting multiple secrets has proven problematic for me.
Upgrade your Ingress using Helm:helm upgrade my-chart ./my-chart -n prod
or, if you don't use Helm:kubectl apply -f my-ingress.yaml -n prod
Verify results
This triggers a chain of reactions, which is explained in cert-manager's troubleshooting guide. The methods mentioned there to troubleshoot the resources using kubectl describe
helped me a lot.
If you're using Lens, you'll be able to notice a new Ingress being created, as well as new Certificate, CertificateRequest (CRDs of cert-manager.io), and new Challenge and Order (CRDs of acme.cert-manager.io).
After accepting the challenge, the new Ingress will disappear, and so will the Challenge.
It is expected that the Certificate will have status of Ready: True
, and so should the ClusterIssuer. The Order should be in valid
state.
And most importantly: the my-services-tls
secret should get its tls.key
and tls.crt
values filled by cert-manager.
In Azure -> Application Gateway, you can expect to see 80
and 443
Listeners. If all set correctly, you should be able to navigate to your services using your host URLs and see them with (invalid - as it's staging only) TLS certificate, after ignoring the browser's warnings.
Inspect the certificates. Verify that both your hosts appear in the Certificate Subject Alternative Name
of the certificate.
If everything is OK, it's time to mode to production environment of Lets Encrypt.
Move to Production Env. of Lets Encryprt
1. Update your Ingress
Change the line:cert-manager.io/cluster-issuer: letsencrypt-staging
to:cert-manager.io/cluster-issuer: letsencrypt-prod
2. Replace ClusterIssuer
Delete your ClusterIssuer, and create a new one, with:
metadata:
name: letsencrypt-prod
...
server: https://acme-v02.api.letsencrypt.org/directory
...
privateKeySecretRef:
# Secret resource used to store the account's private key.
name: letsencrypt-prod
Full updated Ingress:
#!/bin/bash
kubectl apply -f - <<EOF
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
# You must replace this email address with your own.
# Let's Encrypt will use this to contact you about expiring
# certificates, and issues related to your account.
email: <set your email here>
# ACME server URL for Let’s Encrypt’s staging environment.
# The staging environment will not issue trusted certificates but is
# used to ensure that the verification process is working properly
# before moving to production
# server: https://acme-staging-v02.api.letsencrypt.org/directory
# After verifying with the staging environment the ability to properly
# get certificates, you can use the production env. with the following URL:
server: https://acme-v02.api.letsencrypt.org/directory
privateKeySecretRef:
# Secret resource used to store the account's private key.
name: letsencrypt-prod
# Enable the HTTP-01 challenge provider
# you prove ownership of a domain by ensuring that a particular
# file is present at the domain
solvers:
- http01:
ingress:
class: azure/application-gateway
EOF
You can delete the letsencrypt-staging
secret from your namespace.
New secret named letsencrypt-prod
will be created after successfully moving to production environment.
Note: If you encounter issues with getting rid of the remnants of the staging environment, just remove cert-manager and all its resources, as explained here.
Also remove and re-create the my-services-tls
secret, and remove letsencrypt-staging
secret from your namespace.
Then re-install it as mentioned above, just this time configure the ClusterIssuer
and Ingress
to work with the production environment from the get-go.
Helpful resources
The following resources helped me during this setup:
This answer helped me get rid of stuck challenge, which got stuck due to an incorrect setup;
As mentioned above, cert-manager troubleshooting is a great source to understand what's going on;
In general, cert-manager documentation should be read.
Subscribe to my newsletter
Read articles from Mem Mem directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by