Vault HA with TLS on K8S

Manual Unseal and Auto Unseal with GCP-CKMS

Yuwei Sung
3 min readDec 21, 2021
Photo by Jason Dent on Unsplash

There are many tutorials and issue solutions about this topic but I still faced some errors like “bad certificate”, “signed by unknown CA” spilled out the container logs. Here is how I setup vault (helm chart) with HA and TLS enabled on K8S.

To turn on TLS vault enpoint (127.0.0.1:8200), we create a widecard certificate signed by kubelet-serving.

Note that we need “O = system:nodes” and “CN = system:node:…” in the distinguished_name part (line 18, 19), “extendedKeyUsage=serverAuth, clientAuth” in req_extensions part (line 23), and ALL DNS/IP in subjectAlternativeName part (line 26~29). The above commands should return 1) vault.key and 2) vault.csr.

We wrap the vault.csr to a yaml file and send the csr to k8s-api.

Note that we need “spec.signerName=kubernetes.io/kubelet-serving” for the new K8S CSR spec (line 7). We also use the same certificate for server and client authentication (mTLS), so we add “usages=[“digital signature”, “key encipherment”, “server auth”, “client auth”] in the yaml (line 11~14). If you don’t want to use the share certificate for server and client authentication, you can repeat the above process for another certificate.

Next, we get the k8s ca from the kubeconfig and create a TLS secret in the target namespace. The TLS secret contains the private key, certificate and the ca. If you have intermediate CA, you should concat the intermediate ca with the certificate (cat intermediate-ca.crt vault.crt > vault.crt).

Now we are ready to deploy vault helm chart with the following overrides.

Note that we specify some TLS vars in the overrides. In line 4, we enable TLS. In line 15, we disable standalone (helm chart default). Make sure you have those vars (line 9~11, line 28~30) map to your “extraVolumes” (line 12). Also, in line 35, we set disable_mlock = true.

After the deployment, you will find that the pods are not ready because we don’t define the health and readiness probes in our overrides. The reason is that we don’t have vault-init side cart to auto init the vault and health and readiness probes will kill the pods after a few tries. This means we need to “manually” init, unseal, and raft join the cluster.

As you can see, manual raft join involves many arguments. Be aware that the order of those vault command, subcommand, arguments and options.

After the manual init, unseal and raft join, we turn on the health and readiness probes in overrides (line 8~14) and update the helm chart.

So far, we have a working k8s vault ha cluster with TLS on. This is how I troubleshoot the errors through tls, auto-join and auto-pilot, turn it on one by one. However, this is prety tedious. If one pod is down, you need to unseal the vault with the keys. Hashicorp has moved the auto-unseal support to open source version. Next, we explore the auto unseal with GCP CKMS. Let’s reset for learning purpose.

What does vault need from GCP CKMS? First, a keyring and a key in CKMS. Second, a service account with get/encrypt/decrypt privileges binding with the key (cloudkms.cryptoKeyVersions.{useToEncrypt,useToDecrypt} and cloudkms.cryptoKeys.get). We use gcloud commandline to get those items ready for vault.

Next, we create a secret in the vault namespace just like the TLS secret.

Once we have the key ready in GCP-CKMS, we can config the overrides to add the key.

Note that we add another secret volume in line 22~23 and refer it in line 71~76. This tells vault to use the gcpckms key to seal/unseal the vault. Let’s redeploy the helm chart with overrides, in line 1, and run manual init, in line 4 (we will discuss how to “auto-init” later).

We can see the pods are running fine.

The vault status shows it is using “recovery seal type”, which means it is using ckms.

Stay tuned for “auto-init” part.

--

--

Yuwei Sung

A data nerd started from data center field engineer to cloud database reliability engineer.