Automating Internal Certificate Issuance With ACME-based Certificate Authority

This blog is about the certificate authority (CA) that we brought to life in Deutsche Telekom Pan-Net and were running for a few years as a virtual machine (VM) deployment. We have recently moved to a Kubernetes and I cleaned up old deployment consisting of around 40 VMs that still lived in the OpenStack. I took this cleanup procedure as an opportunity to reflect on what we actually did there.
To lower the subsequent operational burden, we have decided to use a certificate authority that leverage Automated Certificate Management Environment (ACME) protocol for automated certificate issuance and renewals. The most famous CA that use the ACME protocol is Let’s Encrypt Boulder and it was the only one with accessible source code, when we started doing proof-of-concepts in 2018. Which is no surprise as the Let’s Encrypt is behind the ACME standard itself.
I am in a no way expert on the Boulder code-base so take a description of the software in this post with a grain of salt. The second important fact is, that the Boulder version I am familiar with is an old version back from the 2019.

Architecture

The Boulder certificate authority has a microservice architecture consisting of multiple services, each handling critical part of a job. Those services are communicating with each other via mutually authenticated, TLS secured gRPC tunnels. That means gRPC itself needs it’s own certificate authority that can sign certificates for each Boulder component. For testing, Let’s Encrypt engineers are using minica, which is easy to use private certificate authority in scenarios, where all the services that needs TLS certificate are under your operation.

Only two Boulder components need inbound connection, those are Web Front End API and the Online Certificate Status Protocol (OCSP) endpoint. Publisher component needs outgoing connection to publish data to a certificate transparency (CT) logs, but this depends on your actual needs. To run a CA you must also have a SQL database, some kind of hardware encryption module and external CT log API available, so you can publish signed certificates for external audits. Of course, if you are running internal CA you doesn’t need a real CT log API, but the software stack expects it to be available, otherwise issuing is not possible without the source code changes.

Boulder Certificate Authority Architecture

Components:

Deployment and Operation

Main operations include the operating system (OS) upgrades and Boulder binary upgrades. Another important operation is an actual key management, where you need to rotate certificate of the certificate authority itself.
We have used Terraform OpenStack Provider to deploy our VMs. We have used blue/green deployment strategy so the OS or Boulder upgrades became routine procedure. Provisioning would create a new blue/green VM deployment with completely new OS image and software version, then pipeline test component availability and an actual certificate issuance. If everything works, we would switch the active color on a load balancers and subsequently destroy the old infrastructure. The Boulder code base is developed in a professional way, where each new software version supports config options from the previous one. In this way, each upgrade can be easily rolled back if some problem happens. Database migrations are also released in a way, where the current migration just extends tables in a way that the previous Boulder version can still use such SQL schema. How LE engineers handle code changes can be learned by reading the contribution guide.

GitLab pipeline used for boulder VM deployment

For the CA key lifecycle, more planning is required. During the change, you need both, old key and the new key available, as the OCSP endpoints still need to issue valid statements about the leaf certificates issued by the previous CA key. The easiest method is to have separate OCSP service using its own DNS address for each CA key. You also need to maintain required components, like the CA, for each key.

Closing Thoughts

Although the Let’s Encrypt Boulder is a production software, I do not recommend using it for an internal deployment, unless you seriously consider allocating full time engineers to deploy and maintain the project. The reason is that the Boulder is primarily a code base for the Let’s Encrypt public certificate authority and as such, development moves towards targets defined by the Internet Security Research Group. You can’t expect fast support except for the cases where you find a critical bug.
These days, there are alternatives that are better documented and commercially supported like the Smallstep ACME Registration Authority, Primekey EJBCA and maybe others, where you does not need to watch for the code changes and read the commit messages.