Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added option to use custom TLS certificates instead of ACME #17

Merged
merged 11 commits into from
Oct 2, 2024

Conversation

kpumuk
Copy link
Contributor

@kpumuk kpumuk commented Sep 20, 2024

In some cases it is needed to use a custom TLS certificate:

  • For Cloudflare origin TLS
  • In a corporate environment with custom certificate authority (CA)
  • In a development environment with self-signed certificate
  • In cloud development environment and end to end testing pipelines
  • Some people still purchase certificates from "trusted vendor™"

With Traefic we had an option to specify a custom certificate, and so this change brings this feature into kamal-proxy.

PS. I am still looking into adding tests, wanted to gauge opinions whether this is a feature you are interested in.
covered by tests now

Related kamal changes

@kpumuk kpumuk force-pushed the custom-tls branch 3 times, most recently from bed2fb7 to a3d4a76 Compare September 23, 2024 10:36
@furkansahin
Copy link

This looks amazing, I was also wondering if it would be possible to define an endpoint to pull certificate and key files from a local endpoint? At @ubicloud, we provide certificates in a local endpoint and customer is requested to pull these files and make use of them. I can also try to create a PR if you prefer.

@kpumuk
Copy link
Contributor Author

kpumuk commented Sep 24, 2024

At @ubicloud, we provide certificates in a local endpoint and customer is requested to pull these files and make use of them.

Do you mean like a local CA, or simply a secrets manager that returns the same certificate on every call?

Edit: Found, https://www.ubicloud.com/docs/quick-start/using-kamal-with-ubicloud

I think with the proposed CertManager interface it will be trivial to implement a caching cert fetcher. Probably out of scope for this MR, unless authors disagree :)

Copy link
Collaborator

@kevinmcconnell kevinmcconnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking at this, @kpumuk! I agree it's something we need. I'd actually been planning to add it myself soon :)

I think the general approach here looks good. Agree that adding tests is a good idea. I've also left a few comments about some specifics. If you want to address those, great. I'd also be happy to take it from here and finish it up if you prefer, as I have some other (unrelated) changes to make around the TLS handling that will probably touch on this too.

Comment on lines 18 to 19
tlsCertificatePath string
tlsPrivateKeyPath string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use the members of the ServiceOptions directly, so don't need to add these here. Similar to how we're handling the ErrorPagePath.

(The other TLS variables here are because they don't exactly match the format that's in the ServiceOptions struct, so they get handled a bit differently).

@@ -33,6 +35,8 @@ func newDeployCommand() *deployCommand {

deployCommand.cmd.Flags().BoolVar(&deployCommand.tls, "tls", false, "Configure TLS for this target (requires a non-empty host)")
deployCommand.cmd.Flags().BoolVar(&deployCommand.tlsStaging, "tls-staging", false, "Use Let's Encrypt staging environment for certificate provisioning")
deployCommand.cmd.Flags().StringVar(&deployCommand.tlsCertificatePath, "tls-certificate-path", "", "")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as above, these can refer to the struct fields like:

&deployCommand.args.ServiceOptions.TLSCertificatePath

Comment on lines 99 to 105
if cmd.Flags().Changed("tls-certificate-path") && !cmd.Flags().Changed("tls-private-key-path") {
return fmt.Errorf("tls-private-key-path must be set when specified tls-certificate-path")
}

if cmd.Flags().Changed("tls-private-key-path") && !cmd.Flags().Changed("tls-certificate-path") {
return fmt.Errorf("tls-certificate-path must be set when specified tls-private-key-path")
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use MarkFlagsRequiredTogether here instead of checking them manually, as it's a case that Cobra can handle. (Similar to how we're using MarkFlagsOneRequired here).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially tried MarkFlagsRequiredTogether, but the error was looking pretty weird:

Error: if any flags in the group [tls-certificate-path tls-private-key-path] are set they must all be set; missing [tls-private-key-path]

Question of usability?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the branch

"tls-private-key-path", m.tlsPrivateKeyFilePath,
)

cert, err := tls.LoadX509KeyPair(m.tlsCertificateFilePath, m.tlsPrivateKeyFilePath)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be called for every TLS handshake, which means the certificate files will be loaded multiple times. Better to load the certificate when the StaticCertManager is established, and reuse it for the lifetime of the deployment.

That would also let us catch a loading error early on, and fail the deployment at that point, which would be safer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would require more changes, as we right now do not expect createCertManager() to ever return an error.

Another option would be to cache the certificate on first use:

diff --git a/internal/server/cert.go b/internal/server/cert.go
index bb1580d..afaf740 100644
--- a/internal/server/cert.go
+++ b/internal/server/cert.go
@@ -9,9 +9,11 @@ type CertManager interface {
 	GetCertificate(hello *tls.ClientHelloInfo) (*tls.Certificate, error)
 }
 
+// StaticCertManager is a certificate manager that loads certificates from disk.
 type StaticCertManager struct {
 	tlsCertificateFilePath string
 	tlsPrivateKeyFilePath  string
+	cert                   *tls.Certificate
 }
 
 func NewStaticCertManager(tlsCertificateFilePath, tlsPrivateKeyFilePath string) *StaticCertManager {
@@ -22,6 +24,10 @@ func NewStaticCertManager(tlsCertificateFilePath, tlsPrivateKeyFilePath string)
 }
 
 func (m *StaticCertManager) GetCertificate(*tls.ClientHelloInfo) (*tls.Certificate, error) {
+	if m.cert != nil {
+		return m.cert, nil
+	}
+
 	slog.Info(
 		"Loading custom TLS certificate",
 		"tls-certificate-path", m.tlsCertificateFilePath,
@@ -32,6 +38,7 @@ func (m *StaticCertManager) GetCertificate(*tls.ClientHelloInfo) (*tls.Certifica
 	if err != nil {
 		return nil, err
 	}
+	m.cert = &cert
 
-	return &cert, nil
+	return m.cert, nil
 }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied the patch and added a test. Let me know what you think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this, might need a mutex since there might be multiple procs loading certs at the same time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed, will work on a fix:

	manager := NewStaticCertManager(certPath, keyPath)
	go func() {
		manager.GetCertificate(&tls.ClientHelloInfo{})
	}()
	cert, err := manager.GetCertificate(&tls.ClientHelloInfo{})

Output of go test ./... -race:

WARNING: DATA RACE
Write at 0x00c0001e8e00 by goroutine 23:
  github.com/basecamp/kamal-proxy/internal/server.(*StaticCertManager).GetCertificate()
      /Users/dmytro/work/github/kamal-proxy/internal/server/cert.go:41 +0x254
  github.com/basecamp/kamal-proxy/internal/server.TestCertificateLoadingRaceCondition.func1()
      /Users/dmytro/work/github/kamal-proxy/internal/server/cert_test.go:49 +0x7c

Previous write at 0x00c0001e8e00 by goroutine 22:
  github.com/basecamp/kamal-proxy/internal/server.(*StaticCertManager).GetCertificate()
      /Users/dmytro/work/github/kamal-proxy/internal/server/cert.go:41 +0x254
  github.com/basecamp/kamal-proxy/internal/server.TestCertificateLoadingRaceCondition()
      /Users/dmytro/work/github/kamal-proxy/internal/server/cert_test.go:51 +0x270
  testing.tRunner()
      /Users/dmytro/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.1.darwin-arm64/src/testing/testing.go:1690 +0x184
  testing.(*T).Run.gowrap1()
      /Users/dmytro/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.1.darwin-arm64/src/testing/testing.go:1743 +0x40

Copy link
Contributor Author

@kpumuk kpumuk Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduced an RWMutex in f1df35f to address it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option to avoid the mutex would be to attempt the file load in the NewStaticCertManager, and then save both the error and the response in the struct. GetCertificate would just return both, avoiding a delayed load, the need for mutex, etc.

This design might limit the options for (potential) future changes, like reloading the certificate on file change.

@furkansahin
Copy link

@kpumuk yes, I was specifically asking to eliminate the step https://www.ubicloud.com/docs/quick-start/using-kamal-with-ubicloud#step-9 :) It would make life much easier.

It makes sense that this is out of scope for this PR. I'll work on it and hopefully it's something useful to the community.

@kevinmcconnell
Copy link
Collaborator

I was also wondering if it would be possible to define an endpoint to pull certificate and key files from a local endpoint?

I don't think the proxy should be responsible for this. Using a Kamal hook seems like a better option to me. Or, if it turned out to be a common enough requirement, then perhaps it could even be a feature on the Kamal side. But I'd rather the proxy not be involved with gathering artefacts like this; they should be supplied to it wherever possible.

@kpumuk
Copy link
Contributor Author

kpumuk commented Sep 28, 2024

@kevinmcconnell seems like the documentation change was merged ahead of the functional change :-)

Do you want to change the initialization interface in this MR to return the error, or keep the mutex and refactor later?

Edit: might have been merged after this thread.

@kpumuk
Copy link
Contributor Author

kpumuk commented Sep 28, 2024

Benchmark WITHOUT mutex:

Running 30s test @ https://kamal.test:8091/
  5 threads and 5 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   451.35us  162.60us   7.68ms   93.69%
    Req/Sec     2.24k   101.26     2.46k    81.07%
  334834 requests in 30.00s, 264.72MB read
Requests/sec:  11159.76
Transfer/sec:      8.82MB

Benchmark WITH mutex:

Running 30s test @ https://kamal.test:8091/
  5 threads and 5 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   446.87us  144.43us   7.29ms   91.86%
    Req/Sec     2.26k    99.38     2.54k    83.11%
  338088 requests in 30.10s, 267.29MB read
Requests/sec:  11232.35
Transfer/sec:      8.88MB

So the only concern is about ergonomics — when the error is shown for inaccessible cert files - during deploy or during the first request.

@kpumuk
Copy link
Contributor Author

kpumuk commented Sep 30, 2024

@kevinmcconnell I have rebased the branch on top of recent changes, which required to add HTTPHandler to the certificate manager interface. Steps to hook it all up:

basecamp/kamal#969 (comment)

This requires two changes on kamal side:

  • Add new options (done)
  • Enable volumes directive for kamal-proxy to mount certificates

What do you think? Any ideas on how to simplify the setup?

Edit: Realized that proxy is shared and so might not be the brightest idea if every service mounted a volume .into the container...

Edit 2: Another option is to copy certs directly to the kamal-proxy config volume:

#!/bin/sh

set -euo pipefail

KAMAL_PROXY_TLS_CERT=$(op read "op://Private/Kamal Demo/cert.pem")
KAMAL_PROXY_TLS_PRIVATE_KEY=$(op read "op://Private/Kamal Demo/key.pem")

for ip in ${KAMAL_HOSTS//,/ }; do
  ssh -q -T -o BatchMode=yes ubuntu@"${ip}" bash --noprofile <<-EOF
    docker run --rm --volume kamal-proxy-config:/home/kamal-proxy/.config/kamal-proxy ubuntu:noble-20240801 sh -c "
      mkdir -p /home/kamal-proxy/.config/kamal-proxy/certs
      echo '${KAMAL_PROXY_TLS_CERT}' > /home/kamal-proxy/.config/kamal-proxy/certs/cert.pem
      echo '${KAMAL_PROXY_TLS_PRIVATE_KEY}' > /home/kamal-proxy/.config/kamal-proxy/certs/key.pem
    "
EOF
done

This can be used to improve ergonomics of the command, and in addition to low-level API to pass keys directly, Kamal can do something like:

proxy:
  ssl: true
  ssl_certificate: <%= KAMAL_PROXY_TLS_CERT %>
  ssl_private_key: <%= KAMAL_PROXY_TLS_PRIVATE_KEY %>

This way the values can come from a secret manager, and Kamal can store them in /home/kamal-proxy/.config/kamal-proxy/certs/<%= KAMAL_SERVICE %>/{cert,key}.pem automatically. This would eliminate the need of a custom hook and docker manipulation.

@kevinmcconnell kevinmcconnell mentioned this pull request Oct 1, 2024
@kevinmcconnell
Copy link
Collaborator

This all looks great, @kpumuk! I have an idea in mind for how to handle the errors earlier in the deploy process as we discussed, but that also has a few knock-on effects I want to consider. So I'll get this PR merged first as-is, and then I'll explore that separately. It doesn't need to hold up this feature.

This can be used to improve ergonomics of the command, and in addition to low-level API to pass keys directly, Kamal can do something like:
...

Yes, agreed. Using the existing mounted volume makes sense, and we can have Kamal scope it to the service name to avoid conflicts. (We'll need to do something similar for the --error-pages flag anyway). Treating the cert/key content as secrets rather than requiring file paths is a nice idea too, that seems much more convenient. /cc @djmb

@kevinmcconnell kevinmcconnell merged commit f6086a7 into basecamp:main Oct 2, 2024
1 check passed
@kevinmcconnell
Copy link
Collaborator

Thanks for your work on this, @kpumuk! 🙏

@djmb
Copy link
Contributor

djmb commented Oct 2, 2024

I've reverted that documentation change for now. We'd need to make it differently anyway as configuration docs are generated from proxy.yml by the bin/docs script in Kamal.

@kpumuk kpumuk deleted the custom-tls branch October 2, 2024 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants