Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve direct connections to Antrea API in antctl #5135

Conversation

antoninbas
Copy link
Contributor

@antoninbas antoninbas commented Jun 16, 2023

For some commands (get featuregates, supportbundle, proxy), antctl
connects directly to the Agent / Controller API when it is run from
outside of the cluster.

We try to address some shortcomings in the implementation:

  1. Antctl was giving priority to the Node's InternalIP to determine how
    to connect to the API. This doesn't work when the machine on which
    antctl runs doesn't have connectivity to the InternalIP (e.g., if I
    am running antctl on my laptop and Antrea is installed in an EKS
    cluster). To fix this issue, we instead give priority to the Node's
    ExternalIP.
  2. The connections were always "insecure" (no TLS verification). To fix
    this we need to retrieve the correct CA certificate and use it in the
    client TLS config. For the Controller, the CA certificate is
    available in the kube-ssytem/antrea-ca ConfigMap, which is easy to
    retrieve. For the Agent, the self-signed certificate is now published
    as part of the AntreaAgentInfo CRD (field name APICertData), and
    hence is available to antctl. We use []byte as the field type as it
    feels more common, but string would also have been acceptable for
    that type of data.

An --insecure flag is available for these commands, if users want to
fallback to the previous behavior.

Signed-off-by: Antonin Bas abas@vmware.com

@antoninbas antoninbas marked this pull request as ready for review June 16, 2023 01:47
@antoninbas
Copy link
Contributor Author

@tnqn please let me know if you think this is an acceptable approach
I haven't updated build/charts/antrea/templates/antctl/clusterrole.yaml as it is already very out-of-date. For example it seems to me that getClusterInfo used for supportbundle really needs a lot of permissions that are currently missing:

func getClusterInfo(w io.Writer, k8sClient kubernetes.Interface) error {

I am not sure what we should do there (should we remove build/charts/antrea/templates/antctl/clusterrole.yaml altogether?).

@antoninbas
Copy link
Contributor Author

/test-windows-all
/test-all

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM.

Comment on lines 130 to 132
certPath := path.Join(agentapiserver.CertDirectory, agentapiserver.CertPairName+".crt")
cmd := []string{"cat", certPath}
stdout, _, err := runCommandFromPodWrapper(ctx, client, kubeconfig, podNamespace, podName, "antrea-agent", cmd)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it makes sense to publish the CA certificate to a field of AntreaAgentInfo? Then antctl could get APIPort and CACertificate from the same place, and may be helpful when other clients want to connect agent API securely without requiring the Pod exec permission.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not a bad idea. Let me think about it a bit more, and I'll make the change tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead with that change, thanks for the suggestion.
It is better as it enables support for non-Pod use cases (Windows, Baremetal).
My one concern is that it increases the size of the CRs. I am not sure if it can eventually become an issue for large clusters with many Nodes. As a reminder, we call Update every minute to update the CR. But there is a solution we could consider: using Patch instead of Update so that data that is static / has not changed does not need to be sent to the APIServer. I am not sure whether this is something that has been discussed before?

Copy link
Member

@tnqn tnqn Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be fine. A CA bundle's size is about 1,460 bytes (if we publish CA bundle only). Even in a cluster of 1000 Nodes, it just increases the storage size 1,460 KB * N (a couple of history versions).
Using Patch to update AntreaAgentInfo makes sense to me. I don't remember if we discussed it before, but it was not so important previously when it just contained a few small fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's twice that (about 3KB) with the current bundle (cert + CA) but your observation still holds.

@tnqn
Copy link
Member

tnqn commented Jun 16, 2023

I haven't updated build/charts/antrea/templates/antctl/clusterrole.yaml as it is already very out-of-date. For example it seems to me that getClusterInfo used for supportbundle really needs a lot of permissions that are currently missing:

func getClusterInfo(w io.Writer, k8sClient kubernetes.Interface) error {

I am not sure what we should do there (should we remove build/charts/antrea/templates/antctl/clusterrole.yaml altogether?).

I think we should either add a e2e test to actually validate each command can be executed correctly with an account bound to the clusterrole, or we should remove the clusterrole. But the latter means user can only use admin account to operate antctl, which may be a not good practice. I incline to the former.

@antoninbas
Copy link
Contributor Author

I think we should either add a e2e test to actually validate each command can be executed correctly with an account bound to the clusterrole, or we should remove the clusterrole. But the latter means user can only use admin account to operate antctl, which may be a not good practice. I incline to the former.

I'll open an issue to track this

Copy link
Contributor

@luolanzone luolanzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, two nits.

pkg/antctl/raw/helper.go Outdated Show resolved Hide resolved
pkg/antctl/raw/proxy/command.go Show resolved Hide resolved
@antoninbas antoninbas force-pushed the improve-direct-connections-to-antrea-api-in-antctl branch from ff10769 to 0e04b49 Compare June 16, 2023 19:51
APIPort int `json:"apiPort,omitempty"`
// The self-signed certificate used to serve the Antrea Agent API
APICertData []byte `json:"apiCertData,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind renaming it, since we use this as a CABundle. The fact that the actual certificate is included can be seen as irrelevant.

if secureServingInfo == nil {
return nil
}
cert, _ := secureServingInfo.Cert.CurrentCertKeyContent()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember this will include both the generate cert and the CA cert. Should we extract the CA cert and publish it only to reduce the overall size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I think something like this would work:

block, _ := pem.Decode(cert)
certs := x509.ParseCertificates(block)
intermediates := certs[1:] // distribute this only

I am not sure this is the best approach. It feels like it's more common to distribute the cert "file" as is, in the self-signed case. That's also what we do for the Antrea Controller (when we call dynamiccertificates.NewDynamicCAContentFromFile after generating the certificate). That may be best practice for a standalone self-signed certificate (as opposed to the case where the same self-signed CA is used to generate multiple certificates). I feel more comfortable keeping it this way for now. What do you think?

By renaming the field to APICABundle as you suggested, we are free to change the contents of the published data later and remove the leaf certificate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works for me

For some commands (get featuregates, supportbundle, proxy), antctl
connects directly to the Agent / Controller API when it is run from
outside of the cluster.

We try to address some shortcomings in the implementation:

1) Antctl was giving priority to the Node's InternalIP to determine how
   to connect to the API. This doesn't work when the machine on which
   antctl runs doesn't have connectivity to the InternalIP (e.g., if I
   am running antctl on my laptop and Antrea is installed in an EKS
   cluster). To fix this issue, we instead give priority to the Node's
   ExternalIP.
2) The connections were always "insecure" (no TLS verification). To fix
   this we need to retrieve the correct CA certificate and use it in the
   client TLS config. For the Controller, the CA certificate is
   available in the kube-ssytem/antrea-ca ConfigMap, which is easy to
   retrieve. For the Agent, the self-signed certificate is now published
   as part of the AntreaAgentInfo CRD (field name APICertData), and
   hence is available to antctl. We use `[]byte` as the field type as it
   feels more common, but `string` would also have been acceptable for
   that type of data.

An `--insecure` flag is available for these commands, if users want to
fallback to the previous behavior.

Signed-off-by: Antonin Bas <abas@vmware.com>
Signed-off-by: Antonin Bas <abas@vmware.com>
@antoninbas antoninbas force-pushed the improve-direct-connections-to-antrea-api-in-antctl branch from 0e04b49 to 94eba8b Compare June 20, 2023 18:26
@antoninbas antoninbas requested a review from tnqn June 20, 2023 18:26
@antoninbas antoninbas added api-review Categorizes an issue or PR as actively needing an API review. area/api Issues or PRs related to an API. area/component/antctl Issues or PRs releated to the command line interface component action/release-note Indicates a PR that should be included in release notes. labels Jun 20, 2023
@antoninbas
Copy link
Contributor Author

/test-all

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@antoninbas antoninbas merged commit 191a6ac into antrea-io:main Jun 21, 2023
@antoninbas antoninbas deleted the improve-direct-connections-to-antrea-api-in-antctl branch June 21, 2023 03:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/release-note Indicates a PR that should be included in release notes. api-review Categorizes an issue or PR as actively needing an API review. area/api Issues or PRs related to an API. area/component/antctl Issues or PRs releated to the command line interface component
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants