Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(deployment) Replace minio with seaweedfs as object store #10998

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

pschoen-itsc
Copy link

Description of your changes:

Replace the old minio deployment with seaweedfs as object store. Seaweedfs is licensed under Apache 2.0 and supports ACLs inside the buckets, so you can give read / write permissions for paths only to specific users.

Checklist:

Copy link

Hi @pschoen-itsc. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign zijianjoy for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@pschoen-itsc
Copy link
Author

For now I just replaced to minio component in one of the overlays. Don't know what would be the best way to integrate this. New overlays for seaweedfs?
Also how should we design the service? For now I continue using minio-service, because it is used statically a lot in the project. But we can also replace every occurrence.

@HumairAK
Copy link
Collaborator

HumairAK commented Jul 11, 2024

@pschoen-itsc I think this definitely looks like something we should bring up in the KFP meeting for a wider discussion.

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Jul 11, 2024

@juliusvonkohout
Copy link
Member

Ah and I forgot

  • simple horizontal scalability by increasing replicas
  • can we scale with read-write-once PVCs as well?
  • should it be a statefulset or a deployment?
  • does it support something similar to minios deprecated gateway mode?

@gregsheremeta
Copy link
Contributor

gregsheremeta commented Jul 12, 2024

Back in March, I did some exploration into a minio replacement for the KFP community. I came up with a rough list of requirements, and then charted out various alternatives against those requirements.

(Note: these requirements are not authoritative. I mostly came up with them myself as a way to compare alternatives)

Requirements, from a Kubeflow Pipelines perspective:

R0 Can store and retrieve objects using S3 API
R1 Permissive license (Apache 2.0 is best, GPLv3 might be ok but would prefer to avoid, AGPL is not ok)
R2 Works in disconnected / on-prem environments (backed by local storage, e.g. PVCs)
R3 In cloud environments, can pass through to GCS or S3
R4 Can be automatically installed with a Kubeflow installer / no extra steps required
?? R5 Supports a developer mode where it is lightweight, low overhead (single pod is best).
Does this matter for production use cases? E.g. if I'm a large corporation storing petabytes worth of models, I want clustered object storage or possibly not even on prem.
R6 Doesn't require an operator / can be installed via simple manifests
?? Rn7 Shared server / zero-overhead namespaces
?? Rn8 Private namespace-scoped storage (e.g. global shared bucket with namespace-scoped subfolders)

This is the chart I came up with:
image

@pschoen-itsc would you be so kind as to give a quick look at these requirements and tell us how you think SeaweedFS does at fulfilling them? If we were to add SeaweedFS as a row 8 in that table, what would be in the columns? Thanks.

@pschoen-itsc
Copy link
Author

Back in March, I did some exploration into a minio replacement for the KFP community. I came up with a rough list of requirements, and then charted out various alternatives against those requirements.

(Note: these requirements are not authoritative. I mostly came up with them myself as a way to compare alternatives)

Requirements, from a Kubeflow Pipelines perspective:

R0 Can store and retrieve objects using S3 API R1 Permissive license (Apache 2.0 is best, GPLv3 might be ok but would prefer to avoid, AGPL is not ok) R2 Works in disconnected / on-prem environments (backed by local storage, e.g. PVCs) R3 In cloud environments, can pass through to GCS or S3 R4 Can be automatically installed with a Kubeflow installer / no extra steps required ?? R5 Supports a developer mode where it is lightweight, low overhead (single pod is best). Does this matter for production use cases? E.g. if I'm a large corporation storing petabytes worth of models, I want clustered object storage or possibly not even on prem. R6 Doesn't require an operator / can be installed via simple manifests ?? Rn7 Shared server / zero-overhead namespaces ?? Rn8 Private namespace-scoped storage (e.g. global shared bucket with namespace-scoped subfolders)

This is the chart I came up with: image

@pschoen-itsc would you be so kind as to give a quick look at these requirements and tell us how you think SeaweedFS does at fulfilling them? If we were to add SeaweedFS as a row 8 in that table, what would be in the columns? Thanks.

R0: Yes (tested)
R1: Yes (Apache 2.0)
R2: Yes
R3: Yes (Documented 1 2 )
R4: Yes
R5: Yes (Deployment with single container tested, scaling horizontally is documented)
R6: Yes (tested)
R7: Yes
R8: Yes (tested)

@pschoen-itsc
Copy link
Author

pschoen-itsc commented Jul 17, 2024

There is also a CLI to configure S3 access. SeaweedFs uses this in their own helm charts to set some settings after deployment. They create a Job which then connect to the running master / cluster and use the CLI 1 . So I think we could do something similar with the existing sync.py script.

I would say we generate the credentials in sync.py , then create the secret in the new namespace and create the identity in seaweedfs with the permissions:

  • List:kubeflow
  • Read:kubeflow/project1/*
  • Write:kubeflow/project1/*

where "kubeflow" is the bucket name and "project1" the project name. Listing of objects is only possible to set globally, but as far as I know, this is just a limitation of S3.

  • lets use the same service and secret names as before. e.g. minio-secret

Service is no problem of course. And also secret should work, because when we use the CLI for dynamic user creation, seaweedfs does not have to read the secret. Changing credentials would then require manual intervention.

  • performance numbers

There are some in the official docs (could be biased) and also some more independent testing. It depends on how you set it up, just a single service with all components or you scale it horizontally with multiple filer / volume nodes. They provide helm charts for setting this up, so I could do my own tests if needed.

  • GUI with upload and download functionality (UI around their webdav interface) in the centraldashboard menu
  • Architectural proposal and diagrams

There are multiple options. Depends own desired performance / fault tolerance / resource usage. You have the 3 main services master, volume and filer which can be scaled independently or you can have everything in one container. You could even deploy separate gateways for S3 / WebDAV / etc. instead of let this run on the filer.
Here they tested with just a single instance writing 100 millions of files with solid performance and resource usage. Downside in this setup would be that after a restart the inspection of the volumes and indexes takes minutes. With a more distributed setup I assume you could reduce this time or even keep the service available the whole time.

@pschoen-itsc
Copy link
Author

pschoen-itsc commented Jul 17, 2024

Ah and I forgot

  • simple horizontal scalability by increasing replicas

Their helm chart looks like this is given. But of course some testing should be done.

  • can we scale with read-write-once PVCs as well?

As far as I understand, yes.

  • should it be a statefulset or a deployment?

According to their helm charts all services (except a dedicated S3 service) should be statefulsets.

  • does it support something similar to minios deprecated gateway mode?

Yes, see my comment to R3 above.

@juliusvonkohout
Copy link
Member

I would say we generate the credentials in sync.py , then create the secret in the new namespace and create the identity in seaweedfs with the permissions:

* List:kubeflow

* Read:kubeflow/project1/*

* Write:kubeflow/project1/*

where "kubeflow" is the bucket name and "project1" the project name. Listing of objects is only possible to set globally, but as far as I know, this is just a limitation of S3.

Can you elaborate a bit more on the global Listing?
I am fine with dropping the global listing, if they can still list their namespace (aws:username) folder.
Can we achieve

                            "arn:aws:s3:::%s/artifacts/*"  % shared_bucket_name, # old shared artifacts for backwards compatibility
                            "arn:aws:s3:::%s/private-artifacts/${aws:username}/*"  % shared_bucket_name, # private artifacts
                            "arn:aws:s3:::%s/private/${aws:username}/*"  % shared_bucket_name, # private storage
                            "arn:aws:s3:::%s/shared/*"  % shared_bucket_name # shared storage for collaboration

with read/write/list access ?

  • lets use the same service and secret names as before. e.g. minio-secret

Service is no problem of course. And also secret should work, because when we use the CLI for dynamic user creation, seaweedfs does not have to read the secret. Changing credentials would then require manual intervention.

Who is then able to modify ACLs?
Can we keep an admin secret in the Kubeflow folder that is used in the sync.py script to change permissions?

  • GUI with upload and download functionality (UI around their webdav interface) in the centraldashboard menu
  • Architectural proposal and diagrams

There are multiple options. Depends own desired performance / fault tolerance / resource usage. You have the 3 main services master, volume and filer which can be scaled independently or you can have everything in one container. You could even deploy separate gateways for S3 / WebDAV / etc. instead of let this run on the filer. Here they tested with just a single instance writing 100 millions of files with solid performance and resource usage. Downside in this setup would be that after a restart the inspection of the volumes and indexes takes minutes. With a more distributed setup I assume you could reduce this time or even keep the service available the whole time.

Can we use a single statefulset that starts a pod with all three services and scale this statefulset or is scaling only possible with three statefulsets (master, volume, filer)?

Is 32 GB the file size limit for the time being?

@pschoen-itsc
Copy link
Author

I would say we generate the credentials in sync.py , then create the secret in the new namespace and create the identity in seaweedfs with the permissions:

* List:kubeflow

* Read:kubeflow/project1/*

* Write:kubeflow/project1/*

where "kubeflow" is the bucket name and "project1" the project name. Listing of objects is only possible to set globally, but as far as I know, this is just a limitation of S3.

Can you elaborate a bit more on the global Listing? I am fine with dropping the global listing, if they can still list their namespace (aws:username) folder. Can we achieve

                            "arn:aws:s3:::%s/artifacts/*"  % shared_bucket_name, # old shared artifacts for backwards compatibility
                            "arn:aws:s3:::%s/private-artifacts/${aws:username}/*"  % shared_bucket_name, # private artifacts
                            "arn:aws:s3:::%s/private/${aws:username}/*"  % shared_bucket_name, # private storage
                            "arn:aws:s3:::%s/shared/*"  % shared_bucket_name # shared storage for collaboration

with read/write/list access ?

Because ListObjects is a bucket level operation it is not possible to control the permissions via the resource, like you can do it with put / read. In AWS S3 there seems to be a way to implement this using IAM policies with conditions. But they also have to first give the user listing permissions for the whole bucket.

@juliusvonkohout
Copy link
Member

I would say we generate the credentials in sync.py , then create the secret in the new namespace and create the identity in seaweedfs with the permissions:

* List:kubeflow

* Read:kubeflow/project1/*

* Write:kubeflow/project1/*

where "kubeflow" is the bucket name and "project1" the project name. Listing of objects is only possible to set globally, but as far as I know, this is just a limitation of S3.

Can you elaborate a bit more on the global Listing? I am fine with dropping the global listing, if they can still list their namespace (aws:username) folder. Can we achieve

                            "arn:aws:s3:::%s/artifacts/*"  % shared_bucket_name, # old shared artifacts for backwards compatibility
                            "arn:aws:s3:::%s/private-artifacts/${aws:username}/*"  % shared_bucket_name, # private artifacts
                            "arn:aws:s3:::%s/private/${aws:username}/*"  % shared_bucket_name, # private storage
                            "arn:aws:s3:::%s/shared/*"  % shared_bucket_name # shared storage for collaboration

with read/write/list access ?

Because ListObjects is a bucket level operation it is not possible to control the permissions via the resource, like you can do it with put / read. In AWS S3 there seems to be a way to implement this using IAM policies with conditions. But they also have to first give the user listing permissions for the whole bucket.

Can we live without the list operation? is it enough to have read/write?

@juliusvonkohout
Copy link
Member

@pschoen-itsc can you create a PR with customize components for kubeflow/manifests/contrib/seaweedfs ? Then we can start there with integration testing.

@juliusvonkohout
Copy link
Member

/ok-to-test

@pschoen-itsc
Copy link
Author

@pschoen-itsc can you create a PR with customize components for kubeflow/manifests/contrib/seaweedfs ? Then we can start there with integration testing.

Yes, will do it tomorrow.

Copy link

@pschoen-itsc: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
kubeflow-pipeline-upgrade-test af854ab link false /test kubeflow-pipeline-upgrade-test
kubeflow-pipelines-manifests af854ab link true /test kubeflow-pipelines-manifests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@thesuperzapper
Copy link
Member

@pschoen-itsc I also want to confirm that SeaweedFS would support IAM policies that restrict object access based on key prefixes?

For example, a profile might use an IAM policy that looks like this, which only allows them to read/write objects under a specific path:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetBucketLocation",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::<BUCKET_NAME>"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject"
      ],
      "Resource": [
        "arn:aws:s3:::<BUCKET_NAME>/artifacts/<PROFILE_NAME>/*",
        "arn:aws:s3:::<BUCKET_NAME>/v2/artifacts/<PROFILE_NAME>/*"
      ]
    }
  ]
}

@pschoen-itsc
Copy link
Author

@pschoen-itsc I also want to confirm that SeaweedFS would support IAM policies that restrict object access based on key prefixes?

For example, a profile might use an IAM policy that looks like this, which only allows them to read/write objects under a specific path:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetBucketLocation",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::<BUCKET_NAME>"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject"
      ],
      "Resource": [
        "arn:aws:s3:::<BUCKET_NAME>/artifacts/<PROFILE_NAME>/*",
        "arn:aws:s3:::<BUCKET_NAME>/v2/artifacts/<PROFILE_NAME>/*"
      ]
    }
  ]
}

Object access based on key prefixes is supported. So you can set equivalent ACLs for the provided example. There is just no separation of PUT and DELETE object, just Write.

@pschoen-itsc
Copy link
Author

I would say we generate the credentials in sync.py , then create the secret in the new namespace and create the identity in seaweedfs with the permissions:

* List:kubeflow

* Read:kubeflow/project1/*

* Write:kubeflow/project1/*

where "kubeflow" is the bucket name and "project1" the project name. Listing of objects is only possible to set globally, but as far as I know, this is just a limitation of S3.

Can you elaborate a bit more on the global Listing? I am fine with dropping the global listing, if they can still list their namespace (aws:username) folder. Can we achieve

                            "arn:aws:s3:::%s/artifacts/*"  % shared_bucket_name, # old shared artifacts for backwards compatibility
                            "arn:aws:s3:::%s/private-artifacts/${aws:username}/*"  % shared_bucket_name, # private artifacts
                            "arn:aws:s3:::%s/private/${aws:username}/*"  % shared_bucket_name, # private storage
                            "arn:aws:s3:::%s/shared/*"  % shared_bucket_name # shared storage for collaboration

with read/write/list access ?

Because ListObjects is a bucket level operation it is not possible to control the permissions via the resource, like you can do it with put / read. In AWS S3 there seems to be a way to implement this using IAM policies with conditions. But they also have to first give the user listing permissions for the whole bucket.

Can we live without the list operation? is it enough to have read/write?

Currently the list operation is used by KFP at least for retrieving input artifacts. I tested this for kubeflow/manifests#2826. If you really just want to PUT and GET objects you don't need it. But that should then be in the implementation of argoexec.

@juliusvonkohout
Copy link
Member

Currently the list operation is used by KFP at least for retrieving input artifacts. I tested this for kubeflow/manifests#2826. If you really just want to PUT and GET objects you don't need it. But that should then be in the implementation of argoexec.

Its fine for cluster.local/ns/kubeflow/sa/ml-pipeline and other admin level service account such as the Kubeflow UI. We just need to prevent that user workloads can access other users artifacts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants