feat(deployment) Replace minio with seaweedfs as object store #10998

pschoen-itsc · 2024-07-11T13:33:08Z

Description of your changes:

Replace the old minio deployment with seaweedfs as object store. Seaweedfs is licensed under Apache 2.0 and supports ACLs inside the buckets, so you can give read / write permissions for paths only to specific users.

Checklist:

The title for your pull request (PR) should follow our title convention. Learn more about the pull request title convention used in this repository.

google-oss-prow · 2024-07-11T13:33:19Z

Hi @pschoen-itsc. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

google-oss-prow · 2024-07-11T13:33:49Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign zijianjoy for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

manifests/kustomize/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pschoen-itsc · 2024-07-11T13:49:52Z

For now I just replaced to minio component in one of the overlays. Don't know what would be the best way to integrate this. New overlays for seaweedfs?
Also how should we design the service? For now I continue using minio-service, because it is used statically a lot in the project. But we can also replace every occurrence.

HumairAK · 2024-07-11T15:03:10Z

@pschoen-itsc I think this definitely looks like something we should bring up in the KFP meeting for a wider discussion.

juliusvonkohout · 2024-07-11T15:14:10Z

Lets try a staging are in https://github.com/kubeflow/manifests/tree/master/contrib
Can you use Kustomize components there https://github.com/kubernetes-sigs/kustomize/blob/master/examples/components.md
we need to be able add users and ACLs at runtime in https://github.com/kubeflow/pipelines/blob/master/manifests/kustomize/base/installs/multi-user/pipelines-profile-controller/sync.py
How can we implement something like feat(backend): isolate artifacts per namespace/profile/user using only one bucket #7725 with seaweedfs (on-demand ACLs and new folders and different credentials per namespace per Seaweedfs rest API calls from sync.py)
lets use the same service and secret names as before. e.g. minio-secret
performance numbers
GUI with upload and download functionality (UI around their webdav interface) in the centraldashboard menu
Architectural proposal and diagrams
How active and reliable is the seaweedfs community? is there a license change going on, is there a commercial backer?
You need to sign the DCO
You should also present this in the KFP meeting

juliusvonkohout · 2024-07-11T17:31:28Z

Ah and I forgot

simple horizontal scalability by increasing replicas
can we scale with read-write-once PVCs as well?
should it be a statefulset or a deployment?
does it support something similar to minios deprecated gateway mode?

gregsheremeta · 2024-07-12T15:17:09Z

Back in March, I did some exploration into a minio replacement for the KFP community. I came up with a rough list of requirements, and then charted out various alternatives against those requirements.

(Note: these requirements are not authoritative. I mostly came up with them myself as a way to compare alternatives)

Requirements, from a Kubeflow Pipelines perspective:

R0 Can store and retrieve objects using S3 API
R1 Permissive license (Apache 2.0 is best, GPLv3 might be ok but would prefer to avoid, AGPL is not ok)
R2 Works in disconnected / on-prem environments (backed by local storage, e.g. PVCs)
R3 In cloud environments, can pass through to GCS or S3
R4 Can be automatically installed with a Kubeflow installer / no extra steps required
?? R5 Supports a developer mode where it is lightweight, low overhead (single pod is best).
Does this matter for production use cases? E.g. if I'm a large corporation storing petabytes worth of models, I want clustered object storage or possibly not even on prem.
R6 Doesn't require an operator / can be installed via simple manifests
?? Rn7 Shared server / zero-overhead namespaces
?? Rn8 Private namespace-scoped storage (e.g. global shared bucket with namespace-scoped subfolders)

This is the chart I came up with:

@pschoen-itsc would you be so kind as to give a quick look at these requirements and tell us how you think SeaweedFS does at fulfilling them? If we were to add SeaweedFS as a row 8 in that table, what would be in the columns? Thanks.

pschoen-itsc · 2024-07-17T07:25:10Z

Back in March, I did some exploration into a minio replacement for the KFP community. I came up with a rough list of requirements, and then charted out various alternatives against those requirements.

(Note: these requirements are not authoritative. I mostly came up with them myself as a way to compare alternatives)

Requirements, from a Kubeflow Pipelines perspective:

R0 Can store and retrieve objects using S3 API R1 Permissive license (Apache 2.0 is best, GPLv3 might be ok but would prefer to avoid, AGPL is not ok) R2 Works in disconnected / on-prem environments (backed by local storage, e.g. PVCs) R3 In cloud environments, can pass through to GCS or S3 R4 Can be automatically installed with a Kubeflow installer / no extra steps required ?? R5 Supports a developer mode where it is lightweight, low overhead (single pod is best). Does this matter for production use cases? E.g. if I'm a large corporation storing petabytes worth of models, I want clustered object storage or possibly not even on prem. R6 Doesn't require an operator / can be installed via simple manifests ?? Rn7 Shared server / zero-overhead namespaces ?? Rn8 Private namespace-scoped storage (e.g. global shared bucket with namespace-scoped subfolders)

This is the chart I came up with:

@pschoen-itsc would you be so kind as to give a quick look at these requirements and tell us how you think SeaweedFS does at fulfilling them? If we were to add SeaweedFS as a row 8 in that table, what would be in the columns? Thanks.

R0: Yes (tested)
R1: Yes (Apache 2.0)
R2: Yes
R3: Yes (Documented 1 2 )
R4: Yes
R5: Yes (Deployment with single container tested, scaling horizontally is documented)
R6: Yes (tested)
R7: Yes
R8: Yes (tested)

pschoen-itsc · 2024-07-17T16:03:08Z

we need to be able add users and ACLs at runtime in https://github.com/kubeflow/pipelines/blob/master/manifests/kustomize/base/installs/multi-user/pipelines-profile-controller/sync.py

There is also a CLI to configure S3 access. SeaweedFs uses this in their own helm charts to set some settings after deployment. They create a Job which then connect to the running master / cluster and use the CLI 1 . So I think we could do something similar with the existing sync.py script.

How can we implement something like feat(backend): isolate artifacts per namespace/profile/user using only one bucket #7725 with seaweedfs (on-demand ACLs and new folders and different credentials per namespace per Seaweedfs rest API calls from sync.py)

I would say we generate the credentials in sync.py , then create the secret in the new namespace and create the identity in seaweedfs with the permissions:

List:kubeflow
Read:kubeflow/project1/*
Write:kubeflow/project1/*

where "kubeflow" is the bucket name and "project1" the project name. Listing of objects is only possible to set globally, but as far as I know, this is just a limitation of S3.

lets use the same service and secret names as before. e.g. minio-secret

Service is no problem of course. And also secret should work, because when we use the CLI for dynamic user creation, seaweedfs does not have to read the secret. Changing credentials would then require manual intervention.

performance numbers

There are some in the official docs (could be biased) and also some more independent testing. It depends on how you set it up, just a single service with all components or you scale it horizontally with multiple filer / volume nodes. They provide helm charts for setting this up, so I could do my own tests if needed.

GUI with upload and download functionality (UI around their webdav interface) in the centraldashboard menu

Architectural proposal and diagrams

There are multiple options. Depends own desired performance / fault tolerance / resource usage. You have the 3 main services master, volume and filer which can be scaled independently or you can have everything in one container. You could even deploy separate gateways for S3 / WebDAV / etc. instead of let this run on the filer.
Here they tested with just a single instance writing 100 millions of files with solid performance and resource usage. Downside in this setup would be that after a restart the inspection of the volumes and indexes takes minutes. With a more distributed setup I assume you could reduce this time or even keep the service available the whole time.

pschoen-itsc · 2024-07-17T16:14:08Z

Ah and I forgot

simple horizontal scalability by increasing replicas

Their helm chart looks like this is given. But of course some testing should be done.

can we scale with read-write-once PVCs as well?

As far as I understand, yes.

should it be a statefulset or a deployment?

According to their helm charts all services (except a dedicated S3 service) should be statefulsets.

does it support something similar to minios deprecated gateway mode?

Yes, see my comment to R3 above.

juliusvonkohout · 2024-07-23T07:52:09Z

I would say we generate the credentials in sync.py , then create the secret in the new namespace and create the identity in seaweedfs with the permissions:
* List:kubeflow

* Read:kubeflow/project1/*

* Write:kubeflow/project1/*
where "kubeflow" is the bucket name and "project1" the project name. Listing of objects is only possible to set globally, but as far as I know, this is just a limitation of S3.

Can you elaborate a bit more on the global Listing?
I am fine with dropping the global listing, if they can still list their namespace (aws:username) folder.
Can we achieve

                            "arn:aws:s3:::%s/artifacts/*"  % shared_bucket_name, # old shared artifacts for backwards compatibility
                            "arn:aws:s3:::%s/private-artifacts/${aws:username}/*"  % shared_bucket_name, # private artifacts
                            "arn:aws:s3:::%s/private/${aws:username}/*"  % shared_bucket_name, # private storage
                            "arn:aws:s3:::%s/shared/*"  % shared_bucket_name # shared storage for collaboration

with read/write/list access ?

lets use the same service and secret names as before. e.g. minio-secret

Service is no problem of course. And also secret should work, because when we use the CLI for dynamic user creation, seaweedfs does not have to read the secret. Changing credentials would then require manual intervention.

Who is then able to modify ACLs?
Can we keep an admin secret in the Kubeflow folder that is used in the sync.py script to change permissions?

GUI with upload and download functionality (UI around their webdav interface) in the centraldashboard menu

Architectural proposal and diagrams

There are multiple options. Depends own desired performance / fault tolerance / resource usage. You have the 3 main services master, volume and filer which can be scaled independently or you can have everything in one container. You could even deploy separate gateways for S3 / WebDAV / etc. instead of let this run on the filer. Here they tested with just a single instance writing 100 millions of files with solid performance and resource usage. Downside in this setup would be that after a restart the inspection of the volumes and indexes takes minutes. With a more distributed setup I assume you could reduce this time or even keep the service available the whole time.

Can we use a single statefulset that starts a pod with all three services and scale this statefulset or is scaling only possible with three statefulsets (master, volume, filer)?

Is 32 GB the file size limit for the time being?

pschoen-itsc · 2024-07-23T08:18:08Z

I would say we generate the credentials in sync.py , then create the secret in the new namespace and create the identity in seaweedfs with the permissions:
* List:kubeflow

* Read:kubeflow/project1/*

* Write:kubeflow/project1/*
where "kubeflow" is the bucket name and "project1" the project name. Listing of objects is only possible to set globally, but as far as I know, this is just a limitation of S3.
Can you elaborate a bit more on the global Listing? I am fine with dropping the global listing, if they can still list their namespace (aws:username) folder. Can we achieve
                            "arn:aws:s3:::%s/artifacts/*"  % shared_bucket_name, # old shared artifacts for backwards compatibility
                            "arn:aws:s3:::%s/private-artifacts/${aws:username}/*"  % shared_bucket_name, # private artifacts
                            "arn:aws:s3:::%s/private/${aws:username}/*"  % shared_bucket_name, # private storage
                            "arn:aws:s3:::%s/shared/*"  % shared_bucket_name # shared storage for collaboration
with read/write/list access ?

Because ListObjects is a bucket level operation it is not possible to control the permissions via the resource, like you can do it with put / read. In AWS S3 there seems to be a way to implement this using IAM policies with conditions. But they also have to first give the user listing permissions for the whole bucket.

juliusvonkohout · 2024-07-23T14:32:15Z

I would say we generate the credentials in sync.py , then create the secret in the new namespace and create the identity in seaweedfs with the permissions:
* List:kubeflow

* Read:kubeflow/project1/*

* Write:kubeflow/project1/*
where "kubeflow" is the bucket name and "project1" the project name. Listing of objects is only possible to set globally, but as far as I know, this is just a limitation of S3.
Can you elaborate a bit more on the global Listing? I am fine with dropping the global listing, if they can still list their namespace (aws:username) folder. Can we achieve
                            "arn:aws:s3:::%s/artifacts/*"  % shared_bucket_name, # old shared artifacts for backwards compatibility
                            "arn:aws:s3:::%s/private-artifacts/${aws:username}/*"  % shared_bucket_name, # private artifacts
                            "arn:aws:s3:::%s/private/${aws:username}/*"  % shared_bucket_name, # private storage
                            "arn:aws:s3:::%s/shared/*"  % shared_bucket_name # shared storage for collaboration
with read/write/list access ?
Because ListObjects is a bucket level operation it is not possible to control the permissions via the resource, like you can do it with put / read. In AWS S3 there seems to be a way to implement this using IAM policies with conditions. But they also have to first give the user listing permissions for the whole bucket.

Can we live without the list operation? is it enough to have read/write?

juliusvonkohout · 2024-07-30T13:39:38Z

@pschoen-itsc can you create a PR with customize components for kubeflow/manifests/contrib/seaweedfs ? Then we can start there with integration testing.

juliusvonkohout · 2024-07-30T13:40:10Z

/ok-to-test

pschoen-itsc · 2024-07-30T13:41:17Z

@pschoen-itsc can you create a PR with customize components for kubeflow/manifests/contrib/seaweedfs ? Then we can start there with integration testing.

Yes, will do it tomorrow.

google-oss-prow · 2024-07-30T13:43:37Z

@pschoen-itsc: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
kubeflow-pipeline-upgrade-test	`af854ab`	link	false	`/test kubeflow-pipeline-upgrade-test`
kubeflow-pipelines-manifests	`af854ab`	link	true	`/test kubeflow-pipelines-manifests`

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

thesuperzapper · 2024-07-31T00:05:12Z

@pschoen-itsc I also want to confirm that SeaweedFS would support IAM policies that restrict object access based on key prefixes?

For example, a profile might use an IAM policy that looks like this, which only allows them to read/write objects under a specific path:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetBucketLocation",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::<BUCKET_NAME>"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject"
      ],
      "Resource": [
        "arn:aws:s3:::<BUCKET_NAME>/artifacts/<PROFILE_NAME>/*",
        "arn:aws:s3:::<BUCKET_NAME>/v2/artifacts/<PROFILE_NAME>/*"
      ]
    }
  ]
}

pschoen-itsc · 2024-07-31T07:25:40Z

@pschoen-itsc I also want to confirm that SeaweedFS would support IAM policies that restrict object access based on key prefixes?

For example, a profile might use an IAM policy that looks like this, which only allows them to read/write objects under a specific path:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetBucketLocation",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::<BUCKET_NAME>"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject"
      ],
      "Resource": [
        "arn:aws:s3:::<BUCKET_NAME>/artifacts/<PROFILE_NAME>/*",
        "arn:aws:s3:::<BUCKET_NAME>/v2/artifacts/<PROFILE_NAME>/*"
      ]
    }
  ]
}

Object access based on key prefixes is supported. So you can set equivalent ACLs for the provided example. There is just no separation of PUT and DELETE object, just Write.

pschoen-itsc · 2024-09-17T14:21:37Z

I would say we generate the credentials in sync.py , then create the secret in the new namespace and create the identity in seaweedfs with the permissions:
* List:kubeflow

* Read:kubeflow/project1/*

* Write:kubeflow/project1/*
where "kubeflow" is the bucket name and "project1" the project name. Listing of objects is only possible to set globally, but as far as I know, this is just a limitation of S3.
Can you elaborate a bit more on the global Listing? I am fine with dropping the global listing, if they can still list their namespace (aws:username) folder. Can we achieve
                            "arn:aws:s3:::%s/artifacts/*"  % shared_bucket_name, # old shared artifacts for backwards compatibility
                            "arn:aws:s3:::%s/private-artifacts/${aws:username}/*"  % shared_bucket_name, # private artifacts
                            "arn:aws:s3:::%s/private/${aws:username}/*"  % shared_bucket_name, # private storage
                            "arn:aws:s3:::%s/shared/*"  % shared_bucket_name # shared storage for collaboration
with read/write/list access ?
Because ListObjects is a bucket level operation it is not possible to control the permissions via the resource, like you can do it with put / read. In AWS S3 there seems to be a way to implement this using IAM policies with conditions. But they also have to first give the user listing permissions for the whole bucket.
Can we live without the list operation? is it enough to have read/write?

Currently the list operation is used by KFP at least for retrieving input artifacts. I tested this for kubeflow/manifests#2826. If you really just want to PUT and GET objects you don't need it. But that should then be in the implementation of argoexec.

juliusvonkohout · 2024-09-27T15:29:32Z

Currently the list operation is used by KFP at least for retrieving input artifacts. I tested this for kubeflow/manifests#2826. If you really just want to PUT and GET objects you don't need it. But that should then be in the implementation of argoexec.

Its fine for cluster.local/ns/kubeflow/sa/ml-pipeline and other admin level service account such as the Kubeflow UI. We just need to prevent that user workloads can access other users artifacts.

Keep minio-service for seaweedfs store

c20baef

google-oss-prow bot added do-not-merge/work-in-progress size/L labels Jul 11, 2024

google-oss-prow bot added the needs-ok-to-test label Jul 11, 2024

google-oss-prow bot requested review from gkcalat and zijianjoy July 11, 2024 13:33

Add newlines to end of files

af854ab

google-oss-prow bot added ok-to-test and removed needs-ok-to-test labels Jul 30, 2024

This was referenced Aug 28, 2024

WG Data proposal kubeflow/community#673

Open

Add seaweedfs to contrib kubeflow/manifests#2826

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(deployment) Replace minio with seaweedfs as object store #10998

feat(deployment) Replace minio with seaweedfs as object store #10998

pschoen-itsc commented Jul 11, 2024

google-oss-prow bot commented Jul 11, 2024

google-oss-prow bot commented Jul 11, 2024

pschoen-itsc commented Jul 11, 2024

HumairAK commented Jul 11, 2024 •

edited

Loading

juliusvonkohout commented Jul 11, 2024 •

edited

Loading

juliusvonkohout commented Jul 11, 2024

gregsheremeta commented Jul 12, 2024 •

edited

Loading

pschoen-itsc commented Jul 17, 2024

pschoen-itsc commented Jul 17, 2024 •

edited

Loading

pschoen-itsc commented Jul 17, 2024 •

edited

Loading

juliusvonkohout commented Jul 23, 2024

pschoen-itsc commented Jul 23, 2024

juliusvonkohout commented Jul 23, 2024

juliusvonkohout commented Jul 30, 2024

juliusvonkohout commented Jul 30, 2024

pschoen-itsc commented Jul 30, 2024

google-oss-prow bot commented Jul 30, 2024

thesuperzapper commented Jul 31, 2024

pschoen-itsc commented Jul 31, 2024

pschoen-itsc commented Sep 17, 2024

juliusvonkohout commented Sep 27, 2024

feat(deployment) Replace minio with seaweedfs as object store #10998

Are you sure you want to change the base?

feat(deployment) Replace minio with seaweedfs as object store #10998

Conversation

pschoen-itsc commented Jul 11, 2024

google-oss-prow bot commented Jul 11, 2024

google-oss-prow bot commented Jul 11, 2024

pschoen-itsc commented Jul 11, 2024

HumairAK commented Jul 11, 2024 • edited Loading

juliusvonkohout commented Jul 11, 2024 • edited Loading

juliusvonkohout commented Jul 11, 2024

gregsheremeta commented Jul 12, 2024 • edited Loading

pschoen-itsc commented Jul 17, 2024

pschoen-itsc commented Jul 17, 2024 • edited Loading

pschoen-itsc commented Jul 17, 2024 • edited Loading

juliusvonkohout commented Jul 23, 2024

pschoen-itsc commented Jul 23, 2024

juliusvonkohout commented Jul 23, 2024

juliusvonkohout commented Jul 30, 2024

juliusvonkohout commented Jul 30, 2024

pschoen-itsc commented Jul 30, 2024

google-oss-prow bot commented Jul 30, 2024

thesuperzapper commented Jul 31, 2024

pschoen-itsc commented Jul 31, 2024

pschoen-itsc commented Sep 17, 2024

juliusvonkohout commented Sep 27, 2024

HumairAK commented Jul 11, 2024 •

edited

Loading

juliusvonkohout commented Jul 11, 2024 •

edited

Loading

gregsheremeta commented Jul 12, 2024 •

edited

Loading

pschoen-itsc commented Jul 17, 2024 •

edited

Loading

pschoen-itsc commented Jul 17, 2024 •

edited

Loading