-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Empower Users to Bring Their Own Storage #15875
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
jmchilton
force-pushed
the
object_store_templates
branch
from
April 1, 2023 21:42
0e6a3ef
to
cbaaa05
Compare
jmchilton
force-pushed
the
object_store_templates
branch
3 times, most recently
from
April 14, 2023 17:28
e5a1c95
to
86981a8
Compare
jmchilton
force-pushed
the
object_store_templates
branch
11 times, most recently
from
April 24, 2023 20:41
8578363
to
3832826
Compare
jmchilton
force-pushed
the
object_store_templates
branch
4 times, most recently
from
April 26, 2023 19:59
bf79507
to
3f600aa
Compare
jdavcs
reviewed
Apr 27, 2023
lib/galaxy/model/migrations/alembic/versions_gxy/c14a3c93d66a_add_user_defined_object_stores.py
Show resolved
Hide resolved
Would you mind moving the database migration into a separate commit? Same reasoning as here: #15663 (comment) |
jmchilton
force-pushed
the
object_store_templates
branch
2 times, most recently
from
April 27, 2023 19:11
74b556c
to
0b4f81e
Compare
davelopez
reviewed
Apr 28, 2023
jmchilton
force-pushed
the
object_store_templates
branch
5 times, most recently
from
February 21, 2024 19:48
ec8e9b2
to
93c9499
Compare
Hey! Can you please share the timeline for this feature and the Galaxy version in which it will become available? |
jmchilton
force-pushed
the
object_store_templates
branch
7 times, most recently
from
April 8, 2024 18:48
fd97092
to
8f1218d
Compare
jmchilton
force-pushed
the
object_store_templates
branch
7 times, most recently
from
April 19, 2024 18:59
cbdc0d8
to
77de715
Compare
jmchilton
force-pushed
the
object_store_templates
branch
from
April 25, 2024 18:26
77de715
to
1bee224
Compare
This was referenced May 6, 2024
2 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pretty close to an MVP I think?
Background
This is very heavily based on #14073 and #12940.
#14073 is...
#12940 is used to store secrets and in a structured way with multiple potential backends.
Implementation
Object Store Templates
Admins (and potentially in the future the core project) can define a set of object store "templates" - the configured set of these templates the configured "catalog" of object store templates. These are currently defined in
object_store_templates.yml
in the config directory. Ultimately I think admin-defined things are important and have some really interesting applications but I doubt the uptake will be nearly as high if we don't also ship a default set of templates at some point in the future after we're confident about the work. I don't think this set of templates should be included in the initial PR though.Very strict Pydantic models are included for the templates and for the resulting object store configurations that they would yield when bound to user supplied "variables" and "secrets". As this PR documentation is outdated over time - the models will remain the source of truth about the documentation. We are going to store configurations of object stores in the database so the JSON blobs we define should be extremely well defined and well tested so we can have old blobs continue to work as the interface to object stores evolve over time. Proposed configurations for
disk
,s3,
andazure
storage have been included. These expose the relevant knobs available in our object store configurations currently and should be adapted as we migrate the object store code.The templates in the catalog can be hidden and new versions can be appended and the old ones will be automatically hidden - but admins should be warned that older definitions should remain for existing defined storage.
The templates are parameterized with variables and secrets - and can include admin supplied fields (and presumably app value information will be trivial to implement).
I'm using jinja templating as opposed to Python string templating or mako templating. Various plugins to Galaxy have used all three approaches, I've gone back and forth on this but jinja seems the best fit because it preserves type information (in this implementation) - which seems to be where Galaxy is heading and dovetails well with the level structure and typing we're using throughout this implementation (from the database to typescript schema consumed by the frontend).
Database Models
The templates can be used by users to create
UserObjectStore
model instances. I've used a prototype to separate the implementation from the object store code so an object store library consumer could store these on disk or in some other persistence store but for the purposes of the Galaxy application - instantiations of templates created by users are stored in the database inUserObjectStore
instances and are called "object store instances" in the API. (I think not calling them User Object Stores in the API makes sense because one can easily imagine group or role implementations of these things in the database and one would expect the API to work with all of those).The UserObjectStore model is:
Backend Plumbing
The catalog and related models all so far are decoupled from the rest of Galaxy outside the object store. The layer above that is in
lib/galaxy/managers/object_store_instances.py
that ties together the database objects, the vault, templates, and object store factory methods to implement most of the target functionality. Code for creating, updating, and upgrading user object stores from one template version to the next are all defined in this file as well as some relevant CRUD code.I've already included an API endpoint for fetching the template catalog - the next step will an API endpoint for creating an object store from a template ID, the user specified variables, and user specified secretes. The initial form that targets that endpoint is already there and handles basic variables and secrets using the Galaxy Form framework. The backend will have to create database objects and vault secrets for each object store created by a user. So for instance if a user creates an object store named "my-cool-objects" which defines a variable "foo" and a secret "bar". A new object will be added to the database for this object store (maybe
UserObjectStoreInstance
with a name field of "my-cool-objects") and the variablefoo
will be attached to that object (maybeUserObjectStoreInstanceVariable
or maybe just in a JSON attribute onUserObjectStoreInstance
with all the template variables).user_vault.write_secret("/object_stores/my-cool-objects/bar", "<supplied value>")
will also be done to securely store supplied variables.The guts of taking the templated object store configuration and the variables and secrets are already there in the initial PR, the next step once we defined the objects above and attach them to a job's outputs - would be serialize the object store the job. We already serialize an object store configuration and we already serialize a file source config that realizes abstract variables into concrete ones for a job - I think those two patterns can be combined to take care of the guts of the interaction between Galaxy and jobs.
Another implementation detail will be generalize how Dataset objects are mapped to ObjectStores. I think the backward compatible thing is just to keep using
object_store_id
and if it is a simple string do what we do now, but if it is a URI - say "user://<user_id>/my-cool-objects`` then to resolve the object store as needed. This will require a bunch of fiddly tracking and exceptions when dealing with maintenance scripts and such but I've got some experience doing this as part of #14073.The MVP I think will benefit from UI elements for managing existing object stores defined this way and providing information about them. I haven't worked through the details of this.
New APIs
Alternatives
The PR write up of #14073 describes in detail how it provided several abstractions that would be needed to address limitations of the work proposed in #14073. In additions to the description of the limitations described there - this work will be implemented with a keen eye toward implementation efficiency and will be usable with essentially any concrete object store implementation as opposed to tightly coupling to the cloud object store. I am confident the result of this will allow admins to address a greater number of potential scenarios.
Detailed Example
My notes on setting up MinIO for this example - need a bucket to attach:
Next setup a sophisticated distributed object store, going to build on the MSI example I used for #14073.
Next add an object_store_templates.yml file to
config/
:This sets up three templates users can create object stores from in the UI.
The first two just allow the user to setup folders under a shared project directories. This example makes sense when you really trust your users and you've got a variety of disk options mounted on Galaxy servers with different properties.
The third template allows the user to attach buckets from the MinIO server we setup - using access key, bucket names, and secret keys we've communicated to the user in some way.
The User Preferences menu now has a "Manage Your Object Stores" option:
Clicking "Create" will show the templates the user can create object stores from:
Let's build one of each of these:
As they are built we see them in the index:
Object store badges communicate information about the object store, its properties, and free Markdown populated by the admin for different object stores:
Some information about the type of object store is displayed also:
When you edit the object stores, regular settings (metadata and admin defined variables) are presented in a different way than secrets stored in Galaxy's vault:
Workflows, tools, histories will all now allow these object stores to be selected as the "preferred" object store. The user can also select this as their preference for all analyses:
These two new user-bound object stores are now available right alongside the admin defined ones in object_store_conf.xml.
Here I've set the history default and ran and job and we can see the result in a MinIO because the path is the path to object store cache:
Looking in the object store management window:
We can see the file that was created.
How to test the changes?
(Select all options that apply)
License