Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Dataproc serverless components #178

Open
abhi8893 opened this issue Oct 18, 2024 · 2 comments
Open

Support for Dataproc serverless components #178

abhi8893 opened this issue Oct 18, 2024 · 2 comments

Comments

@abhi8893
Copy link

As I understand, through a tag based system, the plugin can assign different compute targets for each node as follows:

# excerpt from vertexai.yml
# see https://kedro-vertexai.readthedocs.io/en/0.9.1/source/02_installation/02_configuration.html

  # Optional section allowing adjustment of the resources
  # reservations and limits for the nodes
  resources:

    # For nodes that require more RAM you can increase the "memory"
    data-import-node:
      memory: 2Gi

    # Training nodes can utilize more than one CPU if the algorithm
    # supports it
    model-training-node:
      cpu: 8
      memory: 60Gi

    # GPU-capable nodes can request 1 GPU slot
    tensorflow-node:
      gpu: 1

    # Resources can be also configured via nodes tag
    # (if there is node name and tag configuration for the same
    # resource, tag configuration is overwritten with node one)
    gpu_node_tag:
      cpu: 1
      gpu: 2

    # Default settings for the nodes
    __default__:
      cpu: 200m
      memory: 64Mi

  # Optional section allowing to configure node selectors constraints
  # like gpu accelerator for nodes with gpu resources.
  # (Note that not all accelerators are available in all
  # regions - https://cloud.google.com/compute/docs/gpus/gpu-regions-zones)
  # and not for all machines and resources configurations - 
  # https://cloud.google.com/vertex-ai/docs/training/configure-compute#specifying_gpus
  node_selectors:
    gpu_node_tag:
      cloud.google.com/gke-accelerator: NVIDIA_TESLA_T4
    tensorflow-step:
      cloud.google.com/gke-accelerator: NVIDIA_TESLA_K80

Is there support for using Dataproc serverless components for a pipeline / node as well in this plugin?

@em-pe
Copy link
Member

em-pe commented Oct 18, 2024

As far as I recall, no, because this would require to use different component not only node selectors and currently every node is translated to ContainerOp.

@abhi8893
Copy link
Author

Thanks @em-pe. Yep deep dived into the code and makes sense. I would love to contribute to this package, as I loved the concept of translating the kedro nodes into vertex ai nodes, grouping logic etc. It seems this is EXACTLY what I want, however with a bit more flexibility. I am embarking on a VertexAI + CloudComposer + Kedro MLOPS journey, and would share my learnings, and hopefully contribute back :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants