Release v1.7.0-aws-b1.0.0 · awslabs/kubeflow-manifests

What’s New

This release offers the following features:

Added support for Kubeflow v1.7.0. Upstream Kubeflow components versions as listed in components versions table
Support IAM Role for Service Account (IRSA) for using Amazon S3 as artifact store for Kubeflow Pipelines
- IRSA can be used to configure Amazon S3 as an artifact store for pipelines. IRSA allows to use temporary credentials to make API requests and to scope permissions at pod level via Kubernetes service accounts. Instead of creating static IAM User credentials to access S3, using IRSA implements the security best practices of principle of least privilege and credential isolation. (#571, #601, #613, #680, #685)
- Starting this release, we are deprecating the use of IAM user/static credentials in favor of IRSA to configure S3 with Kubeflow pipelines. We highly recommend migrating to using IRSA. For more details about this change refer to the Github issue #704
Configure Server side encryption and block public access to S3 bucket used by Kubeflow Pipelines by default as security best practice (#517, #518)
Support using IRSA with KServe Inference Services. Use this feature to pull images from private ECR repository or load models directly from S3 bucket.
Support for using Amazon S3 as an object store backend for TensorBoard. Users can now visualize TensorBoard compatible logs stored in S3 published by model servers and training jobs(including TrainingJobs run on SageMaker) to track experiment metrics like loss and accuracy, visualizing the model graph etc.
Added ability to annotate the service account using AWSIAMforServiceAccount Plugin. Users can use this feature if their organizational policies restrict them from using profile controller for updating IAM policies.
- Setting annotateOnly to true in AWSIAMforServiceAccount Plugin will only annotate the service account in user profile and skip mutating the IAM Policy.
Support configuring Amazon S3 as a remote backend for storing Terraform state (#674)
Support configuring auto stopping of idle Jupyter Notebook Servers
- Enabled support for Notebook Culling. Users can save infrastructure costs by specifying notebook instance to stop if it stays idle for certain period of time. (#470)
Updated notebook containers with the latest AWS optimized Deep Learning Containers(DLC) based on Tensorflow 2.12.0 and PyTorch 2.0.0 (#676)
Updated Training and Inference containers with the latest AWS optimized Deep Learning Containers(DLC) based on Tensorflow 2.12 and PyTorch 2.0. Support for CPU/GPU based single node training, distributed training, and inference. For latest DLC images, refer to list of DLC images
Updated the following drivers to newer versions:
- FSx CSI Driver to v0.9.0
- EFS CSI Driver to v1.5.4
- AWS Load Balancer Controller to v2.4.7
Updated SageMaker Operator for k8s (ACK) to v1.2.1
- Training Job resource now supports Managed warm pool, heterogeneous clusters through Instance Groups and Retry Strategy
- Added support for SageMaker Pipeline and Pipeline Execution
- Training Job resource now supports Update Operations.
- Support for Deployment guard rails for Endpoint Resource.
- Support for Serverless Endpoint for Endpoint Config Resource.
- Support for retaining AWS resources after CR deletion.
Supports latest versions of Amazon EKS - eks-compatibility
Support for Kustomize v5.0.1
Bugfixes and improvements to the automated scripts

Updated documentation available at: https://awslabs.github.io/kubeflow-manifests/release-v1.7.0-aws-b1.0.0/docs/

Known Issues:

#709

Full Changelog: release-v1.6.1-aws-b1.0.2...release-v1.7.0-aws-b1.0.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.7.0-aws-b1.0.0

What’s New

Known Issues: