v1.7.0-aws-b1.0.0
What’s New
This release offers the following features:
- Added support for Kubeflow
v1.7.0
. Upstream Kubeflow components versions as listed in components versions table - Support IAM Role for Service Account (IRSA) for using Amazon S3 as artifact store for Kubeflow Pipelines
- IRSA can be used to configure Amazon S3 as an artifact store for pipelines. IRSA allows to use temporary credentials to make API requests and to scope permissions at pod level via Kubernetes service accounts. Instead of creating static IAM User credentials to access S3, using IRSA implements the security best practices of principle of least privilege and credential isolation. (#571, #601, #613, #680, #685)
- Starting this release, we are deprecating the use of IAM user/static credentials in favor of IRSA to configure S3 with Kubeflow pipelines. We highly recommend migrating to using IRSA. For more details about this change refer to the Github issue #704
- Configure Server side encryption and block public access to S3 bucket used by Kubeflow Pipelines by default as security best practice (#517, #518)
- Support using IRSA with KServe Inference Services. Use this feature to pull images from private ECR repository or load models directly from S3 bucket.
- Support for using Amazon S3 as an object store backend for TensorBoard. Users can now visualize TensorBoard compatible logs stored in S3 published by model servers and training jobs(including TrainingJobs run on SageMaker) to track experiment metrics like loss and accuracy, visualizing the model graph etc.
- Added ability to annotate the service account using
AWSIAMforServiceAccount
Plugin. Users can use this feature if their organizational policies restrict them from using profile controller for updating IAM policies.- Setting
annotateOnly
to true inAWSIAMforServiceAccount
Plugin will only annotate the service account in user profile and skip mutating the IAM Policy.
- Setting
- Support configuring Amazon S3 as a remote backend for storing Terraform state (#674)
- Support configuring auto stopping of idle Jupyter Notebook Servers
- Enabled support for Notebook Culling. Users can save infrastructure costs by specifying notebook instance to stop if it stays idle for certain period of time. (#470)
- Updated notebook containers with the latest AWS optimized Deep Learning Containers(DLC) based on
Tensorflow 2.12.0
andPyTorch 2.0.0
(#676) - Updated Training and Inference containers with the latest AWS optimized Deep Learning Containers(DLC) based on
Tensorflow 2.12
andPyTorch 2.0
. Support for CPU/GPU based single node training, distributed training, and inference. For latest DLC images, refer to list of DLC images - Updated the following drivers to newer versions:
- FSx CSI Driver to
v0.9.0
- EFS CSI Driver to
v1.5.4
- AWS Load Balancer Controller to
v2.4.7
- FSx CSI Driver to
- Updated SageMaker Operator for k8s (ACK) to
v1.2.1
- Training Job resource now supports Managed warm pool, heterogeneous clusters through Instance Groups and Retry Strategy
- Added support for SageMaker Pipeline and Pipeline Execution
- Training Job resource now supports Update Operations.
- Support for Deployment guard rails for Endpoint Resource.
- Support for Serverless Endpoint for Endpoint Config Resource.
- Support for retaining AWS resources after CR deletion.
- Supports latest versions of Amazon EKS - eks-compatibility
- Support for Kustomize
v5.0.1
- Bugfixes and improvements to the automated scripts
Updated documentation available at: https://awslabs.github.io/kubeflow-manifests/release-v1.7.0-aws-b1.0.0/docs/
Known Issues:
Full Changelog: release-v1.6.1-aws-b1.0.2...release-v1.7.0-aws-b1.0.0