-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding CPU support in XPK #56
Conversation
4ed211a
to
f0426ca
Compare
9c9b6e9
to
d5073b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we also document how to emulate multiple slices using a single CPU nodepool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slices are emulated using Maxtext parallelism types, hence I think it would belong in Maxtext emulator for strorage documentation.
6751dbc
to
759f0a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Very clean set of changes. Excited to see multiple node pool support for CPUs
--workload xpk-test-workload \ | ||
--command="echo hello world" | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description looks great!
Will you have a maxtext with xpk + cpu guide in the maxtext repo, if so can you link that here when ready?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to have the documentation in Maxtext emulator for storage at the moment, but can definitely build a guide in Maxtext soon.
xpk.py
Outdated
# cluster_cpu_machine_type is being deprecated! | ||
if args.cluster_cpu_machine_type != '': | ||
xpk_print('Note that cluster-cpu-machine-type is deprecated,', | ||
'please use default-pool-cpu-machine-type instead!', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the message you used in the argument description:
Can we rephrase to:
'Note that `--cluster-cpu-machine-type` is deprecated,',
' Please use `--default-pool-cpu-machine-type={args.cluster_cpu_machine_type}` instead,'
' to denote the machine type of the default cpu node pool. Set'
' the machine type of other cpu nodepools using `--device-type`.'
Features
Major changes:
Testing / Documentation
Tested cluster creation, CPU nodepool creation with args and env properly set.
Ran Maxtext workloads on CPUs using these changes.
Documented changes in README as well.
Verified that TPU training (with Maxtext workloads) run as expected.
[ y ] Tests pass
[ y ] Appropriate changes to documentation are included in the PR