-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add e2e tests for tpu-provisioner #235
Add e2e tests for tpu-provisioner #235
Conversation
51723ae
to
00ad9b9
Compare
tpu_v4_2x2x2 = tpuConfig{ | ||
accelerator: "tpu-v4-podslice", | ||
topoX: 2, | ||
topoY: 2, | ||
topoZ: 2, | ||
chipsPerNode: 4, | ||
sliceCount: 1, | ||
} | ||
tpu_v4_2x2x4 = tpuConfig{ | ||
accelerator: "tpu-v4-podslice", | ||
topoX: 2, | ||
topoY: 2, | ||
topoZ: 4, | ||
chipsPerNode: 4, | ||
sliceCount: 1, | ||
} | ||
|
||
tpu_v5e_2x4 = tpuConfig{ | ||
accelerator: "tpu-v5-lite-podslice", | ||
topoX: 2, | ||
topoY: 4, | ||
chipsPerNode: 4, | ||
sliceCount: 2, | ||
} | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these commented out intentionally or was it meant to be temporarily commented out for your testing?
|
||
func newJobset(name string, c tpuConfig, uniqueNodeSelector bool) *jobset.JobSet { | ||
nodeSelectors := map[string]string{ | ||
"cloud.google.com/gke-tpu-accelerator": c.accelerator, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we define the various labels in this file as constants somewhere
var cf struct { | ||
o sync.Once | ||
m sync.RWMutex | ||
f []func() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable names for this struct and its fields are unclear to me,
later in the code when the variables are used it's a bit confusing what the variable is referring to.
Can we make them more descriptive?
Closing old issues with merge conflicts. Please rebase and re-open if still relevant. |
Creates a cluster, installs tpu-provisioner, and then runs test JobSets against the cluster.