0.18.32
TPU
Trillium (v6e
)
dstack
adds support for the latest Trillium TPU (v6e
), which became generally available in GCP on December 12th. The new TPU generation doubles the TPU memory and bursts performance, supporting larger workloads.
Resources
dstack
now includes CPU, RAM, and TPU memory in Google Cloud TPU offers:
$ dstack apply --gpu tpu
# BACKEND REGION INSTANCE RESOURCES SPOT PRICE
1 gcp europe-west4 v5litepod-1 24xCPU, 48GB, 1xv5litepod-1 (16GB), 100.0GB (disk) no $1.56
2 gcp europe-west4 v6e-1 44xCPU, 176GB, 1xv6e-1 (32GB), 100.0GB (disk) no $2.97
3 gcp europe-west4 v2-8 96xCPU, 334GB, 1xv2-8 (64GB), 100.0GB (disk) no $4.95
Volumes
By default, TPU VMs contain a 100GB boot disk, and its size cannot be changed. Now, you can add more storage using Volumes.
Gateways
In this update, we've greatly refactored Gateways, improving their reliability and fixing several bugs.
Note
If you are running multiple replicas of the dstack
server, ensure all replicas are upgraded promptly. Leaving some replicas on an older version may prevent them from creating or deleting services and could result in minor errors in their logs.
Warning
Ensure you update to 0.18.33, which includes critical hot-fixes for important issues.
What's changed
- [
dstack-shim
] Rework resource management by @un-def in #2093 - [Gateways] Restore
dstack-proxy
state on gateway restarts by @jvstme in #2119 - [TPU] Support TPU v6e by @r4victor in #2124
- [UI] Updated
Backend config
Info section by @peterschmidt85 in #2125 - [UI] It's not possible to manage fleets by @olgenn in #2126
- [UI] Improvements by @olgenn in #2127
- [Gateways] Add migration from
state.json
on gateways by @jvstme in #2128 - [Volumes] Forbid deleting backends with active instances or volumes by @r4victor in #2131
- [TPU] Fix backward compatibility with new TPUs by @r4victor in #2138
- Update
gpuhunt
to0.0.17
by @r4victor in #2139 - [Docs] Improve docs by @r4victor in #2135
- [Gateways] Fix certbot process getting stuck in
dstack-proxy
by @jvstme in #2143 - [Gateways] Run
dstack-proxy
on gateways by @jvstme in #2136 - [Volumes] Support volumes for TPUs by @r4victor in #2144
- [Gateways] Optimize
dstack-gateway
installation time by @jvstme in #2146 - [Gateways] Fix OpenAI endpoint on Kubernetes gateways by @jvstme in #2147
Full changelog: 0.18.31...0.18.32