Skip to content

Commit

Permalink
extra cpu cores for the head node
Browse files Browse the repository at this point in the history
  • Loading branch information
thayeral committed Nov 4, 2024
1 parent aad572f commit 7fe176b
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 2 deletions.
3 changes: 2 additions & 1 deletion src/ray_lsf_cluster.sh
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,8 @@ export head_node
export head_node_ip
export cluster_address

apptainer exec --userns --nv --bind $bind --bind $outdir:$tmpdir $env ./ray_start_cluster.sh -i $head_node_ip -p $port -d $dashboard_port -c $cpus -g $gpus -t $tmpdir &
head_cpus=$(( cpus + 4 ))
apptainer exec --userns --nv --bind $bind --bind $outdir:$tmpdir $env ./ray_start_cluster.sh -i $head_node_ip -p $port -d $dashboard_port -c $head_cpus -g $gpus -t $tmpdir &
sleep 10

############################## ADD WORKER NODES
Expand Down
2 changes: 1 addition & 1 deletion src/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -524,7 +524,7 @@ def train_model(
scaling_config = ScalingConfig(
num_workers=workers * gpu_workers,
resources_per_worker={"CPU": 2, "GPU": 1},
trainer_resources={"CPU": 4 if workers > 1 else 0},
trainer_resources={"CPU": 4},
use_gpu=True
)

Expand Down

0 comments on commit 7fe176b

Please sign in to comment.