Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic HPC Install Script #329

Merged
merged 71 commits into from
Oct 23, 2024
Merged
Changes from 10 commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
aa0c077
Copy slurm_init.sh to slurm_init_longleaf.sh
TimothyWillard Sep 13, 2024
22cb130
Restore slurm_init.sh
TimothyWillard Sep 13, 2024
90e29b2
Merge branch 'copy-file' into GH-191/longleaf-batch-submission
TimothyWillard Sep 13, 2024
c1d54ef
Added UNC Longleaf Specific Init/Prerun Scripts
TimothyWillard Sep 13, 2024
e238d66
Draft implementation of HPC install script
TimothyWillard Sep 16, 2024
2436640
Minor changes to `hpc_install.sh`
TimothyWillard Sep 16, 2024
bba583a
Added slurm `--partition` flag to inference script
TimothyWillard Sep 16, 2024
2c8c952
Initial pass at HPC install on rockfish
TimothyWillard Oct 2, 2024
194d2e1
Remove longleaf specific slurm scripts
TimothyWillard Oct 2, 2024
f76d68a
Remove `--partion` flag
TimothyWillard Oct 2, 2024
8af3698
Minor updates to `hpc_install.sh`
TimothyWillard Oct 3, 2024
23d9de1
initial tweaks to make flepimop-inference-* runnable
pearsonca Sep 26, 2024
9ac9cf0
further install scripts fixes
pearsonca Sep 26, 2024
d32a58c
fix reinvocation of inference-slot
pearsonca Sep 26, 2024
9f2c085
initial installation for ubuntu re-org
pearsonca Sep 25, 2024
e2bc41f
updates addressing use of installed r scripts
pearsonca Sep 26, 2024
3095ae4
README revs
pearsonca Sep 26, 2024
1b33dc3
add arrow installation
pearsonca Sep 26, 2024
ec0c479
Switch R pkg install to use `build/setup.R`
TimothyWillard Oct 3, 2024
e754364
Add `$WORKDIR` to `hpc_install.R`
TimothyWillard Oct 4, 2024
5e4f398
Add missing flepi path arg to setup.R
TimothyWillard Oct 4, 2024
0d9f813
Force pin arrow version between python and R
TimothyWillard Oct 4, 2024
9ca12ed
Change rockfish default directories
TimothyWillard Oct 4, 2024
f78ee75
Split `hpc_install.sh` into init and install
TimothyWillard Oct 4, 2024
6d69186
Use `devtools::install` in `setup.R`
TimothyWillard Oct 4, 2024
7adbdfa
Unset error exit around R pkg install
TimothyWillard Oct 7, 2024
ae6666f
Add `set +e` an exit to `flepi_init.sh`
TimothyWillard Oct 7, 2024
f3fb1a7
`install_cli` installs to conda bin
TimothyWillard Oct 7, 2024
e443626
Remove init call from install script
TimothyWillard Oct 7, 2024
480bfd7
Remove old version restrictions, add `optparse`
TimothyWillard Oct 8, 2024
385eeb5
Move R deps install into conda environment
TimothyWillard Oct 8, 2024
f0571ae
Readd inference CLI install
TimothyWillard Oct 8, 2024
53f2f49
Update example command to use installed CLI
TimothyWillard Oct 8, 2024
ec7978d
Manually install `covidcast` package
TimothyWillard Oct 8, 2024
2e6dfca
Downgrade r-base dependency to 4.3
TimothyWillard Oct 9, 2024
42259a2
Remove symlinks if exists on reinstall
TimothyWillard Oct 9, 2024
aed8442
Script to generate `environment.yml`
TimothyWillard Oct 10, 2024
a4631db
Add dnachun to channels, add r-sf dependency
TimothyWillard Oct 10, 2024
b0685c5
GitHub action to generate `environment.yml`
TimothyWillard Oct 10, 2024
a9e9f47
Remove unneeded comment
TimothyWillard Oct 10, 2024
9acda2c
Merge main into GH-191/auto-generate-environment.yml
TimothyWillard Oct 10, 2024
ca10838
GitHub action checkout with attached head
TimothyWillard Oct 10, 2024
6ad753d
Update environment.yml
TimothyWillard Oct 10, 2024
27a80f7
Make clear source of `environment.yml` commit
TimothyWillard Oct 10, 2024
bacf6f4
Use premade `environment.yml` in `hpc_install.sh`
TimothyWillard Oct 10, 2024
c613f09
Remove `setup.R` and `install_ubuntu.sh`
TimothyWillard Oct 10, 2024
8ca2f58
Restore `README.md`
TimothyWillard Oct 10, 2024
cbcdfbd
Bug fix to only remove files present
TimothyWillard Oct 10, 2024
3b33ab1
Add spaces for style in `build/hpc_install.sh`
TimothyWillard Oct 10, 2024
a9c3415
Merge main into GH-191/longleaf-batch-submission
TimothyWillard Oct 14, 2024
04c533b
Merge main into GH-191/longleaf-batch-submission
TimothyWillard Oct 18, 2024
039d311
Move everything on longleaf into `/work`
TimothyWillard Oct 18, 2024
37e1405
Rename install script to clarify uses
TimothyWillard Oct 18, 2024
b1c6e46
Change exec of hpc install script
TimothyWillard Oct 18, 2024
384c219
Change longleaf dirs in init script
TimothyWillard Oct 18, 2024
96df05a
Move rockfish userdir
TimothyWillard Oct 18, 2024
cbcfff2
Change default python to 3.11
TimothyWillard Oct 18, 2024
8fa9c82
Merge main into GH-191/longleaf-batch-submission
TimothyWillard Oct 18, 2024
e1811b6
Make flepi path/conda configurable
TimothyWillard Oct 18, 2024
52edb16
Update `environment.yml` via GitHub action
TimothyWillard Oct 18, 2024
22484f6
Formatting of flepi path/conda inputs
TimothyWillard Oct 18, 2024
1f3987b
Use `realpath` to make format file paths
TimothyWillard Oct 18, 2024
8ba4474
Remove python/R name check
TimothyWillard Oct 18, 2024
24988cd
Remove whitespace from `README.md`
TimothyWillard Oct 18, 2024
216ee1f
Add missing slurm module for rockfish
TimothyWillard Oct 21, 2024
139a8b2
Add `--editable` to `gempyor` install
TimothyWillard Oct 21, 2024
b73b5ab
Cleanup error handling
TimothyWillard Oct 21, 2024
70ced08
Update conda env to use `~/.conda`
TimothyWillard Oct 21, 2024
b34f9ab
Remove `--force-reinstall` from pip install
TimothyWillard Oct 22, 2024
8d8042a
Minor typo
TimothyWillard Oct 22, 2024
86b5cd6
Merge main into GH-191/longleaf-batch-submission
TimothyWillard Oct 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
157 changes: 157 additions & 0 deletions build/hpc_install.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
# Cluster specific setup
if [[ $1 == "longleaf" ]]; then
# Setup general purpose user variables needed for Longleaf
USERO=$( echo $USER | awk '{ print substr($0, 1, 1) }' )
USERN=$( echo $USER | awk '{ print substr($0, 2, 1) }' )
USERDIR="/users/$USERO/$USERN/$USER"
cd $USERDIR

# Load required modules
module purge
module load gcc/9.1.0
module load anaconda/2023.03
module load git
elif [[ $1 == "rockfish" ]]; then
# Setup general purspose user variables needed for RockFish
USERDIR="/scratch4/struelo1/flepimop-code/$USER/"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to cd to USERDIR as well here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and if we do, several of the $USERDIRs below can/must be eliminated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could add creating some hpc-wide environmental variables to the longleaf-setup repo. does that make sense to pair with this?

lastly ... bit weird that we're doing install here in scratch. why not in $HOME? i get doing projects on scratch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per in-person conversation:

  • need to check the preferred location for libraries on longleaf & rockfish
  • maybe refer to that as $LIBDIR (or ACCLIBDIR or some such)
  • might want to move that as a generic variable to be set on the HPC, and if so - move that to the longleaf-setup directions (which could itself stand to be scriptified) and make that setup a prerequisite to this? (one downside to that would be other people on other HPCs wanting to use / modify this script - future problem?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least on the longleaf side it looks like /users is similar to $HOME, the documentation states "Think of it as a capacity expansion to your home directory." However, I think maybe the project directory should be moved to /work since that's high throughput and designed for active jobs. So my take is:

  • flepiMoP and flepimop-env stay in /users, especially for the conda env since that directory can get large and $HOME has some low and strict storage caps.
  • Move the project directory to /work since that'll actually need throughput for the job.

I still need to dig up the rockfish documentation. Longleaf docs: https://help.rc.unc.edu/getting-started-on-longleaf/#main-directory-spaces.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these scripts - the install all worked great for me on longleaf. 🎉

I'm also open to having these things anywhere, but as @jcblemai said I think having everything (including the flepimop libraries) in /work or /scratch makes the most sense, including the flepiMoP folder itself. I understand how installing these in /users or /home would be ideal if flepiMoP was stable but from a practical perspective, I am operationally often changing things within flepimop and reinstalling things run-to-run, playing with my own different environments, jumping between branches, or jumping between different FLEPI_PATH 's (not ideal, but practically this is just what we've had to do with concurrent runs and changes). So for convenience it would be good to just have everything in the same place, imo.

Separately, with my experience with running stuff in the past I was confused with having to link the specific location of the flepimop-env . I'm fine either way, I just don't think I follow why the change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saraloo is there a general class of the things you're changing?

Copy link
Contributor Author

@TimothyWillard TimothyWillard Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put everything in /scratch/ on rockfish (as per the current doc)

This has been the case since 9ca12ed. I see, this is not the case for the $USERDIR variable, done now as well.

and in /work/ on longleaf, for both convenience and speed.

This is done now. I think I am misinterpreting the docs (see https://help.rc.unc.edu/getting-started-on-longleaf/#main-directory-spaces) on the differences between /users and /work. @jcblemai what are the practical differences between the two? My interpretation was that /work was mean for high IO short term storage for active work whereas /users is designed for longer term lower IO (read okay?) storage for libraries/codebases.

I am operationally often changing things within flepimop and reinstalling things run-to-run, playing with my own different environments, jumping between branches, or jumping between different FLEPI_PATH

@saraloo is this normal operational behavior? This sounds like the installation script needs to be much more accommodating to flexibility if this is the case. For the different environments, do you mean switching between multiple conda envs? What makes each of these envs distinct? As far as jumping branches this script won't do anything to your flepiMoP clone, although you can switch the branch yourself and then run this script again to update the conda env with the code from that branch ("install" is a misnomer, it really should be "install or update", I'll change the script name and make sure this is clear when writing the documentation), does that accommodate this use case? As for different $FLEPI_PATHs this script checks if this env var is set before doing anything, and if it is just uses the set value so there should be no issue setting custom $FLEPI_PATHs. Have you tested this yet and does it accommodate your use case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responding to both simultaneously. No, i don't think this is normal behavior so feel free to make a judgement call on your end. Just in the past during larger periods of development which inevitably coincide with operational demands I was running two or three different diseases on significantly different gempyor and/or R inference setups from different conda environments (again, don't think this will necessarily be standard, especially now that more people can run stuff). Just flagging that there will be circumstances where flexibility is preferable and want to reduce the possibility of someone setting the wrong flepimop version they;re working on or something, or reducing having to jump around etc to switch branches.
And sorry, haven't tested the FLEPI_PATH bit yet but that makes sense and I don't anticipate any issues there with setting that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we move to the new workflow carl described today, it means that we will have custom branch for runs so you can envision someone running Flu and RSV from the same account but using two different flepiMoP branch. I however think this flexibility can be added alter with the pre-runs scripts that are mentionned below.

Sometime also when running to many parallel run we can have some filesystem lock on the packages, which is always annoying, but I would not worry about it too much.

RE to sara's questions: do we need to specify the emplacement of the conda environement ?

. @jcblemai what are the practical differences between the two? My interpretation was that /work was mean for high IO short term storage for active work whereas /users is designed for longer term lower IO (read okay?) storage for libraries/codebases.

This is correct, but flepiMoP does not support writing to other folder other than the project one, so we work from work.

# Load required modules
module purge
module load gcc/9.3.0
module load anaconda/2020.07
module load git/2.42.0
else
echo "The cluster name '$1' is not recognized, must be one of: 'longleaf', 'rockfish'."
exit 1
fi

# Make sure the credentials is is where we expect and have the right perms
if [ ! -f "$USERDIR/slack_credentials.sh" ]; then
echo "You need to place sensitive credentials in '$USERDIR/slack_credentials.sh'."
exit 1
fi
chmod 600 $USERDIR/slack_credentials.sh
source $USERDIR/slack_credentials.sh

# Ensure we have a $FLEPI_PATH
if [ -z "${FLEPI_PATH}" ]; then
echo "An explicit \$FLEPI_PATH was not provided, setting to '$USERDIR/flepiMoP'."
export FLEPI_PATH="$USERDIR/flepiMoP"
TimothyWillard marked this conversation as resolved.
Show resolved Hide resolved
fi

# Test that flepiMoP is located there
if [ ! -d "$FLEPI_PATH" ]; then
echo "Did not find flepiMoP at '$FLEPI_PATH', cloning on your behalf."
git clone https://github.com/HopkinsIDD/flepiMoP.git $FLEPI_PATH
TimothyWillard marked this conversation as resolved.
Show resolved Hide resolved
elif [ ! -d "$FLEPI_PATH/.git" ]; then
echo "The flepiMoP found at '$FLEPI_PATH' is not a git clone, unsure of how to proceed."
exit 1
fi

# Setup the conda environment
if [ ! -d "$USERDIR/flepimop-env" ]; then
cat << EOF > $USERDIR/environment.yml
channels:
- conda-forge
- defaults
dependencies:
- python=3.10
TimothyWillard marked this conversation as resolved.
Show resolved Hide resolved
- pip
- r-base>=4.4
- r-essentials
- r-devtools
- pyarrow=17.0.0
- r-arrow=17.0.0
# Manually specify this one because of the paths for libudunits2 on longleaf
- r-sf
# This packages are probably missing from the DESCRIPTION of the R packages
- r-optparse
- pip:
- git+https://github.com/HopkinsIDD/flepiMoP.git#subdirectory=flepimop/gempyor_pkg
jcblemai marked this conversation as resolved.
Show resolved Hide resolved
EOF
conda env create --prefix $USERDIR/flepimop-env --file $USERDIR/environment.yml
fi
conda activate $USERDIR/flepimop-env
conda update --all

# Check the conda environment is valid
WHICH_PYTHON=$( which python )
WHICH_R=$( which R )
WHICH_PYTHON_OKAY=$( echo "$WHICH_PYTHON" | grep "flepimop-env" | wc -l )
WHICH_R_OKAY=$( echo "$WHICH_R" | grep "flepimop-env" | wc -l )
if [[ "$WHICH_PYTHON_OKAY" -ne 1 ]]; then
echo "The python found is '$WHICH_PYTHON', which does not contain the expected 'flepimop-env'."
exit 1
fi
if [[ "$WHICH_R_OKAY" -ne 1 ]]; then
echo "The R found is '$WHICH_R', which does not contain the expected 'flepimop-env'."
exit 1
fi
PYTHON_ARROW_VERSION=$( python -c "import pyarrow; print(pyarrow.__version__)" )
R_ARROW_VERSION=$( Rscript -e "cat(as.character(packageVersion('arrow')))" )
COMPATIBLE_ARROW_VERSION=$( echo "$R_ARROW_VERSION" | grep "$PYTHON_ARROW_VERSION" | wc -l )
if [[ "$COMPATIBLE_ARROW_VERSION" -ne 1 ]]; then
echo "The R version of arrow is '$R_ARROW_VERSION' and the python version is '$PYTHON_ARROW_VERSION'. These may not be compatible versions."
fi

# Install the local R packages
INSTALL_R=$( mktemp )
cat << EOF > $INSTALL_R
library(devtools)
devtools::install_github('HopkinsIDD/flepiMoP', subdir='flepimop/R_packages/flepicommon')
devtools::install_github('HopkinsIDD/flepiMoP', subdir='flepimop/R_packages/flepiconfig')
devtools::install_github('HopkinsIDD/flepiMoP', subdir='flepimop/R_packages/inference')
EOF
Rscript $INSTALL_R
rm $INSTALL_R
TimothyWillard marked this conversation as resolved.
Show resolved Hide resolved

# Set correct env vars
TimothyWillard marked this conversation as resolved.
Show resolved Hide resolved
export FLEPI_STOCHASTIC_RUN=false
export FLEPI_RESET_CHIMERICS=TRUE
export TODAY=`date --rfc-3339='date'`

echo -n "Please set a project path (relative to '$USERDIR'): "
read PROJECT_PATH
export PROJECT_PATH="$USERDIR/$PROJECT_PATH"

echo -n "Please set a config path (relative to '$PROJECT_PATH'): "
read CONFIG_PATH
export CONFIG_PATH="$PROJECT_PATH/$CONFIG_PATH"

echo -n "Please set a validation date (today is $TODAY): "
read VALIDATION_DATE

echo -n "Please set a resume location: "
read RESUME_LOCATION

echo -n "Please set a flepi run index: "
read FLEPI_RUN_INDEX
TimothyWillard marked this conversation as resolved.
Show resolved Hide resolved

# Done
cat << EOM
> The HPC install script has successfully finished.

If you are testing if this worked, say installing for the first time, you can use the inference example from the \`flepimop_sample\` repository:
\`\`\`bash
cd \$PROJECT_PATH
Rscript \$FLEPI_PATH/flepimop/main_scripts/inference_main.R -c \$CONFIG_PATH -j 1 -n 1 -k 1
\`\`\`
Just make sure to \`rm -r model_output\` after running.

Otherwise make sure this diagnostic info looks correct before continuing:
jcblemai marked this conversation as resolved.
Show resolved Hide resolved
* Cluster: $1
* User directory: $USERDIR
* Flepi path: $FLEPI_PATH
* Project path: $PROJECT_PATH
* Python: $WHICH_PYTHON
* R: $WHICH_R
* Python arrow: $PYTHON_ARROW_VERSION
* R arrow: $R_ARROW_VERSION
* Stochastic run: $FLEPI_STOCHASTIC_RUN
* Reset chimerics: $FLEPI_RESET_CHIMERICS
* Today: $TODAY
* Config path: $CONFIG_PATH
* Validation date: $VALIDATION_DATE
* Resume location: $RESUME_LOCATION
* Flepi run index: $FLEPI_RUN_INDEX
EOM
Loading