-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generic HPC Install Script #329
Merged
Merged
Changes from 10 commits
Commits
Show all changes
71 commits
Select commit
Hold shift + click to select a range
aa0c077
Copy slurm_init.sh to slurm_init_longleaf.sh
TimothyWillard 22cb130
Restore slurm_init.sh
TimothyWillard 90e29b2
Merge branch 'copy-file' into GH-191/longleaf-batch-submission
TimothyWillard c1d54ef
Added UNC Longleaf Specific Init/Prerun Scripts
TimothyWillard e238d66
Draft implementation of HPC install script
TimothyWillard 2436640
Minor changes to `hpc_install.sh`
TimothyWillard bba583a
Added slurm `--partition` flag to inference script
TimothyWillard 2c8c952
Initial pass at HPC install on rockfish
TimothyWillard 194d2e1
Remove longleaf specific slurm scripts
TimothyWillard f76d68a
Remove `--partion` flag
TimothyWillard 8af3698
Minor updates to `hpc_install.sh`
TimothyWillard 23d9de1
initial tweaks to make flepimop-inference-* runnable
pearsonca 9ac9cf0
further install scripts fixes
pearsonca d32a58c
fix reinvocation of inference-slot
pearsonca 9f2c085
initial installation for ubuntu re-org
pearsonca e2bc41f
updates addressing use of installed r scripts
pearsonca 3095ae4
README revs
pearsonca 1b33dc3
add arrow installation
pearsonca ec0c479
Switch R pkg install to use `build/setup.R`
TimothyWillard e754364
Add `$WORKDIR` to `hpc_install.R`
TimothyWillard 5e4f398
Add missing flepi path arg to setup.R
TimothyWillard 0d9f813
Force pin arrow version between python and R
TimothyWillard 9ca12ed
Change rockfish default directories
TimothyWillard f78ee75
Split `hpc_install.sh` into init and install
TimothyWillard 6d69186
Use `devtools::install` in `setup.R`
TimothyWillard 7adbdfa
Unset error exit around R pkg install
TimothyWillard ae6666f
Add `set +e` an exit to `flepi_init.sh`
TimothyWillard f3fb1a7
`install_cli` installs to conda bin
TimothyWillard e443626
Remove init call from install script
TimothyWillard 480bfd7
Remove old version restrictions, add `optparse`
TimothyWillard 385eeb5
Move R deps install into conda environment
TimothyWillard f0571ae
Readd inference CLI install
TimothyWillard 53f2f49
Update example command to use installed CLI
TimothyWillard ec7978d
Manually install `covidcast` package
TimothyWillard 2e6dfca
Downgrade r-base dependency to 4.3
TimothyWillard 42259a2
Remove symlinks if exists on reinstall
TimothyWillard aed8442
Script to generate `environment.yml`
TimothyWillard a4631db
Add dnachun to channels, add r-sf dependency
TimothyWillard b0685c5
GitHub action to generate `environment.yml`
TimothyWillard a9e9f47
Remove unneeded comment
TimothyWillard 9acda2c
Merge main into GH-191/auto-generate-environment.yml
TimothyWillard ca10838
GitHub action checkout with attached head
TimothyWillard 6ad753d
Update environment.yml
TimothyWillard 27a80f7
Make clear source of `environment.yml` commit
TimothyWillard bacf6f4
Use premade `environment.yml` in `hpc_install.sh`
TimothyWillard c613f09
Remove `setup.R` and `install_ubuntu.sh`
TimothyWillard 8ca2f58
Restore `README.md`
TimothyWillard cbcdfbd
Bug fix to only remove files present
TimothyWillard 3b33ab1
Add spaces for style in `build/hpc_install.sh`
TimothyWillard a9c3415
Merge main into GH-191/longleaf-batch-submission
TimothyWillard 04c533b
Merge main into GH-191/longleaf-batch-submission
TimothyWillard 039d311
Move everything on longleaf into `/work`
TimothyWillard 37e1405
Rename install script to clarify uses
TimothyWillard b1c6e46
Change exec of hpc install script
TimothyWillard 384c219
Change longleaf dirs in init script
TimothyWillard 96df05a
Move rockfish userdir
TimothyWillard cbcfff2
Change default python to 3.11
TimothyWillard 8fa9c82
Merge main into GH-191/longleaf-batch-submission
TimothyWillard e1811b6
Make flepi path/conda configurable
TimothyWillard 52edb16
Update `environment.yml` via GitHub action
TimothyWillard 22484f6
Formatting of flepi path/conda inputs
TimothyWillard 1f3987b
Use `realpath` to make format file paths
TimothyWillard 8ba4474
Remove python/R name check
TimothyWillard 24988cd
Remove whitespace from `README.md`
TimothyWillard 216ee1f
Add missing slurm module for rockfish
TimothyWillard 139a8b2
Add `--editable` to `gempyor` install
TimothyWillard b73b5ab
Cleanup error handling
TimothyWillard 70ced08
Update conda env to use `~/.conda`
TimothyWillard b34f9ab
Remove `--force-reinstall` from pip install
TimothyWillard 8d8042a
Minor typo
TimothyWillard 86b5cd6
Merge main into GH-191/longleaf-batch-submission
TimothyWillard File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,157 @@ | ||
# Cluster specific setup | ||
if [[ $1 == "longleaf" ]]; then | ||
# Setup general purpose user variables needed for Longleaf | ||
USERO=$( echo $USER | awk '{ print substr($0, 1, 1) }' ) | ||
USERN=$( echo $USER | awk '{ print substr($0, 2, 1) }' ) | ||
USERDIR="/users/$USERO/$USERN/$USER" | ||
cd $USERDIR | ||
|
||
# Load required modules | ||
module purge | ||
module load gcc/9.1.0 | ||
module load anaconda/2023.03 | ||
module load git | ||
elif [[ $1 == "rockfish" ]]; then | ||
# Setup general purspose user variables needed for RockFish | ||
USERDIR="/scratch4/struelo1/flepimop-code/$USER/" | ||
|
||
# Load required modules | ||
module purge | ||
module load gcc/9.3.0 | ||
module load anaconda/2020.07 | ||
module load git/2.42.0 | ||
else | ||
echo "The cluster name '$1' is not recognized, must be one of: 'longleaf', 'rockfish'." | ||
exit 1 | ||
fi | ||
|
||
# Make sure the credentials is is where we expect and have the right perms | ||
if [ ! -f "$USERDIR/slack_credentials.sh" ]; then | ||
echo "You need to place sensitive credentials in '$USERDIR/slack_credentials.sh'." | ||
exit 1 | ||
fi | ||
chmod 600 $USERDIR/slack_credentials.sh | ||
source $USERDIR/slack_credentials.sh | ||
|
||
# Ensure we have a $FLEPI_PATH | ||
if [ -z "${FLEPI_PATH}" ]; then | ||
echo "An explicit \$FLEPI_PATH was not provided, setting to '$USERDIR/flepiMoP'." | ||
export FLEPI_PATH="$USERDIR/flepiMoP" | ||
TimothyWillard marked this conversation as resolved.
Show resolved
Hide resolved
|
||
fi | ||
|
||
# Test that flepiMoP is located there | ||
if [ ! -d "$FLEPI_PATH" ]; then | ||
echo "Did not find flepiMoP at '$FLEPI_PATH', cloning on your behalf." | ||
git clone https://github.com/HopkinsIDD/flepiMoP.git $FLEPI_PATH | ||
TimothyWillard marked this conversation as resolved.
Show resolved
Hide resolved
|
||
elif [ ! -d "$FLEPI_PATH/.git" ]; then | ||
echo "The flepiMoP found at '$FLEPI_PATH' is not a git clone, unsure of how to proceed." | ||
exit 1 | ||
fi | ||
|
||
# Setup the conda environment | ||
if [ ! -d "$USERDIR/flepimop-env" ]; then | ||
cat << EOF > $USERDIR/environment.yml | ||
channels: | ||
- conda-forge | ||
- defaults | ||
dependencies: | ||
- python=3.10 | ||
TimothyWillard marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- pip | ||
- r-base>=4.4 | ||
- r-essentials | ||
- r-devtools | ||
- pyarrow=17.0.0 | ||
- r-arrow=17.0.0 | ||
# Manually specify this one because of the paths for libudunits2 on longleaf | ||
- r-sf | ||
# This packages are probably missing from the DESCRIPTION of the R packages | ||
- r-optparse | ||
- pip: | ||
- git+https://github.com/HopkinsIDD/flepiMoP.git#subdirectory=flepimop/gempyor_pkg | ||
jcblemai marked this conversation as resolved.
Show resolved
Hide resolved
|
||
EOF | ||
conda env create --prefix $USERDIR/flepimop-env --file $USERDIR/environment.yml | ||
fi | ||
conda activate $USERDIR/flepimop-env | ||
conda update --all | ||
|
||
# Check the conda environment is valid | ||
WHICH_PYTHON=$( which python ) | ||
WHICH_R=$( which R ) | ||
WHICH_PYTHON_OKAY=$( echo "$WHICH_PYTHON" | grep "flepimop-env" | wc -l ) | ||
WHICH_R_OKAY=$( echo "$WHICH_R" | grep "flepimop-env" | wc -l ) | ||
if [[ "$WHICH_PYTHON_OKAY" -ne 1 ]]; then | ||
echo "The python found is '$WHICH_PYTHON', which does not contain the expected 'flepimop-env'." | ||
exit 1 | ||
fi | ||
if [[ "$WHICH_R_OKAY" -ne 1 ]]; then | ||
echo "The R found is '$WHICH_R', which does not contain the expected 'flepimop-env'." | ||
exit 1 | ||
fi | ||
PYTHON_ARROW_VERSION=$( python -c "import pyarrow; print(pyarrow.__version__)" ) | ||
R_ARROW_VERSION=$( Rscript -e "cat(as.character(packageVersion('arrow')))" ) | ||
COMPATIBLE_ARROW_VERSION=$( echo "$R_ARROW_VERSION" | grep "$PYTHON_ARROW_VERSION" | wc -l ) | ||
if [[ "$COMPATIBLE_ARROW_VERSION" -ne 1 ]]; then | ||
echo "The R version of arrow is '$R_ARROW_VERSION' and the python version is '$PYTHON_ARROW_VERSION'. These may not be compatible versions." | ||
fi | ||
|
||
# Install the local R packages | ||
INSTALL_R=$( mktemp ) | ||
cat << EOF > $INSTALL_R | ||
library(devtools) | ||
devtools::install_github('HopkinsIDD/flepiMoP', subdir='flepimop/R_packages/flepicommon') | ||
devtools::install_github('HopkinsIDD/flepiMoP', subdir='flepimop/R_packages/flepiconfig') | ||
devtools::install_github('HopkinsIDD/flepiMoP', subdir='flepimop/R_packages/inference') | ||
EOF | ||
Rscript $INSTALL_R | ||
rm $INSTALL_R | ||
TimothyWillard marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# Set correct env vars | ||
TimothyWillard marked this conversation as resolved.
Show resolved
Hide resolved
|
||
export FLEPI_STOCHASTIC_RUN=false | ||
export FLEPI_RESET_CHIMERICS=TRUE | ||
export TODAY=`date --rfc-3339='date'` | ||
|
||
echo -n "Please set a project path (relative to '$USERDIR'): " | ||
read PROJECT_PATH | ||
export PROJECT_PATH="$USERDIR/$PROJECT_PATH" | ||
|
||
echo -n "Please set a config path (relative to '$PROJECT_PATH'): " | ||
read CONFIG_PATH | ||
export CONFIG_PATH="$PROJECT_PATH/$CONFIG_PATH" | ||
|
||
echo -n "Please set a validation date (today is $TODAY): " | ||
read VALIDATION_DATE | ||
|
||
echo -n "Please set a resume location: " | ||
read RESUME_LOCATION | ||
|
||
echo -n "Please set a flepi run index: " | ||
read FLEPI_RUN_INDEX | ||
TimothyWillard marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# Done | ||
cat << EOM | ||
> The HPC install script has successfully finished. | ||
|
||
If you are testing if this worked, say installing for the first time, you can use the inference example from the \`flepimop_sample\` repository: | ||
\`\`\`bash | ||
cd \$PROJECT_PATH | ||
Rscript \$FLEPI_PATH/flepimop/main_scripts/inference_main.R -c \$CONFIG_PATH -j 1 -n 1 -k 1 | ||
\`\`\` | ||
Just make sure to \`rm -r model_output\` after running. | ||
|
||
Otherwise make sure this diagnostic info looks correct before continuing: | ||
jcblemai marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Cluster: $1 | ||
* User directory: $USERDIR | ||
* Flepi path: $FLEPI_PATH | ||
* Project path: $PROJECT_PATH | ||
* Python: $WHICH_PYTHON | ||
* R: $WHICH_R | ||
* Python arrow: $PYTHON_ARROW_VERSION | ||
* R arrow: $R_ARROW_VERSION | ||
* Stochastic run: $FLEPI_STOCHASTIC_RUN | ||
* Reset chimerics: $FLEPI_RESET_CHIMERICS | ||
* Today: $TODAY | ||
* Config path: $CONFIG_PATH | ||
* Validation date: $VALIDATION_DATE | ||
* Resume location: $RESUME_LOCATION | ||
* Flepi run index: $FLEPI_RUN_INDEX | ||
EOM |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to cd to USERDIR as well here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and if we do, several of the $USERDIRs below can/must be eliminated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could add creating some hpc-wide environmental variables to the longleaf-setup repo. does that make sense to pair with this?
lastly ... bit weird that we're doing install here in scratch. why not in $HOME? i get doing projects on scratch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per in-person conversation:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least on the longleaf side it looks like
/users
is similar to$HOME
, the documentation states "Think of it as a capacity expansion to your home directory." However, I think maybe the project directory should be moved to/work
since that's high throughput and designed for active jobs. So my take is:flepiMoP
andflepimop-env
stay in/users
, especially for the conda env since that directory can get large and$HOME
has some low and strict storage caps./work
since that'll actually need throughput for the job.I still need to dig up the rockfish documentation. Longleaf docs: https://help.rc.unc.edu/getting-started-on-longleaf/#main-directory-spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for these scripts - the install all worked great for me on longleaf. 🎉
I'm also open to having these things anywhere, but as @jcblemai said I think having everything (including the flepimop libraries) in /work or /scratch makes the most sense, including the flepiMoP folder itself. I understand how installing these in /users or /home would be ideal if flepiMoP was stable but from a practical perspective, I am operationally often changing things within flepimop and reinstalling things run-to-run, playing with my own different environments, jumping between branches, or jumping between different FLEPI_PATH 's (not ideal, but practically this is just what we've had to do with concurrent runs and changes). So for convenience it would be good to just have everything in the same place, imo.
Separately, with my experience with running stuff in the past I was confused with having to link the specific location of the flepimop-env . I'm fine either way, I just don't think I follow why the change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@saraloo is there a general class of the things you're changing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been the case since 9ca12ed.I see, this is not the case for the$USERDIR
variable, done now as well.This is done now. I think I am misinterpreting the docs (see https://help.rc.unc.edu/getting-started-on-longleaf/#main-directory-spaces) on the differences between
/users
and/work
. @jcblemai what are the practical differences between the two? My interpretation was that/work
was mean for high IO short term storage for active work whereas/users
is designed for longer term lower IO (read okay?) storage for libraries/codebases.@saraloo is this normal operational behavior? This sounds like the installation script needs to be much more accommodating to flexibility if this is the case. For the different environments, do you mean switching between multiple conda envs? What makes each of these envs distinct? As far as jumping branches this script won't do anything to your
flepiMoP
clone, although you can switch the branch yourself and then run this script again to update the conda env with the code from that branch ("install" is a misnomer, it really should be "install or update", I'll change the script name and make sure this is clear when writing the documentation), does that accommodate this use case? As for different$FLEPI_PATH
s this script checks if this env var is set before doing anything, and if it is just uses the set value so there should be no issue setting custom$FLEPI_PATH
s. Have you tested this yet and does it accommodate your use case?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Responding to both simultaneously. No, i don't think this is normal behavior so feel free to make a judgement call on your end. Just in the past during larger periods of development which inevitably coincide with operational demands I was running two or three different diseases on significantly different gempyor and/or R inference setups from different conda environments (again, don't think this will necessarily be standard, especially now that more people can run stuff). Just flagging that there will be circumstances where flexibility is preferable and want to reduce the possibility of someone setting the wrong flepimop version they;re working on or something, or reducing having to jump around etc to switch branches.
And sorry, haven't tested the FLEPI_PATH bit yet but that makes sense and I don't anticipate any issues there with setting that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we move to the new workflow carl described today, it means that we will have custom branch for runs so you can envision someone running Flu and RSV from the same account but using two different flepiMoP branch. I however think this flexibility can be added alter with the pre-runs scripts that are mentionned below.
Sometime also when running to many parallel run we can have some filesystem lock on the packages, which is always annoying, but I would not worry about it too much.
RE to sara's questions: do we need to specify the emplacement of the conda environement ?
This is correct, but flepiMoP does not support writing to other folder other than the project one, so we work from work.