Skip to content

Commit

Permalink
Merge branch 'main' into pstjohn/changes-for-azure-test-suite
Browse files Browse the repository at this point in the history
  • Loading branch information
pstjohn authored Jan 7, 2025
2 parents d6b0cec + 5eddee1 commit b1e2e64
Show file tree
Hide file tree
Showing 18 changed files with 522 additions and 101 deletions.
59 changes: 24 additions & 35 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -1,45 +1,34 @@
(**NOTE:** _**delete** these instructional lines as you fill-out this PR template_)
### Description
<!-- Provide a detailed description of the changes in this PR -->

(**NOTE:** _template is designed to be filled-in and used as the **squashed commit message for the entire PR**. _Italicized text_ is intended to be deleted as you fill in this template. Use the text between the `---`)
### Type of changes
<!-- Mark the relevant option with an [x] -->

---
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Refactor
- [ ] Documentation update
- [ ] Other (please describe):

_High level summary of changes. Try to keep this as short and informative as possible: less is more._
### CI Pipeline Configuration
Configure CI behavior by checking relevant boxes below. This will automatically apply labels.

_Describe your changes. You can be more detailed and descriptive here. If it is a code change, Be sure to answer:_
- _What is changing?_
- _What is the new or fixed functionality?_
- _Why or when would someone want to use these changes?_
- _How can someone use these changes?_
---
- [ ] [SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests
- [ ] [INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest

## Summary
_High level summary of changes. Try to keep this as short and informative as possible: less is more._
> [!NOTE]
> By default, the notebooks validation tests are skipped unless explicitly enabled.
## Details
_Describe your changes. You can be more detailed and descriptive here._

## Usage
_How does a user interact with the changed code?_
### Usage
<!--- How does a user interact with the changed code -->
```python
python -m your.new.module -and -all -options
```

## Testing
_How do you prove that your code behaves the way you claim?_

Tests for these changes can be run via:
```shell
pytest -v tests/your/new/or/existing/test_functions.py::test_function
TODO: Add code snippet
```

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

(**NOTE:** _also **delete** this checklist as you fill-out this PR template_)

**Most of the changes** to files with extensions `*.py`, `*.yaml`, `*.yml`, `Dockerfile*` or `requirements.txt` **DO REQUIRE both `pytest-` and `jet-` CI stages**.

- [ ] Did you review the [Before your PR is "Ready for review" section](https://github.com/NVIDIA/bionemo-framework/-/blob/dev/CONTRIBUTING.md?ref_type=heads#before-pr-ready) before asking for review?
- [ ] Did you make sure your changes have tests? Did you test your changes locally?
- [ ] Can you add [the `SKIP_CI` label](https://github.com/NVIDIA/bionemo-framework/-/blob/dev/CONTRIBUTING.md?ref_type=heads#skip-ci) to your PR?
- [ ] Can you add [the `PYTEST_NOT_REQUIRED` label](https://github.com/NVIDIA/bionemo-framework/-/blob/dev/CONTRIBUTING.md?ref_type=heads#skip-pytest) to your PR?
- [ ] Can you add [the `JET_NOT_REQUIRED` label](https://github.com/NVIDIA/bionemo-framework/-/blob/dev/CONTRIBUTING.md?ref_type=heads#skip-jet) to your PR?
- [ ] I have tested these changes locally
- [ ] I have updated the documentation accordingly
- [ ] I have added/updated tests as needed
- [ ] All existing tests pass successfully
48 changes: 48 additions & 0 deletions .github/workflows/pr-labels.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
name: PR Label Management

on:
pull_request:
types: [opened, edited, synchronize]

jobs:
manage-labels:
runs-on: ubuntu-latest
permissions:
pull-requests: write

steps:
- name: Check PR body and manage labels
uses: actions/github-script@v6
with:
script: |
const prBody = context.payload.pull_request.body;
const labelChecks = {
'SKIP_CI': /\[x\]\s*\[SKIP_CI\]/i,
'INCLUDE_NOTEBOOKS_TESTS': /\[x\]\s*\[INCLUDE_NOTEBOOKS_TESTS\]/i
};
const currentLabels = new Set(
context.payload.pull_request.labels.map(label => label.name)
);
for (const [label, pattern] of Object.entries(labelChecks)) {
const shouldHaveLabel = pattern.test(prBody);
const hasLabel = currentLabels.has(label);
if (shouldHaveLabel && !hasLabel) {
await github.rest.issues.addLabels({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.payload.pull_request.number,
labels: [label]
});
} else if (!shouldHaveLabel && hasLabel) {
await github.rest.issues.removeLabel({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.payload.pull_request.number,
name: label
});
}
}
54 changes: 54 additions & 0 deletions ci/benchmarks/partial-conv/esm2_pretrain.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
scope: partial-conv
time_limit: 14400
script_args:
# All arguments referenced in the script string must be specified here.
# Arguments not referenced in the script string must have the 'arg' field specified.
# See jet/core/configs.py for the specification of the configuration class
workspace:
value: /workspace/bionemo2
key_segment: False
data_path:
value: /data/20240809_uniref_2024_03/data
key_segment: False
model:
value: esm2
variant:
value: train
config_name:
value: 650M
precision:
value: [bf16-mixed]
nodes:
value: [4]
gpus:
value: 8
batch_size:
value: 16
max_steps:
value: 26500
script: |-
WANDB_API_KEY=$BIONEMO_WANDB_API_KEY ${variant}_${model} \
--train-cluster-path=${data_path}/train_clusters.parquet \
--train-database-path=${data_path}/train.db \
--valid-cluster-path=${data_path}/valid_clusters.parquet \
--valid-database-path=${data_path}/validation.db \
--micro-batch-size=${batch_size} \
--num-nodes=${nodes} \
--num-gpus=${gpus} \
--val-check-interval=1000 \
--limit-val-batches=1 \
--num-steps=${max_steps} \
--min-seq-length=1024 \
--max-seq-length=1024 \
--num-layers=33 \
--hidden-size=1280 \
--num-attention-heads=20 \
--ffn-hidden-size=5120 \
--create-tensorboard-logger \
--experiment-name=${batch_size}bs_${nodes}node_${gpus}gpu_${max_steps}s_${precision}prec \
--result-dir=${tensorboard_dir} \
--wandb-project=${wandb_project_name} \
--wandb-group=${model}_${variant}_${config_name} \
--wandb-job-type=${pipeline_label} \
--log-every-n-steps=50 \
--disable-checkpointing;
65 changes: 65 additions & 0 deletions ci/benchmarks/perf/esm2_pretrain.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
scope: perf
time_limit: 1800
script_args:
# All arguments referenced in the script string must be specified here.
# Arguments not referenced in the script string must have the 'arg' field specified.
# See jet/core/configs.py for the specification of the configuration class
workspace:
value: /workspace/bionemo2
key_segment: False
data_path:
value: /data/20240809_uniref_2024_03/data
key_segment: False
model: esm2
variant: train
config_name: 650M
precision: bf16-mixed
max_steps: 200
gpus: 8
acc_grad: 1
products:
- nodes: 1
batch_size: 16
pp: 1
tp: 1
- nodes: 2
batch_size: 16
pp: 2
tp: 1
- nodes: 2
batch_size: 16
pp: 1
tp: 2
- nodes: 2
batch_size: 16
pp: 1
tp: 1
script: |-
WANDB_API_KEY=$BIONEMO_WANDB_API_KEY ${variant}_${model} \
--train-cluster-path=${data_path}/train_clusters.parquet \
--train-database-path=${data_path}/train.db \
--valid-cluster-path=${data_path}/valid_clusters.parquet \
--valid-database-path=${data_path}/validation.db \
--micro-batch-size=${batch_size} \
--num-nodes=${nodes} \
--num-gpus=${gpus} \
--val-check-interval=50 \
--limit-val-batches=1 \
--num-steps=${max_steps} \
--min-seq-length=1024 \
--max-seq-length=1024 \
--num-layers=33 \
--hidden-size=1280 \
--num-attention-heads=20 \
--ffn-hidden-size=5120 \
--create-tensorboard-logger \
--experiment-name=${batch_size}bs_${nodes}node_${gpus}gpu_${max_steps}s_${precision}prec \
--result-dir=${tensorboard_dir} \
--wandb-project=${wandb_project_name} \
--wandb-group=${model}_${variant}_${config_name} \
--wandb-job-type=${pipeline_label} \
--log-every-n-steps=10 \
--accumulate-grad-batches=${acc_grad} \
--pipeline-model-parallel-size=${pp} \
--tensor-model-parallel-size={tp} \
--disable-checkpointing;
113 changes: 95 additions & 18 deletions ci/scripts/run_pytest.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,30 +15,107 @@
# See the License for the specific language governing permissions and
# limitations under the License.

set -xueo pipefail
export PYTHONDONTWRITEBYTECODE=1
# NOTE: if a non-nvidia user wants to run the test suite, just run `export BIONEMO_DATA_SOURCE=ngc` prior to this call.
export BIONEMO_DATA_SOURCE="${BIONEMO_DATA_SOURCE:-pbss}"
# flexible GPU memory management, reducing the risk of fragmentation-related CUDA OOM
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
source "$(dirname "$0")/utils.sh"

if ! set_bionemo_home; then
exit 1
fi

python -m coverage erase
# Enable strict mode with better error handling
set -ueo pipefail

# Function to display usage information
usage() {
cat << EOF
Usage: $(basename "$0") [OPTIONS]
Options:
--skip-docs Skip running tests in the docs directory
--no-nbval Skip jupyter notebook validation tests
Note: Documentation tests (docs/) are only run when notebook validation
is enabled (--no-nbval not set) and docs are not skipped
(--skip-docs not set)
-h, --help Display this help message
EOF
exit "${1:-0}"
}

# Function to handle cleanup on script exit
cleanup() {
local exit_code=$?
[ -n "${coverage_files[*]:-}" ] && rm -f "${coverage_files[@]:-}"
exit "$exit_code"
}

# Set default environment variables
: "${BIONEMO_DATA_SOURCE:=pbss}"
: "${PYTHONDONTWRITEBYTECODE:=1}"
: "${PYTORCH_CUDA_ALLOC_CONF:=expandable_segments:True}"

# Export necessary environment variables
export BIONEMO_DATA_SOURCE PYTHONDONTWRITEBYTECODE PYTORCH_CUDA_ALLOC_CONF

# Initialize variables
declare -a coverage_files
SKIP_DOCS=false
NO_NBVAL=false
error=false
for dir in docs/ ./sub-packages/bionemo-*/; do

# Parse command line arguments
while (( $# > 0 )); do
case "$1" in
--skip-docs) SKIP_DOCS=true ;;
--no-nbval) NO_NBVAL=true ;;
-h|--help) usage ;;
*) echo "Unknown option: $1" >&2; usage 1 ;;
esac
shift
done

# Set up trap for cleanup
trap cleanup EXIT

# Source utility functions
SCRIPT_DIR="$(dirname "$(readlink -f "$0")")"
source "$SCRIPT_DIR/utils.sh" || { echo "Failed to source utils.sh" >&2; exit 1; }

# Set up BioNeMo home directory
set_bionemo_home || exit 1

# Clear previous coverage data
python -m coverage erase

# Set up pytest options
PYTEST_OPTIONS=(
-v
--durations=0
--durations-min=60.0
)
[[ "$NO_NBVAL" != true ]] && PYTEST_OPTIONS+=(--nbval-lax)

# Define test directories
TEST_DIRS=(./sub-packages/bionemo-*/)
if [[ "$NO_NBVAL" != true && "$SKIP_DOCS" != true ]]; then
TEST_DIRS+=(docs/)
fi

echo "Test directories: ${TEST_DIRS[*]}"

# Run tests with coverage
for dir in "${TEST_DIRS[@]}"; do
echo "Running pytest in $dir"
python -m coverage run --parallel-mode --source=bionemo \
-m pytest -v --nbval-lax --durations=0 --durations-min=60.0 "$dir" || error=true
coverage_file=".coverage.${dir//\//_}"
coverage_files+=("$coverage_file")

if ! python -m coverage run \
--parallel-mode \
--source=bionemo \
--data-file="$coverage_file" \
-m pytest "${PYTEST_OPTIONS[@]}" "$dir"; then
error=true
fi
done

# Combine and report coverage
python -m coverage combine
python -m coverage report --show-missing

if [ "$error" = true ]; then
exit 1
fi
# Exit with appropriate status
$error && exit 1
exit 0
Empty file removed docs/docs/docs.todo
Empty file.
20 changes: 19 additions & 1 deletion docs/docs/user-guide/contributing/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ repository (unless external constraints prevent it).

## Pull Request (PR) Guidelines

### Labeling Your PR
### Labeling Your PR as External Contributor

If you are an external contributor (not an NVIDIA employee), please add the `contribution` label to your PR before submitting. Labels can be accessed in the right sidebar of the GitHub user interface when creating or editing a PR.

Expand Down Expand Up @@ -135,6 +135,24 @@ If you are an external contributor (not an NVIDIA employee), please add the `con
redistributed consistent with this project or the open source license(s) involved.
```
### CI Pipeline Configuration Controls
CI pipeline behavior can be controlled via checkboxes in PR descriptions to optimize test execution:
Key behaviors:
- Controls processed automatically on PR submit/update
- Labels applied based on checkbox status
- Invalid combinations default to most restrictive option
#### **SKIP_CI**
- Skips entire CI pipeline
- Use for documentation typos, README updates
#### **INCLUDE_NOTEBOOKS_TESTS**
- Enables notebook validation tests
- Use when modifying notebooks or notebook-related code
- Disabled by default
### Developer workflows:
You should always carefully test your changes. Run `pytest ...` in your container locally. All tests are done via `pytest`.
Expand Down
Loading

0 comments on commit b1e2e64

Please sign in to comment.