Welcome to this repository! This repo contains Python scripts and WDLs designed to be used in conjunction with Terra and the Terra Data Repository (TDR).
- Python scripts: Automate tasks such as ingesting metadata, managing files in TDR, and interacting with Terra workspaces.
- WDLs: Wrap these Python scripts for execution in Terra workflows.
- This repo does not include the direct setup of Terra or TDR environments.
- It does not manage external dependencies for running workflows unless done through the provided Docker image or WDL.
For more detailed information, please refer to CONTRIBUTING.md.
If you're new to repository or any of these features, follow the guide here to get started writing and testing your first workflow.
- Clone the repository:
git clone <repository-url> cd <repository-directory>
- Next, write your Python code. To get tokens and interact with Terra and/or TDR, see our template Python script. It has a lot of the set-up required for writing a script to interact with these resources. Additionally, a good deal of functionality to interact with Terra and TDR can be found in the utilities directory. See Terra utilities here and different types of TDR utilities located in the TDR utilities directory.
- Now it's time for your WDL file. In this repository, most WDL files are simply "wrappers" for the Python code. All this means is that the WDLs don't contain much logic on their own - they're mostly one-task workflows that call out to Python scripts. You can see multiple examples here. As a general rule, the most of these WDLs can be copied over to create a new workflow, changing just the workflow name, inputs, and the Python script that's called. As a general rule, the inputs to your workflow are the same as the inputs that are required for your Python script. Additionally, the same Docker image can generally be used for all workflows (
us-central1-docker.pkg.dev/operations-portal-427515/ops-toolbox/ops_terra_utils_slim:latest
). If your Python script contains optional inputs or flags, there are examples here for optional inputs and examples here for flags (the flags are passed in as Booleans to the WDL). - Once you have your Python script and WDL code ready, you can either test locally, or publish your workflow to Dockstore and run it via Terra. To run just your Python script locally, see running locally directions. To run your entire workflow locally (including the WDL) there are some options outlined here. To publish in Dockstore and run test via Terra, go through the following steps:
- Update the .dockstore.yml to include your new workflow. The syntax should remain the same, just replace with the path to your new WDL file and the name of your new workflow.
- Push all your local changes to your remote feature branch in GitHub. Once all your changes are pushed, navigate to the GitHub action to build the docker image and in the "Run workflow" drop down, select your feature branch and click the green "Run workflow" button. This will create the Docker image with your new Python script so it's available to use.
- Next, navigate to Dockstore (make sure you're logged in) and under the "More" dropdown with the gear icon, select "Discover Existing Dockstore Workflows". This may take a minute, but it will re-sync Dockstore to make your new WDL available. Once this finishes, click on the "Unpublished" tab, and select your workflow. Once you find your new workflow, select the 'Versions' tab and make sure your feature branch is selected. Then select "Publish". Note that Dockstore won't allow you to publish your workflow if your WDL has syntax errors. In this case, you'll see them reported in the "Files" tab.
- Once your workflow is registered, navigate to Terra and either open an existing workspace or create a new one. Import your workflow and launch it! It will run via Terra and you'll see in status reported. You can debug using the links the execution buckets (all logs, stdout, and stderr files will be located in the execution directories).
- To run a Python script locally you will need all required dependencies installed. It will be some subset, if not all, of requirements.txt.
python python/script_name.py --arg1 value --arg2 value
- Alternatively, if you're using Docker:
docker run -v $(pwd):/app us-central1-docker.pkg.dev/operations-portal-427515/ops-toolbox/ops_terra_utils_slim:latest python /app/script_name.py --arg1 value --arg2 value
pre-commit is enabled on this repository
- Install the
pre-commit
package using one of the following two options:- via pip:
pip install pre-commit
- via homebrew:
brew install pre-commit
- Set up
pre-commit
locally so that it runs on each commit automatically by runningpre-commit install
- Note, if you run into the following error when running the installation command:
Cowardly refusing to install hooks with core.hooksPath set.
, then try running the following two commands and then re-attempt to runpre-commit install
git config --unset-all core.hooksPath
git config --global --unset-all core.hooksPath
- Note, if you run into the following error when running the installation command:
- via pip:
- The hooks that are automatically set to run on each commit are configured in the .pre-commit.yml
- To add more hooks, browse available hooks here
- After you've installed
pre-commit
, all hooks will run automatically on each file you've changed in your feature branch - To override the check and commit directly, run
git commit -m "YOU COMMIT MESSAGE" --no-verify
- The only time you may want to use the
--no-verify
flag is if your commit is WIP and you just need to commit something for testing.
- The only time you may want to use the
- There are linting checks currently configured for this repository, so committing without fully formatting or type annotating a file may result in failed GitHub actions.
mypy
is one of the checks that are run as part of the current pre-commit
checks. Sometimes it's super helpful, but occasionally it's just wrong. If you have line(s) failing on mypy
errors, try the following:
- If
mypy
isn't already installed, install it using pip:pip install mypy
- If you're not able to troubleshoot with the given information automatically output my
pre-commit
, you can run
mypy --ignore-missing-imports --ignore-missing-imports {PATH TO YOUR FILE}
There are lots of additional options for running mypy
if this still isn't helpful. You can see them all by running mypy -h
, or looking through their documentation.
3. The part in the brackets (such as [assignment]
, [misc]
, etc.) is the error code. If you think mypy
is throwing an error incorrectly on a given line, you can ignore that error using the following syntax in your code on the line the check is failing on. In the following example, the error code is an assignment
error. You can swap this out for whatever type of error code mypy
is reporting for that line.
def my_function(x): # type: ignore[assignment]
return x
- Because
pre-commit
will automatically reformat and make other adjustments (due to some of the configurations currently set up in the configuration yml), it could be that you rungit commit
and end up with 0 files tracked for commit. You may have to re-run your exact commit command to add the tracked files. If all automatic re-formatting has been run successfully, you'll see a message that says something like2 files changed, 52 insertions(+), 1 deletion(-)
- At this point, you can run
git push
to push your local files to your remote branch
- At this point, you can run
- For hooks that do not automatically reformat your file (for example the hook that checks for type annotations), you will have to modify your file and address the missing annotations before you'll be able to add those files as tracked files ready for commit. If annotations have successfully been fixed, you can re-run the exact same commit command and you'll see the same message as mentioned above.
- At this point, you can run
git push
to push your local files to your remote branch
- At this point, you can run
For more setup details, see CONTRIBUTING.md.