From 8b660919dc16e37e358d69a3774e60a38d5f8aa9 Mon Sep 17 00:00:00 2001 From: Ralph Castain Date: Thu, 15 Jun 2023 22:32:12 -0600 Subject: [PATCH] Revamp prterun show_help files to integrate with Sphinx This commit represents a major reorganization and revamp of prterun's show_help files. The overriding purpose is to allow downstream projects -- such as MPI implementations -- to include PRRTE's Sphinx (.rst) documentation in their own. This entailed the following: * The existing "show help" .txt files are used by the existing " --help " CLI system. It was important to preserve this CLI capability. * However, we wanted this same content to also appear in various man pages (and therefore also HTML documentation). Hence, we moved the existing "show_help" .txt files to a new location: src/docs/ (vs. docs/, where the main PRRTE documentation lives). * After various experiments, we determined that we just could not have both the main PRRTE Sphinx docs and this new class of show_help/man-page-content combination docs in the same Sphinx docs tree. The structure and layout of how we need to render these two sets of docs are quite different. Instead, as described below, we took a different approach: have "building blocks" of content that can be assembled in different ways that are ultimately rendered with the structure and layout required for each environment. * src/docs/ contains 2 directories: * prrte-rst-content: this directory contains a README.txt file that both PRRTE docs authors and downstream consumers of the RST content should read. It explains how both how to author/maintain the docs in that directory, as well as how to integrate the contents of this directory into downstream project RST docroots, and how to include the content. Essentially, this directory is the main source of prterun CLI documentation content. Each topic is split into its own file (see the README for more info) so that it can be RST-included in other .rst files as relevant. Think of these files as individual building blocks of content that can be assembled in multiple different ways. The text content in this directory is mostly content from previous show_help-style text files moved into this directory, split into individual .rst files, and then converted to RST syntax. Some of the tables ended up being nicer / easier to maintain, for example. * show-help-files: the contents of this directory are rendered to what ultimately become the "show_help" text files. This directory effectively RST-includes the files from the prrte-rst-content directory into files and renders them as INI-style "show_help" text files (i.e., suitable for reading by the prte_show_help() subsystem). * The src/docs/prrte-rst-content is also RST-included in the top-level docs/ .rst files (va a sym link). That allows the same "building blocks" of documentation content to be included in both the show_help files and the HTML/man pages. * Note that in distribution tarballs, pre-built HTML docs, norff man pages, and INI-style "show_help" text files will be included in the tarball -- end users do not need to have Sphinx available. For git clones, where none of the Sphinx-generated content is available (i.e., HTML docs, nroff man pages, show_help text files), Sphinx is also still not required (but it *is* still required to "make dist"). If Sphinx is not available when "configure" is invoked, dummy INI-style "show_help" text files are generated (that basically say "You don't have Sphinx, so you don't get help here"). That is the overall scheme. There were many other minor points included in this work, including (but not limited to): * Moving existing explanations of hosts and hostfiles -- and expanding on them -- to the main docs. * Moving explanations of relative indexting to the main docs. * Moving explanations of notifications to the main docs. * Moving listing of deprecated CLI options to the main docs. * Moving explanations of diagnostics to the main docs. * Moving most process placement explanations and examples to the main docs. * Moving explanations of the session directory to the main docs. * Adding psched.1 and pterm.1 man pages. More work needs to be done here. * Slightly improve prte.1, prte_info.1, prted.1, prterun.1, and prun.1 man pages. More work needs to be done here. It is expected that the text content of all the docs will continue to be improved over time. The focus of this commit was not to make the content perfect, but rather to create a first-gen infrastructure for having a single source of documentation content that can be rendered by Sphinx in 3 ways (HTML docs, nroff man pages, and show_help files), and also included in downstream RST documentation. Signed-off-by: Ralph Castain Signed-off-by: Jeff Squyres --- .gitignore | 6 +- Makefile.prte-rules | 8 + autogen.pl | 10 +- configure.ac | 14 +- docs/Makefile.am | 19 +- docs/conf.py | 22 +- docs/hosts/cli.rst | 36 + docs/hosts/hostfiles.rst | 41 + docs/hosts/index.rst | 19 + docs/hosts/relative-indexing.rst | 113 ++ docs/hosts/rm.rst | 9 + docs/index.rst | 22 +- docs/man/man1/index.rst | 2 + docs/man/man1/prte.1.rst | 29 +- docs/man/man1/prte_info.1.rst | 30 +- docs/man/man1/prted.1.rst | 32 +- docs/man/man1/prterun.1.rst | 23 +- docs/man/man1/prun.1.rst | 33 +- docs/man/man1/psched.1.rst | 29 + docs/man/man1/pterm.1.rst | 29 + docs/notifications.rst | 96 ++ docs/placement/deprecated.rst | 141 +++ docs/placement/diagnostics.rst | 77 ++ docs/placement/examples.rst | 420 +++++++ docs/placement/fundamentals.rst | 142 +++ docs/placement/index.rst | 22 + docs/placement/limits.rst | 198 ++++ docs/placement/overview.rst | 190 ++++ docs/placement/rankfiles.rst | 71 ++ docs/prrte-rst-content | 1 + docs/session-directory.rst | 54 + src/Makefile.am | 2 + src/docs/Makefile.am | 15 + src/docs/prrte-rst-content/Makefile.am | 63 + src/docs/prrte-rst-content/README.txt | 132 +++ src/docs/prrte-rst-content/cli-add-host.rst | 68 ++ .../prrte-rst-content/cli-add-hostfile.rst | 49 + .../cli-allow-run-as-root.rst | 31 + src/docs/prrte-rst-content/cli-bind-to.rst | 80 ++ src/docs/prrte-rst-content/cli-dash-host.rst | 27 + .../cli-debug-daemons-file.rst | 20 + .../prrte-rst-content/cli-debug-daemons.rst | 14 + src/docs/prrte-rst-content/cli-display.rst | 50 + .../prrte-rst-content/cli-dvm-hostfile.rst | 85 ++ src/docs/prrte-rst-content/cli-dvm.rst | 56 + .../prrte-rst-content/cli-forward-signals.rst | 15 + .../cli-launcher-hostfile.rst | 49 + .../cli-leave-session-attached.rst | 16 + src/docs/prrte-rst-content/cli-map-by.rst | 136 +++ src/docs/prrte-rst-content/cli-noprefix.rst | 16 + src/docs/prrte-rst-content/cli-output.rst | 52 + .../prrte-rst-content/cli-personality.rst | 18 + src/docs/prrte-rst-content/cli-pmixmca.rst | 15 + src/docs/prrte-rst-content/cli-prefix.rst | 17 + src/docs/prrte-rst-content/cli-prtemca.rst | 15 + src/docs/prrte-rst-content/cli-rank-by.rst | 63 + .../prrte-rst-content/cli-runtime-options.rst | 164 +++ .../cli-stream-buffering.rst | 16 + src/docs/prrte-rst-content/cli-tune.rst | 25 + src/docs/prrte-rst-content/cli-x.rst | 20 + .../deprecated-bind-to-core.rst | 17 + .../deprecated-display-allocation.rst | 17 + .../deprecated-display-devel-allocation.rst | 18 + .../deprecated-display-devel-map.rst | 18 + .../deprecated-display-map.rst | 6 + .../deprecated-display-topo.rst | 18 + .../prrte-rst-content/deprecated-gmca.rst | 28 + src/docs/prrte-rst-content/deprecated-mca.rst | 26 + .../deprecated-merge-stderr-to-stdout.rst | 17 + .../deprecated-output-directory.rst | 23 + .../deprecated-output-filename.rst | 22 + .../deprecated-report-bindings.rst | 17 + .../deprecated-tag-output.rst | 17 + .../deprecated-timestamp-output.rst | 17 + src/docs/prrte-rst-content/deprecated-xml.rst | 17 + .../prrte-rst-content/prterun-all-cli.rst | 135 +++ .../prterun-all-deprecated.rst | 85 ++ src/docs/show-help-files/.gitignore | 3 + src/docs/show-help-files/Makefile.am | 108 ++ .../show-help-files/build-dummy-ini-files.py | 76 ++ src/docs/show-help-files/conf.py | 1 + .../show-help-files/help-schizo-pinfo.rst | 126 ++ src/docs/show-help-files/help-schizo-prte.rst | 336 ++++++ .../show-help-files/help-schizo-prted.rst | 180 +++ .../show-help-files/help-schizo-prterun.rst | 675 +++++++++++ src/docs/show-help-files/help-schizo-prun.rst | 532 +++++++++ .../show-help-files/help-schizo-pterm.rst | 126 ++ src/docs/show-help-files/index.rst | 31 + src/docs/show-help-files/prrte-rst-content | 1 + src/docs/show-help-files/requirements.txt | 1 + src/hwloc/help-prte-hwloc-base.txt | 69 +- src/mca/rmaps/base/help-prte-rmaps-base.txt | 1009 +---------------- src/mca/schizo/base/Makefile.am | 6 +- src/mca/schizo/base/help-schizo-base.txt | 10 + src/mca/schizo/base/help-schizo-cli.txt | 813 ------------- .../schizo/base/help-schizo-deprecated.txt | 157 --- src/mca/schizo/ompi/Makefile.am | 18 +- src/mca/schizo/ompi/help-schizo-ompi.txt | 640 ----------- src/mca/schizo/ompi/schizo-ompi-cli.rst | 39 + src/mca/schizo/prte/Makefile.am | 9 +- src/mca/schizo/prte/help-schizo-pinfo.txt | 76 -- src/mca/schizo/prte/help-schizo-prte.txt | 192 ---- src/mca/schizo/prte/help-schizo-prted.txt | 99 -- src/mca/schizo/prte/help-schizo-prterun.txt | 418 ------- src/mca/schizo/prte/help-schizo-prun.txt | 327 ------ src/mca/schizo/prte/help-schizo-pterm.txt | 75 -- 106 files changed, 5919 insertions(+), 3999 deletions(-) create mode 100644 docs/hosts/cli.rst create mode 100644 docs/hosts/hostfiles.rst create mode 100644 docs/hosts/index.rst create mode 100644 docs/hosts/relative-indexing.rst create mode 100644 docs/hosts/rm.rst create mode 100644 docs/man/man1/psched.1.rst create mode 100644 docs/man/man1/pterm.1.rst create mode 100644 docs/notifications.rst create mode 100644 docs/placement/deprecated.rst create mode 100644 docs/placement/diagnostics.rst create mode 100644 docs/placement/examples.rst create mode 100644 docs/placement/fundamentals.rst create mode 100644 docs/placement/index.rst create mode 100644 docs/placement/limits.rst create mode 100644 docs/placement/overview.rst create mode 100644 docs/placement/rankfiles.rst create mode 120000 docs/prrte-rst-content create mode 100644 docs/session-directory.rst create mode 100644 src/docs/Makefile.am create mode 100644 src/docs/prrte-rst-content/Makefile.am create mode 100644 src/docs/prrte-rst-content/README.txt create mode 100644 src/docs/prrte-rst-content/cli-add-host.rst create mode 100644 src/docs/prrte-rst-content/cli-add-hostfile.rst create mode 100644 src/docs/prrte-rst-content/cli-allow-run-as-root.rst create mode 100644 src/docs/prrte-rst-content/cli-bind-to.rst create mode 100644 src/docs/prrte-rst-content/cli-dash-host.rst create mode 100644 src/docs/prrte-rst-content/cli-debug-daemons-file.rst create mode 100644 src/docs/prrte-rst-content/cli-debug-daemons.rst create mode 100644 src/docs/prrte-rst-content/cli-display.rst create mode 100644 src/docs/prrte-rst-content/cli-dvm-hostfile.rst create mode 100644 src/docs/prrte-rst-content/cli-dvm.rst create mode 100644 src/docs/prrte-rst-content/cli-forward-signals.rst create mode 100644 src/docs/prrte-rst-content/cli-launcher-hostfile.rst create mode 100644 src/docs/prrte-rst-content/cli-leave-session-attached.rst create mode 100644 src/docs/prrte-rst-content/cli-map-by.rst create mode 100644 src/docs/prrte-rst-content/cli-noprefix.rst create mode 100644 src/docs/prrte-rst-content/cli-output.rst create mode 100644 src/docs/prrte-rst-content/cli-personality.rst create mode 100644 src/docs/prrte-rst-content/cli-pmixmca.rst create mode 100644 src/docs/prrte-rst-content/cli-prefix.rst create mode 100644 src/docs/prrte-rst-content/cli-prtemca.rst create mode 100644 src/docs/prrte-rst-content/cli-rank-by.rst create mode 100644 src/docs/prrte-rst-content/cli-runtime-options.rst create mode 100644 src/docs/prrte-rst-content/cli-stream-buffering.rst create mode 100644 src/docs/prrte-rst-content/cli-tune.rst create mode 100644 src/docs/prrte-rst-content/cli-x.rst create mode 100644 src/docs/prrte-rst-content/deprecated-bind-to-core.rst create mode 100644 src/docs/prrte-rst-content/deprecated-display-allocation.rst create mode 100644 src/docs/prrte-rst-content/deprecated-display-devel-allocation.rst create mode 100644 src/docs/prrte-rst-content/deprecated-display-devel-map.rst create mode 100644 src/docs/prrte-rst-content/deprecated-display-map.rst create mode 100644 src/docs/prrte-rst-content/deprecated-display-topo.rst create mode 100644 src/docs/prrte-rst-content/deprecated-gmca.rst create mode 100644 src/docs/prrte-rst-content/deprecated-mca.rst create mode 100644 src/docs/prrte-rst-content/deprecated-merge-stderr-to-stdout.rst create mode 100644 src/docs/prrte-rst-content/deprecated-output-directory.rst create mode 100644 src/docs/prrte-rst-content/deprecated-output-filename.rst create mode 100644 src/docs/prrte-rst-content/deprecated-report-bindings.rst create mode 100644 src/docs/prrte-rst-content/deprecated-tag-output.rst create mode 100644 src/docs/prrte-rst-content/deprecated-timestamp-output.rst create mode 100644 src/docs/prrte-rst-content/deprecated-xml.rst create mode 100644 src/docs/prrte-rst-content/prterun-all-cli.rst create mode 100644 src/docs/prrte-rst-content/prterun-all-deprecated.rst create mode 100644 src/docs/show-help-files/.gitignore create mode 100644 src/docs/show-help-files/Makefile.am create mode 100755 src/docs/show-help-files/build-dummy-ini-files.py create mode 120000 src/docs/show-help-files/conf.py create mode 100644 src/docs/show-help-files/help-schizo-pinfo.rst create mode 100644 src/docs/show-help-files/help-schizo-prte.rst create mode 100644 src/docs/show-help-files/help-schizo-prted.rst create mode 100644 src/docs/show-help-files/help-schizo-prterun.rst create mode 100644 src/docs/show-help-files/help-schizo-prun.rst create mode 100644 src/docs/show-help-files/help-schizo-pterm.rst create mode 100644 src/docs/show-help-files/index.rst create mode 120000 src/docs/show-help-files/prrte-rst-content create mode 120000 src/docs/show-help-files/requirements.txt delete mode 100644 src/mca/schizo/base/help-schizo-cli.txt delete mode 100644 src/mca/schizo/base/help-schizo-deprecated.txt delete mode 100644 src/mca/schizo/ompi/help-schizo-ompi.txt create mode 100644 src/mca/schizo/ompi/schizo-ompi-cli.rst delete mode 100644 src/mca/schizo/prte/help-schizo-pinfo.txt delete mode 100644 src/mca/schizo/prte/help-schizo-prte.txt delete mode 100644 src/mca/schizo/prte/help-schizo-prted.txt delete mode 100644 src/mca/schizo/prte/help-schizo-prterun.txt delete mode 100644 src/mca/schizo/prte/help-schizo-prun.txt delete mode 100644 src/mca/schizo/prte/help-schizo-pterm.txt diff --git a/.gitignore b/.gitignore index ba2c363103..f53bf4e4b6 100644 --- a/.gitignore +++ b/.gitignore @@ -71,7 +71,6 @@ vc70.pdb .hgignore_local stamp-h? AUTHORS -docs-venv/ ar-lib ylwrap @@ -190,7 +189,12 @@ docs/_build docs/_static docs/_static/css/custom.css docs/_templates +src/docs/_build +src/docs/mca/mca.rst +src/docs/mca/help*rst +__pycache__ # Common Python virtual environment directory names venv py?? +docs-venv diff --git a/Makefile.prte-rules b/Makefile.prte-rules index 1a39b5956e..eb50a1855b 100644 --- a/Makefile.prte-rules +++ b/Makefile.prte-rules @@ -21,3 +21,11 @@ prte__v_SPHINX_HTML_0 = @echo " GENERATE HTML docs"; PRTE_V_SPHINX_MAN = $(prte__v_SPHINX_MAN_$V) prte__v_SPHINX_MAN_ = $(prte__v_SPHINX_MAN_$AM_DEFAULT_VERBOSITY) prte__v_SPHINX_MAN_0 = @echo " GENERATE man pages"; + +PRTE_V_TXT = $(prte__v_SPHINX_TXT_$V) +prte__v_TXT_ = $(prte__v_SPHINX_TXT_$AM_DEFAULT_VERBOSITY) +prte__v_TXT_0 = @echo " GENERATE text files"; + +PRTE_V_LN_S = $(prte__v_LN_S_$V) +prte__v_LN_S_ = $(prte__v_LN_S_$AM_DEFAULT_VERBOSITY) +prte__v_LN_S_0 = @echo " LN_S " `basename $@`; diff --git a/autogen.pl b/autogen.pl index d601b4f308..8667914dfe 100755 --- a/autogen.pl +++ b/autogen.pl @@ -8,8 +8,9 @@ # Copyright (c) 2015 Research Organization for Information Science # and Technology (RIST). All rights reserved. # Copyright (c) 2015 IBM Corporation. All rights reserved. -# # Copyright (c) 2021-2023 Nanook Consulting. All rights reserved. +# Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. +# # $COPYRIGHT$ # # Additional copyrights may follow @@ -146,6 +147,13 @@ sub mca_process_component { verbose " Found configure.m4 file\n"; } + # Does this directory have a + # help-{framework}-{component}.rst file? + if (-f "$cdir/help-$framework-$component.rst") { + $found_component->{"rst"} = 1; + verbose " Found help-$framework-$component.rst file\n"; + } + # Push the results onto the $mca_found hash array push(@{$mca_found->{$framework}->{"components"}}, $found_component); diff --git a/configure.ac b/configure.ac index 2fa39ca6d0..f513641428 100644 --- a/configure.ac +++ b/configure.ac @@ -27,6 +27,7 @@ # All Rights reserved. # Copyright (c) 2021-2023 Nanook Consulting. All rights reserved. # Copyright (c) 2021 FUJITSU LIMITED. All rights reserved. +# Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -57,6 +58,14 @@ AC_CONFIG_MACRO_DIR(config) OAC_PUSH_PREFIX([PRTE]) +# Get the absolute version of the srcdir. We don't use "readlink -f", +# because that unfortunately isn't portable (cough cough macOS cough +# cough). +save=$(pwd) +cd $srcdir +abs_srcdir=$(pwd) +cd $save + # autotools expects to perform tests without interference # from user-provided CFLAGS, particularly -Werror flags. # Search for them here and cache any we find @@ -705,7 +714,7 @@ AC_INCLUDES_DEFAULT #endif]) # -# Setup HTML and man page processing +# Setup Sphinx processing # OAC_SETUP_SPHINX([$srcdir/docs/_build/html/index.html], []) @@ -991,6 +1000,9 @@ AC_CONFIG_FILES([ include/Makefile include/prte_version.h docs/Makefile + src/docs/Makefile + src/docs/show-help-files/Makefile + src/docs/prrte-rst-content/Makefile ]) PRTE_CONFIG_FILES diff --git a/docs/Makefile.am b/docs/Makefile.am index 99de2d570c..61bfb0c2b2 100644 --- a/docs/Makefile.am +++ b/docs/Makefile.am @@ -1,5 +1,6 @@ # # Copyright (c) 2022-2023 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. # # Copyright (c) 2023 Nanook Consulting. All rights reserved. # $COPYRIGHT$ @@ -33,20 +34,20 @@ SPHINX_OPTS ?= -W --keep-going -j auto # Note: it is significantly more convenient to list all the source # files here using wildcards (vs. listing every single .rst file). # However, it is necessary to list $(srcdir) when using wildcards. -TEXT_SOURCE_FILES = # if you have any .txt files, list them here -IMAGE_SOURCE_FILES = # if you have any, list them here -RST_SOURCE_FILES = \ +RST_SOURCE_FILES = \ $(srcdir)/*.rst \ + $(srcdir)/prrte-rst-content/*.rst \ + $(srcdir)/placement/*.rst \ + $(srcdir)/hosts/*.rst \ $(srcdir)/developers/*.rst \ $(srcdir)/man/*.rst \ $(srcdir)/man/man1/*.rst \ $(srcdir)/man/man5/*.rst -EXTRA_DIST = \ +EXTRA_DIST = \ requirements.txt \ + _templates/configurator.html \ $(SPHINX_CONFIG) \ - $(TEXT_SOURCE_FILES) \ - $(IMAGE_SOURCE_FILES) \ $(RST_SOURCE_FILES) ########################################################################### @@ -63,7 +64,9 @@ PRTE_MAN1 = \ prte_info.1 \ prted.1 \ prterun.1 \ - prun.1 + prun.1 \ + psched.1 \ + pterm.1 PRTE_MAN5 = \ prte.5 @@ -76,6 +79,8 @@ PRTE_MAN1_BUILT = $(PRTE_MAN1:%.1=$(MAN_OUTDIR)/%.1) PRTE_MAN5_RST = $(PRTE_MAN5:%.5=man/man5/%.5.rst) PRTE_MAN5_BUILT = $(PRTE_MAN5:%.5=$(MAN_OUTDIR)/%.5) +#------------------- + EXTRA_DIST += \ $(PRTE_MAN1_BUILT) \ $(PRTE_MAN5_BUILT) diff --git a/docs/conf.py b/docs/conf.py index bbfe495220..1a76db467f 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -10,12 +10,13 @@ # add these directories to sys.path here. If the directory is relative to the # documentation root, use os.path.abspath to make it absolute, like shown here. # -# import os -# import sys -# sys.path.insert(0, os.path.abspath('.')) +import os +import sys # -- Project information ----------------------------------------------------- +needs_sphinx = '4.2' + import datetime year = datetime.datetime.now().year @@ -25,8 +26,15 @@ # The full version, including alpha/beta/rc tags # Read the PRRTE version from the VERSION file -with open("../VERSION") as fp: - prte_lines = fp.readlines() +# Note: this conf file lives in 2 different directories, so find the +# VERSION file relative to where we're running right now. +prte_lines = None +for dir in ['..', '../../..']: + file = f'{dir}/VERSION' + if os.path.exists(file): + with open(file) as fp: + prte_lines = fp.readlines() + break prte_data = dict() for prte_line in prte_lines: @@ -171,6 +179,10 @@ tuple = (full_filename_without_rst, base_name, '', '', section) man_pages.append(tuple) +# -- Options for TEXT output ------------------------------------------------ + +text_sectionchars = '=-$#@!`' + # -- PRRTE-specific options ----------------------------------------------- # This prolog is included in every file. Put common stuff here. diff --git a/docs/hosts/cli.rst b/docs/hosts/cli.rst new file mode 100644 index 0000000000..e7473440e8 --- /dev/null +++ b/docs/hosts/cli.rst @@ -0,0 +1,36 @@ +.. _hosts-cli-label: + +Listing Hosts on the Command Line +================================= + +Many PRRTE commands accept the ``--host`` CLI parameter. +``--host`` accepts a comma-delimited list of tokens of the form: + +.. code:: + + host[:slots] + +The ``host`` token can be either: + +* A name that resolves to an IP address, or +* An IP address + +.. note:: The names and/or IP addresses of hosts are *only* used for + identifying the target host on which to launch. They are + *not* used for determining which network interfaces are used + by applications (e.g., MPI or other network-based + applications). + + For network-based applications, consult their documentation + for how to specify which network interfaces are used. + +The optional integer ``:slots`` parameter tells PRRTE the maximum +number of slots to use on that host (:ref:`see this section +` for a description of what a +"slot" is). + +For example: + +.. code:: + + prterun --host node1:10,node2,node3:5 ... diff --git a/docs/hosts/hostfiles.rst b/docs/hosts/hostfiles.rst new file mode 100644 index 0000000000..309c73fc0d --- /dev/null +++ b/docs/hosts/hostfiles.rst @@ -0,0 +1,41 @@ +Hostfiles +========= + +Hostfiles (sometimes called "machine files") are a combination of two +things: + +#. A listing of hosts on which to launch processes. +#. Optionally, limit the number of processes which can be launched on + each host. + +Syntax +------ + +Hostfile syntax consists of one node name on each line, optionally +including a designated number of "slots": + +.. code:: sh + + # This is a comment line, and will be ignored + node01 slots=10 + node13 slots=5 + + node15 + node16 + node17 slots=3 + ... + +Blank lines and lines beginning with a ``#`` are ignored. + +A "slot" is the PRRTE term for an allocatable unit where we can launch +a process. :ref:`See this section +` for a longer description of +slots. + +In the absence of the ``slot`` parameter, PRRTE will assign either the +number of slots to be the number of CPUs detected on the node or the +resource manager-assigned value if operating in the presence of an +RM. + +.. important:: If using a resource manager, the user-specified number + of slots is capped by the RM-assigned value. diff --git a/docs/hosts/index.rst b/docs/hosts/index.rst new file mode 100644 index 0000000000..aec4543b8c --- /dev/null +++ b/docs/hosts/index.rst @@ -0,0 +1,19 @@ +Host specification +================== + +PRRTE identifies hosts on which to launch processes by: + +#. A Resource Manager (RM) telling PRRTE which hosts to use +#. The user providing a hostfile (also sometimes called a "machine + file") +#. The user providing a list of hosts on the command line + +The sections below describe each of these in more detail. + +.. toctree:: + :maxdepth: 1 + + rm + hostfiles + cli + relative-indexing diff --git a/docs/hosts/relative-indexing.rst b/docs/hosts/relative-indexing.rst new file mode 100644 index 0000000000..3be6ccf27c --- /dev/null +++ b/docs/hosts/relative-indexing.rst @@ -0,0 +1,113 @@ +Relative host indexing +====================== + +Hostfile and ``--host`` specifications can also be made using relative +indexing. This allows a user to stipulate which hosts are to be used +for a given app context without specifying the particular host name, +but rather its relative position in the allocation. + +This can probably best be understood through consideration of a few +examples. Consider the case where a DVM is comprised of a set of nodes +named ``foo1``, ``foo2``, ``foo3``, ``foo4``. The user wants the first +app context to have exclusive use of the first two nodes, and a second +app context to use the last two nodes. Of course, the user could +printout the allocation to find the names of the nodes allocated to +them and then use ``--host`` to specify this layout, but this is +cumbersome and would require hand-manipulation for every invocation. + +A simpler method is to utilize PRRTE's relative indexing capability to +specify the desired layout. In this case, a command line containing: + +.. code:: + + --host +n1,+n2 ./app1 : --host +n3,+n4 ./app2 + +would provide the desired pattern. The ``+`` syntax indicates that the +information is being provided as a relative index into the existing +allocation. Two methods of relative indexing are supported: + +* ``+n#``: A relative index into the allocation referencing the ``#`` + node. PRRTE will substitute the ``#`` node in the allocation + +* ``+e[:#]``: A request for ``#`` empty nodes |mdash| i.e., PRRTE is + to substitute this reference with nodes that have not yet been used + by any other app_context. If the ``:#`` is not provided, PRRTE will + substitute the reference with all empty nodes. Note that PRRTE does + track the empty nodes that have been assigned in this manner, so + multiple uses of this option will result in assignment of unique + nodes up to the limit of the available empty nodes. Requests for + more empty nodes than are available will generate an error. + +Relative indexing can be combined with absolute naming of hosts in any +arbitrary manner, and can be used in hostfiles as well as with the +``--host`` command line option. In addition, any slot specification +provided in hostfiles will be respected |mdash| thus, a user can +specify that only a certain number of slots from a relative indexed +host are to be used for a given app context. + +Another example may help illustrate this point. Consider the case +where the user has a hostfile containing: + +.. code:: + + dummy1 slots=4 + dummy2 slots=4 + dummy3 slots=4 + dummy4 slots=4 + dummy5 slots=4 + +This may, for example, be a hostfile that describes a set of +commonly-used resources that the user wishes to execute applications +against. For this particular application, the user plans to map +byslot, and wants the first two ranks to be on the second node of any +allocation, the next ranks to land on an empty node, have one rank +specifically on ``dummy4``, the next rank to be on the second node of the +allocation again, and finally any remaining ranks to be on whatever +empty nodes are left. To accomplish this, the user provides a hostfile +of: + +.. code:: + + +n2 slots=2 + +e:1 + dummy4 slots=1 + +n2 + +e + +The user can now use this information in combination with PRRTE's +sequential mapper to obtain their specific layout: + +.. code:: + + --hostfile dummyhosts --hostfile mylayout --prtemca rmaps seq ./my_app + +which will result in: + +.. code:: + + rank0 being mapped to dummy3 + rank1 to dummy1 as the first empty node + rank2 to dummy4 + rank3 to dummy3 + rank4 to dummy2 and rank5 to dummy5 as the last remaining unused nodes + +Note that the sequential mapper ignores the number of slots arguments +as it only maps one rank at a time to each node in the list. + +If the default round-robin mapper had been used, then the mapping +would have resulted in: + +* ranks 0 and 1 being mapped to dummy3 since two slots were specified +* ranks 2-5 on dummy1 as the first empty node, which has four slots +* rank6 on dummy4 since the hostfile specifies only a single slot from + that node is to be used +* ranks 7 and 8 on dummy3 since only two slots remain available +* ranks 9-12 on dummy2 since it is the next available empty node and + has four slots +* ranks 13-16 on dummy5 since it is the last remaining unused node and + has four slots + +Thus, the use of relative indexing can allow for complex mappings to +be ported across allocations, including those obtained from automated +resource managers, without the need for manual manipulation of scripts +and/or command lines. diff --git a/docs/hosts/rm.rst b/docs/hosts/rm.rst new file mode 100644 index 0000000000..28c97b0d77 --- /dev/null +++ b/docs/hosts/rm.rst @@ -0,0 +1,9 @@ +Resource Manager-Provided Hosts +=============================== + +When launching under a Resource Manager (RM), the RM usually +picks which hosts |mdash| and how many processes can be launched on +each host |mdash| on a per-job basis. + +The RM will communicate this information to PRRTE directly; users can +simply omit specifying hosts or numbers of processes. diff --git a/docs/index.rst b/docs/index.rst index 15046e11d2..7217a6f04a 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -4,10 +4,20 @@ PMIx Reference Runtime Environment (PRRTE) |prte_ver| The project is formally referred to in documentation by "PRRTE", and the GitHub repository is ``prrte``. -However, we have found that most users do not like typing the two -consecutive ``r`` letters in the name. Hence, all of the internal API symbols, -environment variables, MCA frameworks, and CLI executables all use the -abbreviated ``prte`` (one ``r``, not two) for convenience. +.. note:: We have found that most users do not like typing the two + consecutive ``r`` letters in the name. Hence, all of the + internal API symbols, environment variables, MCA frameworks, + and CLI executables all use the abbreviated ``prte`` (one + ``r``, not two) for convenience. + +Documentation locations +======================= + +This documentation can be found in the following locations: + +* On the web: https://docs.prrte.org/ +* In the tarball: ``docs/_build/html/index.html`` +* Installed: ``$prefix/share/doc/prrte/html/index.html`` Table of contents ================= @@ -21,6 +31,10 @@ Table of contents getting-help install configuration + hosts/index + placement/index + notifications + session-directory resilience developers/index contributing diff --git a/docs/man/man1/index.rst b/docs/man/man1/index.rst index fecc6807cc..da27174695 100644 --- a/docs/man/man1/index.rst +++ b/docs/man/man1/index.rst @@ -9,3 +9,5 @@ Commands (section 1) prted.1.rst prterun.1.rst prun.1.rst + psched.1.rst + pterm.1.rst diff --git a/docs/man/man1/prte.1.rst b/docs/man/man1/prte.1.rst index c1e7d92bd1..a00461c735 100644 --- a/docs/man/man1/prte.1.rst +++ b/docs/man/man1/prte.1.rst @@ -3,38 +3,27 @@ prte ==== -.. one line summary of this command - -prte |mdash| do great things +prte |mdash| instantiate an instance of PRRTE DVM SYNOPSIS -------- -.. brief listing of all the CLI options - .. code:: sh - prte ...options... + shell$ prte ...options... DESCRIPTION ----------- -Full description of this command. - -.. admonition:: PRRTE Docs TODO - :class: error - - Need to write this man page. - -EXIT STATUS ------------ - -Description of the various exit statuses of this command. +``prte`` instantiates an instance of the PMIx Reference Run Time +Environment (PRRTE) distributed virtual machine (DVM). -EXAMPLES --------- +Extensive help documentation for this command is provided through +``prte --help [topic]``. -Examples of using this command. +At least for now, that content is not available in man page form. +Pull requests to add all the content (via repeatable infrastructure) +would be greatly appreciated. .. seealso:: :ref:`prterun(1) ` diff --git a/docs/man/man1/prte_info.1.rst b/docs/man/man1/prte_info.1.rst index acce4938d2..9eb027723e 100644 --- a/docs/man/man1/prte_info.1.rst +++ b/docs/man/man1/prte_info.1.rst @@ -3,15 +3,11 @@ prte_info ========= -.. one line summary of this command - -prte_info |mdash| do great things +prte_info |mdash| Provide detailed information on your PRRTE installation SYNOPSIS -------- -.. brief listing of all the CLI options - .. code:: sh prte_info ...options... @@ -19,23 +15,15 @@ SYNOPSIS DESCRIPTION ----------- -Full description of this command. - -.. admonition:: PRRTE Docs TODO - :class: error - - Need to write this man page. - -EXIT STATUS ------------ - -Description of the various exit statuses of this command. +``prte_info`` provide detailed information on your PMIx Reference Run +Time (PRRTE) installation. -EXAMPLES --------- +Extensive help documentation for this command is provided through +``prte_info --help [topic]``. -Examples of using this command. +At least for now, that content is not available in man page form. +Pull requests to add all the content (via repeatable infrastructure) +would be greatly appreciated. .. seealso:: - :ref:`prte(1) `, - :ref:`prterun(1) ` + :ref:`prte(1) ` diff --git a/docs/man/man1/prted.1.rst b/docs/man/man1/prted.1.rst index 92715e66b2..1fe14f6116 100644 --- a/docs/man/man1/prted.1.rst +++ b/docs/man/man1/prted.1.rst @@ -3,39 +3,27 @@ prted ===== -.. one line summary of this command - -prted |mdash| do great things +prted |mdash| helper daemon for PRRTE SYNOPSIS -------- -.. brief listing of all the CLI options - .. code:: sh - prted ...options... + shell$ prted ...options... DESCRIPTION ----------- -Full description of this command. - -.. admonition:: PRRTE Docs TODO - :class: error - - Need to write this man page. - -EXIT STATUS ------------ - -Description of the various exit statuses of this command. +``prted`` is the back end helper daemon for PMIx Reference Runtime +Environment (PRRTE). -EXAMPLES --------- +Extensive help documentation for this command is provided through +``prted --help [topic]``. -Examples of using this command. +At least for now, that content is not available in man page form. +Pull requests to add all the content (via repeatable infrastructure) +would be greatly appreciated. .. seealso:: - :ref:`prte(1) `, - :ref:`prterun(1) ` + :ref:`prte(1) ` diff --git a/docs/man/man1/prterun.1.rst b/docs/man/man1/prterun.1.rst index 29da34ee9a..2b00769f00 100644 --- a/docs/man/man1/prterun.1.rst +++ b/docs/man/man1/prterun.1.rst @@ -3,38 +3,41 @@ prterun ======== -.. one line summary of this command - prterun |mdash| do great things SYNOPSIS -------- -.. brief listing of all the CLI options - .. code:: sh - prterun ...options... + shell$ prterun ...options... DESCRIPTION ----------- Full description of this command. +This is all the common stuff in ``prterun(1)``. + .. admonition:: PRRTE Docs TODO :class: error Need to write this man page. +COMMAND LINE OPTIONS +-------------------- + +.. include:: /prrte-rst-content/prterun-all-cli.rst + +DEPRECATED COMMAND LINE OPTIONS +------------------------------- + +.. include:: /prrte-rst-content/prterun-all-deprecated.rst + EXIT STATUS ----------- Description of the various exit statuses of this command. -EXAMPLES --------- - -Examples of using this command. - .. seealso:: :ref:`prte(1) ` diff --git a/docs/man/man1/prun.1.rst b/docs/man/man1/prun.1.rst index a29c24fa00..7b700b1500 100644 --- a/docs/man/man1/prun.1.rst +++ b/docs/man/man1/prun.1.rst @@ -1,40 +1,29 @@ .. _man1-prun: prun -==== +===== -.. one line summary of this command - -prun |mdash| do great things +prun |mdash| submit job to PRRTE SYNOPSIS -------- -.. brief listing of all the CLI options - .. code:: sh - prun ...options... + shell$ prun ...options... DESCRIPTION ----------- -Full description of this command. - -.. admonition:: PRRTE Docs TODO - :class: error - - Need to write this man page. - -EXIT STATUS ------------ - -Description of the various exit statuses of this command. +``prun`` submits a job to the PMIx Reference Runtime Environment +(PRRTE). -EXAMPLES --------- +Extensive help documentation for this command is provided through +``prun --help [topic]``. -Examples of using this command. +At least for now, that content is not available in man page form. +Pull requests to add all the content (via repeatable infrastructure) +would be greatly appreciated. .. seealso:: - :ref:`prterun(1) ` + :ref:`prte(1) ` diff --git a/docs/man/man1/psched.1.rst b/docs/man/man1/psched.1.rst new file mode 100644 index 0000000000..81e4ee9ed6 --- /dev/null +++ b/docs/man/man1/psched.1.rst @@ -0,0 +1,29 @@ +.. _man1-psched: + +psched +====== + +psched |mdash| submit job to PRRTE + +SYNOPSIS +-------- + +.. code:: sh + + shell$ psched ...options... + +DESCRIPTION +----------- + +``psched`` submits a job to the PMIx Reference Runtime Environment +(PRRTE). + +Extensive help documentation for this command is provided through +``psched --help [topic]``. + +At least for now, that content is not available in man page form. +Pull requests to add all the content (via repeatable infrastructure) +would be greatly appreciated. + +.. seealso:: + :ref:`prte(1) ` diff --git a/docs/man/man1/pterm.1.rst b/docs/man/man1/pterm.1.rst new file mode 100644 index 0000000000..0cb3560c4b --- /dev/null +++ b/docs/man/man1/pterm.1.rst @@ -0,0 +1,29 @@ +.. _man1-pterm: + +pterm +===== + +pterm |mdash| terminate an instance of the PRRTE DVM + +SYNOPSIS +-------- + +.. code:: sh + + shell$ pterm ...options... + +DESCRIPTION +----------- + +``pterm`` terminates an instance of the PMIx Reference Runtime +Environment (PRRTE) distributed virtual machine (DVM). + +Extensive help documentation for this command is provided through +``pterm --help [topic]``. + +At least for now, that content is not available in man page form. +Pull requests to add all the content (via repeatable infrastructure) +would be greatly appreciated. + +.. seealso:: + :ref:`prte(1) ` diff --git a/docs/notifications.rst b/docs/notifications.rst new file mode 100644 index 0000000000..7382b2d986 --- /dev/null +++ b/docs/notifications.rst @@ -0,0 +1,96 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +Notifications +============= + +PRRTE provides notifications on a variety of process and job +states. Each notification includes not only the PMIx event code that +generated it, but also information on the cause of the event to the +extent to which this is known. + +Supported job events include: + +* ``PMIX_READY_FOR_DEBUG``: indicates that all processes in the + reported nspace have reached the specified debug stopping point. + +* ``PMIX_LAUNCH_COMPLETE``: indicates that the reported nspace has + been launched |mdash| i.e., the involved PRRTE daemons all report + that their respective child processes have completed fork/exec. + +* ``PMIX_ERR_JOB_CANCELED``: indicates that the job was cancelled by + user command, usually issued via an appropriate PMIx-enabled tool. + +* ``PMIX_ERR_JOB_FAILED_TO_LAUNCH``: indicates that the specified job + failed to launch. This can be due to a variety of factors that + include inability to find the executable on at least one involved + node. + +Supported process events include: + +* ``PMIX_ERR_PROC_TERM_WO_SYNC``: indicates that at least one process + in the job called ``PMIx_Init``, thus indicating some notion of a + global existence, and at least one process in the job subsequently + exited without calling ``PMIx_Finalize``. This usually indicates a + failure somewhere in the application itself that precluded an + orderly shutdown of the process. Notification will include the + process ID that exited in this manner. + +* ``PMIX_EVENT_PROC_TERMINATED``: indicates that the reported process + terminated normally. Notification will include the process ID that + exited and its exit status. + +* ``PMIX_ERR_PROC_KILLED_BY_CMD``: indicates that the reported process + was killed by PRRTE command. This typically occurs in response to a + Ctrl-C (or equivalent) being applied to the PRRTE launcher, thereby + instructing PRRTE to forcibly terminate its processes. The event + currently will only be issued in the case where forcible termination + is commanded via a tool that can pass the process IDs that are + specifically to be terminated |mdash| otherwise, in the case of the + Ctrl-C event previously described, all processes in the job will be + terminated, leaving none to be notified. Notification will include + the process ID that was terminated. + +* ``PMIX_ERR_PROC_SENSOR_BOUND_EXCEEDED``: indicates that the + specified process exceeded a previously-set sensor boundary |mdash| + e.g., it may have grown beyond a defined memory limit. Such events + may or may not automatically trigger termination by command, + depending upon the behavior of the sensor. Notification will include + the process ID that exceeded the sensor boundary plus whatever + information the sensor provides regarding measurements and bounds. + +* ``PMIX_ERR_PROC_ABORTED_BY_SIG``: indicates that the specified + process was killed by a signal |mdash| e.g., a segmentation + fault/violation or an externally applied signal. Notifications will + include the process ID that was killed and the corresponding + reported signal. + +* ``PMIX_ERR_PROC_REQUESTED_ABORT:`` indicates that the specified + process has aborted by calling the ``PMIx_Abort`` + function. Notification will include the process ID that called abort + and its exit status. + +* ``PMIX_ERR_EXIT_NONZERO_TERM``: indicates that the specified process + terminated with a non-zero exit status. This notification is only + generated in the case where the runtime option + ``ERROR-NONZERO-STATUS`` is set to true, thereby indicating that a + process exiting with non-zero status is to be considered an + error. As PRRTE can be overwhelmed by a large job where every + process exits with a non-zero status, only the *first* process in a + given job that exits with a non-zero status will generate a + notification unless the ``RECOVERABLE`` runtime option is also + provided as otherwise the job will be immediately + terminated. Notifications will include the process ID that exited + and the status it returned. + +* ``PMIX_ERR_PROC_RESTART``: indicates that the specified process has + been restarted. Additional information may include the hostname + where the process is now executing. diff --git a/docs/placement/deprecated.rst b/docs/placement/deprecated.rst new file mode 100644 index 0000000000..f295641c6e --- /dev/null +++ b/docs/placement/deprecated.rst @@ -0,0 +1,141 @@ +Deprecated options +================== + +These deprecated options will be removed in a future release. + +.. list-table:: + :header-rows: 1 + :widths: 20 20 30 + + * - Deprecated Option + - Replacement + - Description + + * - ``--bind-to-core`` + - ``--bind-to core`` + - Bind processes to cores + + + * - ``--bind-to-socket`` + - ``--bind-to package`` + - Bind processes to processor sockets + + * - ``--bycore`` + - ``--map-by core`` + - Map processes by core + + * - ``--bynode`` + - ``--map-by node`` + - Launch processes one per node, cycling by node in a round-robin + fashion. This spreads processes evenly among nodes and assigns + ranks in a round-robin, "by node" manner. + + * - ``--byslot`` + - ``--map-by slot`` + - Map and rank processes round-robin by slot + + * - ``--cpus-per-proc <#perproc>`` + - `--map-by :PE=<#perproc>`` + - Bind each process to the specified number of CPUs + + * - ``--cpus-per-rank <#perrank>`` + - ``--map-by :PE=<#perrank>`` + - Alias for ``--cpus-per-proc`` + + * - ``--display-allocation`` + - ``--display ALLOC`` + - Display the detected resource allocation + + * - ``-display-devel-map`` + - ``--display MAP-DEVEL`` + - Display a detailed process map (mostly intended for developers) + just before launch. + + * - ``--display-map`` + - ``--display MAP`` + - Display a table showing the mapped location of each process + prior to launch. + + * - ``--display-topo`` + - ``--display TOPO`` + - Display the topology as part of the process map (mostly + intended for developers) just before launch. + + * - ``--do-not-launch`` + - ``--map-by :DONOTLAUNCH`` + - Perform all necessary operations to prepare to launch the + application, but do not actually launch it (usually used to + test mapping patterns). + + * - ``--do-not-resolve`` + - ``--map-by :DONOTRESOLVE`` + - Do not attempt to resolve interfaces |mdash| usually used to + determine proposed process placement/binding prior to obtaining + an allocation. + + * - ``-N `` + - ``--map-by prr::node`` + - Launch ``num`` processes per node on all allocated nodes + + * - ``--nolocal`` + - ``--map-by :NOLOCAL`` + - Do not run any copies of the launched application on the same + node as ``prun`` is running. This option will override listing + the ``localhost`` with ``--host`` or any other host-specifying + mechanism. + + * - ``--nooversubscribe`` + - ``--map-by :NOOVERSUBSCRIBE`` + - Do not oversubscribe any nodes; error (without starting any + processes) if the requested number of processes would cause + oversubscription. This option implicitly sets "max_slots" equal + to the "slots" value for each node. (Enabled by default). + + * - ``--npernode <#pernode>`` + - ``--map-by ppr:<#pernode>:node`` + - On each node, launch this many processes + + * - ``--npersocket <#persocket>`` + - ``--map-by ppr:<#perpackage>:package`` + - On each node, launch this many processes times the number of + processor sockets on the node. The ``--npersocket`` option also + turns on the ``--bind-to socket`` option. The term ``socket`` + has been globally replaced with ``package``. + + * - ``--oversubscribe`` + - ``--map-by :OVERSUBSCRIBE`` + - Nodes are allowed to be oversubscribed, even on a managed + system, and overloading of processing elements. + + * - ``--pernode`` + - ``--map-by ppr:1:node`` + - On each node, launch one process + + * - ``--ppr`` + - `--map-by ppr:`` + - Comma-separated list of number of processes on a given resource type + [default: ``none``]. + + * - ``--rankfile `` + - ``--map-by rankfile:FILE=`` + - Use a rankfile for mapping/ranking/binding + + * - ``--report-bindings`` + - ``--display BINDINGS`` + - Report any bindings for launched processes + + * - ``--tag-output`` + - ``--output TAG`` + - Tag all output with ``[job,rank]`` + + * - ``--timestamp-output`` + - ``--output TIMESTAMP`` + - Timestamp all application process output + + * - ``--use-hwthread-cpus`` + - ``--map-by :HWTCPUS`` + - Use hardware threads as independent CPUs + + * - ``--xml`` + - ``--output XML`` + - Provide all output in XML format diff --git a/docs/placement/diagnostics.rst b/docs/placement/diagnostics.rst new file mode 100644 index 0000000000..215fd85cc8 --- /dev/null +++ b/docs/placement/diagnostics.rst @@ -0,0 +1,77 @@ +Diagnostics +=========== + +PRRTE provides various diagnostic reports that aid the user in +verifying and tuning the mapping/ranking/binding for a specific job. + +The ``:REPORT`` qualifier to the ``--bind-to`` command line option can +be used to report process bindings. + +As an example, consider a node with: + +* 2 processor packages, +* 4 cores per package, and +* 8 hardware threads per core. + +In each of the examples below the binding is reported in a human readable +format. + +.. code:: + + $ prun --np 4 --map-by core --bind-to core:REPORT ./a.out + [node01:103137] MCW rank 0 bound to package[0][core:0] + [node01:103137] MCW rank 1 bound to package[0][core:1] + [node01:103137] MCW rank 2 bound to package[0][core:2] + [node01:103137] MCW rank 3 bound to package[0][core:3] + +In the example above, processes are bound to successive cores on the +first package. + +.. code:: + + $ prun --np 4 --map-by package --bind-to package:REPORT ./a.out + [node01:103115] MCW rank 0 bound to package[0][core:0-9] + [node01:103115] MCW rank 1 bound to package[1][core:10-19] + [node01:103115] MCW rank 2 bound to package[0][core:0-9] + [node01:103115] MCW rank 3 bound to package[1][core:10-19] + +In the example above, processes are bound to all cores on successive +packages in a round-robin fashion. + +.. code:: + + $ prun --np 4 --map-by package:PE=2 --bind-to core:REPORT ./a.out + [node01:103328] MCW rank 0 bound to package[0][core:0-1] + [node01:103328] MCW rank 1 bound to package[1][core:10-11] + [node01:103328] MCW rank 2 bound to package[0][core:2-3] + [node01:103328] MCW rank 3 bound to package[1][core:12-13] + +The example above shows us that 2 cores have been bound per process. +The ``:PE=2`` qualifier states that 2 CPUs underneath the package +(which would be cores in this case) are mapped to each process. + +.. code:: + + $ prun --np 4 --map-by core:PE=2:HWTCPUS --bind-to :REPORT hostname + [node01:103506] MCW rank 0 bound to package[0][hwt:0-1] + [node01:103506] MCW rank 1 bound to package[0][hwt:8-9] + [node01:103506] MCW rank 2 bound to package[0][hwt:16-17] + [node01:103506] MCW rank 3 bound to package[0][hwt:24-25] + +The example above shows us that 2 hardware threads have been bound per +process. In this case ``prun`` is directing the DVM to map by +hardware threads since we used the ``:HWTCPUS`` qualifier. Without +that qualifier this command would return an error since by default the +DVM will not map to resources smaller than a core. The ``:PE=2`` +qualifier states that 2 processing elements underneath the core (which +would be hardware threads in this case) are mapped to each process. + +.. code:: + + $ prun --np 4 --bind-to none:REPORT hostname + [node01:107126] MCW rank 0 is not bound (or bound to all available processors) + [node01:107126] MCW rank 1 is not bound (or bound to all available processors) + [node01:107126] MCW rank 2 is not bound (or bound to all available processors) + [node01:107126] MCW rank 3 is not bound (or bound to all available processors) + +Binding is turned off in the above example, as reported. diff --git a/docs/placement/examples.rst b/docs/placement/examples.rst new file mode 100644 index 0000000000..4a3a293f43 --- /dev/null +++ b/docs/placement/examples.rst @@ -0,0 +1,420 @@ +Examples +======== + +Listed here are the subset of command line options that will be used +in the process mapping/ranking/binding examples below. + +Specifying Host Nodes +--------------------- + +Use one of the following options to specify which hosts (nodes) within +the PRRTE DVM environment to run on. + +.. code:: + + --host + + # or + + --host + +* List of hosts on which to invoke processes. After each hostname a + colon (``:``) followed by a positive integer can be used to specify + the number of slots on that host (``:X``, ``:Y``, and ``:Z``). The + default is ``1``. + +.. code:: + + --hostfile + +* Provide a hostfile to use. + +Process Mapping / Ranking / Binding Options +------------------------------------------- + +* ``-c #``, ``-n #``, ``--n #``, ``--np <#>``: Run this many copies of + the program on the given nodes. This option indicates that the + specified file is an executable program and not an application + context. If no value is provided for the number of copies to execute + (i.e., neither the ``-np`` nor its synonyms are provided on the + command line), ``prun`` will automatically execute a copy of the + program on each process slot (see below for description of a + "process slot"). This feature, however, can only be used in the SPMD + model and will return an error (without beginning execution of the + application) otherwise. + + .. note:: These options specify the number of processes to launch. + None of the options imply a particular binding policy + |mdash| e.g., requesting ``N`` processes for each package + does not imply that the processes will be bound to the + package. + +* ``--map-by ``: Map to the specified object. Supported + objects include: + + * ``slot`` + * ``hwthread`` + * ``core`` (default) + * ``l1cache`` + * ``l2cache`` + * ``l3cache`` + * ``numa`` + * ``package`` + * ``node`` + * ``seq`` + * ``ppr`` + * ``rankfile`` + * ``pe-list`` + + Any object can include qualifiers by adding a colon (``:``) and any + colon-delimited combination of one or more of the following to the + ``--map-by`` options: + + * ``PE=n`` bind ``n`` processing elements to each process (can not + be used in combination with rankfile or pe-list directives) + + .. error:: JMS Several of the options below refer to ``pe-list``. + Is this option supposed to be ``PE-LIST=n``, not + ``PE=n``? + + * ``SPAN`` load balance the processes across the allocation (cannot + be used in combination with ``slot``, ``node``, ``seq``, ``ppr``, + ``rankfile``, or ``pe-list`` directives) + + * ``OVERSUBSCRIBE`` allow more processes on a node than processing + elements + + * ``NOOVERSUBSCRIBE`` means ``!OVERSUBSCRIBE`` + + * ``NOLOCAL`` do not launch processes on the same node as ``prun`` + + * ``HWTCPUS`` use hardware threads as CPU slots + + * ``CORECPUS`` use cores as CPU slots (default) + + * ``INHERIT`` indicates that a child job (i.e., one spawned from + within an application) shall inherit the placement policies of the + parent job that spawned it. + + * ``NOINHERIT`` means ``!INHERIT`` + + * ``FILE=`` (path to file containing sequential or rankfile + entries). + + * ``ORDERED`` only applies to the PE-LIST option to indicate that + procs are to be bound to each of the specified CPUs in the order + in which they are assigned (i.e., the first proc on a node shall + be bound to the first CPU in the list, the second proc shall be + bound to the second CPU, etc.) + + ``ppr`` policy example: ``--map-by ppr:N:`` will launch + ``N`` times the number of objects of the specified type on each + node. + + .. note:: Directives and qualifiers are case-insensitive and can be + shortened to the minimum number of characters to uniquely + identify them. Thus, ``L1CACHE`` can be given as + ``l1cache`` or simply as ``L1``. + +* ``--rank-by ``: This assigns ranks in round-robin fashion + according to the specified object. The default follows the mapping + pattern. Supported rank-by objects include: + + * ``slot`` + * ``node`` + * ``fill`` + * ``span`` + + There are no qualifiers for the ``--rank-by`` directive. + +* ``--bind-to ``: This binds processes to the specified + object. See defaults in Quick Summary. Supported bind-to objects + include: + + * ``none`` + * ``hwthread`` + * ``core`` + * ``l1cache`` + * ``l2cache`` + * ``l3cache`` + * ``numa`` + * ``package`` + + Any object can include qualifiers by adding a colon (``:``) and any + colon-delimited combination of one or more of the following to the + ``--bind-to`` options: + + * ``overload-allowed`` allows for binding more than one process in + relation to a CPU + + * ``if-supported`` if binding to that object is supported on this + system. + +Specifying Host Nodes +--------------------- + +Host nodes can be identified on the command line with the ``--host`` +option or in a hostfile. + +For example, assuming no other resource manager or scheduler is +involved: + +.. code:: + + prun --host aa,aa,bb ./a.out + +This launches two processes on node ``aa`` and one on ``bb``. + +.. code:: + + prun --host aa ./a.out + +This launches one process on node ``aa``. + +.. code:: + + prun --host aa:5 ./a.out + +This launches five processes on node ``aa``. + +Or, consider the hostfile: + +.. code:: + + $ cat myhostfile + aa slots=2 + bb slots=2 + cc slots=2 + +Here, we list both the host names (``aa``, ``bb``, and ``cc``) but +also how many "slots" there are for each. Slots indicate how many +processes can potentially execute on a node. For best performance, the +number of slots may be chosen to be the number of cores on the node or +the number of processor sockets. + +If the hostfile does not provide slots information, the PRRTE DVM will +attempt to discover the number of cores (or hwthreads, if the +``:HWTCPUS`` qualifier to the ``--map-by`` option is set) and set the +number of slots to that value. + +Examples using the hostfile above with and without the ``--host`` +option: + +.. code:: + + prun --hostfile myhostfile ./a.out + +This will launch two processes on each of the three nodes. + +.. code:: + + prun --hostfile myhostfile --host aa ./a.out + +This will launch two processes, both on node ``aa``. + +.. code:: + + prun --hostfile myhostfile --host dd ./a.out + +This will find no hosts to run on and abort with an error. That is, the +specified host ``dd`` is not in the specified hostfile. + +When running under resource managers (e.g., SLURM, Torque, etc.), PRTE +will obtain both the hostnames and the number of slots directly from +the resource manger. The behavior of ``--host`` in that environment +will behave the same as if a hostfile was provided (since it is +provided by the resource manager). + + +Specifying Number of Processes +------------------------------ + +As we have just seen, the number of processes to run can be set using +the hostfile. Other mechanisms exist. + +The number of processes launched can be specified as a multiple of the +number of nodes or processor sockets available. Consider the hostfile +below for the examples that follow. + +.. code:: + + $ cat myhostfile + aa + bb + +For example: + +.. code:: + + prun --hostfile myhostfile --map-by ppr:2:package ./a.out + +This launches processes 0-3 on node ``aa`` and process 4-7 on node +``bb``, where ``aa`` and ``bb`` are both dual-package nodes. The +``--map-by ppr:2:package`` option also turns on the ``--bind-to +package`` option, which is discussed in a later section. + +.. code:: + + prun --hostfile myhostfile --map-by ppr:2:node ./a.out + +This launches processes 0-1 on node ``aa`` and processes 2-3 on node +``bb``. + +.. code:: + + prun --hostfile myhostfile --map-by ppr:1:node ./a.out + +This launches one process per host node. + +Another alternative is to specify the number of processes with the +``--np`` option. Consider now the hostfile: + +.. code:: + + $ cat myhostfile + aa slots=4 + bb slots=4 + cc slots=4 + +With this hostfile: + +.. code:: + + prun --hostfile myhostfile --np 6 ./a.out + +This will launch processes 0-3 on node ``aa`` and processes 4-5 on +node ``bb``. The remaining slots in the hostfile will not be used +since the ``-np`` option indicated that only 6 processes should be +launched. + + +Mapping Processes to Nodes Using Policies +----------------------------------------- + +The examples above illustrate the default mapping of process processes +to nodes. This mapping can also be controlled with various +``prun`` / ``prterun`` options that describe mapping policies. + +.. code:: + + $ cat myhostfile + aa slots=4 + bb slots=4 + cc slots=4 + +Consider the hostfile above, with ``--np 6``: + +.. list-table:: + :header-rows: 1 + + * - Command + - Ranks on ``aa`` + - Ranks on ``bb`` + - Ranks on ``cc`` + + * - ``prun`` + - 0 1 2 3 + - 4 5 + - + + * - ``prun --map-by node`` + - 0 3 + - 1 4 + - 2 5 + + * - ``prun --map-by node:NOLOCAL`` + - + - 0 2 4 + - 1 3 5 + +The ``--map-by node`` option will load balance the processes across +the available nodes, numbering each process by node in a round-robin +fashion. + +The ``:NOLOCAL`` qualifier to ``--map-by`` prevents any processes from +being mapped onto the local host (in this case node ``aa``). While +``prun`` typically consumes few system resources, the ``:NOLOCAL`` +qualifier can be helpful for launching very large jobs where ``prun`` +may actually need to use noticeable amounts of memory and/or +processing time. + +Just as ``--np`` can specify fewer processes than there are slots, it +can also oversubscribe the slots. For example, with the same hostfile: + +.. code:: + + prun --hostfile myhostfile --np 14 ./a.out + +This will produce an error since the default ``:NOOVERSUBSCRIBE`` +qualifier to ``--map-by`` prevents oversubscription. + +To oversubscribe the nodes you can use the ``:OVERSUBSCRIBE`` +qualifier to ``--map-by``: + +.. code:: + + prun --hostfile myhostfile --np 14 --map-by :OVERSUBSCRIBE ./a.out + +This will launch processes 0-5 on node ``aa``, 6-9 on ``bb``, and +10-13 on ``cc``. + +Limits to oversubscription can also be specified in the hostfile +itself with the ``max_slots`` field: + +.. code:: + + $ cat myhostfile + aa slots=4 max_slots=4 + bb max_slots=8 + cc slots=4 + +The ``max_slots`` field specifies such a limit. When it does, the +``slots`` value defaults to the limit. Now: + +.. code:: + + prun --hostfile myhostfile --np 14 --map-by :OVERSUBSCRIBE ./a.out + +This causes the first 12 processes to be launched as before, but the +remaining two processes will be forced onto node cc. The other two +nodes are protected by the hostfile against oversubscription by this +job. + +Using the ``:NOOVERSUBSCRIBE`` qualifier to ``--map-by`` option can be +helpful since the PRTE DVM currently does not get ``max_slots`` values +from the resource manager. + +Of course, ``--np`` can also be used with the ``--host`` option. For +example, + +.. code:: + + prun --host aa,bb --np 8 ./a.out + +This will produce an error since the default ``:NOOVERSUBSCRIBE`` +qualifier to ``--map-by`` prevents oversubscription. + +.. code:: + + prun --host aa,bb --np 8 --map-by :OVERSUBSCRIBE ./a.out + +This launches 8 processes. Since only two hosts are specified, after +the first two processes are mapped, one to ``aa`` and one to ``bb``, +the remaining processes oversubscribe the specified hosts evenly. + +.. code:: + + prun --host aa:2,bb:6 --np 8 ./a.out + +This launches 8 processes. Processes 0-1 on node ``aa`` since it has 2 +slots and processes 2-7 on node ``bb`` since it has 6 slots. + +And here is a MIMD example: + +.. code:: + + prun --host aa --np 1 hostname : --host bb,cc --np 2 uptime + +This will launch process 0 running ``hostname`` on node ``aa`` and +processes 1 and 2 each running ``uptime`` on nodes ``bb`` and ``cc``, +respectively. diff --git a/docs/placement/fundamentals.rst b/docs/placement/fundamentals.rst new file mode 100644 index 0000000000..b49aaf3bc8 --- /dev/null +++ b/docs/placement/fundamentals.rst @@ -0,0 +1,142 @@ +Fundamentals +============ + +The mapping of processes to nodes can be defined not just with general +policies but also, if necessary, using arbitrary mappings that cannot +be described by a simple policy. Supported directives, given on the +command line via the ``--map-by`` option, include: + +* ``SEQ``: (often accompanied by the ``file=`` qualifier) + assigns one process to each node specified in the file. The + sequential file is to contain an entry for each desired process, one + per line of the file. + +* ``RANKFILE``: (often accompanied by the ``file=`` qualifier) + assigns one process to the node/resource specified in each entry of + the file, one per line of the file. + +For example, using the hostfile below: + +.. code:: + + $ cat myhostfile + aa slots=4 + bb slots=4 + cc slots=4 + +The command below will launch three processes, one on each of nodes +``aa``, ``bb``, and ``cc``, respectively. The slot counts don't +matter; one process is launched per line on whatever node is listed on +the line. + +.. code:: + + $ prun --hostfile myhostfile --map-by seq ./a.out + +Impact of the ranking option is best illustrated by considering the +following hostfile and test cases where each node contains two +packages (each package with two cores). Using the ``--map-by +ppr:2:package`` option, we map two processes onto each package and +utilize the ``--rank-by`` option as show below: + +.. code:: + + $ cat myhostfile + aa + bb + +.. list-table:: + :header-rows: 1 + + * - Command + - Ranks on ``aa`` + - Ranks on ``bb`` + + * - ``--rank-by core`` + - 0 1 ! 2 3 + - 4 5 ! 6 7 + + * - ``--rank-by package`` + - 0 2 ! 1 3 + - 4 6 ! 5 7 + + * - ``--rank-by package:SPAN`` + - 0 4 ! 1 5 + - 2 6 ! 3 7 + +Ranking by slot provides the identical result as ranking by core in +this case |mdash| a simple progression of ranks across each +node. Ranking by package does a round-robin ranking across packages +within each node until all processes have been assigned a rank, and +then progresses to the next node. Adding the ``:SPAN`` qualifier to +the ranking directive causes the ranking algorithm to treat the entire +allocation as a single entity |mdash| thus, the process ranks are +assigned across all packages before circling back around to the +beginning. + +The binding operation restricts the process to a subset of the CPU +resources on the node. + +The processors to be used for binding can be identified in terms of +topological groupings |mdash| e.g., binding to an l3cache will bind +each process to all processors within the scope of a single L3 cache +within their assigned location. Thus, if a process is assigned by the +mapper to a certain package, then a ``--bind-to l3cache`` directive +will cause the process to be bound to the processors that share a +single L3 cache within that package. + +To help balance loads, the binding directive uses a round-robin method, +binding a process to the first available specified object type within +the object where the process was mapped. For example, consider the case +where a job is mapped to the package level, and then bound to core. Each +package will have multiple cores, so if multiple processes are mapped to +a given package, the binding algorithm will assign each process located +to a package to a unique core in a round-robin manner. + +Binding can only be done to the mapped object or to a resource located +within that object. + +An object is considered completely consumed when the number of +processes bound to it equals the number of CPUs within it. Unbound +processes are not considered in this computation. Additional +processes cannot be mapped to consumed objects unless the +OVERLOAD qualifier is provided via the "--bind-to" command +line option. + +Default process mapping/ranking/binding policies can also be set with MCA +parameters, overridden by the command line options when provided. MCA +parameters can be set on the ``prte`` command line when starting the +DVM (or in the ``prterun`` command line for a single-execution job), but +also in a system or user ``mca-params.conf`` file or as environment +variables, as described in the MCA section below. Some examples include: + +.. list-table:: + :header-rows: 1 + + * - ``prun`` option + - MCA parameter key + - Value + + * - ``--map-by core`` + - ``rmaps_default_mapping_policy`` + - ``core`` + + * - ``--map-by package`` + - ``rmaps_default_mapping_policy`` + - ``package`` + + * - ``--rank-by core`` + - ``rmaps_default_ranking_policy`` + - ``core`` + + * - ``--bind-to core`` + - ``hwloc_default_binding_policy`` + - ``core``` + + * - ``--bind-to package`` + - ``hwloc_default_binding_policy`` + - ``package`` + + * - ``--bind-to none`` + - ``hwloc_default_binding_policy`` + - ``none`` diff --git a/docs/placement/index.rst b/docs/placement/index.rst new file mode 100644 index 0000000000..babe9042de --- /dev/null +++ b/docs/placement/index.rst @@ -0,0 +1,22 @@ +Process placement +================= + +PRRTE provides a set of three controls for assigning process +locations and ranks: + +#. Mapping: Assigns a default location to each process +#. Ranking: Assigns a unique integer rank value to each process +#. Binding: Constrains each process to run on specific processors + +The sections below describe these controls in more detail. + +.. toctree:: + :maxdepth: 1 + + overview + examples + fundamentals + limits + diagnostics + rankfiles + deprecated diff --git a/docs/placement/limits.rst b/docs/placement/limits.rst new file mode 100644 index 0000000000..d56d979aa9 --- /dev/null +++ b/docs/placement/limits.rst @@ -0,0 +1,198 @@ +Overloading and Oversubscribing +=============================== + +This section explores the difference between the terms "overloading" +and "oversubscribing". Users are often confused by the difference +between these two scenarios. As such, this section provides a number +of scenarios to help illustrate the differences. + +* ``--map-by :OVERSUBSCRIBE`` allow more processes on a node than + allocated :ref:`slots ` + +* ``--bind-to :overload-allowed`` allows for binding more than + one process in relation to a CPU + +The important thing to remember with *oversubscribing* is that it can +be defined separately from the actual number of CPUs on a node. This +allows the mapper to place more or fewer processes per node than +CPUs. By default, PRRTE uses cores to determine slots in the absence +of such information provided in the hostfile or by the resource +manager (except in the case of the ``--host`` as described :ref:`in +this section `). + +The important thing to remember with *overloading* is that it is +defined as binding more processes than CPUs. By default, PRRTE uses +cores as a means of counting the number of CPUs. However, the user can +adjust this. For example when using the ``:HWTCPUS`` qualifier to the +``--map-by`` option PRRTE will use hardware threads as a means of +counting the number of CPUs. + +For the following examples consider a node with: + +* 2 processor packages, +* 10 cores per package, and +* 8 hardware threads per core. + +Consider the node from above with the hostfile below: + +.. code:: + + $ cat myhostfile + node01 slots=32 + node02 slots=32 + +The ``slots`` token tells PRRTE that it can place up to 32 processes +before *oversubscribing* the node. + +If we run the following: + +.. code:: + + prun --np 34 --hostfile myhostfile --map-by core --bind-to core hostname + +It will return an error at the binding time indicating an +*overloading* scenario. + +The mapping mechanism assigns 32 processes to ``node01`` matching the +``slots`` specification in the hostfile. The binding mechanism will bind +the first 20 processes to unique cores leaving it with 12 processes +that it cannot bind without overloading one of the cores (putting more +than one process on the core). + +Using the ``overload-allowed`` qualifier to the ``--bind-to core`` +option tells PRRTE that it may assign more than one process to a core. + +If we run the following: + +.. code:: + + prun --np 34 --hostfile myhostfile --map-by core --bind-to core:overload-allowed hostname + +This will run correctly placing 32 processes on ``node01``, and 2 +processes on ``node02``. On ``node01`` two processes are bound to +cores 0-11 accounting for the overloading of those cores. + +Alternatively, we could use hardware threads to give binding a lower +level CPU to bind to without overloading. + +If we run the following: + +.. code:: + + prun --np 34 --hostfile myhostfile --map-by core:HWTCPUS --bind-to hwthread hostname + +This will run correctly placing 32 processes on ``node01``, and 2 +processes on ``node02``. On ``node01`` two processes are mapped to +cores 0-11 but bound to different hardware threads on those cores (the +logical first and second hardware thread). Thus no hardware threads +are overloaded at binding time. + +In both of the examples above the node is not oversubscribed at +mapping time because the hostfile set the oversubscription limit to +``slots=32`` for each node. It is only after we exceed that limit that +PRRTE will throw an oversubscription error. + +Consider next if we ran the following: + +.. code:: + + prun --np 66 --hostfile myhostfile --map-by core:HWTCPUS --bind-to hwthread hostname + +This will return an error at mapping time indicating an +oversubscription scenario. The mapping mechanism will assign all of +the available slots (64 across 2 nodes) and be left two processes to +map. The only way to map those processes is to exceed the number of +available slots putting the job into an oversubscription scenario. + +You can force PRRTE to oversubscribe the nodes by using the +``:OVERSUBSCRIBE`` qualifier to the ``--map-by`` option as seen in the +example below: + +.. code:: + + prun --np 66 --hostfile myhostfile \ + --map-by core:HWTCPUS:OVERSUBSCRIBE --bind-to hwthread hostname + +This will run correctly placing 34 processes on ``node01`` and 32 on +``node02``. Each process is bound to a unique hardware thread. + +Overloading vs. Oversubscription: Package Example +------------------------------------------------- + +Let's extend these examples by considering the package level. +Consider the same node as before, but with the hostfile below: + +.. code:: + + $ cat myhostfile + node01 slots=22 + node02 slots=22 + +The lowest level CPUs are "cores" and we have 20 total (10 per +package). + +If we run: + +.. code:: + + prun --np 20 --hostfile myhostfile --map-by package \ + --bind-to package:REPORT hostname + +Then 10 processes are mapped to each package, and bound at the package +level. This is not overloading since we have 10 CPUs (cores) +available in the package at the hardware level. + +However, if we run: + +.. code:: + + prun --np 21 --hostfile myhostfile --map-by package \ + --bind-to package:REPORT hostname + +Then 11 processes are mapped to the first package and 10 to the second +package. At binding time we have an overloading scenario because +there are only 10 CPUs (cores) available in the package at the +hardware level. So the first package is overloaded. + +Overloading vs. Oversubscription: Hardware Threads Example +---------------------------------------------------------- + +Similarly, if we consider hardware threads. + +Consider the same node as before, but with the hostfile below: + +.. code:: + + $ cat myhostfile + node01 slots=165 + node02 slots=165 + +The lowest level CPUs are "hwthreads" (because we are going to use the +``:HWTCPUS`` qualifier) and we have 160 total (80 per package). + +If we re-run (from the package example) and add the ``:HWTCPUS`` +qualifier: + +.. code:: + + prun --np 21 --hostfile myhostfile --map-by package:HWTCPUS \ + --bind-to package:REPORT hostname + +Without the ``:HWTCPUS`` qualifier this would be overloading (as we +saw previously). The mapper places 11 processes on the first package +and 10 to the second package. The processes are still bound to the +package level. However, with the ``:HWTCPUS`` qualifier, it is not +overloading since we have 80 CPUs (hwthreads) available in the package +at the hardware level. + +Alternatively, if we run: + +.. code:: + + prun --np 161 --hostfile myhostfile --map-by package:HWTCPUS \ + --bind-to package:REPORT hostname + +Then 81 processes are mapped to the first package and 80 to the second +package. At binding time we have an overloading scenario because +there are only 80 CPUs (hwthreads) available in the package at the +hardware level. So the first package is overloaded. diff --git a/docs/placement/overview.rst b/docs/placement/overview.rst new file mode 100644 index 0000000000..ebc1521131 --- /dev/null +++ b/docs/placement/overview.rst @@ -0,0 +1,190 @@ +Overview +======== + +PRRTE provides a set of three controls for assigning process +locations and ranks: + +#. Mapping: Assigns a default location to each process +#. Ranking: Assigns a unique integer rank value to each process +#. Binding: Constrains each process to run on specific processors + +This section provides an overview of these three controls. Unless +otherwise this behavior is shared by ``prun(1)`` (working with a PRRTE +DVM), and ``prterun(1)``. More detail about PRRTE process placement is +available in the following sections (using ``--help +placement-
``): + +* ``examples``: some examples of the interactions between mapping, + ranking, and binding options. + +* ``fundamentals``: provides deeper insight into PRRTE's mapping, + ranking, and binding options. + +* ``limits``: explains the difference between *overloading* and + *oversubscribing* resources. + +* ``diagnostics``: describes options for obtaining various diagnostic + reports that aid the user in verifying and tuning the placement for + a specific job. + +* ``rankfiles``: explains the format and use of the rankfile mapper + for specifying arbitrary process placements. + +* ``deprecated``: a list of deprecated options and their new + equivalents. + +* ``all``: outputs all the placement help except for the + ``deprecated`` section. + + +Quick Summary +------------- + +The two binaries that most influence process layout are ``prte(1)`` +and ``prun(1)``. The ``prte(1)`` process discovers the allocation, +establishes a Distributed Virtual Machine by starting a ``prted(1)`` +daemon on each node of the allocation, and defines the efault +mapping/ranking/binding policies for all jobs. The ``prun(1)`` process +defines the specific mapping/ranking/binding for a specific job. Most +of the command line controls are targeted to ``prun(1)`` since each job +has its own unique requirements. + +``prterun(1)`` is just a wrapper around ``prte(1)`` for a single job +PRRTE DVM. It is doing the job of both ``prte(1)`` and ``prun(1)``, +and, as such, accepts the sum all of their command line arguments. Any +example that uses ``prun(1)`` can substitute the use of ``prterun(1)`` +except where otherwise noted. + +The ``prte(1)`` process attempts to automatically discover the nodes +in the allocation by querying supported resource managers. If a +supported resource manager is not present then ``prte(1)`` relies on a +hostfile provided by the user. In the absence of such a hostfile it +will run all processes on the localhost. + +If running under a supported resource manager, the ``prte(1)`` process +will start the daemon processes (``prted(1)``) on the remote nodes +using the corresponding resource manager process starter. If no such +starter is available then ``ssh`` (or ``rsh``) is used. + +Minus user direction, PRRTE will automatically map processes in a +round-robin fashion by CPU, binding each process to its own CPU. The +type of CPU used (core vs hwthread) is determined by (in priority +order): + +* user directive on the command line via the HWTCPUS qualifier to + the ``--map-by`` directive + +* setting the ``rmaps_default_mapping_policy`` MCA parameter to + include the ``HWTCPUS`` qualifier. This parameter sets the default + value for a PRRTE DVM |mdash| qualifiers are carried across to DVM + jobs started via ``prun`` unless overridden by the user's command + line + +* defaulting to ``CORE`` in topologies where core CPUs are defined, + and to ``hwthreads`` otherwise. + +By default, the ranks are assigned in accordance with the mapping +directive |mdash| e.g., jobs that are mapped by-node will have the +process ranks assigned round-robin on a per-node basis. + +PRRTE automatically binds processes unless directed not to do so by +the user. Minus direction, PRRTE will bind individual processes to +their own CPU within the object to which they were mapped. Should a +node become oversubscribed during the mapping process, and if +oversubscription is allowed, all subsequent processes assigned to that +node will *not* be bound. + +.. _placement-definition-of-slot-label: + +Definition of 'slot' +-------------------- + +The term "slot" is used extensively in the rest of this documentation. +A slot is an allocation unit for a process. The number of slots on a +node indicate how many processes can potentially execute on that node. +By default, PRRTE will allow one process per slot. + +If PRRTE is not explicitly told how many slots are available on a node +(e.g., if a hostfile is used and the number of slots is not specified +for a given node), it will determine a maximum number of slots for +that node in one of two ways: + +#. Default behavior: By default, PRRTE will attempt to discover the + number of processor cores on the node, and use that as the number + of slots available. + +#. When ``--use-hwthread-cpus`` is used: If ``--use-hwthread-cpus`` is + specified on the command line, then PRRTE will attempt to discover + the number of hardware threads on the node, and use that as the + number of slots available. + +This default behavior also occurs when specifying the ``--host`` +option with a single host. Thus, the command: + +.. code:: sh + + shell$ prun --host node1 ./a.out + +launches a number of processes equal to the number of cores on node +``node1``, whereas: + +.. code:: sh + + shell$ prun --host node1 --use-hwthread-cpus ./a.out + +launches a number of processes equal to the number of hardware +threads on ``node1``. + +When PRRTE applications are invoked in an environment managed by a +resource manager (e.g., inside of a Slurm job), and PRRTE was built +with appropriate support for that resource manager, then PRRTE will +be informed of the number of slots for each node by the resource +manager. For example: + +.. code:: sh + + shell$ prun ./a.out + +launches one process for every slot (on every node) as dictated by +the resource manager job specification. + +Also note that the one-process-per-slot restriction can be overridden +in unmanaged environments (e.g., when using hostfiles without a +resource manager) if oversubscription is enabled (by default, it is +disabled). Most parallel applications and HPC environments do not +oversubscribe; for simplicity, the majority of this documentation +assumes that oversubscription is not enabled. + +Slots are not hardware resources +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Slots are frequently incorrectly conflated with hardware resources. +It is important to realize that slots are an entirely different metric +than the number (and type) of hardware resources available. + +Here are some examples that may help illustrate the difference: + +#. More processor cores than slots: Consider a resource manager job + environment that tells PRRTE that there is a single node with 20 + processor cores and 2 slots available. By default, PRRTE will + only let you run up to 2 processes. + + Meaning: you run out of slots long before you run out of processor + cores. + +#. More slots than processor cores: Consider a hostfile with a single + node listed with a ``slots=50`` qualification. The node has 20 + processor cores. By default, PRRTE will let you run up to 50 + processes. + + Meaning: you can run many more processes than you have processor + cores. + +.. _placement-definition-of-processor-element-label: + +Definition of "processor element" +--------------------------------- + +By default, PRRTE defines that a "processing element" is a processor +core. However, if ``--use-hwthread-cpus`` is specified on the command +line, then a "processing element" is a hardware thread. diff --git a/docs/placement/rankfiles.rst b/docs/placement/rankfiles.rst new file mode 100644 index 0000000000..16f2eb772d --- /dev/null +++ b/docs/placement/rankfiles.rst @@ -0,0 +1,71 @@ +Rankfiles +========= + +Another way to specify arbitrary mappings is with a rankfile, which +gives you detailed control over process binding as well. + +Rankfiles are text files that specify detailed information about how +individual processes should be mapped to nodes, and to which +processor(s) they should be bound. Each line of a rankfile specifies +the location of one process. The general form of each line in the +rankfile is: + +.. code:: + + rank = slot= + +For example: + +.. code:: + + $ cat myrankfile + rank 0=aa slot=10-12 + rank 1=bb slot=0,1,4 + rank 2=cc slot=1-2 + $ prun --host aa,bb,cc,dd --map-by rankfile:FILE=myrankfile ./a.out + +Means that: + +* Rank 0 runs on node aa, bound to logical cores 10-12. +* Rank 1 runs on node bb, bound to logical cores 0, 1, and 4. +* Rank 2 runs on node cc, bound to logical cores 1 and 2. + +Similarly: + +.. code:: + + $ cat myrankfile + rank 0=aa slot=1:0-2 + rank 1=bb slot=0:0,1,4 + rank 2=cc slot=1-2 + $ prun --host aa,bb,cc,dd --map-by rankfile:FILE=myrankfile ./a.out + +Means that: + +* Rank 0 runs on node aa, bound to logical package 1, cores 10-12 (the + 0th through 2nd cores on that package). +* Rank 1 runs on node bb, bound to logical package 0, cores 0, 1, + and 4. +* Rank 2 runs on node cc, bound to logical cores 1 and 2. + +The hostnames listed above are "absolute," meaning that actual +resolvable hostnames are specified. However, hostnames can also be +specified as "relative," meaning that they are specified in relation +to an externally-specified list of hostnames (e.g., by ``prun``'s +``--host`` argument, a hostfile, or a job scheduler). + +The "relative" specification is of the form "``+n``", where ``X`` +is an integer specifying the Xth hostname in the set of all available +hostnames, indexed from 0. For example: + +.. code:: + + $ cat myrankfile + rank 0=+n0 slot=10-12 + rank 1=+n1 slot=0,1,4 + rank 2=+n2 slot=1-2 + $ prun --host aa,bb,cc,dd --map-by rankfile:FILE=myrankfile ./a.out + +All package/core slot locations are be specified as *logical* +indexes. You can use tools such as HWLOC's ``lstopo`` to find the +logical indexes of packages and cores. diff --git a/docs/prrte-rst-content b/docs/prrte-rst-content new file mode 120000 index 0000000000..e2e90e10c8 --- /dev/null +++ b/docs/prrte-rst-content @@ -0,0 +1 @@ +../src/docs/prrte-rst-content \ No newline at end of file diff --git a/docs/session-directory.rst b/docs/session-directory.rst new file mode 100644 index 0000000000..34dbbb5c7d --- /dev/null +++ b/docs/session-directory.rst @@ -0,0 +1,54 @@ +Session directory +================= + +PRRTE establishes a "session directory" on the filesystem to serve as +a top-level location for temporary files used by both the local PRRTE +daemon and its child processes. + +This is done to enable quick and easy cleanup in the event that PRRTE +is unable to fully cleanup after itself. + +Directory location +------------------ + +PRRTE decides where to located the root of the session directory by +examining the following (in precedence order): + +#. If the value of the ``prte_top_session_dir`` MCA parameter is not + empty, use that (it defaults to empty). + + .. note:: MCA parameters can be set via environment variables, on + the command line, or in a parameter file. + +#. If the environment variable ``TMPDIR`` is not empty, use that. +#. If the environment variable ``TEMP`` is not empty, use that. +#. If the environment variable ``TMP`` is not empty, use that. +#. Use ``/tmp`` + +Directory name +-------------- + +By default, the session directory name is set to + +.. code:: + + prte.. + +The session directory name can further be altered to include the PID +of the daemon process, if desired: + +.. code:: + + prte... + +by setting the ``prte_add_pid_to_session_dirname`` MCA parameter to a +"true" value (e.g., 1). + +Tools +----- + +In the case of tools, the rendezvous files containing connection +information for a target server are located in the session directory +tree. Thus, it may be necessary to point the tool at the location +where those files can be found if that location is other than the +expected default. diff --git a/src/Makefile.am b/src/Makefile.am index f98c89e204..a6231009bc 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -27,6 +27,7 @@ # SUBDIRS = \ + docs \ etc \ util \ $(MCA_prte_FRAMEWORKS_SUBDIRS) \ @@ -35,6 +36,7 @@ SUBDIRS = \ $(MCA_prte_FRAMEWORK_COMPONENT_DSO_SUBDIRS) DIST_SUBDIRS = \ + docs \ etc \ util \ $(MCA_prte_FRAMEWORKS_SUBDIRS) \ diff --git a/src/docs/Makefile.am b/src/docs/Makefile.am new file mode 100644 index 0000000000..1826b6e290 --- /dev/null +++ b/src/docs/Makefile.am @@ -0,0 +1,15 @@ +# +# Copyright (c) 2022-2023 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. +# Copyright (c) 2023 Nanook Consulting. All rights reserved. +# +# $COPYRIGHT$ +# +# Additional copyrights may follow +# +# $HEADER$ +# + +SUBDIRS = \ + prrte-rst-content \ + show-help-files diff --git a/src/docs/prrte-rst-content/Makefile.am b/src/docs/prrte-rst-content/Makefile.am new file mode 100644 index 0000000000..5bd4304001 --- /dev/null +++ b/src/docs/prrte-rst-content/Makefile.am @@ -0,0 +1,63 @@ +# +# Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. +# Copyright (c) 2023 Nanook Consulting. All rights reserved. +# +# $COPYRIGHT$ +# +# Additional copyrights may follow +# +# $HEADER$ +# + +# Install the prterun RST "source" files so that other projects can +# include them in their documentation, even if they do not have access +# to the PRTE source tree. +# +# Note that we install these files into the "data" directory (vs. the +# "doc" directory) so that packagers won't split these files into a +# "docs" sub-package that may not be installed when we build other +# packages that want to use these RST files. +rstdir = $(prtedatadir)/rst/prrte-rst-content +dist_rst_DATA = \ + cli-add-host.rst \ + cli-add-hostfile.rst \ + cli-allow-run-as-root.rst \ + cli-bind-to.rst \ + cli-dash-host.rst \ + cli-debug-daemons-file.rst \ + cli-debug-daemons.rst \ + cli-display.rst \ + cli-dvm-hostfile.rst \ + cli-dvm.rst \ + cli-forward-signals.rst \ + cli-launcher-hostfile.rst \ + cli-leave-session-attached.rst \ + cli-map-by.rst \ + cli-noprefix.rst \ + cli-output.rst \ + cli-personality.rst \ + cli-pmixmca.rst \ + cli-prefix.rst \ + cli-prtemca.rst \ + cli-rank-by.rst \ + cli-runtime-options.rst \ + cli-stream-buffering.rst \ + cli-tune.rst \ + cli-x.rst \ + deprecated-bind-to-core.rst \ + deprecated-display-allocation.rst \ + deprecated-display-devel-allocation.rst \ + deprecated-display-devel-map.rst \ + deprecated-display-map.rst \ + deprecated-display-topo.rst \ + deprecated-gmca.rst \ + deprecated-mca.rst \ + deprecated-merge-stderr-to-stdout.rst \ + deprecated-output-directory.rst \ + deprecated-output-filename.rst \ + deprecated-report-bindings.rst \ + deprecated-tag-output.rst \ + deprecated-timestamp-output.rst \ + deprecated-xml.rst \ + prterun-all-cli.rst \ + prterun-all-deprecated.rst diff --git a/src/docs/prrte-rst-content/README.txt b/src/docs/prrte-rst-content/README.txt new file mode 100644 index 0000000000..e4ab269d01 --- /dev/null +++ b/src/docs/prrte-rst-content/README.txt @@ -0,0 +1,132 @@ +This file covers two main topics: + +1. How downstream projects should use PRRTE's RST files +2. How PRRTE developers should maintain these RST files + + +How downstream projects should use PRRTE's RST files +==================================================== + +The intent is that PRRTE will install some of its RST files in the +install tree so that downstream projects can use the RST +".. include::" directive to include them directly in their own RST +documentation. + +Overall scheme +-------------- + +The overall scheme is relatively straightfoward: + +1. Downstream projects should sym link or copy the PRRTE install + directory $datadir/rst/prrte-rst-content (typically + $prefix/share/prte/rst/prrte-rst-content) to the top directory of + their RST doc tree. + +2. Where relevant, use the RST include directive using the absolute + path form: + + .. include:: /prrte-rst-content/prterun-all-cli.rst + .. include:: /prrte-rst-content/prterun-all-deprecated.rst + + The absolute path represents the root of the RST tree (not the root + of the overall filesystem). + + The RST files shown above are the public interface for the RST + docs: they will include multiple other RST files. The specific + list of files that are included by the above files may (will) + change over time; downstream projects are encouraged to limit + themselves to including only the above-listed files to protect + themselves from changes like this. + + The RST files use the following RST block indicators in this order: + + Chapter: === + Section: --- + Subsection: ^^^ + Subsubsection: +++ + + It is advisable to start a new RST block immediately after + including the public PRRTE RST files in order to precisely control + the document section level after the PRRTE content. + +3. Specific components may also install RST files in the same + $datadir/rst tree (e.g., schizo components). These RST files are + suitable for inclusion in downstream project documentation; consult + their guidance for usage. + + Note that PRRTE component RST filenames are not standardized by + PRRTE, but they follow the same "use the absolute path form" + convention as the public PRRTE RST files. + + +Sphinx pickyness +---------------- + +Sphinx is notably picky about two things: + +1. How include paths propagate through files (and included files, and + files that are included from included files, ... etc.). + + This is why it is recommended that downstream projects have + "prrte-rst-content" at the top of their doc tree and then include + with the absolute path form. Attempting to use relative paths + *might* be able to work, but is not advisable. + +2. That every *.rst file in the doc tree is used. + + The "prrte-rst-content" directory contains a bunch of individual + *.rst files. If downstream project documentation ends up only + using *some* of those files, Sphinx will complain about the *.rst + files that are not used. + + In such cases, downstream projects can either remove unneeded files + (which may not be entirely safe across multiple different versions + of PRRTE), or use the RST ".. toctree::" directive with the + ":hidden:" parameter to include those RST files so that Sphinx will + not complain, but not have them linked anywhere in the rendered + output tree. + + +How PRRTE developers should maintain these RST files +==================================================== + +There are two types of files in this directory: + +1. Files that are intended to be directly included by downstream + project documentation. + + As described in the "How downstream projects should use PRRTE's RST + files" section in this file, there are a small number of "public" + RST files that are intended to be included by downstream projects. + + a. PRRTE developers should attempt to a) make this list as short as + possible, and b) keep the list the same between PRRTE versions + as much as possible. This helps downstream projects include the + PRRTE docs in their own docs over time. + + b. These files should include other PRRTE RST files using the + absolute path form, and assume that /prrte-rst-content is a + top-level directory in the RST doc tree. + + The absolute path represents the root of the RST tree (not the + root of the overall filesystem). + +2. Files that are included (directly or indirectly) by the files from + #1. + + They are meant to be self-contained chunks of information that can + be included like building blocks in higher-level RST documentation + files. + + a. Be wary of defining sections / subsections, especially for the + "lowest" level files (that may be included multiple times in + multiple places). When creating new RST blocks, follow the RST + block indicator sequence listed earlier in this file. + + b. Be ware of creating RST labels, especially for the "lowest" + level files (that may be included multiple times in multiple + places). + + c. When using the ".. include::" RST directive, use the absolute + path form, and assume that /prrte-rst-content is a top-level + directory in the RST doc tree. diff --git a/src/docs/prrte-rst-content/cli-add-host.rst b/src/docs/prrte-rst-content/cli-add-host.rst new file mode 100644 index 0000000000..404318fb08 --- /dev/null +++ b/src/docs/prrte-rst-content/cli-add-host.rst @@ -0,0 +1,68 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +PRRTE allows a user to expand an existing DVM prior to launching an +application. Users can specify a a comma-delimited list of node +names, each entry optionally containing a ``:N`` extension indicating +the number of slots to assign to that entry: + +.. code:: + + --host node01:5,node02 + +In the absence of the slot extension, one slot will be assigned to the +node. Duplicate entries are aggregated and the number of slots +assigned to that node are summed together. + +.. note:: A "slot" is the PRRTE term for an allocatable unit where we + can launch a process. Thus, the number of slots equates to + the maximum number of processes PRRTE may start on that node + without oversubscribing it. + +The list can include nodes that are already part of the DVM |mdash| in +this case, the number of slots available on those nodes will be set to +the new specification, or adjusted as directed: + +.. code:: + + --host node01:5,node02 + +would direct that node01 be set to 5 slots and node02 will have 1 +slot, while + +.. code:: + + --host node01:+5,node02 + +would add 5 slots to the current value for node01, and + +.. code:: + + --host node01:-5,node02 + +would subtract 5 slots from the current value. + +Slot adjustments for existing nodes will have no impact on currently executing +jobs, but will be applied to any new spawn requests. Nodes contained in the +add-host specification are available for immediate use by the accompanying +application. + +Users desiring to constrain the accompanying application to the newly added +nodes should also include the ``--host`` command line directive, giving +the same hosts in its argument: + +.. code:: + + --add-host node01:+5,node02 --host node01:5,node02 + +Note that the ``--host`` argument indicates the number of slots to assign +node01 for this spawn request, and not the number of slots being added to +the node01 allocation. diff --git a/src/docs/prrte-rst-content/cli-add-hostfile.rst b/src/docs/prrte-rst-content/cli-add-hostfile.rst new file mode 100644 index 0000000000..f93ad777dd --- /dev/null +++ b/src/docs/prrte-rst-content/cli-add-hostfile.rst @@ -0,0 +1,49 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +PRRTE allows a user to expand an existing DVM prior to launching an +application. Users can specify a hostfile that contains a list of +nodes to be added to the DVM using normal hostfile syntax. + +The list can include nodes that are already part of the DVM |mdash| in +this case, the number of slots available on those nodes will be set to +the new specification, or adjusted as directed: + +.. code:: + + node01 slots=5 + +would direct that node01 be set to 5 slots, while + +.. code:: + + node01 slots+=5 + +would add 5 slots to the current value for node01, and + +.. code:: + + node01 slots-=5 + +would subtract 5 slots from the current value. + +Slot adjustments for existing nodes will have no impact on currently executing +jobs, but will be applied to any new spawn requests. Nodes contained in the +add-hostfile specification are available for immediate use by the accompanying +application. + +Users desiring to constrain the accompanying application to the newly added +nodes should also include the ``--hostfile`` command line directive, giving +the same hostfile as its argument: + +.. code:: + + --add-hostfile --hostfile diff --git a/src/docs/prrte-rst-content/cli-allow-run-as-root.rst b/src/docs/prrte-rst-content/cli-allow-run-as-root.rst new file mode 100644 index 0000000000..6669174e57 --- /dev/null +++ b/src/docs/prrte-rst-content/cli-allow-run-as-root.rst @@ -0,0 +1,31 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +Allow execution as root **(STRONGLY DISCOURAGED)**. + +Running as root exposes the user to potentially catastrophic file +system corruption and damage |mdash| e.g., if the user accidentally +points the root of the session directory to a system required point, +this directory and all underlying elements will be deleted upon job +completion, thereby rendering the system inoperable. + +It is recognized that some environments (e.g., containers) may require +operation as root, and that the user accepts the risks in those +scenarios. Accordingly, one can override PRRTE's run-as-root +protection by providing one of the following: + +* The ``--allow-run-as-root`` command line directive +* Adding **BOTH** of the following environmental parameters: + + * ``PRTE_ALLOW_RUN_AS_ROOT=1`` + * ``PRTE_ALLOW_RUN_AS_ROOT_CONFIRM=1`` + +Again, we recommend this only be done if absolutely necessary. diff --git a/src/docs/prrte-rst-content/cli-bind-to.rst b/src/docs/prrte-rst-content/cli-bind-to.rst new file mode 100644 index 0000000000..3707f1a212 --- /dev/null +++ b/src/docs/prrte-rst-content/cli-bind-to.rst @@ -0,0 +1,80 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +By default, processes are bound to individual CPUs (either COREs or +HWTHREADs, as defined by default or by user specification for the +job). On nodes that are OVERSUBSCRIBEd (i.e., where the number of +procs exceeds the number of assigned slots), the default is to not +bind the processes. + +.. note:: Processes from prior jobs that are already executing on a + node are not "unbound" when a new job mapping results in the + node becoming oversubscribed. + +Binding is performed to the first available specified object type +within the object where the process was mapped. In other words, +binding can only be done to the mapped object or to a resource +located beneath that object. + +An object is considered completely consumed when the number of +processes bound to it equals the number of CPUs within it. Unbound +processes are not considered in this computation. Additional +processes cannot be mapped to consumed objects unless the +``OVERLOAD`` qualifier is provided via the ``--bind-to`` command +line option. + +Note that directives and qualifiers are case-insensitive +and can be shortened to the minimum number of characters +to uniquely identify them. Thus, ``L1CACHE`` can be given +as ``l1cache`` or simply as ``L1``. + +Supported binding directives include: + +* ``NONE`` does not bind the processes + +* ``HWTHREAD`` binds each process to a single hardware + thread/ This requires that hwthreads be treated + as independent CPUs (i.e., that either the ``HWTCPUS`` + qualifier be provided to the ``map-by`` option or + that ``hwthreads`` be designated as CPUs by default). + +* ``CORE`` binds each process to a single core. This + can be done whether ``hwthreads`` or ``cores`` are being + treated as independent CPUs provided that mapping + is performed at the core or higher level. + +* ``L1CACHE`` binds each process to all the CPUs in + an ``L1`` cache. + +* ``L2CACHE`` binds each process to all the CPUs in + an ``L2`` cache + +* ``L3CACHE`` binds each process to all the CPUs in + an ``L3`` cache + +* ``NUMA`` binds each process to all the CPUs in a ``NUMA`` + region + +* ``PACKAGE`` binds each process to all the CPUs in a ``PACKAGE`` + +Any directive can include qualifiers by adding a colon (:) and any +combination of one or more of the following to the ``--bind-to`` +option: + +* ``OVERLOAD`` indicates that objects can have more + processes bound to them than CPUs within them + +* ``IF-SUPPORTED`` indicates that the job should continue to + be launched and executed even if binding cannot be + performed as requested. + +.. note:: Directives and qualifiers are case-insensitive. + ``OVERLOAD`` is the same as ``overload``. diff --git a/src/docs/prrte-rst-content/cli-dash-host.rst b/src/docs/prrte-rst-content/cli-dash-host.rst new file mode 100644 index 0000000000..25e8372830 --- /dev/null +++ b/src/docs/prrte-rst-content/cli-dash-host.rst @@ -0,0 +1,27 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +Host syntax consists of a comma-delimited list of node names, each +entry optionally containing a ``:N`` extension indicating the number +of slots to assign to that entry: + +.. code:: + + --host node01:5,node02 + +In the absence of the slot extension, one slot will be assigned to the +node. Duplicate entries are aggregated and the number of slots +assigned to that node are summed together. + +.. note:: A "slot" is the PRRTE term for an allocatable unit where we + can launch a process. Thus, the number of slots equates to the + maximum number of processes PRRTE may start on that node without + oversubscribing it. diff --git a/src/docs/prrte-rst-content/cli-debug-daemons-file.rst b/src/docs/prrte-rst-content/cli-debug-daemons-file.rst new file mode 100644 index 0000000000..8588ef8af4 --- /dev/null +++ b/src/docs/prrte-rst-content/cli-debug-daemons-file.rst @@ -0,0 +1,20 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +Debug daemon output is enabled and all output from the daemons is +redirected into files with names of the form: + +.. code:: + + output-prted--.log + +These names avoid conflict on shared file systems. The files are +located in the top-level session directory assigned to the DVM. diff --git a/src/docs/prrte-rst-content/cli-debug-daemons.rst b/src/docs/prrte-rst-content/cli-debug-daemons.rst new file mode 100644 index 0000000000..cf574e8bcd --- /dev/null +++ b/src/docs/prrte-rst-content/cli-debug-daemons.rst @@ -0,0 +1,14 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +Debug daemon output enabled. This is a somewhat limited stream of +information normally used to simply confirm that the daemons +started. Includes leaving the output streams open. diff --git a/src/docs/prrte-rst-content/cli-display.rst b/src/docs/prrte-rst-content/cli-display.rst new file mode 100644 index 0000000000..c07ee0a000 --- /dev/null +++ b/src/docs/prrte-rst-content/cli-display.rst @@ -0,0 +1,50 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +The ``display`` command line directive must be accompanied by a +comma-delimited list of case-insensitive options indicating what +information about the job and/or allocation is to be displayed. The +full directive need not be provided |mdash| only enough characters are +required to uniquely identify the directive. For example, ``ALL`` is +sufficient to represent the ``ALLOCATION`` directive |mdash| while ``MAP`` +can not be used to represent ``MAP-DEVEL`` (though ``MAP-D`` would +suffice). + +Supported values include: + +* ``ALLOCATION`` displays the detected hosts and slot assignments for + this job + +* ``BINDINGS`` displays the resulting bindings applied to processes in + this job + +* ``MAP`` displays the resulting locations assigned to processes in + this job + +* ``MAP-DEVEL`` displays a more detailed report on the locations + assigned to processes in this job that includes local and node + ranks, assigned bindings, and other data + +* ``TOPO=LIST`` displays the topology of each node in the + semicolon-delimited list that is allocated to the job + +* ``CPUS[=LIST]`` displays the available CPUs on the provided + semicolon-delimited list of nodes (defaults to all nodes) + +The display command line directive can include qualifiers by adding a +colon (``:``) and any combination of one or more of the following +(delimited by colons): + +* ``PARSEABLE`` directs that the output be provided in a format that + is easily parsed by machines. Note that ``PARSABLE`` is also accepted as + a typical spelling for the qualifier. + +Provided qualifiers will apply to *all* of the display directives. diff --git a/src/docs/prrte-rst-content/cli-dvm-hostfile.rst b/src/docs/prrte-rst-content/cli-dvm-hostfile.rst new file mode 100644 index 0000000000..a4ee088196 --- /dev/null +++ b/src/docs/prrte-rst-content/cli-dvm-hostfile.rst @@ -0,0 +1,85 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +PRRTE supports several levels of user-specified host lists based on an +established precedence order. Users can specify a default hostfile +that contains a list of nodes to be used by the DVM. Only one default +hostfile can be provided for a given DVM. In addition, users can +specify a hostfile that contains a list of nodes to be used for a DVM, +or can provide a comma-delimited list of nodes to be used for that DVM +via the ``--host`` command line option. + +The precedence order applied to these various options depends to some +extent on the local environment. The following table illustrates how +host and hostfile directives work together to define the set of hosts +upon which a DVM will execute in the absence of a resource manager +(RM): + +.. list-table:: + :header-rows: 1 + :widths: 10 7 10 32 + + * - Default hostfile + - host + - hostfile + - Result + + * - unset + - unset + - unset + - | The DVN will consist solely of the + | local host where the DVM + | was started. + + * - unset + - set + - unset + - | Host option defines resource list for the DVM. + + * - unset + - unset + - set + - | Hostfile option defines resource list for the DVM. + + * - unset + - set + - set + - | Hostfile option defines resource list for the DVM, + | then host filters the list to define the final + | set of nodes to be used by the DVM + + * - set + - unset + - unset + - | Default hostfile defines resource list for the DVM + + * - set + - set + - unset + - | Default hostfile defines resource list for the DVM, + | then host filters the list to define the final + | set of nodes to be used by the DVM + + * - set + - set + - set + - | Default hostfile defines resource list for the DVM, + | then hostfile filters the list, and then host filters + | the list to define the final set of nodes to be + | used by the DVM + +This changes somewhat in the presence of an RM as that entity +specifies the initial allocation of nodes. In this case, the default +hostfile, hostfile and host directives are all used to filter the RM's +specification so that a user can utilize different portions of the +allocation for different DVMs. This is done according to the same +precedence order as in the prior table, with the RM providing the +initial pool of nodes. diff --git a/src/docs/prrte-rst-content/cli-dvm.rst b/src/docs/prrte-rst-content/cli-dvm.rst new file mode 100644 index 0000000000..394706890d --- /dev/null +++ b/src/docs/prrte-rst-content/cli-dvm.rst @@ -0,0 +1,56 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +A required argument is passed to the ``--dvm`` directive to specify +the location of the DVM controller (e.g., ``--dvm pid:12345``) or by +passing the string ``search`` to instead search for an existing +controller. + +Supported options include: + +* ``search``: directs the tool to search for available DVM controllers + it is authorized to use, connecting to the first such candidate it + finds. + +* ``pid:``: provides the PID of the target DVM controller. This + can be given as either the PID itself (arg = int) or the path to a + file that contains the PID (arg = ``file:``) + +* ``file:``: provides the path to a PMIx rendezvous file that is + output by PMIx servers |mdash| the file contains all the required + information for completing the connection + +* ``uri:``: specifies the URI of the DVM controller, or the name of + the file (specified as ``file:filename``) that contains that info + +* ``ns:``: specifies the namespace of the DVM controller + +* ``system``: exclusively find and use the system-level DVM controller + +* ``system-first``: look for a system-level DVM controller, fall back + to searching for an available DVM controller the command is + authorized to use if a system-level controller is not found + +Examples: + +.. code:: + + prterun --dvm file:dvm_uri.txt --np 4 ./a.out + + prterun --dvm pid:12345 --np 4 ./a.out + + prterun --dvm uri:file:dvm_uri.txt --np 4 ./a.out + + prterun --dvm ns:prte-node1-2095 --np 4 ./a.out + + prterun --dvm pid:file:prte_pid.txt --np 4 ./a.out + + prterun --dvm search --np 4 ./a.out diff --git a/src/docs/prrte-rst-content/cli-forward-signals.rst b/src/docs/prrte-rst-content/cli-forward-signals.rst new file mode 100644 index 0000000000..e7dc3a7ea3 --- /dev/null +++ b/src/docs/prrte-rst-content/cli-forward-signals.rst @@ -0,0 +1,15 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +Comma-delimited list of additional signals (names or integers) to +forward to application processes (``none`` = forward +nothing). Signals provided by default include SIGTSTP, SIGUSR1, +SIGUSR2, SIGABRT, SIGALRM, and SIGCONT. diff --git a/src/docs/prrte-rst-content/cli-launcher-hostfile.rst b/src/docs/prrte-rst-content/cli-launcher-hostfile.rst new file mode 100644 index 0000000000..421e37d524 --- /dev/null +++ b/src/docs/prrte-rst-content/cli-launcher-hostfile.rst @@ -0,0 +1,49 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +PRRTE supports several levels of user-specified hostfiles based on an +established precedence order. Users can specify a hostfile that +contains a list of nodes to be used for the job, or can provide a +comma-delimited list of nodes to be used for that job via the +``--host`` command line option. + +The precedence order applied to these various options depends to some +extent on the local environment. The following table illustrates how +host and hostfile directives work together to define the set of hosts +upon which a DVM will execute the job in the absence of a resource +manager (RM): + +.. list-table:: + :header-rows: 1 + :widths: 7 10 45 + + * - host + - hostfile + - Result + + * - unset + - unset + - | The DVM will utilize all its available resources + | when mapping the job. + + * - set + - unset + - | Host option defines resource list for the job + + * - unset + - set + - | Hostfile defines resource list for the job + + * - set + - set + - | Hostfile defines resource list for the job, + | then host filters the list to define the final + | set of nodes to be used for the job diff --git a/src/docs/prrte-rst-content/cli-leave-session-attached.rst b/src/docs/prrte-rst-content/cli-leave-session-attached.rst new file mode 100644 index 0000000000..1e6e984d58 --- /dev/null +++ b/src/docs/prrte-rst-content/cli-leave-session-attached.rst @@ -0,0 +1,16 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +Do not discard stdout/stderr of remote PRRTE daemons. The primary use +for this option is to ensure that the daemon output streams (i.e., +stdout and stderr) remain open after launch, thus allowing the user to +see any daemon-generated error messages. Otherwise, the daemon will +"daemonize" itself upon launch, thereby closing its output streams. diff --git a/src/docs/prrte-rst-content/cli-map-by.rst b/src/docs/prrte-rst-content/cli-map-by.rst new file mode 100644 index 0000000000..eef475dd66 --- /dev/null +++ b/src/docs/prrte-rst-content/cli-map-by.rst @@ -0,0 +1,136 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +Processes are mapped based on one of the following directives as +applied at the job level: + +* ``SLOT`` assigns procs to each node up to the number of available + slots on that node before moving to the next node in the + allocation + +* ``HWTHREAD`` assigns a proc to each hardware thread on a node in a + round-robin manner up to the number of available slots on that + node before moving to the next node in the allocation + +* ``CORE`` (default) assigns a proc to each core on a node in a + round-robin manner up to the number of available slots on that + node before moving to the next node in the allocation + +* ``L1CACHE`` assigns a proc to each L1 cache on a node in a + round-robin manner up to the number of available slots on that + node before moving to the next node in the allocation + +* ``L2CACHE`` assigns a proc to each L2 cache on a node in a + round-robin manner up to the number of available slots on that + node before moving to the next node in the allocation + +* ``L3CACHE`` assigns a proc to each L3 cache on a node in a + round-robin manner up to the number of available slots on that + node before moving to the next node in the allocation + +* ``NUMA`` assigns a proc to each NUMA region on a node in a + round-robin manner up to the number of available slots on that + node before moving to the next node in the allocation + +* ``PACKAGE`` assigns a proc to each package on a node in a + round-robin manner up to the number of available slots on that + node before moving to the next node in the allocation + +* ``NODE`` assigns processes in a round-robin fashion to all nodes + in the allocation, with the number assigned to each node capped + by the number of available slots on that node + +* ``SEQ`` (often accompanied by the file= qualifier) assigns + one process to each node specified in the file. The sequential + file is to contain an entry for each desired process, one per + line of the file. + +* ``PPR:N``:resource maps N procs to each instance of the specified + resource type in the allocation + +* ``RANKFILE`` (often accompanied by the file= qualifier) assigns + one process to the node/resource specified in each entry of the + file, one per line of the file. + +* ``PE-LIST=a,b`` assigns procs to each node in the allocation based on + the ORDERED qualifier. The list is comprised of comma-delimited + ranges of CPUs to use for this job. If the ORDERED qualifier is not + provided, then each node will be assigned procs up to the number of + available slots, capped by the availability of the specified CPUs. + If ORDERED is given, then one proc will be assigned to each of the + specified CPUs, if available, capped by the number of slots on each + node and the total number of specified processes. Providing the + OVERLOAD qualifier to the "bind-to" option removes the check on + availability of the CPU in both cases. + +Any directive can include qualifiers by adding a colon (``:``) and any +combination of one or more of the following (delimited by colons) to +the ``--map-by`` option (except where noted): + +* ``PE=n`` bind n CPUs to each process (can not be used in combination + with rankfile or pe-list directives) + +* ``SPAN`` load balance the processes across the allocation by treating + the allocation as a single "super-node" (can not be used in + combination with ``slot``, ``node``, ``seq``, ``ppr``, ``rankfile``, or + ``pe-list`` directives) + +* ``OVERSUBSCRIBE`` allow more processes on a node than processing elements + +* ``NOOVERSUBSCRIBE`` means ``!OVERSUBSCRIBE`` + +* ``NOLOCAL`` do not launch processes on the same node as ``prun`` + +* ``HWTCPUS`` use hardware threads as CPU slots + +* ``CORECPUS`` use cores as CPU slots (default) + +* ``INHERIT`` indicates that a child job (i.e., one spawned from within + an application) shall inherit the placement policies of the parent job + that spawned it. + +* ``NOINHERIT`` means ```!INHERIT`` + +* ``FILE=`` (path to file containing sequential or rankfile entries). + +* ``ORDERED`` only applies to the ``PE-LIST`` option to indicate that + procs are to be bound to each of the specified CPUs in the order in + which they are assigned (i.e., the first proc on a node shall be + bound to the first CPU in the list, the second proc shall be bound + to the second CPU, etc.) + +.. note:: Directives and qualifiers are case-insensitive and can be + shortened to the minimum number of characters to uniquely + identify them. Thus, ``L1CACHE`` can be given as ``l1cache`` or + simply as ``L1``. + +The type of CPU (core vs hwthread) used in the mapping algorithm +is determined as follows: + +* by user directive on the command line via the HWTCPUS qualifier to + the ``--map-by`` directive + +* by setting the ``rmaps_default_mapping_policy`` MCA parameter to + include the ``HWTCPUS`` qualifier. This parameter sets the default + value for a PRRTE DVM |mdash| qualifiers are carried across to DVM jobs + started via ``prun`` unless overridden by the user's command line + +* defaults to CORE in topologies where core CPUs are defined, and to + hwthreads otherwise. + +If your application uses threads, then you probably want to ensure that +you are either not bound at all (by specifying ``--bind-to none``), or +bound to multiple cores using an appropriate binding level or specific +number of processing elements per application process via the ``PE=#`` +qualifier to the ``--map-by`` command line directive. + +A more detailed description of the mapping, ranking, and binding +procedure can be obtained via the ``--help placement`` option. diff --git a/src/docs/prrte-rst-content/cli-noprefix.rst b/src/docs/prrte-rst-content/cli-noprefix.rst new file mode 100644 index 0000000000..96bf282184 --- /dev/null +++ b/src/docs/prrte-rst-content/cli-noprefix.rst @@ -0,0 +1,16 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +Disable automatic ``--prefix`` behavior. PRRTE automatically sets the +prefix for remote daemons if it was either configured with the +``--enable-prte-prefix-by-default`` option OR prte itself was executed +with an absolute path to the ``prte`` command. This option disables +that behavior. diff --git a/src/docs/prrte-rst-content/cli-output.rst b/src/docs/prrte-rst-content/cli-output.rst new file mode 100644 index 0000000000..6dff3b1085 --- /dev/null +++ b/src/docs/prrte-rst-content/cli-output.rst @@ -0,0 +1,52 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +The ``output`` command line directive must be accompanied by a +comma-delimited list of case-insensitive options that control how +output is generated. The full directive need not be provided |mdash| only +enough characters are required to uniquely identify the directive. For +example, ``MERGE`` is sufficient to represent the +``MERGE-STDERR-TO-STDOUT`` directive |mdash| while ``TAG`` can not be +used to represent ``TAG-DETAILED`` (though ``TAG-D`` would suffice). + +Supported values include: + +* ``TAG`` marks each output line with the ``[job,rank]:`` of + the process that generated it + +* ``TAG-DETAILED`` marks each output line with a detailed annotation + containing ``[namespace,rank][hostname:pid]:`` of the + process that generated it + +* ``TAG-FULLNAME`` marks each output line with the + ``[namespace,rank]:`` of the process that generated it + +* ``TAG-FULLNAME`` marks each output line with the + ``[namespace,rank]:`` of the process that generated it + +* ``TIMESTAMP`` prefixes each output line with a ``[datetime]:`` + stamp. Note that the timestamp will be the time when the line is + output by the DVM and not the time when the source output it + +* ``XML`` provides all output in a pseudo-XML format + ``MERGE-STDERR-TO-STDOUT`` merges stderr into stdout + +* ``DIR=DIRNAME`` redirects output from application processes into + ``DIRNAME/job/rank/std[out,err,diag]``. The provided name will be + converted to an absolute path + +* ``FILE=FILENAME`` redirects output from application processes into + ``filename.rank.`` The provided name will be converted to an absolute + path + +Supported qualifiers include ``NOCOPY`` (do not copy the output to the +stdout/err streams), and ``RAW`` (do not buffer the output into complete +lines, but instead output it as it is received). diff --git a/src/docs/prrte-rst-content/cli-personality.rst b/src/docs/prrte-rst-content/cli-personality.rst new file mode 100644 index 0000000000..cb667248da --- /dev/null +++ b/src/docs/prrte-rst-content/cli-personality.rst @@ -0,0 +1,18 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +Specify the personality to be used. This governs selection of the +plugin responsible for defining and parsing the command line, +harvesting and forwarding environmental variables, and providing +library-dependent support to the launched processes. Examples include +``ompi`` for an application compiled with Open MPI, ``mpich`` for one +built against the MPICH library, or ``oshmem`` for an OpenSHMEM +application compiled against SUNY's reference library. diff --git a/src/docs/prrte-rst-content/cli-pmixmca.rst b/src/docs/prrte-rst-content/cli-pmixmca.rst new file mode 100644 index 0000000000..03697f0e2a --- /dev/null +++ b/src/docs/prrte-rst-content/cli-pmixmca.rst @@ -0,0 +1,15 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +Pass a PMIx MCA parameter + +Syntax: ``--pmixmca ``, where ``key`` is the parameter +name and ``value`` is the parameter value. diff --git a/src/docs/prrte-rst-content/cli-prefix.rst b/src/docs/prrte-rst-content/cli-prefix.rst new file mode 100644 index 0000000000..67f68fd75c --- /dev/null +++ b/src/docs/prrte-rst-content/cli-prefix.rst @@ -0,0 +1,17 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +Prefix to be used to look for PRRTE executables. PRRTE automatically +sets the prefix for remote daemons if it was either configured with +the ``--enable-prte-prefix-by-default`` option OR prte itself was +executed with an absolute path to the prte command. This option +overrides those settings, if present, and forces use of the provided +path. diff --git a/src/docs/prrte-rst-content/cli-prtemca.rst b/src/docs/prrte-rst-content/cli-prtemca.rst new file mode 100644 index 0000000000..4d2c4d25da --- /dev/null +++ b/src/docs/prrte-rst-content/cli-prtemca.rst @@ -0,0 +1,15 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +Pass a PRRTE MCA parameter. + +Syntax: ``--prtemca ``, where ``key`` is the parameter +name and ``value`` is the parameter value. diff --git a/src/docs/prrte-rst-content/cli-rank-by.rst b/src/docs/prrte-rst-content/cli-rank-by.rst new file mode 100644 index 0000000000..dfc997fd94 --- /dev/null +++ b/src/docs/prrte-rst-content/cli-rank-by.rst @@ -0,0 +1,63 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +PRRTE automatically ranks processes for each job starting from zero. +Regardless of the algorithm used, rank assignments span applications +in the same job |mdash| i.e., a command line of + +.. code:: + + -n 3 app1 : -n 2 app2 + +will result in ``app1`` having three processes ranked 0-2 and ``app2`` +having two processes ranked 3-4. + +By default, process ranks are assigned in accordance with the mapping +directive |mdash| e.g., jobs that are mapped by-node will have the process +ranks assigned round-robin on a per-node basis. However, users can override +the default by specifying any of the following directives using the +``--rank-by`` command line option: + +* ``SLOT`` assigns ranks to each process on a node in the order in + which the mapper assigned them. This is the default behavior, + but is provided as an explicit option to allow users to override + any alternative default specified in the environment. When mapping + to a specific resource type, procs assigned to a given instance + of that resource on a node will be ranked on a per-resource basis + on that node before moving to the next node. + +* ``NODE`` assigns ranks round-robin on a per-node basis + +* ``FILL`` assigns ranks to procs mapped to a particular resource type + on each node, filling all ranks on that resource before moving to + the next resource on that node. For example, procs mapped by + ``L1cache`` would have all procs on the first ``L1cache`` ranked + sequentially before moving to the second ``L1cache`` on the + node. Once all procs on the node have been ranked, ranking would + continue on the next node. + +* ``SPAN`` assigns ranks round-robin to procs mapped to a particular + resource type, treating the collection of resource instances + spanning the entire allocation as a single "super node" before + looping around for the next pass. Thus, ranking would begin with the + first proc on the first ``L1cache`` on the first node, then the next + rank would be assigned to the first proc on the second ``L1cache`` + on that node, proceeding across until the first proc had been ranked + on all ``L1cache`` used by the job before circling around to rank + the second proc on each object. + +The ``rank-by`` command line option has no qualifiers. + +.. note:: Directives are case-insensitive. ``SPAN`` is the same as + ``span``. + +A more detailed description of the mapping, ranking, and binding +procedure can be obtained via the ``--help placement`` option. diff --git a/src/docs/prrte-rst-content/cli-runtime-options.rst b/src/docs/prrte-rst-content/cli-runtime-options.rst new file mode 100644 index 0000000000..2a2c0a4f93 --- /dev/null +++ b/src/docs/prrte-rst-content/cli-runtime-options.rst @@ -0,0 +1,164 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +The ``--runtime-options`` command line directive must be accompanied +by a comma-delimited list of case-insensitive options that control the +runtime behavior of the job. The full directive need not be provided +|mdash| only enough characters are required to uniquely identify the +directive. + +Runtime options are typically ``true`` or ``false``, though this is +not a requirement on developers. Since the value of each option may +need to be set (e.g., to override a default set by MCA parameter), the +syntax of the command line directive includes the use of an ``=`` +character to allow inclusion of a value for the option. For example, +one can set the ``ABORT-NONZERO-STATUS`` option to ``true`` by +specifying it as ``ABORT-NONZERO-STATUS=1``. Note that boolean options +can be set to ``true`` using a non-zero integer or a case-insensitive +string of the word ``true``. For the latter representation, the user +need only provide at least the ``T`` character. The same policy +applies to setting a boolean option to ``false``. + +Note that a boolean option will default to ``true`` if provided +without a value. Thus, ``--runtime-options abort-nonzero`` is +sufficient to set the ``ABORT-NONZERO-STATUS`` option to ``true``. + +Supported values include: + +* ``ERROR-NONZERO-STATUS[=(bool)]``: if set to false, this directs the + runtime to treat a process that exits with non-zero status as a + normal termination. If set to true, the runtime will consider such + an occurrence as an error termination and take appropriate action + |mdash| i.e., the job will be terminated unless a runtime option + directs otherwise. This option defaults to a true value if the + option is given without a value. + +* ``DONOTLAUNCH``: directs the runtime to map but not launch the + specified job. This is provided to help explore possible process + placement patterns before actually starting execution. No value need + be passed as this is not an option that can be set by default in + PRRTE. + +* ``SHOW-PROGRESS[=(bool)]``: requests that the runtime provide + progress reports on its startup procedure |mdash| i.e., the launch + of its daemons in support of a job. This is typically used to debug + DVM startup on large systems. This option defaults to a true value + if the option is given without a value. + +* ``NOTIFYERRORS[=(bool)]``: if set to true, requests that the runtime + provide a PMIx event whenever a job encounters an error |mdash| + e.g., a process fails. The event is to be delivered to each + remaining process in the job. This option defaults to a true value + if the option is given without a value. See ``--help + notifications`` for more detail as to the PMIx event codes available + for capturing failure events. + +* ``RECOVERABLE[=(bool)]``: if set to true, this indicates that the + application wishes to consider the job as recoverable |mdash| i.e., + the application is assuming responsibility for recovering from any + process failure. This could include application-driven spawn of a + substitute process or internal compensation for the missing + process. This option defaults to a true value if the option is given + without a value. + +* ``AUTORESTART[=(bool)]``: if set to true, this requests that the + runtime automatically restart failed processes up to "max restarts" + number of times. This option defaults to a true value if the option + is given without a value. + +* ``CONTINUOUS[=(bool)]``: if set to true, this informs the runtime + that the processes in this job are to run until explicitly + terminated. Processes that fail are to be automatically restarted up + to "max restarts" number of times. Notification of process failure + is to be delivered to all processes in the application. This is the + equivalent of specifying ``RECOVERABLE``, ``NOTIFYERRORS``, and + ``AUTORESTART`` options except that the runtime, not the + application, assumes responsibility for process recovery. This + option defaults to a true value if the option is given without a + value. + +* ``MAX-RESTARTS=``: indicates the maximum number of times a + given process is to be restarted. This can be set at the application + or job level (which will then apply to all applications in that + job). + +* ``EXEC-AGENT=`` indicates the executable that shall be used to + start an application process. The resulting command for starting an + application process will be `` app ``. The path may + contain its own command line arguments. + +* ``DEFAULT-EXEC-AGENT``: directs the runtime to use the system + default exec agent to start an application process. No value need be + passed as this is not an option that can be set by default in PRRTE. + +* ``OUTPUT-PROCTABLE[(=channel)]``: directs the runtime to report the + convential debugger process table (includes PID and host location of + each process in the application). Output is directed to stdout if + the channel is ``-``, stderr if ``+``, or into the specified file + otherwise. If no channel is specified, output will be directed to + stdout. + +* ``STOP-ON-EXEC``: directs the runtime to stop the application + process(es) immediately upon exec'ing them. The directive will apply + to all processes in the job. + +* ``STOP-IN-INIT``: indicates that the runtime should direct the + application process(es) to stop in ``PMIx_Init()``. The directive + will apply to all processes in the job. + +* ``STOP-IN-APP``: indicates that the runtime should direct + application processes to stop at some application-defined place and + notify they are ready-to-debug. The directive will apply to all + processes in the job. + +* ``TIMEOUT=``: directs the runtime to terminate the job after + it has executed for the specified time. Time is specified in + colon-delimited format |mdash| e.g., ``01:20:13:05`` to indicate 1 + day, 20 hours, 13 minutes and 5 seconds. Time specified without + colons will be assumed to have been given in seconds. + +* ``SPAWN-TIMEOUT=``: directs the runtime to terminate the job + if job launch is not completed within the specified time. Time is + specified in colon-delimited format |mdash| e.g., ``01:20:13:05`` to + indicate 1 day, 20 hours, 13 minutes and 5 seconds. Time specified + without colons will be assumed to have been given in seconds. + +* ``REPORT-STATE-ON-TIMEOUT[(=bool)]``: directs the runtime to provide + a detailed report on job and application process state upon job + timeout. This option defaults to a true value if the option is given + without a value. + +* ``GET-STACK-TRACES[(=bool)]``: requests that the runtime provide + stack traces on all application processes still executing upon + timeout. This option defaults to a true value if the option is given + without a value. + +* ``REPORT-CHILD-JOBS-SEPARATELY[(=bool)]``: directs the runtime to + report the exit status of any child jobs spawned by the primary job + separately. If false, then the final exit status reported will be + zero if the primary job and all spawned jobs exit normally, or the + first non-zero status returned by either primary or child jobs. + This option defaults to a true value if the option is given without + a value. + +* ``AGGREGATE-HELP-MESSAGES[(=bool)]``: directs the runtime to + aggregate help messages, reporting each unique help message once + accompanied by the number of processes that reported it. This option + defaults to a true value if the option is given without a value. + +* ``FWD-ENVIRONMENT[(=bool)]``: directs the runtime to forward the + entire local environment in support of the application. This option + defaults to a true value if the option is given without a value. + +The ``--runtime-options`` command line option has no qualifiers. + +.. note:: Directives are case-insensitive. ``FWD-ENVIRONMENT`` is the + same as ``fwd-environment``. diff --git a/src/docs/prrte-rst-content/cli-stream-buffering.rst b/src/docs/prrte-rst-content/cli-stream-buffering.rst new file mode 100644 index 0000000000..05ad554dd5 --- /dev/null +++ b/src/docs/prrte-rst-content/cli-stream-buffering.rst @@ -0,0 +1,16 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +Adjust buffering for stdout/stderr. Allowable values: + +* 0: unbuffered +* 1: line buffered +* 2: fully buffered diff --git a/src/docs/prrte-rst-content/cli-tune.rst b/src/docs/prrte-rst-content/cli-tune.rst new file mode 100644 index 0000000000..c37deeeed4 --- /dev/null +++ b/src/docs/prrte-rst-content/cli-tune.rst @@ -0,0 +1,25 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +Comma-delimited list of one or more files containing PRRTE and PMIx +MCA params for tuning DVM and/or application operations. Parameters in +the file will be treated as *generic* parameters and subject to the +translation rules/uncertainties. See ``--help mca`` for more +information. + +Syntax in the file is: + +.. code:: + + param = value + +with one parameter and its associated value per line. Empty lines and +lines beginning with the ``#`` character are ignored. diff --git a/src/docs/prrte-rst-content/cli-x.rst b/src/docs/prrte-rst-content/cli-x.rst new file mode 100644 index 0000000000..bf3f2d721f --- /dev/null +++ b/src/docs/prrte-rst-content/cli-x.rst @@ -0,0 +1,20 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +Export an environment variable, optionally specifying a value. For +example: + +* ``-x foo`` exports the environment variable ``foo`` and takes its + value from the current environment. +* ``-x foo=bar`` exports the environment variable name ``foo`` and + sets its value to ``bar`` in the started processes. +* ``-x foo*`` exports all current environmental variables starting + with ``foo``. diff --git a/src/docs/prrte-rst-content/deprecated-bind-to-core.rst b/src/docs/prrte-rst-content/deprecated-bind-to-core.rst new file mode 100644 index 0000000000..8a76f231ca --- /dev/null +++ b/src/docs/prrte-rst-content/deprecated-bind-to-core.rst @@ -0,0 +1,17 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +Bind each process to its own core. + +.. admonition:: Deprecated + :class: warning + + This option is deprecated. Please use ``--bind-to core``. diff --git a/src/docs/prrte-rst-content/deprecated-display-allocation.rst b/src/docs/prrte-rst-content/deprecated-display-allocation.rst new file mode 100644 index 0000000000..ba4b41928c --- /dev/null +++ b/src/docs/prrte-rst-content/deprecated-display-allocation.rst @@ -0,0 +1,17 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +Display the allocation being used by this job. + +.. admonition:: Deprecated + :class: warning + + This option is deprecated. Please use ``--display alloc``. diff --git a/src/docs/prrte-rst-content/deprecated-display-devel-allocation.rst b/src/docs/prrte-rst-content/deprecated-display-devel-allocation.rst new file mode 100644 index 0000000000..a4a506593e --- /dev/null +++ b/src/docs/prrte-rst-content/deprecated-display-devel-allocation.rst @@ -0,0 +1,18 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +Display a detailed list (mostly intended for developers) of the +allocation being used by this job. + +.. admonition:: Deprecated + :class: warning + + This option is deprecated. Please use ``--display alloc-devel``. diff --git a/src/docs/prrte-rst-content/deprecated-display-devel-map.rst b/src/docs/prrte-rst-content/deprecated-display-devel-map.rst new file mode 100644 index 0000000000..5942fb4983 --- /dev/null +++ b/src/docs/prrte-rst-content/deprecated-display-devel-map.rst @@ -0,0 +1,18 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +Display a detailed process map (mostly intended for developers) +just before launch. + +.. admonition:: Deprecated + :class: warning + + This option is deprecated. Please use ``--display map-devel``. diff --git a/src/docs/prrte-rst-content/deprecated-display-map.rst b/src/docs/prrte-rst-content/deprecated-display-map.rst new file mode 100644 index 0000000000..69adbd8f58 --- /dev/null +++ b/src/docs/prrte-rst-content/deprecated-display-map.rst @@ -0,0 +1,6 @@ +Display the process map just before launch. + +.. admonition:: Deprecated + :class: warning + + This option is deprecated. Please use ``--display map``. diff --git a/src/docs/prrte-rst-content/deprecated-display-topo.rst b/src/docs/prrte-rst-content/deprecated-display-topo.rst new file mode 100644 index 0000000000..0ba505d7f6 --- /dev/null +++ b/src/docs/prrte-rst-content/deprecated-display-topo.rst @@ -0,0 +1,18 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +Display the topology as part of the process map (mostly intended +for developers) just before launch. + +.. admonition:: Deprecated + :class: warning + + This option is deprecated. Please use ``--display topo``. diff --git a/src/docs/prrte-rst-content/deprecated-gmca.rst b/src/docs/prrte-rst-content/deprecated-gmca.rst new file mode 100644 index 0000000000..9e5dbfc3b9 --- /dev/null +++ b/src/docs/prrte-rst-content/deprecated-gmca.rst @@ -0,0 +1,28 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +Syntax: ``--gmca ``, where ``key`` is the parameter name +and ``value`` is the parameter value. The ``g`` prefix indicates that +this parameter is "global", and to be applied to *all* application +contexts |mdash| not just the one in which the directive appears. + +Pass generic MCA parameters |mdash| i.e., parameters whose project +affiliation must be determined by PRRTE based on matching the name of +the parameter with defined values from various projects that PRRTE +knows about. + +.. admonition:: Deprecated + :class: warning + + This translation can be incomplete (e.g., if known project adds or + changes parameters) |mdash| thus, it is strongly recommended that + users use project-specific parameters such as ``--gprtemca`` or + ``--gpmixmca``. diff --git a/src/docs/prrte-rst-content/deprecated-mca.rst b/src/docs/prrte-rst-content/deprecated-mca.rst new file mode 100644 index 0000000000..dd76dafa4b --- /dev/null +++ b/src/docs/prrte-rst-content/deprecated-mca.rst @@ -0,0 +1,26 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +Syntax: ``--mca ``, where ``key`` is the parameter name +and ``value`` is the parameter value. + +Pass generic MCA parameters |mdash| i.e., parameters whose project +affiliation must be determined by PRRTE based on matching the name of +the parameter with defined values from various projects that PRRTE +knows about. + +.. admonition:: Deprecated + :class: warning + + This translation can be incomplete (e.g., if a project adds or + changes parameters) |mdash| thus, it is strongly recommended that + users use project-specific parameters such as ``--prtemca`` or + ``--pmixmca``. diff --git a/src/docs/prrte-rst-content/deprecated-merge-stderr-to-stdout.rst b/src/docs/prrte-rst-content/deprecated-merge-stderr-to-stdout.rst new file mode 100644 index 0000000000..67708ea49a --- /dev/null +++ b/src/docs/prrte-rst-content/deprecated-merge-stderr-to-stdout.rst @@ -0,0 +1,17 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +Merge stderr to stdout for each process. + +.. admonition:: Deprecated + :class: warning + + This option is deprecated. Please use ``--output merge`` diff --git a/src/docs/prrte-rst-content/deprecated-output-directory.rst b/src/docs/prrte-rst-content/deprecated-output-directory.rst new file mode 100644 index 0000000000..4e009067c5 --- /dev/null +++ b/src/docs/prrte-rst-content/deprecated-output-directory.rst @@ -0,0 +1,23 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +Redirect output from application processes into +``filename/job/rank/std[out,err,diag]``. A relative path value will be +converted to an absolute path. The directory name may include a colon +followed by a comma-delimited list of optional case-insensitive +directives. Supported directives currently include ``NOJOBID`` (do not +include a job-id directory level) and ``NOCOPY`` (do not copy the +output to the stdout/err streams). + +.. admonition:: Deprecated + :class: warning + + This option is deprecated. Please use ``--output dir=``. diff --git a/src/docs/prrte-rst-content/deprecated-output-filename.rst b/src/docs/prrte-rst-content/deprecated-output-filename.rst new file mode 100644 index 0000000000..19c3d79673 --- /dev/null +++ b/src/docs/prrte-rst-content/deprecated-output-filename.rst @@ -0,0 +1,22 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +Redirect output from application processes into ``filename.rank``. A +relative path value will be converted to an absolute path. The +directory name may include a colon followed by a comma-delimited list +of optional case-insensitive directives. Supported directives +currently include ``NOCOPY`` (do not copy the output to the stdout/err +streams). + +.. admonition:: Deprecated + :class: warning + + This option is deprecated. Please use ``--output file=`` diff --git a/src/docs/prrte-rst-content/deprecated-report-bindings.rst b/src/docs/prrte-rst-content/deprecated-report-bindings.rst new file mode 100644 index 0000000000..9fcc4bbd1e --- /dev/null +++ b/src/docs/prrte-rst-content/deprecated-report-bindings.rst @@ -0,0 +1,17 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +Display process bindings to stderr. + +.. admonition:: Deprecated + :class: warning + + This option is deprecated. Please use ``--display bindings``. diff --git a/src/docs/prrte-rst-content/deprecated-tag-output.rst b/src/docs/prrte-rst-content/deprecated-tag-output.rst new file mode 100644 index 0000000000..fe7f4791cc --- /dev/null +++ b/src/docs/prrte-rst-content/deprecated-tag-output.rst @@ -0,0 +1,17 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +Tag all output with ``[job,rank]``. + +.. admonition:: Deprecated + :class: warning + + This option is deprecated. Please use ``--output``. diff --git a/src/docs/prrte-rst-content/deprecated-timestamp-output.rst b/src/docs/prrte-rst-content/deprecated-timestamp-output.rst new file mode 100644 index 0000000000..204eec71e7 --- /dev/null +++ b/src/docs/prrte-rst-content/deprecated-timestamp-output.rst @@ -0,0 +1,17 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +Timestamp all application process output. + +.. admonition:: Deprecated + :class: warning + + This option is deprecated. Please use ``--output timestamp``. diff --git a/src/docs/prrte-rst-content/deprecated-xml.rst b/src/docs/prrte-rst-content/deprecated-xml.rst new file mode 100644 index 0000000000..8478155b89 --- /dev/null +++ b/src/docs/prrte-rst-content/deprecated-xml.rst @@ -0,0 +1,17 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +Provide all output in XML format. + +.. admonition:: Deprecated + :class: warning + + This option is deprecated. Please use ``--output``. diff --git a/src/docs/prrte-rst-content/prterun-all-cli.rst b/src/docs/prrte-rst-content/prterun-all-cli.rst new file mode 100644 index 0000000000..cb1fa6cee8 --- /dev/null +++ b/src/docs/prrte-rst-content/prterun-all-cli.rst @@ -0,0 +1,135 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +The ``--add-host`` option +^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/cli-add-host.rst + +The ``--add-hostfile`` option +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/cli-add-hostfile.rst + +The ``--allow-run-as-root`` option +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/cli-allow-run-as-root.rst + +The ``--bind-to`` option +^^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/cli-bind-to.rst + +The ``--debug-daemons`` option +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/cli-debug-daemons.rst + +The ``--debug-daemons-file`` option +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/cli-debug-daemons-file.rst + +The ``--display`` option +^^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/cli-display.rst + +The ``--dvm`` option +^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/cli-dvm.rst + +The ``--dvm-hostfile`` option +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/cli-dvm-hostfile.rst + +The ``--forward-signals`` option +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/cli-forward-signals.rst + +The ``--host`` option +^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/cli-dash-host.rst + +The ``--launcher-hostfile`` option +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/cli-launcher-hostfile.rst + +The ``--leave-session-attached`` option +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/cli-leave-session-attached.rst + +The ``--map-by`` option +^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/cli-map-by.rst + +The ``--output`` option +^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/cli-output.rst + +The ``--personality`` option +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/cli-personality.rst + +The ``--pmixmca`` option +^^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/cli-pmixmca.rst + +The ``--prefix`` option +^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/cli-prefix.rst + +The ``--prtemca`` option +^^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/cli-prtemca.rst + +The ``--noprefix`` option +^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/cli-noprefix.rst + +The ``--rank-by`` option +^^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/cli-rank-by.rst + +The ``--runtime-options`` option +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/cli-runtime-options.rst + +The ``--stream-buffering`` option +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/cli-stream-buffering.rst + +The ``--tune`` option +^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/cli-tune.rst + +The ``-x`` option +^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/cli-x.rst diff --git a/src/docs/prrte-rst-content/prterun-all-deprecated.rst b/src/docs/prrte-rst-content/prterun-all-deprecated.rst new file mode 100644 index 0000000000..cc04ffcdf6 --- /dev/null +++ b/src/docs/prrte-rst-content/prterun-all-deprecated.rst @@ -0,0 +1,85 @@ +.. -*- rst -*- + + Copyright (c) 2022-2023 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +The ``--bind-to-core`` option +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/deprecated-bind-to-core.rst + +The ``--display-allocation`` option +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/deprecated-display-allocation.rst + +The ``--display-devel-allocation`` option +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/deprecated-display-devel-allocation.rst + +The ``--display-devel-map`` option +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/deprecated-display-devel-map.rst + +The ``--display-map`` option +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/deprecated-display-map.rst + +The ``--display-topo`` option +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/deprecated-display-topo.rst + +The ``--gmca`` option +^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/deprecated-gmca.rst + +The ``--mca`` option +^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/deprecated-mca.rst + +The ``--merge-stderr-to-stdout`` option +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/deprecated-merge-stderr-to-stdout.rst + +The ``--output-directory`` option +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/deprecated-output-directory.rst + +The ``--output-filename`` option +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/deprecated-output-filename.rst + +The ``--report-bindings`` option +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/deprecated-report-bindings.rst + +The ``--tag-output`` option +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/deprecated-tag-output.rst + +The ``--timestamp-output`` option +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/deprecated-timestamp-output.rst + +The ``--xml`` option +^^^^^^^^^^^^^^^^^^^^ + +.. include:: /prrte-rst-content/deprecated-xml.rst diff --git a/src/docs/show-help-files/.gitignore b/src/docs/show-help-files/.gitignore new file mode 100644 index 0000000000..a786516d7e --- /dev/null +++ b/src/docs/show-help-files/.gitignore @@ -0,0 +1,3 @@ +_build +cli-*.rst +deprecated-*.rst diff --git a/src/docs/show-help-files/Makefile.am b/src/docs/show-help-files/Makefile.am new file mode 100644 index 0000000000..e9bb40a523 --- /dev/null +++ b/src/docs/show-help-files/Makefile.am @@ -0,0 +1,108 @@ +# +# Copyright (c) 2022-2023 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. +# Copyright (c) 2023 Nanook Consulting. All rights reserved. +# +# $COPYRIGHT$ +# +# Additional copyrights may follow +# +# $HEADER$ +# + +# We need this Makefile to be executed serially. Below, we list all +# the man pages as the targets of the rule that invokes Sphinx for +# dependency/generation reasons. But a *single* execution of the make +# target will generate *all* of the man pages and HTML files. Hence, +# when "make" determines that none of the man page files exist, it +# should execute the Sphinx-invocation rule once, and then it will +# realize that all the man pages files exist. More specifically: if +# someone invokes "make -j N", we need make to not execute the +# Sphinx-invocation rule multiple times simultaneously. Both GNU Make +# and BSD Make will honor the .NOTPARALLEL target to disable all +# parallel invocation in this Makefile[.am]. +# +# Note that even though we explicitly disable make's parallelism, +# we'll use Sphinx's internal parallelism via "-j auto" -- see +# SPHINX_OPTS. +.NOTPARALLEL: + +OUTDIR = _build +SPHINX_CONFIG = conf.py +#SPHINX_OPTS ?= -W --keep-going -j auto +SPHINX_OPTS ?= --keep-going -j auto + +# All RST source files, including those that are not installed +RST_SOURCE_FILES = \ + $(srcdir)/prrte-rst-content/*.rst \ + $(srcdir)/index.rst \ + $(srcdir)/help-schizo-pinfo.rst \ + $(srcdir)/help-schizo-prte.rst \ + $(srcdir)/help-schizo-prted.rst \ + $(srcdir)/help-schizo-prterun.rst \ + $(srcdir)/help-schizo-prun.rst \ + $(srcdir)/help-schizo-pterm.rst + +TXT_OUTDIR = $(OUTDIR)/text + +# These files are the ones that are built that we care about (Sphinx +# will emit lots of files in the TXT_OUTDIR; these are the only ones +# that we care about). +ALL_TXT_BUILT = \ + $(TXT_OUTDIR)/help-schizo-pinfo.txt \ + $(TXT_OUTDIR)/help-schizo-prte.txt \ + $(TXT_OUTDIR)/help-schizo-prted.txt \ + $(TXT_OUTDIR)/help-schizo-prterun.txt \ + $(TXT_OUTDIR)/help-schizo-prun.txt \ + $(TXT_OUTDIR)/help-schizo-pterm.txt + +#------------------- + +EXTRA_DIST = \ + requirements.txt \ + $(SPHINX_CONFIG) \ + $(RST_SOURCE_FILES) \ + $(TXT_OUTDIR) + +########################################################################### + +include $(top_srcdir)/Makefile.prte-rules + +# We do not specifically compute or list the RST file dependencies. +# Instead, because Sphinx will do this computation by itself, we just +# say "all Sphinx output is dependent upon all RST input files" here +# in the Makefile. If any of the input RST files are edited, this +# will cause "make" to invoke Sphinx and let it re-generate whatever +# output files need to be re-generated. +# +# Hence, we use a single sentinel file here in the Makefile to +# represent all Sphinx output files. +TXT_SENTINEL_OUTPUT = $(TXT_OUTDIR)/index.txt +$(ALL_TXT_BUILT): $(TXT_SENTINEL_OUTPUT) + +$(TXT_SENTINEL_OUTPUT): $(RST_SOURCE_FILES) +$(TXT_SENTINEL_OUTPUT): $(SPHINX_CONFIG) + +# Technically, we always build the docs files if they're missing +# (e.g., in a git clone). The "PRTE_BUILD_DOCS" conditional really +# refers to whether we have Sphinx or not. +# +# If we have Sphinx, use it to build. +# +# If we do not have Sphinx, then build dummy help files that say "you +# don't have Sphinx, so you didn't get proper help files" files, just +# so that it's obvious as to why you don't have proper help files. +$(TXT_SENTINEL_OUTPUT): +if PRTE_BUILD_DOCS + $(PRTE_V_TXT) $(SPHINX_BUILD) -M text "$(srcdir)" "$(OUTDIR)" $(SPHINX_OPTS) +else !PRTE_BUILD_DOCS + $(PRTE_V_TXT) $(srcdir)/build-dummy-ini-files.py $(TXT_OUTDIR) $(ALL_TXT_BUILT) +endif !PRTE_BUILD_DOCS + +maintainer-clean-local: + $(SPHINX_BUILD) -M clean "$(srcdir)" "$(OUTDIR)" $(SPHINX_OPTS) + +########################################################################### + +# Install the generated text files +dist_prtedata_DATA = $(ALL_TXT_BUILT) diff --git a/src/docs/show-help-files/build-dummy-ini-files.py b/src/docs/show-help-files/build-dummy-ini-files.py new file mode 100755 index 0000000000..a295047ab9 --- /dev/null +++ b/src/docs/show-help-files/build-dummy-ini-files.py @@ -0,0 +1,76 @@ +#!/usr/bin/env python3 +# +# Copyright 2023 Jeffrey M. Squyres. All rights reserved. +# +# $COPYRIGHT$ +# +# Additional copyrights may follow +# +# $HEADER$ + +# Trivial script to scrape section names from source .rst files and +# generate dummy show_help-style text files. We only use this script +# a) if the show_help files are not availabale (i.e., in a git clone), +# and b) if Sphinx is not available. +# +# We generate these show_help files (as opposed to generating a run +# time error when the show_help file is unavailable) just so that it's +# 100% obvious that you're not getting proper help files because you +# did not have Sphinx available. + +# Intentially use a minimal set of Python modules (to decrease any +# needed dependencies). + +import re +import os +import sys + +# Pop the executable name +exe = sys.argv.pop(0) +abs_srcdir = os.path.abspath(os.path.dirname(exe)) + +# First argument is the text outdir (in the build tree). +outdir = sys.argv.pop(0) +outdir_len = len(outdir) + 1 + +# The rest of the arguments are the text filenames to build. +for outfile in sys.argv: + + # The filenames all have the outdir prefix. We find the + # correspoding source .rst file by stripping that prefix off and + # adding the srcdir prefix to it. + # + # We do this instead of using os.path.basename() because some + # files have subdirectory names in them (e.g., + # "mca/help-something.txt"). + txt_filename = outfile[outdir_len:] + + # Replace the .txt with .rst, and add the srcdir prefix. + rst_filename = txt_filename.replace(".txt", ".rst") + srcfile = os.path.join(abs_srcdir, rst_filename) + + # Read in the source file + with open(srcfile) as fp: + src_rst = fp.readlines() + + # Find all the "[section]" lines. + sections = list() + for line in src_rst: + match = re.search('\s*\[(.+)\]\s*$', line) + if match: + sections.append(match.group(1)) + + # Ensure the out directory exists + full_outdir = os.path.abspath(os.path.dirname(outfile)) + # Use older form of os.mkdirs (without the exist_ok param) to make + # this script runnable in as many environments as possible. + try: + os.makedirs(full_outdir) + except FileExistsError: + pass + + # Write the output file + with open(outfile, 'w') as fp: + for section in sections: + fp.write(f"""[{section}] +This help section is empty because PRRTE was built without Sphinx.\n""") diff --git a/src/docs/show-help-files/conf.py b/src/docs/show-help-files/conf.py new file mode 120000 index 0000000000..cb9acdea07 --- /dev/null +++ b/src/docs/show-help-files/conf.py @@ -0,0 +1 @@ +../../../docs/conf.py \ No newline at end of file diff --git a/src/docs/show-help-files/help-schizo-pinfo.rst b/src/docs/show-help-files/help-schizo-pinfo.rst new file mode 100644 index 0000000000..c528203ad2 --- /dev/null +++ b/src/docs/show-help-files/help-schizo-pinfo.rst @@ -0,0 +1,126 @@ +.. -*- rst -*- + + Copyright (c) 2021-2022 Nanook Consulting. All rights reserved. + Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved. + + $COPYRIGHT$ + + Additional copyrights may follow + + $HEADER$ + +[bogus section] + +This section is not used by PRTE code. But we have to put a RST +section title in this file somewhere, or Sphinx gets unhappy. So we +put it in a section that is ignored by PRTE code. + +Hello, world +------------ + +[usage] + +%s (%s) %s + +Usage: ``%s [OPTION]...`` + +Provide detailed information on your PRRTE installation. + +The following list of command line options are available. Note that +more detailed help for any option can be obtained by adding that +option to the help request as ``--help