From 8b95fe274b2171df4ac0543aba76f57c36cbcb85 Mon Sep 17 00:00:00 2001 From: Ralph Castain Date: Fri, 19 Apr 2024 11:30:42 -0600 Subject: [PATCH 1/3] Remove stale resilience document PRRTE itself no longer requires specific resilience settings. Signed-off-by: Ralph Castain --- docs/index.rst | 1 - docs/resilience.rst | 206 -------------------------------------------- 2 files changed, 207 deletions(-) delete mode 100644 docs/resilience.rst diff --git a/docs/index.rst b/docs/index.rst index dbd58d58dd..b9af57fad3 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -37,7 +37,6 @@ Table of contents placement/index notifications session-directory - resilience developers/index contributing license diff --git a/docs/resilience.rst b/docs/resilience.rst deleted file mode 100644 index bb3d5e7274..0000000000 --- a/docs/resilience.rst +++ /dev/null @@ -1,206 +0,0 @@ -Resilience -========== - -This section documents the features and options specific to the **PRTE -Level Fault Tolerant** PMIx reference RunTime Environment (PRTE) - -Features --------- - -This implementation provides a runtime level failure detection and -propagation mechanism for both process and node failure. - -What's added to support fault-tolerance? -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -#. New module under src/mca/errmgr: detector: - Daemons monitor one another along a ring topology to detect node failures. - src/mca/odls is in charge of detecting the failure of locally hosted processes - (using SIGCHLD signals from the operating system). - -#. New component: propagate with module prperror: - Prepares the content of the reliable broadcast messages (i.e., the list of failed processes). - In order to populate the list of failed processes in node failure cases, the list of processes - hosted by a particular daemon is collected by prperror module. - -#. New module under src/mca/grpcomm: bmg - The BMG component implements a broadcast algorithm in a reliable way; - to be noted, this component abides by the normal interface for a daemon - broadcast and can reliably broadcast any type of information - -#. Test case for process failure under example/error_notify.c - This test uses kill(pid) to kill a process to simulate process failure. - -#. Test case for daemon/node failure under example/daemon_error_notify.c - This test uses kill(ppid) to kill a process's parent to simulate node failure. - -Building -^^^^^^^^ - -.. code-block:: bash - - ./autogen.pl - - # If you want to run mpi applications, you mpi and PRTE - # should have the same version of PMIx and libevent - - ./configure --enable-prte-ft --prefix=... --with-pmix=/external-pmix-path --with-libevent=/external-libevent-path - - make [-j N] all install - # use an integer value of N for parallel builds - -Running -^^^^^^^ - -Building your application -+++++++++++++++++++++++++ - -Compile your application as usual - -#. using the provided ``pcc`` for pmix-based application; -#. using your ``mpicc`` for mpi-based application with a prte-based MPI (e.g., Open MPI). - -Running your application -++++++++++++++++++++++++ - -If running standalone: - -#. you need to launch first the DVM daemons with ``prte --mca prte_enable_ft true``. -#. You can then launch your application with by simply using the provided ``prun --enable-recovery``. - -Make sure to set your ``PATH`` and ``LD_LIBRARY_PATH`` properly. - -If running with a PRRTE-based MPI (e.g., Open MPI): - -#. use ``mpiexec --enable-recovery --mca prte_enable_ft true``. - -Running under a batch scheduler -+++++++++++++++++++++++++++++++ - -This code can operate under a job/batch scheduler, and is tested routinely with Slurm. -One difficulty comes from the fact that many job schedulers will "cleanup" the -application as soon as a process fails. In order to avoid this problem, it is preferred -that you use ``-k, --no-kill [=off]: Do not automatically terminate a job if one of the nodes -it has been allocated fails.`` within an allocation (e.g., ``salloc``, ``sbatch``) rather than -a direct launch (e.g. ``srun``). - -Run-time tuning knobs -^^^^^^^^^^^^^^^^^^^^^ - -This code comes with a variety of knobs for controlling how it runs. The default -parameters are sane and should result in very good performance in most -cases. You can change those default by ``--prtemca parameter value``: - -* ``prte_enable_recovery (default: false)`` controls automatic - cleanup of apps with failed processes. - -* ``prte_abort_non_zero_exit (default: true)`` controls the job - termination after a error occurred. - -* ``errmgr_detector_enable (default: false)`` enable or disable error - detection and propagation. - -* ``errmgr_detector_heartbeat_period (default:5s)`` heartbeat - period. Recommended value is 1/2 of the timeout. - -* ``errmgr_detector_heartbeat_timeout (default:10s)`` heartbeat - timeout (i.e. failure detection speed). Recommended value is 2 times - the heartbeat period - -To be noted: if you want to use prte failure detection and propagation features. - You MUST set prte_enable_recovery to true, - prte_abort_non_zero_exit to false. - -Testing -------- - -.. code-block:: bash - - # Step 1 - salloc -k -N num_of_nodes -w host1,host2... - -k, --no-kill do not kill job on node failure - - # Step 2 - prte --prtemca prte_enable_ft true \ - --prtemca errmgr_detector_heartbeat_period 0.5 \ - --prtemca errmgr_detector_heartbeat_timeout 1 \ - --prtemca errmgr_detector_enable 1 \ - --prtemca prte_abort_on_non_zero_status 0 \ - --debug-daemons - - # using 'errmgr_detector_enable 1' choose enable the error detector. - -Config with ``--enable-debug``, ``--debug-daemons`` will give you lots of information. - -Also, the ring detector heartbeat sending frequency is not hard coded, -you can change heartbeat_peroid and heartbeat_timeout by using MCA -params. For example: - -* using ``--prtemca errmgr_detector_heartbeat_period 10`` set the sending frequency to every 10 seconds(default is 5s) - -* using ``--prtemca errmgr_detector_heartbeat_timeout 20`` set timeout to 20 seconds(default is 10s) - -Step 3: under example we have 2 test codes ``error_notify.c``, -``daemon_error_notify.c``: - -.. code-block:: bash - - # Compile the codes - pcc -g error_notify.c -o error_notify - - # Run - prun --oversubscribe --merge-stderr-to-stdout \ - --map-by node:DISPLAY:DISPLAYALLOC \ - --report-bindings --enable-recovery \ - --max-restarts 4 \ - --continuous -np num_of_procs error_notify -v - -If use external pmix: - -.. code-block:: bash - - # Compile - pcc error_notify.c -o error_notify_1 \ - -I/external_pmix_install_path/include \ - -L/external_pmix_install_path/lib \ - -lpmix - - # Run - prun --oversubscribe -x LD_LIBRARY_PATH \ - --merge-stderr-to-stdout \ - --map-by node:DISPLAY:DISPLAYALLOC \ - --report-bindings --enable-recovery \ - --max-restarts 4 \ - --continuous -np num_of_procs error_notify_1 -v - -Iif use external pmix: - -.. code-block:: bash - - # Compile - pcc daemon_error_notify.c -o daemon_error_notify_1 \ - -I/external_pmix_install_path/include \ - -L/external_pmix_install_path/lib \ - -lpmix - - # Run - prun --oversubscribe -x LD_LIBRARY_PATH \ - --merge-stderr-to-stdout \ - --map-by node:DISPLAY:DISPLAYALLOC \ - --report-bindings --enable-recovery \ - --max-restarts 4 \ - --continuous -np num_of_procs \ - daemon_error_notify_1 -v - -Copyright ---------- - -Copyright (c) 2018-2020 The University of Tennessee and The University -of Tennessee Research Foundation. All rights reserved. - -Copyright (c) 2022 Cisco Systems, Inc. All rights reserved. -$COPYRIGHT$ - -Additional copyrights may follow - -$HEADER$ From 39577895f6c2beb7531e4a0707ce451b5c78de91 Mon Sep 17 00:00:00 2001 From: Ralph Castain Date: Thu, 2 May 2024 14:13:53 -0600 Subject: [PATCH 2/3] Add support for PMIX_MEM_ALLOC_KIND Add a new cmd line option that corresponds to this attribute. Add the attribute to the prun payload. When received, it will default to including in the job info for the spawned job. Add query support for it. Signed-off-by: Ralph Castain --- src/mca/schizo/prte/schizo_prte.c | 2 ++ src/prted/pmix/pmix_server_queries.c | 24 +++++++++++++++++++++++- src/prted/prun_common.c | 7 +++++++ src/util/prte_cmd_line.h | 3 ++- 4 files changed, 34 insertions(+), 2 deletions(-) diff --git a/src/mca/schizo/prte/schizo_prte.c b/src/mca/schizo/prte/schizo_prte.c index db18d7013e..16507a6ae5 100644 --- a/src/mca/schizo/prte/schizo_prte.c +++ b/src/mca/schizo/prte/schizo_prte.c @@ -196,6 +196,7 @@ static struct option prterunoptions[] = { PMIX_OPTION_SHORT_DEFINE(PRTE_CLI_PRELOAD_BIN, PMIX_ARG_NONE, 's'), PMIX_OPTION_DEFINE(PRTE_CLI_DO_NOT_AGG_HELP, PMIX_ARG_NONE), PMIX_OPTION_DEFINE(PRTE_CLI_FWD_ENVIRON, PMIX_ARG_OPTIONAL), + PMIX_OPTION_DEFINE(PRTE_CLI_MEM_ALLOC_KIND, PMIX_ARG_REQD), // output options PMIX_OPTION_DEFINE(PRTE_CLI_OUTPUT, PMIX_ARG_REQD), @@ -310,6 +311,7 @@ static struct option prunoptions[] = { PMIX_OPTION_SHORT_DEFINE(PRTE_CLI_PRELOAD_BIN, PMIX_ARG_NONE, 's'), PMIX_OPTION_DEFINE(PRTE_CLI_DO_NOT_AGG_HELP, PMIX_ARG_NONE), PMIX_OPTION_DEFINE(PRTE_CLI_FWD_ENVIRON, PMIX_ARG_OPTIONAL), + PMIX_OPTION_DEFINE(PRTE_CLI_MEM_ALLOC_KIND, PMIX_ARG_REQD), // output options PMIX_OPTION_DEFINE(PRTE_CLI_OUTPUT, PMIX_ARG_REQD), diff --git a/src/prted/pmix/pmix_server_queries.c b/src/prted/pmix/pmix_server_queries.c index d3df3743c8..b1b7d3002e 100644 --- a/src/prted/pmix/pmix_server_queries.c +++ b/src/prted/pmix/pmix_server_queries.c @@ -90,7 +90,8 @@ static void _query(int sd, short args, void *cbdata) pmix_proc_info_t *procinfo; pmix_data_array_t dry; prte_proc_t *proct; - pmix_proc_t *proc; + pmix_proc_t *proc, pproc; + pmix_info_t info; size_t sz; PRTE_HIDE_UNUSED_PARAMS(sd, args); @@ -823,6 +824,27 @@ static void _query(int sd, short args, void *cbdata) } #endif +#ifdef PMIX_MEM_ALLOC_KIND + } else if (0 == strcmp(q->keys[n], PMIX_MEM_ALLOC_KIND)) { + pmix_value_t *value; + jdata = prte_get_job_data_object(jobid); + if (NULL == jdata) { + ret = PMIX_ERR_NOT_FOUND; + goto done; + } + PMIX_LOAD_PROCID(&pproc, jobid, PMIX_RANK_WILDCARD); + PMIX_INFO_LOAD(&info, PMIX_IMMEDIATE, NULL, PMIX_BOOL); + ret = PMIx_Get(&pproc, PMIX_MEM_ALLOC_KIND, &info, 1, (void**)&value); + if (PMIX_SUCCESS != ret) { + goto done; + } + PMIX_INFO_LIST_ADD(rc, results, PMIX_MEM_ALLOC_KIND, value->data.string, PMIX_STRING); + PMIX_VALUE_RELEASE(value); + if (PMIX_SUCCESS != rc) { + PMIX_ERROR_LOG(rc); + goto done; + } +#endif } else { fprintf(stderr, "Query for unrecognized attribute: %s\n", q->keys[n]); } diff --git a/src/prted/prun_common.c b/src/prted/prun_common.c index 695aec2f21..62281d30c5 100644 --- a/src/prted/prun_common.c +++ b/src/prted/prun_common.c @@ -661,6 +661,13 @@ int prun_common(pmix_cli_result_t *results, PMIX_INFO_LIST_ADD(ret, jinfo, PMIX_LOG_AGG, &flag, PMIX_BOOL); } +#ifdef PMIX_MEM_ALLOC_KIND + opt = pmix_cmd_line_get_param(results, PRTE_CLI_MEM_ALLOC_KIND); + if (NULL != opt) { + PMIX_INFO_LIST_ADD(ret, jinfo, PMIX_MEM_ALLOC_KIND, opt->values[0], PMIX_STRING); + } +#endif + /* give the schizo components a chance to add to the job info */ schizo->job_info(results, jinfo); diff --git a/src/util/prte_cmd_line.h b/src/util/prte_cmd_line.h index 3f657f6ad2..15f5727898 100644 --- a/src/util/prte_cmd_line.h +++ b/src/util/prte_cmd_line.h @@ -15,7 +15,7 @@ * Copyright (c) 2016-2017 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2017-2022 IBM Corporation. All rights reserved. - * Copyright (c) 2021-2023 Nanook Consulting. All rights reserved. + * Copyright (c) 2021-2024 Nanook Consulting All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -108,6 +108,7 @@ BEGIN_C_DECLS #define PRTE_CLI_SET_CWD_SESSION "set-cwd-to-session-dir" // none #define PRTE_CLI_ENABLE_RECOVERY "enable-recovery" // none #define PRTE_CLI_DISABLE_RECOVERY "disable-recovery" // none +#define PRTE_CLI_MEM_ALLOC_KIND "memory-alloc-kinds" // required // Placement options #define PRTE_CLI_MAPBY "map-by" // required From 2ac45f362c6992257a58e7c34b3a48ce906658c4 Mon Sep 17 00:00:00 2001 From: Ralph Castain Date: Thu, 2 May 2024 14:41:00 -0600 Subject: [PATCH 3/3] Remove MacOS CI builds Homebrew has broken something and I cannot figure out how to fix it. Signed-off-by: Ralph Castain --- .github/workflows/builds.yaml | 50 ----------------------------------- 1 file changed, 50 deletions(-) diff --git a/.github/workflows/builds.yaml b/.github/workflows/builds.yaml index d02be34cef..099a95e37b 100644 --- a/.github/workflows/builds.yaml +++ b/.github/workflows/builds.yaml @@ -3,56 +3,6 @@ name: Build tests on: [pull_request] jobs: - macos: - runs-on: macos-latest - strategy: - matrix: - path: ['non-vpath', 'vpath'] - sphinx: ['no-sphinx', 'sphinx'] - steps: - - name: Install dependencies - run: brew install libevent hwloc autoconf automake libtool - - name: Git clone OpenPMIx - uses: actions/checkout@v3 - with: - submodules: recursive - repository: openpmix/openpmix - path: openpmix/master - ref: master - - name: Build OpenPMIx - run: | - cd openpmix/master - ./autogen.pl - ./configure --prefix=$RUNNER_TEMP/pmixinstall - make -j - make install - - name: Git clone PRRTE - uses: actions/checkout@v3 - with: - submodules: recursive - clean: false - - name: Build PRRTE - run: | - ./autogen.pl - - sphinx= - if test "${{ matrix.sphinx }}" = sphinx; then - pip3 install -r docs/requirements.txt - sphinx=--enable-sphinx - fi - - c=./configure - if test "${{ matrix.path }}" = vpath; then - mkdir build - cd build - c=../configure - fi - - $c --prefix=$RUNNER_TEMP/prteinstall --with-pmix=$RUNNER_TEMP/pmixinstall $sphinx - make -j - make install - make uninstall - ubuntu: runs-on: ubuntu-latest strategy: