diff --git a/docs/how-to/hip_runtime_api/error_handling.rst b/docs/how-to/hip_runtime_api/error_handling.rst new file mode 100644 index 0000000000..db1ff71a51 --- /dev/null +++ b/docs/how-to/hip_runtime_api/error_handling.rst @@ -0,0 +1,144 @@ +.. meta:: + :description: Error Handling + :keywords: AMD, ROCm, HIP, error handling, error + +************************************************************************* +Error handling +************************************************************************* + +Error handling is crucial for several reasons. It enhances the stability of applications by preventing crashes and maintaining a consistent state. It also improves security by protecting against vulnerabilities that could be exploited by malicious actors. Additionally, effective error handling enhances the user experience by providing meaningful feedback and ensuring that applications can recover gracefully from errors. Finally, it aids maintainability by making it easier for developers to diagnose and fix issues. + +Strategies +========== + +One of the fundamental best practices in error handling is to develop a consistent strategy across the entire application. This involves defining how errors are reported, logged, and managed. This can be achieved by using a centralized error handling mechanism that ensures consistency and reduces redundancy. For instance, using macros to simplify error checking and reduce code duplication is a common practice. A macro like ``HIP_CHECK`` can be defined to check the return value of HIP API calls and handle errors appropriately. + +Granular error reporting +------------------------ +It involves reporting errors at the appropriate level of detail. Too much detail can overwhelm users, while too little can make debugging difficult. Differentiating between user-facing errors and internal errors is crucial. + +Fail-fast principle +------------------- +It involves detecting and handling errors as early as possible to prevent them from propagating and causing more significant issues. Such as validating inputs and preconditions before performing operations. + +Resource management +------------------- +Ensuring that resources such as memory, file handles, and network connections are properly managed and released in the event of an error is essential. + +Integration in Error Handling +============================= +Functions like ``hipGetLastError`` and ``hipPeekAtLastError`` are used to detect errors after HIP API calls. This ensures that any issues are caught early in the execution flow. + +For reporting, ``hipGetErrorName`` and ``hipGetErrorString`` provide meaningful error messages that can be logged or displayed to users. This helps in understanding the nature of the error and facilitates debugging. + +By checking for errors and providing detailed information, these functions enable developers to implement appropriate error handling strategies, such as retry mechanisms, resource cleanup, or graceful degradation. + +Examples +-------- + +``hipGetLastError`` returns the last error that occurred during a HIP runtime API call and resets the error code to ``hipSuccess``: + + .. code-block:: cpp + + hipError_t err = hipGetLastError(); + if (err != hipSuccess) + { + printf("HIP Error: %s\n", hipGetErrorString(err)); + } + +``hipPeekAtLastError`` returns the last error that occurred during a HIP runtime API call **without** resetting the error code: + + .. code-block:: cpp + + hipError_t err = hipPeekAtLastError(); + if (err != hipSuccess) + { + printf("HIP Error: %s\n", hipGetErrorString(err)); + } + +``hipGetErrorName`` converts a HIP error code to a string representing the error name: + + .. code-block:: cpp + + const char* errName = hipGetErrorName(err); + printf("Error Name: %s\n", errName); + +``hipGetErrorString`` converts a HIP error code to a string describing the error: + + .. code-block:: cpp + + const char* errString = hipGetErrorString(err); + printf("Error Description: %s\n", errString); + +Best Practices +============== + +1. Check Errors After Each API Call + + Always check the return value of HIP API calls to catch errors early. For example: + + .. code-block:: cpp + + hipError_t err = hipMalloc(&d_A, size); + if (err != hipSuccess) { + printf("hipMalloc failed: %s\n", hipGetErrorString(err)); + return -1; + } + +2. Use Macros for Error Checking + + Define macros to simplify error checking and reduce code duplication. For example: + + .. code-block:: cpp + + #define HIP_CHECK(call) \ + { \ + hipError_t err = call; \ + if (err != hipSuccess) { \ + printf("HIP Error: %s:%d, %s\n", __FILE__, __LINE__, hipGetErrorString(err)); \ + exit(err); \ + } \ + } + + // Usage + HIP_CHECK(hipMalloc(&d_A, size)); + +3. Handle Errors Gracefully + + Ensure the application can handle errors gracefully, such as by freeing resources or providing meaningful error messages to the user. + +Example +------- + +A complete example demonstrating error handling: + + .. code-block:: cpp + + #include + #include + + #define HIP_CHECK(call) \ + { \ + hipError_t err = call; \ + if (err != hipSuccess) { \ + printf("HIP Error: %s:%d, %s\n", __FILE__, __LINE__, hipGetErrorString(err)); \ + exit(err); \ + } \ + } + + int main() + { + constexpr int N = 100; + size_t size = N * sizeof(float); + float *d_A; + + // Allocate memory on the device + HIP_CHECK(hipMalloc(&d_A, size)); + + // Perform other operations... + + // Free device memory + HIP_CHECK(hipFree(d_A)); + + return 0; + } diff --git a/docs/how-to/hip_runtime_api/initialization.rst b/docs/how-to/hip_runtime_api/initialization.rst new file mode 100644 index 0000000000..e49385ffef --- /dev/null +++ b/docs/how-to/hip_runtime_api/initialization.rst @@ -0,0 +1,85 @@ +.. meta:: + :description: Initialization. + :keywords: AMD, ROCm, HIP, initialization + +************************************************************************* +Initialization +************************************************************************* + +Initialization involves setting up the environment and resources needed for GPU computation. + +Include HIP headers +=================== + +To use HIP functions, include the HIP runtime header in your source file: + +.. code-block:: cpp + + #include + +Initialize the HIP Runtime +========================== + +The HIP runtime is initialized automatically when the first HIP API call is made. However, you can explicitly initialize it using ``hipInit``: + +.. code-block:: cpp + + hipError_t err = hipInit(0); + if (err != hipSuccess) + { + // Handle error + } + +The initialization includes the following steps: + +- Loading the HIP Runtime + + This includes loading necessary libraries and setting up internal data structures. + +- Querying GPU Devices + + Identifying and querying the available GPU devices on the system. + +- Setting Up Contexts + + Creating contexts for each GPU device, which are essential for managing resources and executing kernels. + +Get device properties +===================== + +Before using a GPU device, you might want to query its properties: + +.. code-block:: cpp + + int deviceCount; + hipGetDeviceCount(&deviceCount); + for (int i = 0; i < deviceCount; ++i) + { + hipDeviceProp_t prop; + hipGetDeviceProperties(&prop, i); + printf("Device %d: %s\n", i, prop.name); + } + +Set device +========== + +Select the GPU device to be used for subsequent HIP operations: + +.. code-block:: cpp + + int deviceId = 0; // Example: selecting the first device + hipSetDevice(deviceId); + +This function performs several key tasks: + +- Context Binding + + Binds the current thread to the context of the specified GPU device. This ensures that all subsequent operations are executed on the selected device. + +- Resource Allocation + + Prepares the device for resource allocation, such as memory allocation and stream creation. + +- Error Handling + + Checks for errors in device selection and ensures that the specified device is available and capable of executing HIP operations. diff --git a/docs/index.md b/docs/index.md index 9c154588f1..e32c4e057f 100644 --- a/docs/index.md +++ b/docs/index.md @@ -39,7 +39,9 @@ On non-AMD platforms, like NVIDIA, HIP provides header files required to support :::{grid-item-card} How to * {doc}`./how-to/hip_runtime_api` + * {doc}`./how-to/hip_runtime_api/initialization` * {doc}`./how-to/hip_runtime_api/memory_management` + * {doc}`./how-to/hip_runtime_api/error_handling` * {doc}`./how-to/hip_runtime_api/cooperative_groups` * [HIP porting guide](./how-to/hip_porting_guide) * [HIP porting: driver API guide](./how-to/hip_porting_driver_api) diff --git a/docs/sphinx/_toc.yml.in b/docs/sphinx/_toc.yml.in index f26db1adc8..8e3eb05c48 100644 --- a/docs/sphinx/_toc.yml.in +++ b/docs/sphinx/_toc.yml.in @@ -25,6 +25,7 @@ subtrees: - file: how-to/hip_runtime_api subtrees: - entries: + - file: how-to/hip_runtime_api/initialization - file: how-to/hip_runtime_api/memory_management subtrees: - entries: @@ -37,6 +38,7 @@ subtrees: - file: how-to/hip_runtime_api/memory_management/unified_memory - file: how-to/hip_runtime_api/memory_management/virtual_memory - file: how-to/hip_runtime_api/memory_management/stream_ordered_allocator + - file: how-to/hip_runtime_api/error_handling - file: how-to/hip_runtime_api/cooperative_groups - file: how-to/hip_porting_guide - file: how-to/hip_porting_driver_api