Skip to content

Commit

Permalink
Add initialization and error handling
Browse files Browse the repository at this point in the history
  • Loading branch information
matyas-streamhpc authored and neon60 committed Oct 4, 2024
1 parent 3ab331c commit 7230caa
Show file tree
Hide file tree
Showing 4 changed files with 233 additions and 0 deletions.
144 changes: 144 additions & 0 deletions docs/how-to/hip_runtime_api/error_handling.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
.. meta::
:description: Error Handling
:keywords: AMD, ROCm, HIP, error handling, error

*************************************************************************
Error handling
*************************************************************************

Error handling is crucial for several reasons. It enhances the stability of applications by preventing crashes and maintaining a consistent state. It also improves security by protecting against vulnerabilities that could be exploited by malicious actors. Additionally, effective error handling enhances the user experience by providing meaningful feedback and ensuring that applications can recover gracefully from errors. Finally, it aids maintainability by making it easier for developers to diagnose and fix issues.

Strategies
==========

One of the fundamental best practices in error handling is to develop a consistent strategy across the entire application. This involves defining how errors are reported, logged, and managed. This can be achieved by using a centralized error handling mechanism that ensures consistency and reduces redundancy. For instance, using macros to simplify error checking and reduce code duplication is a common practice. A macro like ``HIP_CHECK`` can be defined to check the return value of HIP API calls and handle errors appropriately.

Granular error reporting
------------------------
It involves reporting errors at the appropriate level of detail. Too much detail can overwhelm users, while too little can make debugging difficult. Differentiating between user-facing errors and internal errors is crucial.

Fail-fast principle
-------------------
It involves detecting and handling errors as early as possible to prevent them from propagating and causing more significant issues. Such as validating inputs and preconditions before performing operations.

Resource management
-------------------
Ensuring that resources such as memory, file handles, and network connections are properly managed and released in the event of an error is essential.

Integration in Error Handling
=============================
Functions like ``hipGetLastError`` and ``hipPeekAtLastError`` are used to detect errors after HIP API calls. This ensures that any issues are caught early in the execution flow.

For reporting, ``hipGetErrorName`` and ``hipGetErrorString`` provide meaningful error messages that can be logged or displayed to users. This helps in understanding the nature of the error and facilitates debugging.

By checking for errors and providing detailed information, these functions enable developers to implement appropriate error handling strategies, such as retry mechanisms, resource cleanup, or graceful degradation.

Examples
--------

``hipGetLastError`` returns the last error that occurred during a HIP runtime API call and resets the error code to ``hipSuccess``:

.. code-block:: cpp
hipError_t err = hipGetLastError();
if (err != hipSuccess)
{
printf("HIP Error: %s\n", hipGetErrorString(err));
}
``hipPeekAtLastError`` returns the last error that occurred during a HIP runtime API call **without** resetting the error code:

.. code-block:: cpp
hipError_t err = hipPeekAtLastError();
if (err != hipSuccess)
{
printf("HIP Error: %s\n", hipGetErrorString(err));
}
``hipGetErrorName`` converts a HIP error code to a string representing the error name:

.. code-block:: cpp
const char* errName = hipGetErrorName(err);
printf("Error Name: %s\n", errName);
``hipGetErrorString`` converts a HIP error code to a string describing the error:

.. code-block:: cpp
const char* errString = hipGetErrorString(err);
printf("Error Description: %s\n", errString);
Best Practices
==============

1. Check Errors After Each API Call

Always check the return value of HIP API calls to catch errors early. For example:

.. code-block:: cpp
hipError_t err = hipMalloc(&d_A, size);
if (err != hipSuccess) {
printf("hipMalloc failed: %s\n", hipGetErrorString(err));
return -1;
}
2. Use Macros for Error Checking

Define macros to simplify error checking and reduce code duplication. For example:

.. code-block:: cpp
#define HIP_CHECK(call) \
{ \
hipError_t err = call; \
if (err != hipSuccess) { \
printf("HIP Error: %s:%d, %s\n", __FILE__, __LINE__, hipGetErrorString(err)); \
exit(err); \
} \
}
// Usage
HIP_CHECK(hipMalloc(&d_A, size));
3. Handle Errors Gracefully

Ensure the application can handle errors gracefully, such as by freeing resources or providing meaningful error messages to the user.

Example
-------

A complete example demonstrating error handling:

.. code-block:: cpp
#include <stdio.h>
#include <hip/hip_runtime.h>
#define HIP_CHECK(call) \
{ \
hipError_t err = call; \
if (err != hipSuccess) { \
printf("HIP Error: %s:%d, %s\n", __FILE__, __LINE__, hipGetErrorString(err)); \
exit(err); \
} \
}
int main()
{
constexpr int N = 100;
size_t size = N * sizeof(float);
float *d_A;
// Allocate memory on the device
HIP_CHECK(hipMalloc(&d_A, size));
// Perform other operations...
// Free device memory
HIP_CHECK(hipFree(d_A));
return 0;
}
85 changes: 85 additions & 0 deletions docs/how-to/hip_runtime_api/initialization.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
.. meta::
:description: Initialization.
:keywords: AMD, ROCm, HIP, initialization

*************************************************************************
Initialization
*************************************************************************

Initialization involves setting up the environment and resources needed for GPU computation.

Include HIP headers
===================

To use HIP functions, include the HIP runtime header in your source file:

.. code-block:: cpp
#include <hip/hip_runtime.h>
Initialize the HIP Runtime
==========================

The HIP runtime is initialized automatically when the first HIP API call is made. However, you can explicitly initialize it using ``hipInit``:

.. code-block:: cpp
hipError_t err = hipInit(0);
if (err != hipSuccess)
{
// Handle error
}
The initialization includes the following steps:

- Loading the HIP Runtime

This includes loading necessary libraries and setting up internal data structures.

- Querying GPU Devices

Identifying and querying the available GPU devices on the system.

- Setting Up Contexts

Creating contexts for each GPU device, which are essential for managing resources and executing kernels.

Get device properties
=====================

Before using a GPU device, you might want to query its properties:

.. code-block:: cpp
int deviceCount;
hipGetDeviceCount(&deviceCount);
for (int i = 0; i < deviceCount; ++i)
{
hipDeviceProp_t prop;
hipGetDeviceProperties(&prop, i);
printf("Device %d: %s\n", i, prop.name);
}
Set device
==========

Select the GPU device to be used for subsequent HIP operations:

.. code-block:: cpp
int deviceId = 0; // Example: selecting the first device
hipSetDevice(deviceId);
This function performs several key tasks:

- Context Binding

Binds the current thread to the context of the specified GPU device. This ensures that all subsequent operations are executed on the selected device.

- Resource Allocation

Prepares the device for resource allocation, such as memory allocation and stream creation.

- Error Handling

Checks for errors in device selection and ensures that the specified device is available and capable of executing HIP operations.
2 changes: 2 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,9 @@ On non-AMD platforms, like NVIDIA, HIP provides header files required to support
:::{grid-item-card} How to

* {doc}`./how-to/hip_runtime_api`
* {doc}`./how-to/hip_runtime_api/initialization`
* {doc}`./how-to/hip_runtime_api/memory_management`
* {doc}`./how-to/hip_runtime_api/error_handling`
* {doc}`./how-to/hip_runtime_api/cooperative_groups`
* [HIP porting guide](./how-to/hip_porting_guide)
* [HIP porting: driver API guide](./how-to/hip_porting_driver_api)
Expand Down
2 changes: 2 additions & 0 deletions docs/sphinx/_toc.yml.in
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ subtrees:
- file: how-to/hip_runtime_api
subtrees:
- entries:
- file: how-to/hip_runtime_api/initialization
- file: how-to/hip_runtime_api/memory_management
subtrees:
- entries:
Expand All @@ -37,6 +38,7 @@ subtrees:
- file: how-to/hip_runtime_api/memory_management/unified_memory
- file: how-to/hip_runtime_api/memory_management/virtual_memory
- file: how-to/hip_runtime_api/memory_management/stream_ordered_allocator
- file: how-to/hip_runtime_api/error_handling
- file: how-to/hip_runtime_api/cooperative_groups
- file: how-to/hip_porting_guide
- file: how-to/hip_porting_driver_api
Expand Down

0 comments on commit 7230caa

Please sign in to comment.