Skip to content

Commit

Permalink
WIP
Browse files Browse the repository at this point in the history
  • Loading branch information
neon60 committed May 25, 2024
1 parent abf3acd commit 0228789
Showing 1 changed file with 11 additions and 11 deletions.
22 changes: 11 additions & 11 deletions docs/how-to/hip_rtc.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ hipModuleLoadData(&module, kernel_binary.data());
hipModuleGetFunction(&kernel, module, "vector_add");
```

And now this kernel can be launched via hipModule APIs.
And now this kernel can be launched via `hipModule` APIs.

The full example is below:

Expand Down Expand Up @@ -270,11 +270,11 @@ HIPRTC assumes **WGP mode by default** for gfx10+. This can be overridden by pas

## Linker APIs

The bitcode generated using the HIPRTC Bitcode APIs can be loaded using hipModule APIs and also can be linked with other generated bitcodes with appropriate linker flags using the HIPRTC linker APIs. This also provides more flexibility and optimizations to the applications who want to generate the binary dynamically according to their needs. The input bitcodes can be generated only for a specific architecture or it can be a bundled bitcode which is generated for multiple architectures.
The bitcode generated using the HIPRTC Bitcode APIs can be loaded using `hipModule` APIs and also can be linked with other generated bitcodes with appropriate linker flags using the HIPRTC linker APIs. This also provides more flexibility and optimizations to the applications who want to generate the binary dynamically according to their needs. The input bitcodes can be generated only for a specific architecture or it can be a bundled bitcode which is generated for multiple architectures.

### Example

Firstly, HIPRTC link instance or a pending linker invocation must be created using hiprtcLinkCreate, with the appropriate linker options provided.
Firstly, HIPRTC link instance or a pending linker invocation must be created using `hiprtcLinkCreate`, with the appropriate linker options provided.

```cpp
hiprtcLinkCreate( num_options, // number of options
Expand All @@ -283,7 +283,7 @@ hiprtcLinkCreate( num_options, // number of options
&rtc_link_state ); // HIPRTC link state created upon success
```
Following which, the bitcode data can be added to this link instance via hiprtcLinkAddData (if the data is present as a string) or hiprtcLinkAddFile (if the data is present as a file) with the appropriate input type according to the data or the bitcode used.
Following which, the bitcode data can be added to this link instance via hiprtcLinkAddData (if the data is present as a string) or `hiprtcLinkAddFile` (if the data is present as a file) with the appropriate input type according to the data or the bitcode used.
```cpp
hiprtcLinkAddData(rtc_link_state, // HIPRTC link state
Expand All @@ -305,15 +305,15 @@ hiprtcLinkAddFile(rtc_link_state, // HIPRTC link state
0); // Array of option values cast to void*
```
Once the bitcodes for multiple architectures are added to the link instance, the linking of the device code must be completed using hiprtcLinkComplete which generates the final binary.
Once the bitcodes for multiple architectures are added to the link instance, the linking of the device code must be completed using `hiprtcLinkComplete` which generates the final binary.
```cpp
hiprtcLinkComplete(rtc_link_state, // HIPRTC link state
&binary, // upon success, points to the output binary
&binarySize); // size of the binary is stored (optional)
```

If the hiprtcLinkComplete returns successfully, the generated binary can be loaded and run using the hipModule* APIs.
If the `hiprtcLinkComplete` returns successfully, the generated binary can be loaded and run using the `hipModule*` APIs.

```cpp
hipModuleLoadData(&module, binary);
Expand Down Expand Up @@ -416,13 +416,13 @@ Currently, it returns hardcoded value. This should be implemented to return HIP

HIPRTC mangles the `__global__` function names and names of `__device__` and `__constant__` variables. If the generated binary is being loaded using the HIP Runtime API, the kernel function or `__device__/__constant__` variable must be looked up by name, but this is very hard when the name has been mangled. To overcome this, HIPRTC provides API functions that map `__global__` function or `__device__/__constant__` variable names in the source to the mangled names present in the generated binary.

The two APIs hiprtcAddNameExpression and hiprtcGetLoweredName provide this functionality. First, a 'name expression' string denoting the address for the `__global__` function or `__device__/__constant__` variable is provided to hiprtcAddNameExpression. Then, the program is compiled with hiprtcCompileProgram. During compilation, HIPRTC will parse the name expression string as a C++ constant expression at the end of the user program. Finally, the function hiprtcGetLoweredName is called with the original name expression and it returns a pointer to the lowered name. The lowered name can be used to refer to the kernel or variable in the HIP Runtime API.
The two APIs `hiprtcAddNameExpression` and hiprtcGetLoweredName provide this functionality. First, a 'name expression' string denoting the address for the `__global__` function or `__device__/__constant__` variable is provided to `hiprtcAddNameExpression`. Then, the program is compiled with hiprtcCompileProgram. During compilation, HIPRTC will parse the name expression string as a C++ constant expression at the end of the user program. Finally, the function hiprtcGetLoweredName is called with the original name expression and it returns a pointer to the lowered name. The lowered name can be used to refer to the kernel or variable in the HIP Runtime API.

### Note

* The identical name expression string must be provided on a subsequent call to hiprtcGetLoweredName to extract the lowered name.
* The correct sequence of calls is : hiprtcAddNameExpression, hiprtcCompileProgram, hiprtcGetLoweredName, hiprtcDestroyProgram.
* The lowered names must be fetched using hiprtcGetLoweredName only after the HIPRTC program has been compiled, and before it has been destroyed.
* The identical name expression string must be provided on a subsequent call to `hiprtcGetLoweredName` to extract the lowered name.
* The correct sequence of calls is : `hiprtcAddNameExpression`, `hiprtcCompileProgram`, `hiprtcGetLoweredName`, `hiprtcDestroyProgram`.
* The lowered names must be fetched using `hiprtcGetLoweredName` only after the HIPRTC program has been compiled, and before it has been destroyed.

### Example

Expand Down Expand Up @@ -481,7 +481,7 @@ hipModuleGetGlobal(&variable_addr, &bytes, module, name);
hipMemcpyHtoD(variable_addr, &initial_value, sizeof(initial_value));
```
Finally, the mangled name of the kernel is used to launch it using the hipModule APIs.
Finally, the mangled name of the kernel is used to launch it using the `hipModule` APIs.
```cpp
hipFunction_t kernel;
Expand Down

0 comments on commit 0228789

Please sign in to comment.