Nvidia GPU Support #230

cvonelm · 2022-11-02T11:28:52Z

This issue documents how NVML can be used to get events from the GPU:

All calls to NVML have to be encased in a pair of nvmlinit_v2() and nvmlShutdown() calls.

To interface with a specific card, you need a nvmlDevice_t handle. A specific device can be retrieved in multiple ways, for example by its PCI Bus Id. (nvmlDeviceGetHandleByPciBusId_v2())

Samples are read with nvmlDeviceGetProcessUtilization:

    unsigned samples_count = 0;
    nvmlDeviceGetProcessUtilization(device, NULL, &samples_count, last _readout);
    nvmlProcessUtilizationSample_t *samples = malloc(samples_count * sizeof(*samples));
    nvmlDeviceGetProcessUtilization(device, samples, &samples_count, last_readout);

the last parameter to nvmlDeviceGetProcessUtilization controls that only events with timestamp larger than, in this case, last_readout are returned.

First, nvmlDeviceGetProcessUtilization is called with the event buffer field set to NULL. This updates samples_count with the amount of samples that can be read, after that a buffer is allocated and the events are actually read.

There is some wonkyness to the readout, as documented in this comment in the nvtop code

Sample data includes the pid of the executing process and utilization values for the Decoder, Encoder, Framebuffer Memory and SM (Compute) units ( in percents between 1-100
This feature is not supported on Kepler or Ampere cards (So no luck with Taurus' A100s).

Sample rate can not be controlled by software."Each sample period may be between 1 second and
1/6 second, depending on the product being queried."

nvmlDeviceGetSamples can be used in the same vein to get device wide samples. For example for the clock speed of the memory and compute.

There is also a big zoo of different getters for a wide array of information. It might be worthwile to investigate if higher or at least user controlled sampling periods can be achieved by manually polling.

The text was updated successfully, but these errors were encountered:

cvonelm self-assigned this May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nvidia GPU Support #230

Nvidia GPU Support #230

cvonelm commented Nov 2, 2022

Nvidia GPU Support #230

Nvidia GPU Support #230

Comments

cvonelm commented Nov 2, 2022