Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nvidia GPU Support #230

Open
cvonelm opened this issue Nov 2, 2022 · 0 comments
Open

Nvidia GPU Support #230

cvonelm opened this issue Nov 2, 2022 · 0 comments
Assignees

Comments

@cvonelm
Copy link
Member

cvonelm commented Nov 2, 2022

This issue documents how NVML can be used to get events from the GPU:

All calls to NVML have to be encased in a pair of nvmlinit_v2() and nvmlShutdown() calls.

To interface with a specific card, you need a nvmlDevice_t handle. A specific device can be retrieved in multiple ways, for example by its PCI Bus Id. (nvmlDeviceGetHandleByPciBusId_v2())

Samples are read with nvmlDeviceGetProcessUtilization:

    unsigned samples_count = 0;
    nvmlDeviceGetProcessUtilization(device, NULL, &samples_count, last _readout);
    nvmlProcessUtilizationSample_t *samples = malloc(samples_count * sizeof(*samples));
    nvmlDeviceGetProcessUtilization(device, samples, &samples_count, last_readout);

the last parameter to nvmlDeviceGetProcessUtilization controls that only events with timestamp larger than, in this case, last_readout are returned.

First, nvmlDeviceGetProcessUtilization is called with the event buffer field set to NULL. This updates samples_count with the amount of samples that can be read, after that a buffer is allocated and the events are actually read.

There is some wonkyness to the readout, as documented in this comment in the nvtop code

Sample data includes the pid of the executing process and utilization values for the Decoder, Encoder, Framebuffer Memory and SM (Compute) units ( in percents between 1-100
This feature is not supported on Kepler or Ampere cards (So no luck with Taurus' A100s).

Sample rate can not be controlled by software."Each sample period may be between 1 second and
1/6 second, depending on the product being queried."

nvmlDeviceGetSamples can be used in the same vein to get device wide samples. For example for the clock speed of the memory and compute.

There is also a big zoo of different getters for a wide array of information. It might be worthwile to investigate if higher or at least user controlled sampling periods can be achieved by manually polling.

@cvonelm cvonelm self-assigned this May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant