GPUPluginKernels

GPU kernels implementation overview

As mentioned in GPU plugin structure, kernels for GPU plugin are located in inference-engine/thirdparty/clDNN/kernel_selector folder.

For each operation we usually have multiple kernels that can support different parameters and/or optimized for different scenarios.

Each operation has 3 major entities in kernel selector:

Operation specific kernel_selector instance
Operation parameters descriptor
Kernels itself with a set of heuristics inside for optimal selection

Kernel selector instance

For each operation we create kernel_selector class derived from kernel_selector_base. Basically, this class is needed to specify available kernels for given operation. Each kernel selector is used as singleton. For example:

class mvn_kernel_selector : public kernel_selector_base {
public:
    static mvn_kernel_selector& Instance() {
        static mvn_kernel_selector instance_;
        return instance_;
    }

    mvn_kernel_selector();

    KernelsData GetBestKernels(const Params& params, const optional_params& options) const override;
}

// The list of available kernels is usually specified in kernel_selector c-tor using `Attach` method whith creates instance of each type
// and append it to implementations list.
// In this case we have 3 available kernels for MVN operation. Kernels might have different priorities and support only subset of operation parameters
// E.g. MVNKernel_b_fs_yx_fsv16_imad supports only `fsv16` blocked layouts and INT8/UINT8 input data types
mvn_kernel_selector::mvn_kernel_selector() {
    Attach<MVNKernelRef>();
    Attach<MVNKernelBfyxOpt>();
    Attach<MVNKernel_b_fs_yx_fsv16_imad>();
}

// This method is used to get the optimal kernel for given parameters
// There are 2 base methods to pick optimal kernels: `GetNaiveBestKernel` and `GetAutoTuneBestKernel`
// If kernel supports auto tuning, then it uses `GetAutoTuneBestKernel`, otherwise, it uses `GetNaiveBestKernel`
// parameterized with `KernelType` which specifies the operation type which is implemented by the specific kernel selector
KernelsData mvn_kernel_selector::GetBestKernels(const Params& params, const optional_params& options) const {
    return GetNaiveBestKernel(params, options, KernelType::MVN);
}

The caller code looks as follows:

// Get static instance of the kernel_selector
auto& kernel_selector = kernel_selector::mvn_kernel_selector::Instance();
// Run some heuristics to pick the best mvn kernel for given `mvn_params`
auto best_kernels = kernel_selector.GetBestKernels(mvn_params, mvn_optional_params);

Operation parameters

The parameters of operation for kernel_selector are defined in corresponding ${op_name}_params class which is derived from base_params. For example:

struct mvn_params : public base_params {
    mvn_params() : base_params(KernelType::MVN) {}

    MVNMode mvnMode = MVNMode::WITHIN_CHANNELS;
    bool mvnNormalizeVariance = true;
    float epsilon = 1e-10f;

    virtual ParamsKey GetParamsKey() const {
        ParamsKey k = base_params::GetParamsKey();

        k.EnableMVNMode(mvnMode);

        if (mvnNormalizeVariance)
            k.EnableMVNNormalizeVariance();

        return k;
    }
};

The derived class should parameterize base class with specific KernelType and add operation-specific parameters. The only method that must be implemented is GetParamsKey() which is used as a quick check for kernels applicability for current parameters, i.e. we take ParamsKey object calculated for input operation parameters and ParamsKey object for each kernel, so we can compare them and discard the kernels that don't support current parameters. ParamsKey is implemented as a set of bit masks, so the applicability check is quite simple:

const ParamsKey implKey = some_implementation->GetSupportedKey();
if (!implKey.Support(paramsKey))
    // Do something

// Support() method do something like follows for each internal bit mask:
if (!((implKey.mask & paramsKey.mask) == paramsKey.mask))
    return false;

Kernel implementation

Each kernel must specify the following things:

Input parameters checks
- GetSupportedKey() method implementation which returns ParamsKey object for current implementation
- Validate() method that do more complex checks (optional)
Dispatch data (global/local workgroup sizes, scheduling algorithm, etc)
Kernel name - must be passes to base class c-tor
Kernel arguments specification - description of each argument in corresponding OpenCL™ kernel
Additional JIT constants required for kernel - set of macro definitions that must be added to thi kernel template to make full specialization for given params
Supported fused operations (if any) - a list of supported operations that can be fused into current kernel

Let's have a look at the key methods of each kernel implementation:

class MVNKernelRef : public MVNKernelBase {
public:
    MVNKernelRef() : MVNKernelBase("mvn_gpu_ref") {} // mvn_gpu_ref is the name of the file with kernel template in cl_kernels/ folder without .cl extension
    // Returns the kernel specified for input parameters if the implementation can process it
    KernelsData GetKernelsData(const Params& params, const optional_params& options) const override;
    // Returns `ParamsKey` for current implementation for quick applicability check
    ParamsKey GetSupportedKey() const override;

protected:
    // Specifies additional jit constants for kernel template specification
    JitConstants GetJitConstants(const mvn_params& params, DispatchData dispatchData) const override;
    // The list of supported fused operations
    std::vector<FusedOpType> GetSupportedFusedOps() const override {
        return {
            FusedOpType::ACTIVATION,
            FusedOpType::QUANTIZE,
            FusedOpType::ELTWISE,
            FusedOpType::SCALE
        };
    }
};

Home
General resources
- Getting started
- Contribute
  - Google Summer of Code
How to build
Developer documentation
- Inference Engine architecture
- CPU plugin
- GPU plugin
- HETERO plugin architecture
- Snippets
- Sample for IE C++/C/Python API
- Proxy plugin (Concept)
Tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPUPluginKernels

GPU kernels implementation overview

Kernel selector instance

Operation parameters

Kernel implementation

Clone this wiki locally