AMDGPU API Reference

Kernel launching

AMDGPU.@rocMacro
@roc [kwargs...] func(args...)

High-level interface for executing code on a GPU. The @roc macro should prefix a call, with func a callable function or object that should return nothing. It will be compiled to a GCN function upon first use, and to a certain extent arguments will be converted and managed automatically using rocconvert. Finally, a call to roccall is performed, scheduling a kernel launch on the specified (or default) HSA queue.

Several keyword arguments are supported that influence the behavior of @roc.

The underlying operations (argument conversion, kernel compilation, kernel call) can be performed explicitly when more control is needed, e.g. to reflect on the resource usage of a kernel to determine the launch configuration. A host-side kernel launch is done as follows:

args = ...
GC.@preserve args begin
    kernel_args = rocconvert.(args)
    kernel_tt = Tuple{Core.Typeof.(kernel_args)...}
    kernel = rocfunction(f, kernel_tt; compilation_kwargs)
    kernel(kernel_args...; launch_kwargs)
end

A device-side launch, aka. dynamic parallelism, is similar but more restricted:

args = ...
# GC.@preserve is not supported
# we're on the device already, so no need to rocconvert
kernel_tt = Tuple{Core.Typeof(args[1]), ...}    # this needs to be fully inferred!
kernel = dynamic_rocfunction(f, kernel_tt)       # no compiler kwargs supported
kernel(args...; launch_kwargs)
source
AMDGPU.AbstractKernelType
(::HostKernel)(args...; kwargs...)
(::DeviceKernel)(args...; kwargs...)

Low-level interface to call a compiled kernel, passing GPU-compatible arguments in args. For a higher-level interface, use AMDGPU.@roc.

The following keyword arguments are supported:

  • groupsize or threads (defaults to 1)
  • gridsize or blocks (defaults to 1)
  • config: callback function to dynamically compute the launch configuration. should accept a HostKernel and return a name tuple with any of the above as fields.
  • queue (defaults to the default queue)
source
AMDGPU.HostKernelType
(::HostKernel)(args...; kwargs...)
(::DeviceKernel)(args...; kwargs...)

Low-level interface to call a compiled kernel, passing GPU-compatible arguments in args. For a higher-level interface, use AMDGPU.@roc.

The following keyword arguments are supported:

  • groupsize or threads (defaults to 1)
  • gridsize or blocks (defaults to 1)
  • config: callback function to dynamically compute the launch configuration. should accept a HostKernel and return a name tuple with any of the above as fields.
  • queue (defaults to the default queue)
source
AMDGPU.rocfunctionFunction
rocfunction(f, tt=Tuple{}; kwargs...)

Low-level interface to compile a function invocation for the currently-active GPU, returning a callable kernel object. For a higher-level interface, use @roc.

The following keyword arguments are supported:

  • name: overrides the name that the kernel will have in the generated code
  • device: chooses which device to compile the kernel for
  • global_hooks: specifies maps from global variable name to initializer hook

The output of this function is automatically cached, i.e. you can simply call rocfunction in a hot path without degrading performance. New code will be generated automatically, when function definitions change, or when different types or keyword arguments are provided.

source

Device code API

Thread indexing

HSA nomenclature

AMDGPU.gridDimFunction
gridDim()::ROCDim3

Returns the size of the grid in workitems. This behaviour is different from CUDA where gridDim gives the size of the grid in blocks.

source
AMDGPU.gridDimWGFunction
gridDimWG()::ROCDim3

Returns the size of the grid in workgroups. This is equivalent to CUDA's gridDim.

source

CUDA nomenclature

Use these functions for compatibility with CUDAnative.jl.

Synchronization

Global Variables

Missing docstring.

Missing docstring for AMDGPU.get_global_pointer. Check Documenter's build log for details.