AMDGPU API Reference
Kernel launching
AMDGPU.@roc
— Macro@roc [kwargs...] func(args...)
High-level interface for executing code on a GPU. The @roc
macro should prefix a call, with func
a callable function or object that should return nothing. It will be compiled to a GCN function upon first use, and to a certain extent arguments will be converted and managed automatically using rocconvert
. Finally, a call to roccall
is performed, scheduling a kernel launch on the specified (or default) HSA queue.
Several keyword arguments are supported that influence the behavior of @roc
.
dynamic
: use dynamic parallelism to launch device-side kernelslaunch
: whether to launch the kernel- arguments that influence kernel compilation: see
rocfunction
anddynamic_rocfunction
- arguments that influence kernel launch: see
AMDGPU.HostKernel
andAMDGPU.DeviceKernel
The underlying operations (argument conversion, kernel compilation, kernel call) can be performed explicitly when more control is needed, e.g. to reflect on the resource usage of a kernel to determine the launch configuration. A host-side kernel launch is done as follows:
args = ...
GC.@preserve args begin
kernel_args = rocconvert.(args)
kernel_tt = Tuple{Core.Typeof.(kernel_args)...}
kernel = rocfunction(f, kernel_tt; compilation_kwargs)
kernel(kernel_args...; launch_kwargs)
end
A device-side launch, aka. dynamic parallelism, is similar but more restricted:
args = ...
# GC.@preserve is not supported
# we're on the device already, so no need to rocconvert
kernel_tt = Tuple{Core.Typeof(args[1]), ...} # this needs to be fully inferred!
kernel = dynamic_rocfunction(f, kernel_tt) # no compiler kwargs supported
kernel(args...; launch_kwargs)
AMDGPU.AbstractKernel
— Type(::HostKernel)(args...; kwargs...)
(::DeviceKernel)(args...; kwargs...)
Low-level interface to call a compiled kernel, passing GPU-compatible arguments in args
. For a higher-level interface, use AMDGPU.@roc
.
The following keyword arguments are supported:
groupsize
orthreads
(defaults to 1)gridsize
orblocks
(defaults to 1)config
: callback function to dynamically compute the launch configuration. should accept aHostKernel
and return a name tuple with any of the above as fields.queue
(defaults to the default queue)
AMDGPU.HostKernel
— Type(::HostKernel)(args...; kwargs...)
(::DeviceKernel)(args...; kwargs...)
Low-level interface to call a compiled kernel, passing GPU-compatible arguments in args
. For a higher-level interface, use AMDGPU.@roc
.
The following keyword arguments are supported:
groupsize
orthreads
(defaults to 1)gridsize
orblocks
(defaults to 1)config
: callback function to dynamically compute the launch configuration. should accept aHostKernel
and return a name tuple with any of the above as fields.queue
(defaults to the default queue)
AMDGPU.rocfunction
— Functionrocfunction(f, tt=Tuple{}; kwargs...)
Low-level interface to compile a function invocation for the currently-active GPU, returning a callable kernel object. For a higher-level interface, use @roc
.
The following keyword arguments are supported:
name
: overrides the name that the kernel will have in the generated codedevice
: chooses which device to compile the kernel forglobal_hooks
: specifies maps from global variable name to initializer hook
The output of this function is automatically cached, i.e. you can simply call rocfunction
in a hot path without degrading performance. New code will be generated automatically, when function definitions change, or when different types or keyword arguments are provided.
Device code API
Thread indexing
HSA nomenclature
AMDGPU.workitemIdx
— FunctionworkitemIdx()::ROCDim3
Returns the work item index within the work group. See also: threadIdx
AMDGPU.workgroupIdx
— FunctionworkgroupIdx()::ROCDim3
Returns the work group index. See also: blockIdx
AMDGPU.workgroupDim
— FunctionworkgroupDim()::ROCDim3
Returns the size of each workgroup in workitems. See also: blockDim
AMDGPU.gridDim
— FunctiongridDim()::ROCDim3
Returns the size of the grid in workitems. This behaviour is different from CUDA where gridDim
gives the size of the grid in blocks.
AMDGPU.gridDimWG
— FunctiongridDimWG()::ROCDim3
Returns the size of the grid in workgroups. This is equivalent to CUDA's gridDim
.
CUDA nomenclature
Use these functions for compatibility with CUDAnative.jl.
AMDGPU.threadIdx
— FunctionthreadIdx()::ROCDim3
Returns the thread index within the block. See also: workitemIdx
AMDGPU.blockIdx
— FunctionblockIdx()::ROCDim3
Returns the block index within the grid. See also: workgroupIdx
AMDGPU.blockDim
— FunctionblockDim()::ROCDim3
Returns the dimensions of the block. See also: workgroupDim
Synchronization
AMDGPU.sync_workgroup
— Functionsync_workgroup()
Waits until all wavefronts in a workgroup have reached this call.
Global Variables
Missing docstring for AMDGPU.get_global_pointer
. Check Documenter's build log for details.