AMDGPU API Reference
Kernel launching
AMDGPU.@roc — Macro@roc [kwargs...] func(args...)High-level interface for executing code on a GPU. The @roc macro should prefix a call, with func a callable function or object that should return nothing. It will be compiled to a GCN function upon first use, and to a certain extent arguments will be converted and managed automatically using rocconvert. Finally, a call to roccall is performed, scheduling a kernel launch on the specified (or default) HSA queue.
Several keyword arguments are supported that influence the behavior of @roc.
dynamic: use dynamic parallelism to launch device-side kernels- arguments that influence kernel compilation: see
rocfunctionanddynamic_rocfunction - arguments that influence kernel launch: see
AMDGPU.HostKernelandAMDGPU.DeviceKernel
The underlying operations (argument conversion, kernel compilation, kernel call) can be performed explicitly when more control is needed, e.g. to reflect on the resource usage of a kernel to determine the launch configuration. A host-side kernel launch is done as follows:
args = ...
GC.@preserve args begin
kernel_args = rocconvert.(args)
kernel_tt = Tuple{Core.Typeof.(kernel_args)...}
kernel = rocfunction(f, kernel_tt; compilation_kwargs)
kernel(kernel_args...; launch_kwargs)
endA device-side launch, aka. dynamic parallelism, is similar but more restricted:
args = ...
# GC.@preserve is not supported
# we're on the device already, so no need to rocconvert
kernel_tt = Tuple{Core.Typeof(args[1]), ...} # this needs to be fully inferred!
kernel = dynamic_rocfunction(f, kernel_tt) # no compiler kwargs supported
kernel(args...; launch_kwargs)AMDGPU.AbstractKernel — Type(::HostKernel)(args...; kwargs...)
(::DeviceKernel)(args...; kwargs...)Low-level interface to call a compiled kernel, passing GPU-compatible arguments in args. For a higher-level interface, use AMDGPU.@roc.
The following keyword arguments are supported:
groupsizeorthreads(defaults to 1)gridsizeorblocks(defaults to 1)config: callback function to dynamically compute the launch configuration. should accept aHostKerneland return a name tuple with any of the above as fields.queue(defaults to the default queue)
AMDGPU.HostKernel — Type(::HostKernel)(args...; kwargs...)
(::DeviceKernel)(args...; kwargs...)Low-level interface to call a compiled kernel, passing GPU-compatible arguments in args. For a higher-level interface, use AMDGPU.@roc.
The following keyword arguments are supported:
groupsizeorthreads(defaults to 1)gridsizeorblocks(defaults to 1)config: callback function to dynamically compute the launch configuration. should accept aHostKerneland return a name tuple with any of the above as fields.queue(defaults to the default queue)
AMDGPU.rocfunction — Functionrocfunction(f, tt=Tuple{}; kwargs...)Low-level interface to compile a function invocation for the currently-active GPU, returning a callable kernel object. For a higher-level interface, use @roc.
The following keyword arguments are supported:
name: override the name that the kernel will have in the generated code
The output of this function is automatically cached, i.e. you can simply call rocfunction in a hot path without degrading performance. New code will be generated automatically, when when function changes, or when different types or keyword arguments are provided.
Device code API
Thread indexing
HSA nomenclature
AMDGPU.workitemIdx — FunctionworkitemIdx()::ROCDim3Returns the work item index within the work group. See also: threadIdx
AMDGPU.workgroupIdx — FunctionworkgroupIdx()::ROCDim3Returns the work group index. See also: blockIdx
AMDGPU.workgroupDim — FunctionworkgroupDim()::ROCDim3Returns the size of each workgroup in workitems. See also: blockDim
AMDGPU.gridDim — FunctiongridDim()::ROCDim3Returns the size of the grid in workitems. This behaviour is different from CUDA where gridDim gives the size of the grid in blocks.
AMDGPU.gridDimWG — FunctiongridDimWG()::ROCDim3Returns the size of the grid in workgroups. This is equivalent to CUDA's gridDim.
CUDA nomenclature
Use these functions for compatibility with CUDAnative.jl.
AMDGPU.threadIdx — FunctionthreadIdx()::ROCDim3Returns the thread index within the block. See also: workitemIdx
AMDGPU.blockIdx — FunctionblockIdx()::ROCDim3Returns the block index within the grid. See also: workgroupIdx
AMDGPU.blockDim — FunctionblockDim()::ROCDim3Returns the dimensions of the block. See also: workgroupDim
Synchronization
AMDGPU.sync_workgroup — Functionsync_workgroup()Waits until all wavefronts in a workgroup have reached this call.
Global Variables
Missing docstring for AMDGPU.get_global_pointer. Check Documenter's build log for details.