Graphs
Graphs allow capturing GPU kernels and executing them as one unit, reducing host overhead.
Simple operations can be captured as is:
using AMDGPU
f!(o) = o .+= one(eltype(o))
z = AMDGPU.zeros(Int, 4, 4)
graph = AMDGPU.@captured f!(z)
@assert sum(z) == 16
AMDGPU.launch(graph)
@assert sum(z) == 16 * 2However, if your code contains more complex flow, it requires more preparations:
code must not result in hostcall invokation.
if code contains malloc and respective frees, then it can be captured and relaunched as is.
if code contains only allocations (without freeing), allocations must be cached with
GPUArrays.@cachedbeforehand (see example below).other unsupported operations (e.g. RNG init) must be done beforehand as well.
updating graph, does not update allocated pointers, only instantiation is supported in such cases.
using AMDGPU, GPUArrays
function f(o)
x = AMDGPU.rand(Float32, size(o))
y = AMDGPU.rand(Float32, size(o))
o .+= sin.(x) * cos.(y) .+ 1f0
return
end
cache = GPUArrays.AllocCache()
z = AMDGPU.zeros(Float32, 256, 256)
N = 10
# Execute function normally and cache all allocations.
GPUArrays.@cached cache f(z)
# Capture graph using AllocCache to avoid capturing malloc/free calls.
graph = GPUArrays.@cached cache AMDGPU.@captured f(z)
# Allocations cache must be kept alive while executing graph.
for i in 1:N
AMDGPU.launch(graph)
end
AMDGPU.synchronize()AMDGPU.HIP.capture Function
capture(f::Function; flags = hipStreamCaptureModeGlobal, throw_error::Bool = true)::Union{Nothing, HIPGraph}Capture fiven function f to a graph. If successful, returns a captured graph that needs to be instantiate'd to obtain executable graph.
AMDGPU.HIP.@captured Macro
graph = AMDGPU.@captured begin
# code to capture in a graph.
endMacro to capture a given expression in a graph & execute it. Returns captured graph, that can be relaunched with launch or updated with update.
If capture fails (e.g. due to JIT), attempts recovery, compilation and re-capture.
sourceAMDGPU.HIP.instantiate Function
instantiate(graph::HIPGraph)::HIPGraphExecInstantiate captured graph making it executable with launch.
AMDGPU.HIP.update Function
update(exec::HIPGraphExec, graph::HIPGraph; throw_error::Bool = true)::BoolGiven executable graph, perform update with graph. Return true if successful, false otherwise.
If throw_error=false allows avoiding throwing an exception if update was not successful.
AMDGPU.HIP.is_capturing Function
is_capturing(stream::HIPStream = AMDGPU.stream())::BoolFor a given stream check if capturing for a graph is performed.
AMDGPU.HIP.launch Function
launch(exec::HIPGraphExec, stream::HIPStream = AMDGPU.stream())Launch executable graph on a given stream.
source