Streams
Similar to CUDA streams, ROCm has HIP streams, which are buffers used to instruct the GPU hardware which kernels to launch. HIP streams are synchronous, like CUDA streams.
Each device has a default stream associated, which is accessible with AMDGPU.stream().
There are several ways to specify which stream to launch a kernel on:
- Using
AMDGPU.stream!to change default stream to be used within the same Julia task.
stream = AMDGPU.HIPStream()
AMDGPU.stream!(stream) # Change default stream to be used for subsequent operations.
AMDGPU.ones(Float32, 16) # Will be executed on `stream`.- Using
AMDGPU.stream!to execute given function and reset to the original stream after completion:
stream = AMDGPU.HIPStream()
x = AMDGPU.stream!(() -> AMDGPU.ones(Float32, 16), stream)- Using
streamargument to@rocmacro:
stream = AMDGPU.HIPStream()
@roc stream=stream kernel(...)Streams also have an inherent priority, which allows control of kernel submission latency and on-device scheduling preference with respect to kernels submitted on other streams. There are three priorities: normal (the default), low, and high priority.
Priority of the default stream can be set with AMDGPU.priority!. Alternatively, it can be set at stream creation time:
low_prio = HIPStream(:low)
high_prio = HIPStream(:high)
normal_prio = HIPStream(:normal) # or just omit "priority"AMDGPU.stream — Functionstream()::HIPStreamGet the HIP stream that should be used as the default one for the currently executing task.
AMDGPU.stream! — Functionstream!(s::HIPStream)Change the default stream to be used within the same Julia task.
stream!(f::Base.Callable, stream::HIPStream)Change the default stream to be used within the same Julia task, execute f and revert to the original stream.
Returns:
Return value of the function f.
AMDGPU.priority! — Functionpriority!(p::Symbol)Change the priority of the default stream. Accepted values are :normal (the default), :low and :high.
priority!(f::Base.Callable, priority::Symbol)Chnage the priority of default stream, execute f and revert to the original priority. Accepted values are :normal (the default), :low and :high.
Returns:
Return value of the function f.
AMDGPU.HIP.HIPStream — TypeHIPStream(priority::Symbol = :normal)Arguments:
priority::Symbol: Priority of the stream::normal,:highor:low.
Create HIPStream with given priority. Device is the default device that's currently in use.
HIPStream(stream::hipStream_t)Create HIPStream from hipStream_t handle. Device is the default device that's currently in use.
Synchronization
AMDGPU.jl by default uses non-blocking stream synchronization with AMDGPU.synchronize to work correctly with TLS and Hostcall.
Users, however, can switch to a blocking synchronization globally with nonblocking_synchronization preference or with fine-grained AMDGPU.synchronize(; blocking=true). Blocking synchronization might offer slightly lower latency.
You can also perform synchronization of the expression with AMDGPU.@sync macro, which will execute given expression and synchronize afterwards (using AMDGPU.synchronize under the hood).
AMDGPU.@sync begin
@roc ...
endFinally, you can perform full device synchronization with AMDGPU.device_synchronize.
AMDGPU.synchronize — Functionsynchronize(stream::HIPStream = stream(); blocking::Bool = false)Wait until all kernels executing on stream have completed.
If there are running HostCalls, then blocking must be false. Additionally, if you want to stop host calls afterwards, then provide stop_hostcalls=true keyword argument.
AMDGPU.@sync — Macro@sync exRun expression ex on currently active stream and synchronize the GPU on that stream afterwards.
See also: synchronize.
AMDGPU.device_synchronize — FunctionBlocks until all kernels on all streams have completed. Uses currently active device.