Streams
Similar to CUDA streams, ROCm has HIP streams, which are buffers used to instruct the GPU hardware which kernels to launch. HIP streams are synchronous, like CUDA streams.
Each device has a default stream associated, which is accessible with AMDGPU.stream()
.
There are several ways to specify which stream to launch a kernel on:
- Using
AMDGPU.stream!
to change default stream to be used within the same Julia task.
stream = AMDGPU.HIPStream()
AMDGPU.stream!(stream) # Change default stream to be used for subsequent operations.
AMDGPU.ones(Float32, 16) # Will be executed on `stream`.
- Using
AMDGPU.stream!
to execute given function and reset to the original stream after completion:
stream = AMDGPU.HIPStream()
x = AMDGPU.stream!(() -> AMDGPU.ones(Float32, 16), stream)
- Using
stream
argument to@roc
macro:
stream = AMDGPU.HIPStream()
@roc stream=stream kernel(...)
Streams also have an inherent priority, which allows control of kernel submission latency and on-device scheduling preference with respect to kernels submitted on other streams. There are three priorities: normal (the default), low, and high priority.
Priority of the default stream
can be set with AMDGPU.priority!
. Alternatively, it can be set at stream creation time:
low_prio = HIPStream(:low)
high_prio = HIPStream(:high)
normal_prio = HIPStream(:normal) # or just omit "priority"
AMDGPU.stream
— Functionstream()::HIPStream
Get the HIP stream that should be used as the default one for the currently executing task.
AMDGPU.stream!
— Functionstream!(s::HIPStream)
Change the default stream to be used within the same Julia task.
stream!(f::Base.Callable, stream::HIPStream)
Change the default stream to be used within the same Julia task, execute f
and revert to the original stream.
Returns:
Return value of the function f
.
AMDGPU.priority!
— Functionpriority!(p::Symbol)
Change the priority of the default stream. Accepted values are :normal
(the default), :low
and :high
.
priority!(f::Base.Callable, priority::Symbol)
Chnage the priority of default stream, execute f
and revert to the original priority. Accepted values are :normal
(the default), :low
and :high
.
Returns:
Return value of the function f
.
AMDGPU.HIP.HIPStream
— TypeHIPStream(priority::Symbol = :normal)
Arguments:
priority::Symbol
: Priority of the stream::normal
,:high
or:low
.
Create HIPStream with given priority. Device is the default device that's currently in use.
HIPStream(stream::hipStream_t)
Create HIPStream from hipStream_t
handle. Device is the default device that's currently in use.
Synchronization
AMDGPU.jl by default uses non-blocking stream synchronization with AMDGPU.synchronize
to work correctly with TLS and Hostcall.
Users, however, can switch to a blocking synchronization globally with nonblocking_synchronization
preference or with fine-grained AMDGPU.synchronize(; blocking=true)
. Blocking synchronization might offer slightly lower latency.
You can also perform synchronization of the expression with AMDGPU.@sync
macro, which will execute given expression and synchronize afterwards (using AMDGPU.synchronize
under the hood).
AMDGPU.@sync begin
@roc ...
end
Finally, you can perform full device synchronization with AMDGPU.device_synchronize
.
AMDGPU.synchronize
— Functionsynchronize(stream::HIPStream = stream(); blocking::Bool = false)
Wait until all kernels executing on stream
have completed.
If there are running HostCalls, then blocking
must be false
. Additionally, if you want to stop host calls afterwards, then provide stop_hostcalls=true
keyword argument.
AMDGPU.@sync
— Macro@sync ex
Run expression ex
on currently active stream and synchronize the GPU on that stream afterwards.
See also: synchronize
.
AMDGPU.HIP.device_synchronize
— FunctionBlocks until all kernels on all streams have completed. Uses currently active device.