Streams

Similar to CUDA streams, ROCm has HIP streams, which are buffers used to instruct the GPU hardware which kernels to launch. HIP streams are synchronous, like CUDA streams.

Each device has a default stream associated, which is accessible with AMDGPU.stream().

There are several ways to specify which stream to launch a kernel on:

Using AMDGPU.stream! to change default stream to be used within the same Julia task.

stream = AMDGPU.HIPStream()
AMDGPU.stream!(stream) # Change default stream to be used for subsequent operations.
AMDGPU.ones(Float32, 16) # Will be executed on `stream`.

Using AMDGPU.stream! to execute given function and reset to the original stream after completion:

stream = AMDGPU.HIPStream()
x = AMDGPU.stream!(() -> AMDGPU.ones(Float32, 16), stream)

Using stream argument to @roc macro:

stream = AMDGPU.HIPStream()
@roc stream=stream kernel(...)

Streams also have an inherent priority, which allows control of kernel submission latency and on-device scheduling preference with respect to kernels submitted on other streams. There are three priorities: normal (the default), low, and high priority.

Priority of the default stream can be set with AMDGPU.priority!. Alternatively, it can be set at stream creation time:

low_prio = HIPStream(:low)
high_prio = HIPStream(:high)
normal_prio = HIPStream(:normal) # or just omit "priority"

AMDGPU.stream — Function

stream()::HIPStream

Get the HIP stream that should be used as the default one for the currently executing task.

source

AMDGPU.stream! — Function

stream!(s::HIPStream)

Change the default stream to be used within the same Julia task.

source

stream!(f::Base.Callable, stream::HIPStream)

Change the default stream to be used within the same Julia task, execute f and revert to the original stream.

Returns:

Return value of the function f.

source

AMDGPU.priority! — Function

priority!(p::Symbol)

Change the priority of the default stream. Accepted values are :normal (the default), :low and :high.

source

priority!(f::Base.Callable, priority::Symbol)

Chnage the priority of default stream, execute f and revert to the original priority. Accepted values are :normal (the default), :low and :high.

Returns:

Return value of the function f.

source

AMDGPU.HIP.HIPStream — Type

HIPStream(priority::Symbol = :normal)

Arguments:

priority::Symbol: Priority of the stream: :normal, :high or :low.

Create HIPStream with given priority. Device is the default device that's currently in use.

source

HIPStream(stream::hipStream_t)

Create HIPStream from hipStream_t handle. Device is the default device that's currently in use.

source

Synchronization

AMDGPU.jl by default uses non-blocking stream synchronization with AMDGPU.synchronize to work correctly with TLS and Hostcall.

Users, however, can switch to a blocking synchronization globally with nonblocking_synchronization preference or with fine-grained AMDGPU.synchronize(; blocking=true). Blocking synchronization might offer slightly lower latency.

You can also perform synchronization of the expression with AMDGPU.@sync macro, which will execute given expression and synchronize afterwards (using AMDGPU.synchronize under the hood).

AMDGPU.@sync begin
    @roc ...
end

Finally, you can perform full device synchronization with AMDGPU.device_synchronize.

AMDGPU.synchronize — Function

synchronize(stream::HIPStream = stream(); blocking::Bool = false)

Wait until all kernels executing on stream have completed.

If there are running HostCalls, then blocking must be false. Additionally, if you want to stop host calls afterwards, then provide stop_hostcalls=true keyword argument.

source

AMDGPU.@sync — Macro

@sync ex

Run expression ex on currently active stream and synchronize the GPU on that stream afterwards.