Wavefront Operations

These intrinsics provide efficient operations across wavefronts.

AMDGPU.wfredFunction
wfred(op::Function, val::T) where T -> T

Performs a wavefront-wide reduction on val in each lane, and returns the result. A limited subset of functions are available to be passed as op. When op is one of (+, max, min, &, |, ⊻), T may be <:Union{Cint, Clong, Cuint, Culong}. When op is one of (+, max, min), T may also be <:Union{Float32, Float64}.

source