Parallel computing and GPU programming with Julia¶

Introduction¶

Alexis Montoison

No description has been provided for this image

There are many types of parallelism:

  • Instruction level parallelism (e.g. SIMD)
  • Multi-threading (shared memory)
  • Multi-processing (shared system memory)
  • Distributed processing (typically no shared memory)

And then there are highly-parallel hardware accelerators like GPUs.

Important: At the center of any efficient parallel code is a fast serial code!!!

When go parallel?¶

  • If parts of your (optimized!) serial code aren't fast enough.
    • note that parallelization typically increases the code complexity.
  • If your system has multiple execution units (CPU cores, GPU streaming multiprocessors, ...).
    • particularly important on large supercomputers but also already on modern desktop computers and laptops.
No description has been provided for this image
No description has been provided for this image

How many CPU threads / cores do I have?¶

In [ ]:
using Hwloc
Hwloc.num_physical_cores()

Note that there may be more than one CPU thread per physical CPU core (e.g. hyperthreading).

In [ ]:
Sys.CPU_THREADS

Amdahl's law¶

Naive strong scaling expectation: I have 4 cores, give me my 4x speedup!

If $p$ is the fraction of a code that can be parallelized, then the maximal theoretical speedup by parallelization on $n$ cores is given by $$ F(n) = \frac{1}{1 - p + p / n} $$

In [ ]:
using Plots
F(p,n) = 1/(1-p + p/n)

pl = plot()
for p in (0.5, 0.7, 0.9, 0.95, 0.99)
    plot!(pl, n -> F(p,n), 1:128, lab="p=$p", lw=2,
        legend=:topleft, xlab="number of cores", ylab="parallel speedup", frame=:box)
end
pl

Parallel computing in Julia¶

Julia provides support for all types of parallelism mentioned above

Instruction level parallelism (e.g. SIMD) → @simd, SIMD.jl, ...
Multi-threading (shared memory) → Base.Threads, ThreadsX.jl, FLoops.jl, ..
Multi-processing (shared system memory) → Distributed.jl, MPI.jl, ...
Distributed processing (typically no shared memory) → Distributed.jl, MPI.jl, ...
GPU programming → CUDA.jl, AMDGPU.jl, oneAPI.jl, KernelAbstractions.jl, ...

Reference: JuliaUCL24