Data Parallelism, 10000ft View

The most central idea of the SIMD paradigm is given away in its name: the idea that a single instruction works on multiple items of data in parallel. This idea was not new when the first such machines were built. Mathematics had introduced uniform fixed-size groups of numbers long before under the name "vector"; the rules that govern computation with vectors are well understood and are called "Linear Algebra". The term "vectorization" applied to data parallel programming originated in those early days.

The very first SIMD program ever written was not meant to be run, and probably looked very similar to this:

C = A + B,

where A, B, and C are vectors. That means that each individual variable is really an abbreviation for a group of numbers:

A = (a0, a1, a2, a3, ..., aN),
B = (b0, b1, b2, b3, ..., bN),
C = (c0, c1, c2, c3, ..., cN).

Consequently, the single plus sign is really an abbreviation for a group of parallel additions:

c0 = a0 + b0,
c1 = a1 + b1,
c2 = a2 + b2,
cN = aN + bN.

So we have a single instruction, +, working on all the as, bs, and cs in parallel. This is the most basic design pattern of SIMD programming:

- there is a number of identical computations carried out in parallel,

- the individual data items are grouped to have the same meaning (illustrated by the vector A containing all as, and so on),

- the individual data items flow straight through (i.e. they do not change their position to another index),

- there are no other dependencies between the data items.

Whenever a program follows this simplest possible pattern of data parallelism, then the compiler is probably capable of vectorizing it for you automatically. For this reason, I will call this example boring, and move on to more interesting cases. I will revisit it later, when I cover some of the practical concerns of real hardware.