This guide is intended to familiarize the reader with the concepts of data parallel programming, namely the SIMD paradigm ("Single Instruction, Multiple Data"); another common name for this type of programming is vectorization. The presentation of the material aims to lead to a deeper theoretical understanding of the topic, rather than offering a collection of idioms or bag of tricks. If I have done my job well, even programmers new to data parallelism should be enabled to quickly fill their own bag of tricks as they practice this art, regardless of the specific machine they work with. Seasoned veterans should find the underlying theory useful for larger scale architecting of SIMD algorithms (or possibly even for architecting SIMD processors). The terminology presented here will enable some sound reasoning about capabilities of any vector processor with respect to the qualities and requirements of any algorithm.
The reader is expected to be a programmer knowing the basics of modern processor architecture: pipelining, superscalarism, branch prediction, caching, memory management. Ideally, you already have some experience in thorough performance optimization of compute intensive workloads.