Electrical Engineering Seminar and Special Problems – VECTOR COMPUTER ARCHITECTURE AND PROGRAMMING
Vector processing is a highly successful architectural approach taken for accelerating certain kinds of software, particularly algorithms with dense matrix operations. Despite being around since the late 1960s, vector architectures have not yet become highly available in mainstream instruction sets. However, that may soon change with the recent introduction of vector instructions being added to the open RISC-V instruction set architecture. Formally approved in late 2021, the RISC-V Vector Specification has been highly awaited and many implementations are expected to become available on the market.
To exploit vector instruction sets, users must become accustomed to thinking about operations on sets of data rather than individual elements. Very often, to exploit vector instructions, data and code needs to be organized in a different way than for a traditional scalar instruction set.
The purpose of this study course is to gain an intimate knowledge of the RISC-V vector architecture and learn how to best exploit it by writing vector-parallel code.
DETAILS
Vector processors obtain their high performance through execution of parallel pipelined functional units at high clock speeds. The key to performance, then, is keeping these functional units busy (highly utilized) with real operations from software. At the most basic level, this involves supplying the functional units with data on a timely basis. The methods used for fetching and providing this data to the parallel functional units is what makes vector architectures unique, and this uniqueness necessarily comes out in the instruction set and the programming model.
In the last 5 years, there has been a resurgence in the need for a major vector approach that is compatible across vendors. The RISC-V Foundation has been designing a proposal for an open vector instruction set which allows for many possible different architectural implementations. There is concurrent work on compiler development to simplify programming these forthcoming CPU designs. However, the best way to exploit the instruction set is to understand the architecture and directly program it at a low-level using vector intrinsics, which are akin to adding assembly instructions as functions directly in C code.
RVV adds new challenges with support for other data element sizes (bytes, halfwords, and doublewords), exploiting subword-SIMD parallelism, and adding mixed-width operations to support data conversion. RVV also adds its own unique instructions for executing data-conditional operations.
Emphasis will be placed on techniques that are needed to exploit vector processing, such as loop interchange, loop fusion, struct-of- array data types, prefetching, and overlapping computation and communication. As well, proper benchmarking techniques and limitations of achieving high speed-up such as Amdahl’s Law and Gustafson’s Law will be studied. A variety of other vector architectures will also be investigated to add breadth.
COURSE STRUCTURE AND ASSESSMENT
The course will meet once per week and consist of assigned readings, paper summaries/presentations, and programming assignments. The focus is to understand the underlying vector architecture in sufficient detail to enable programming of common numerical algorithms such as correlations, FIR or IIR filters, data conversions, matrix multiplication, and image processing. As much as possible, a complete programming environment will be used to measure actual runtimes. In cases where that level of detail is not possible, a more analytical approach will be used.
Grades will be divided into six modules, roughly every 2 weeks, with the focus of a single vector approach in each module. In the first week of each module, an architectural understanding will be developed and the necessary programming information and tools will be collected.
In the second week, a programming assignment will be completed.
3 credits
More Information