Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Matrix-Vector Multiplication

In this design, one or multiple AI Engine compute cores (spread across hardware columns, configurable as n_cores) perform a matrix-vector multiplication. We use a bfloat16 data type, and the dimensions of the A matrix M×K are set to 288×288 by default (N, the number of columns in B, is always 1, since B is a vector). The kernel itself consumes chunks of 32×32 (M×K) of A, so it is invoked multiple times to complete the full result.

This design relies on the same basic concepts as the whole-array matrix-matrix multiplication design, and it is structured very similarly to that design. Please refer to the in-depth explanation of that design along with the below outlined differences for a better understanding of this design.

  • A specialized matrix-vector microkernel, named matvec_vectorized is used in this design, as opposed to the more general matrix-matrix microkernel (matmul_vectorized) used in the matrix-matrix-multiplication designs.
  • The data movement in this design varies as follows: An identical 32-element chunk of the vector B is broadcast to the cores in all columns, whereas distinct subsequent 32×32-sized tiles of the A matrix are distributed to the cores. As such, each core is responsible for a distinct 32-element chunk of the output vector C. These chunks are assembled (joined) at the shim tile level (in the aiex.runtime_sequence()).
  • This design does not use all available compute cores. Instead, it uses at most one core in each hardware column. The variable n_cores defines the number of columns to be used. It would however be possible to extend this design to use all cores.

Building and Running the Design

You need C++23 for bfloat16_t support. It can be found in g++-13: https://lindevs.com/install-g-on-ubuntu

To compile design:

make
make matrixVectorMultiplication.exe

To run the design:

make run