In fact, most of the core algorithms used in deep learning and scientific/HPC applications are based on matrix multiplication.
Matrix multiplication is highly parallelizable, and each computation is independent of the others, which allows thousands o...