If you start naively without any library that avoids the problem then memory access is the problem. Have a look at how much effort is needed to avoid the problem, for example with blocking algorithms.
Multiplying the content of two x-y matrices together for screen rendering and AI processing. Matrix multiplication provides a series of fast multiply and add operations in parallel, and it is built ...
Researchers at MIT's Computer Science & Artificial Intelligence Lab (CSAIL) have open-sourced Multiply-ADDitioN-lESS (MADDNESS), an algorithm that speeds up machine learning using approximate matrix ...
Matrix multiplication is at the heart of many machine learning breakthroughs, and it just got faster—twice. Last week, DeepMind announced it discovered a more efficient way to perform matrix ...