17

Jul 1, 2024

Aman Salykov writes:

This blog post is the result of my attempt to implement high-performance matrix multiplication on CPU while keeping the code simple, portable and scalable. The implementation follows the BLIS design, works for arbitrary matrix sizes, and, when fine-tuned for an AMD Ryzen 7700 (8 cores), outperforms NumPy (=OpenBLAS), achieving over 1 TFLOPS of peak performance across a wide range of matrix sizes.

By efficiently parallelizing the code with just 3 lines of OpenMP directives, it’s both scalable and easy to understand. The implementation hasn’t been tested on other CPUs, so I would appreciate feedback on its performance on your hardware. Although the code is portable and targets Intel Core and AMD Zen CPUs with FMA3 and AVX instructions (i.e., all modern Intel Core and AMD Zen CPUs), please don’t expect peak performance without fine-tuning the hyperparameters, such as the number of threads, kernel, and block sizes, unless you are running it on a Ryzen 7700(X). Additionally, on some Intel CPUs, the OpenBLAS implementation might be notably faster due to AVX-512 instructions, which were intentionally omitted here to support a broader range of processors. Throughout this tutorial, we’ll implement matrix multiplication from scratch, learning how to optimize and parallelize C code using matrix multiplication as an example.

Read Beating NumPy's matrix multiplication in 150 lines of C code

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here
this post was submitted on 04 Jul 2024
17 points (90.5% liked)

Programming

16304 readers
75 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev



founded 1 year ago
MODERATORS