SIMD in Mufi-Lang

Mustafif KhanMustafif Khan
3 min read

While working on the v0.6.0 Mars Release for MufiZ, I decided to try something way beyond what I'm used to doing and decided to try and figure out how to implement SIMD into Mufi.

The Problems

So, let's start with the problems. Firstly, our Value struct, while awesome and all, was a massive pain in the arse to handle when handling SIMD operations for an array.

Secondly, AVX2, which enables the SIMD instructions, is not available on every processor; it's mainly available on AMD/Intel x86_64 CPUS, meaning we can only guarantee this feature for Linux x86_64.

The Solution

So, we fixed our first problem by introducing a new type, FloatVector which contains a double* pointer and acts the same as a static array in Mufi. You can create a new vector fvec() that accepts an array to be converted into a vector or an integer to declare the size.

Inside our build.zig, we have changed our C flags to check if the target we are compiling for is x86_64, if so we compile it with -mavx2 that will store the SIMD instructions and be used if your processor supports AVX2.

To see the performance it brings, consider the following program in Mufi:

var size = 5000;
var f_a = fvec(size);
var f_b = fvec(size);
var a = array(size, true);
var b = array(size, true);

for(var i = 0; i < size; i++){
    push(f_a, double(i*100));
    push(f_b, double(i/10));
    push(a, double(i*10));
    push(b, double(i/10));
}

print "Size = " + str(size);

var start = now_ns();
var result = f_a + f_b;
var end = now_ns();
var vec_time = end - start;
print "FloatVector = " + str(vec_time);

start = now_ns();
var arr_res = a + b;
end = now_ns();
var arr_time = end - start;

print "Array = " + str(arr_time);

print "Array is " + str(arr_time/vec_time) + " slower!";

We create two floating vectors and static arrays with size 5000, then we add in random values to fill them. We then time both processes being added. Then we get the following results:

$ ./zig-out/bin/mufiz -r tests/vec_arr_add.mufi
Size = 5000
FloatVector = 1.8944e+04
Array = 6.1952e+04
Array is 3.27027027027027e+00 slower!

This was run with ReleaseSafe mode the default we package to users, and although this grand optimization is in terms of nanosecond performance, addition is quick, but imagine you need to process a large set of data, then perhaps having this alternative will be great, and in general to show how I'd like to explore different ideas and implement them is cool.

You can try this out today by installing mufiz in the next-experimental release, however, do keep note that not all functions of arrays have been implemented yet.

https://github.com/Mustafif/MufiZ/releases/tag/next-experimental

0
Subscribe to my newsletter

Read articles from Mustafif Khan directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Mustafif Khan
Mustafif Khan