Strassen's Algorithm and The New Google Breakthrough

Introduction

Strassen’s algorithm, an often overlooked yet omnipresent section of technical curricula worldwide, is frequently reduced to an exercise of memorizing formulas and matrix positions. Only a few students take the time to appreciate why this half a century old algorithm fundamentally transformed the paradigm of computational mathematics. Today, on the wake of Google’s announcement for its Gemini-powered coding assistant AlphaEvolve, this algorithm is in the headlines of the technical communities once again.

New AI agent evolves algorithms for math and practical applications in computing by combining the creativity of large language models with automated evaluators

This blog is intended to serve as a concise guide for readers unfamiliar with the significance of matrix multiplication and Strassen’s algorithm. It does not intend to dive deep into the inner workings of AlphaEvolve or the complex math behind it, but rather to help you build a clear and simple intuition about why this breakthrough actually matters in the world of computation.

We will cover this journey using the WHW principle - the What, the How and the Why

An Overview of the Strassen’s Algorithm

Strassen’s algorithm or Strassen’s method is consequence of a keen observation made by German Mathematician Volker Strassen regarding sub-block multiplication of matrices.

Volker Strassen presenting at SODA 2009 Volker Strassen giving the Knuth Prize lecture at SODA 2009. Photo by Jeff Kushner, licensed under CC BY 3.0. Source.

Volker Strassen giving the Knuth Prize lecture at SODA 2009; Source

Before we move any further with this discussion, an obvious but interesting thing to note here is that the computational overhead of multiplication is much higher than that of addition. While this is quite an intuitive derivation, it forms the core motivation behind Strassen’s method.

In practical terms, this means that when computers perform large-scale matrix operations, the time and energy consumed by multiplications can significantly outweigh that of additions. Multiplication involves more complex circuitry and gates and longer processing time than addition, which is relatively straightforward and lightweight.

Now, in standard matrix multiplication of 2×2 blocks as follows :

$$A = \begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix}, \quad B = \begin{bmatrix} B_{11} & B_{12} \\ B_{21} & B_{22} \end{bmatrix},$$

we would require a total of 8 block multiplications as follows :

$$\begin{bmatrix} C_{11} & C_{12} \\ C_{21} & C_{22} \end{bmatrix} = \begin{bmatrix} A_{11}B_{11} + A_{12}B_{21} & A_{11}B_{12} + A_{12}B_{22} \\ A_{21}B_{11} + A_{22}B_{21} & A_{21}B_{12} + A_{22}B_{22} \end{bmatrix}$$

However, what Strassen did was reframe the above results by defining these 7 inter-mediate values as :

$$\begin{aligned} M_1 &= (A_{11} + A_{22})(B_{11} + B_{22}) \\ M_2 &= (A_{21} + A_{22})B_{11} \\ M_3 &= A_{11}(B_{12} - B_{22}) \\ M_4 &= A_{22}(B_{21} - B_{11}) \\ M_5 &= (A_{11} + A_{12})B_{22} \\ M_6 &= (A_{21} - A_{11})(B_{11} + B_{12}) \\ M_7 &= (A_{12} - A_{22})(B_{21} + B_{22}) \end{aligned}$$

Now, based on these intermediate values, we can get the values of :

$$\begin{aligned} C_{11} &= M_1 + M_4 - M_5 + M_7 \\ C_{12} &= M_3 + M_5 \\ C_{21} &= M_2 + M_4 \\ C_{22} &= M_1 - M_2 + M_3 + M_6 \end{aligned}$$

Now, this is the point where you should start noticing things : starting with the number of multiplications operations being performed in the second case versus that in the first case.

Have you made that observation already?

Voila! By mere re-arrangement of results, the number of multiplications can be brought down from 8 to 7, a change that irrespective of appearing minor has pretty great implications on our computational capabilities.

Impact on Time Complexity

To fathom what impact the above intuition from Strassen had on the computational world, we need to take a look at the time complexities :

Traditional Matrix Multiplication — A Closer Look

When multiplying two square matrices of size n*n, the process follows a well-known and straightforward approach.

Each element in the resulting matrix is calculated by taking the dot product of the i-th row of matrix A and the j-th column of matrix B. Mathematically, this is represented as:

$$C_{ij} = \sum_{k=1}^{n} A_{ik} \cdot B_{kj}$$

Since there are n² elements in the output matrix CC, we repeat this process n2n^2 times. For each individual element, we perform :

n multiplications
(n−1) additions

So, the total number of operations across the full matrix becomes:

$$T(n) = n^3\ \text{multiplications} + (n^3 - n^2)\ \text{additions}$$

As we have previously discussed, the time complexity of the operation asymptotically depends on the number of multiplications much more than the additions.

$$\boxed{O(n^3)}$$

Strassen’s Time Complexity

Now, Stressen’s method breaks down the the multiplication of n*n matrices into four n/2 * n/2 sub-matrices and the multiplication is performed in this recursive fashion :

$$T(n) = 7 \cdot T\left(\frac{n}{2}\right) + O(n^2)$$

Breaking this down

The ‘7’ recursive calls correspond to the ‘7’ submatrix operations and the O(n²) accounts for the additions and subtractions in each step.

Now, using the Master Theorem form, we can solve the above Recurrence Relation to get :

$$T(n) = 7 \cdot T\left(\frac{n}{2}\right) + O(n^2) \\$$

$$T(n) = a \cdot T\left(\frac{n}{b}\right) + f(n) \\$$

$$Here : a = 7,\ b = 2,\ f(n) = O(n^2) \Rightarrow d = 2$$

$$Compute : \log_b a: \log_b a = \log_2 7$$

$$Approximate : \log_2 7: \log_2 7 \approx 2.807$$

$$Compare : \log_b a: \\ d = 2 \\ \log_2 7 \approx 2.807 \\$$

$$Comparison :\\ d < \log_b a \quad (2 < 2.807) \\$$

$$\text{Master Theorem Case 1 applies:} \\$$

$$\text{Final Time Complexity: } \\ T(n) = \Theta(n^{\log_2 7}) \\$$

$$\text{Approximate complexity : } \\ T(n) \approx \Theta(n^{2.81}) \\$$

$$\text{In Big-O notation : } \\ T(n) = O(n^{\log_2 7}) \boxed{\approx O(n^{2.81})}$$

The Case of 4×4 Matrix Multiplication : And What AlphaEvolve Brings to the Table

Now, from the derivations above, it is more than clear that multiplication of two 4×4 matrices would require a total of 7×7 = 49 sub-multiplications at a scaler level using Strassen’s method.

Now this has been the standard and most optimal result for over 5 decades but this little result just witnessed a transformation this week with Google AlphaEvolve, which has been able to find a way to accomplish this aforementioned task in 48 sub-block multiplication operations.

The How?

Now that the motivation(the why) and the case in consideration(the what) have been discussed, it’s time to discuss the how at a very high level.

Before moving any further, let me link the original release of Google AlphaEvolve from Google DeepMind because we will be referring to the same quite often for now.

AlphaEvolve: A coding agent for scientific and algorithmic discovery

Since the pioneering work of Strassen [94], it has been known that a rich space of algorithms for multiplying two matrices can be represented as decompositions of a given 3D tensor into rank-one tensors. The rank (number of terms) of the decomposition exactly specifies the number of scalar multiplications needed to compute the matrix product. Hence, to develop faster matrix multiplication algorithms one needs to find low-rank decompositions of particular tensors.

The above excerpt is from the official release of AlphaEvolve. Let’s take a deeper look into it -

Let’s do a little exercise to know what exactly is a tensor here?

Think of a 4×4 matrix product as a 3D block of numbers a tensor with dimensions (4, 4, 4).
Any way you multiply two matrices corresponds to slicing that tensor into simple pieces (rank‑one tensors).
The number of slices you use equals the number of scalar multiplications you perform.

Our Goal: Find the smallest number of slices (rank) that still exactly reconstructs the full tensor.

How AlphaEvolve approaches this exercise?

AlphaEvolve approaches a this lower rank decomposition hunt as a genuine task of evolutionary discovery. Again, in Google’s own words:

Starting from the problem description and a standard gradient-based algorithm (including an initializer, a reconstruction loss function, and an Adam optimizer [49]), AlphaEvolve is able to develop sophisticated tensor decomposition algorithms that outperform existing approaches.

Let us take a step back and try to understand that here, each candidate solution is embodied in a little program whose job is to break the 4×4×4 multiplication tensor into simpler pieces. An Adam optimizer lies at the heart of this process, tuning the tensor factors to reduce reconstruction error.

Once a batch of candidate programs is generated, AlphaEvolve must tell which ones are worth keeping. For each evolved program, it :

chooses a set of matrix multiplication targets and run[s] the algorithm, initialized with multiple random seeds using the evaluation cascade described in Section 2.4. The performance is then measured as the best (lowest) rank achieved on each target as well as the fraction of seeds that achieved this rank, providing a signal for AlphaEvolve to hill‑climb.

The creative spark comes from asking the Large Language Model to experiment with its own code.

introducing several original ideas to design increasingly better algorithms. … seeding the initial program with our own ideas (such as adding stochasticity to the evaluation function or using evolutionary approaches) could further boost performance, highlighting the possibility of scientific collaboration between researchers and AlphaEvolve.

Over time, this loop of generate → evaluate → mutate → select drives the population toward ever‑lower tensor ranks. And after enough iterations, one variant emerges that encodes the multiplication tensor with just 48 rank‑one pieces. As the documentation proudly states:

For 56 years, designing an algorithm with rank less than 49 over any field with characteristic 0 was an open problem. AlphaEvolve is the first method to find a rank‑48 algorithm to multiply two 4 × 4 complex‑valued matrices.

Is that all?

Definitely not. The original Google DeepMind blog presents far more than this isolated result of improved matrix multiplications, and I strongly encourage all readers to explore it. As large language models continue to advance in our rapidly evolving landscape, it is vital to appreciate the foundational principles of computer science, mathematics, and their interdisciplinary fields and to recognize how each discipline can be significantly enhanced by age-of-the-art AI advents.

Strassen’s Algorithm and Google AlphaEvolve