NN Architectures as Generalized Algorithms

1) Core thesis

  • We can treat neural architectures as a generalization of algorithms.

  • There is a spectrum of algorithmic content:

    • On one end, simple feedforward MLPs carry almost no structure — they define little more than a circuit of fixed depth and width, so the trained parameters must encode nearly everything.

    • In the middle, architectures like CNNs, RNNs, or Transformers embed progressively stronger algorithmic priors (locality, recurrence, routing, invariance).

    • On the far end, an architecture without any learnable weights is simply an algorithm: its behavior is entirely fixed, with no parametric degrees of freedom.

  • This perspective is useful: it lets us carry over reasoning from algorithms and complexity (depth, width, dataflow, adaptivity) to architectural design.


2) Scalars

  • Although implemented as floats, individual scalars carry O(1) task-relevant bits. Treat them as learnable bits, not reals with meaningful infinite precision.

  • Differentiability uses reals as an optimization device, not as a semantic claim about per-scalar information content.


3) Vectors

  • Vectors are the atomic representational unit. A single scalar—just like a single bit or letter—carries no meaning on its own; a vector can carry a basic value/meaning, just like a word.

  • Coordinate form is a convenience; other vector encodings are possible. The key point is that composition starts at the vector level.


4) Fully connected layers

  • Fully connected layers operate at the same atomic level as vectors: they map one vector to another (per-atom nonlinear mixing). They are like hard-coded primitive circuits.

  • They do not expose or exploit structure across multiple atoms and do not factorize problems on their own.

  • Misuse: concatenating multiple vectors and feeding a single fully connected layer collapses structure; the model must relearn relationships from scratch.

  • Proper use: apply fully connected layers per vector/token, embedded inside architectures that handle inter-vector structure.


5) Composition and architectural bias

  • To compose across vectors, the architecture must provide structure: attention (routing), convolution (locality), recurrence (sequential dependence), message passing (relations), etc.

  • These mechanisms encode inductive bias that allows reuse, factorization, and generalization.


6) Complexity and dataflow

  • An architecture determines a class of algorithms characterized by:

    • Depth (sequential steps),

    • Width/parallelism,

    • Connectivity/routing,

    • Adaptivity (whether computation length depends on input).

  • Lower bounds and separations from circuit/communication complexity transfer: some functions require certain depth or communication; architectures lacking the requisite dataflow cannot realize them, regardless of parameter count.


7) Recursion, iteration, divide-and-conquer

  • Divide-and-conquer requires recursive/iterative dataflow (module reuse, hierarchical composition, input-dependent computation length).

  • Fixed-depth, feedforward architectures (plain MLPs, standard CNNs) cannot implement such strategies in their general form.


8) Example: CNNs and fractals

  • CNNs provide translation equivariance but no recursion or scale equivariance.

  • They can fit shallow statistics of fractal images but cannot represent or generalize the self-similar generative process.

  • This illustrates the general rule: if the architecture’s dataflow omits a pattern (e.g., recursion), algorithms needing that pattern are outside its class.


Summary

Overarching insight: Neural architectures are generalized algorithms whose dataflow bakes in algorithmic expressivity.
Scalars are the bits—minimal, low-information.
Vectors are the atomic representations, analogous to bitstrings.
Fully connected layers are the atomic operators (per-vector mixers).
Architectures are the algorithms: their structure fixes the class of computations they can express and generalize to. Ordinary algorithms appear as the zero-parameter / fixed control-flow limit of this view.

Crucially: learning requires the right algorithmic expressivity. If the architecture lacks the necessary computational patterns (recursion, iteration, locality, invariance), then the corresponding algorithms lie outside its class and are not learnable in principle. This perspective grounds architectural design in the same principles as algorithm design and complexity theory.

0
Subscribe to my newsletter

Read articles from Stephane Bersier directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Stephane Bersier
Stephane Bersier