This post was written by Steven Whitaker.

In this Julia for Devs post, we will explore the fascinating world of using universal differential equations (UDEs) for model discovery. A UDE is a differential equation where part (or all) of it is defined by a universal approximator (typically a neural network). These UDEs play a pivotal role in uncovering missing aspects of established models.

UDEs enable seamless integration of known mechanistic models with data-driven modeling approaches. Rather than omit or guess at unknown aspects of a model, a neural network can be embedded into the model, which can then be trained using observed data.

Once trained, the embedded neural network can be approximated by a mathematical expression. This allows us to replace the neural network with an interpretable formula, improving transparency while filling in the missing aspects of the original model with new structure learned from observed data.

This post will illustrate key ideas and design decisions made while helping one of our clients uncover missing dynamics in their model, a complex system representing the human body under treatment of a rare disease.

By replacing just one aspect of our client's model with a neural network, we achieved a significant milestone: a 30% reduction in the error of the model predictions. This success story is a testament to the power of UDEs in enhancing model performance.

We will not share client-specific code, but will provide similar examples to illustrate our approach.

Here are some of the key ideas we will focus on:

Ensure nonnegativity of state variables (as needed).
Appropriately initialize neural network weights.
Normalize neural network inputs.
Make dynamics modular.
Fit symbolic expression.

Note that we will discuss fully connected neural networks, so some comments below may not apply if other architectures are used.

And with that, let's dive in!

Ensure Nonnegativity of States

When working with a model representing a real phenomenon, chances are that at least some of the model's state variables are naturally nonnegative. For example, the energy of a system or the concentration of a chemical cannot be negative quantities.

However, a differential equation solver sees only math, not the physical phenomenon the math represents. So, we have to tell the solver to respect the natural constraints that exist.

One way to enforce nonnegativity is to pass the isoutofdomain option to solve, where isoutofdomain is a function that returns true when at least one of the provided state variables violates the nonnegativity constraint. For example, if all state variables should be nonnegative, the function can be defined as:

isoutofdomain(u, p, t) = any(<(0), u)

As another example, if only the first and third states should be nonnegative:

isoutofdomain(u, p, t) = u[1] < 0 || u[3] < 0

For more information and other tools for tackling nonnegativity, check out the SciML docs.

The idea of telling the solver about the nonnegativity constraints may seem obvious. Still, it has important implications for the initialization of neural networks embedded into the model. Care should be taken to ensure the initial network weights (before training) do not cause states to become negative. See the next section below for more details.

Appropriately Initialize Neural Network Weights

To learn missing dynamics, the neural network in a UDE needs to be trained.

Training the neural network can occur only if the UDE can be solved using the initial network weights. Otherwise, the gradient of the loss, which is used by optimization algorithms to train the neural network, cannot be computed.

As a result, it is necessary to choose a good initialization for the neural network weights. Here are some approaches for initialization and how well they work:

Random initialization: Typically, neural network weights are initialized randomly. However, this results in random UDE dynamics, so whether a state variable stays positive (or satisfies any other constraints) is left to chance. And if constraints are violated, but we try to enforce them via, e.g., isoutofdomain, the solver will be unable to solve the UDE, preventing computation of the loss gradient. So, random initialization might work, but it might not.
Zero initialization: Don't do this. Network weights won't ever update during training because the gradients with respect to those weights will always be zero.
Constant initialization: Don't do this, either. Initializing all the weights to the same, non-zero value will result in some amount of learning, but it severely limits the expressivity of the network.
Pre-training: If the neural network in the UDE is used to replace an expression that is believed to be incorrect (but otherwise gives a usable model), one way to initialize the network weights for UDE training is to initialize the weights randomly but then train the network directly (i.e., not the UDE as a whole) to fit the expression the network is to replace. This works because no constraints are involved during pre-training (so randomly initializing the weights is okay), but when training the UDE, the network weights have been set to avoid running into issues with constraints.
Smoothed Collocation: Smoothed collocation is another approach for pre-training the neural network without involving constraints. See the SciML docs for more info.

For our client, randomly initialized weights failed to give a usable gradient for training, so we decided on pre-training because they had a reasonable guess that we could fit the initial network weights to.

Normalize Neural Network Inputs

One key to ensuring UDE training proceeds nicely is to normalize the inputs to the neural network to ensure all inputs are roughly of the same magnitude.

Normalizing is essential because the gradient of the training loss function is proportional to the network inputs. If one input tends to be 1000 times larger than another, the gradient will also tend to be that much larger, dominating the learning algorithm.

Effect of input scale imbalance on training

To illustrate how to normalize, suppose we have the following neural network (created using Lux.jl):

nn = Chain(
    Dense(2, 5, tanh),
    Dense(5, 5, tanh),
    Dense(5, 1),
)

Then we can add a function to the Chain that performs the normalization:

nn_normalized = Chain(
    x -> x ./ normalization,
    # Other layers as before
)

In this example, normalization is a Vector containing the values to scale each input variable. And of course, we're not limited to scaling the inputs; we can implement whatever function we want to process the inputs (such as z-score standardization).

The neural network in our client's UDE did not learn without normalization. After normalizing the inputs, we got the neural network to learn meaningful dynamics.

Make Dynamics Modular

While not strictly necessary for UDEs, making the dynamics modular vastly simplifies comparing between different approaches.

For our client, we wanted to compare three versions of the dynamics: the original dynamics, the UDE (where part of the original dynamics was replaced with a neural network), and the updated dynamics (where the trained neural network was replaced with a symbolic expression fit to the neural network).

Rather than duplicate the dynamics function three times with each slight variation, we allowed the variable part of the dynamics to be passed in as an option to the dynamics function. To illustrate:

function f!(du, u, p, t; g = original_g)
    # ... some code ...

    # `a` and `b` are values computed above
    # or pulled from `u`, etc.
    val = g(a, b, p)
    # `val` is used to update `du` in some way.

    # ... some code ...
end

However, ODEProblems don't allow passing options to the dynamics function. But that's not a problem; we can create a closure to set the option before handing the function to an ODEProblem. For example, if we want g to use a Lux.jl neural network:

nn = Chain(...)
(p_nn, st) = Lux.setup(...)
# Be sure to include `p_nn` in the `ODEProblem` parameters `p`.
g_nn = (a, b, p) -> nn([a, b], p.p_nn, st)[1][1]
f_nn! = (du, u, p, t) -> f!(du, u, p, t; g = g_nn)

Setting up our client's code like this reduced code duplication while allowing easy comparisons across different versions of the dynamics.

Fit Symbolic Expression

One of the key steps to learning interpretable dynamics from a UDE is to fit a symbolic expression to the trained neural network. Doing so allows us to replace the black box neural network with a mathematical expression we can understand.

The first step is to collect the data to use for the fitting. If the neural network takes two inputs, we will need a Matrix of values:

X = [
    a1 a2 ... aN;
    b1 b2 ... bN;
]

The (ai, bi) pairs should cover the expected range of input values. One way to determine this range is to solve the UDE and save off the values that end up going into the neural network.

To get the target values, evaluate the neural network with the X we just created:

y = nn(X, p_nn, st)[1] |> vec

(Here we assume the neural network outputs a single number per input pair.)

Now we can figure out a mathematical expression relating X to y. One way to do this is to use SymbolicRegression.jl.

After obtaining an Expression object eq (called tree in the README for SymbolicRegression.jl), we can use it in our dynamics function as described in the previous section:

# `p` needs to be an input even though it's not used.
g_eq = (a, b, p) -> eq([a; b;;])[1]
f_eq! = (du, u, p, t) -> f!(du, u, p, t; g = g_eq)

Using SymbolicRegression.jl to fit the trained UDE neural network, we could provide our client with a precise mathematical expression representing the dynamics that were missing from their original differential equation.

Summary

In this post, we discussed vital aspects of implementing UDEs to learn missing dynamics. We learned some tips for helping UDE training, including appropriately initializing network weights and normalizing network inputs. We also saw the benefits of code modularity and how that can easily allow us to insert a learned mathematical expression into the dynamics. For our client, we were able to use these ideas to train a UDE to achieve 30% less error in the model predictions.

Do you want to leverage UDEs to improve your models' predictive capabilities? Contact us, and we can help you out!

Additional Links

SciML UDE Tutorial
- UDE tutorial from official Julia SciML docs.
SciML Docs about Nonnegativity
- More information about debugging problems with negativity, including a discussion about isoutofdomain.
Lux.jl
- Package for creating and training neural networks.
SymbolicRegression.jl
- Package for fitting expressions to data.
GLCS Modeling & Simulation
- Connect with us for Julia Modeling & Simulation consulting.

Enhance Model Accuracy Using Universal Differential Equations

Table of contents