This post was written by Steven Whitaker.

In this Julia for Devs post, we will discuss improving the performance of simulation code written in Julia by eliminating sources of type-instabilities.

We wrote another post detailing what type-stability is and how type-instabilities can degrade performance. We also showed how SnoopCompile.jl and Cthulhu.jl can be used to pinpoint causes of type-instability.

This post will cover some of the type-instabilities we helped one of our clients overcome.

Our client is a technology innovator. They are building a first-of-its-kind logistics system focused on autonomous electric delivery to reduce traffic and air pollution. Their aim is to provide efficient delivery services for life-saving products to people in both urban and rural areas. Julia is helping them power critical Guidance, Navigation, and Control (GNC) systems.

With this client, we:

Eliminated slowdown-related failures on the most important simulation scenario.
Decreased the compilation time of the scenario by 30%.
Improved the slowest time steps from 300 ms to 10 ms (30x speedup), enabling 2x real-time performance.

We will not share client-specific code, but will provide similar examples to illustrate root-cause issues and suggested resolutions.

Here are the root-cause issues and resolutions we will focus on:

Help type-inference by unrolling recursion.
Standardize the output of different branches.
Avoid loops over Tuples.
- Use SnoopCompile.jl to reveal dynamic dispatches.
- Investigate functions with Cthulhu.jl.
Avoid dictionaries that map to functions.

Let's dive in!

Help Type-Inference by Unrolling Recursion

One of the interesting problems we saw was that there was a part of the client's code that SnoopCompile.jl reported was resulting in calls to inference, but when we inspected the code with Cthulhu.jl the code looked perfectly type-stable.

This code consisted of a set of functions that recursively called each other, traversing the model tree to grab data from all the submodels.

As it turns out, recursion can pose difficulties for Julia's type-inference. Basically, if type-inference detects recursion but cannot prove it terminates (based only on the types of inputs---remember that type-inference occurs before runtime), inference gives up, resulting in code that runs like it is type-unstable. (See this Discourse post and comments and links therein for more information.)

The solution we implemented was to use a @generated function to unroll the recursion at compile time, resulting in a flat implementation that could be correctly inferred.

Here's an example that illustrates the essence of the recursive code:

# Grab all the data in the entire model tree beginning at `model`.
get_data(model::NamedTuple) = (; data = model.data, submodels = get_submodel_data(model.submodels))

# This generated function is necessary for type-stability.
# It calls `get_data` on each of the fields of `submodels`
# and returns a `NamedTuple` of the results.
# (This is not the generated function implemented in the solution.)
@generated function get_submodel_data(submodels::NamedTuple)
    assignments = map(fieldnames(submodels)) do field
        Expr(:kw, field, :(get_data(submodels.$field)))
    end
    return Expr(:tuple, Expr(:parameters, assignments...))
end

get_submodel_data(::@NamedTuple{}) = (;)

Note that in this example get_data calls get_submodel_data, which in turn calls get_data on the submodels.

Here's the code for unrolling the recursion:

function _gen_get_data(T, path)
    subT = _typeof_field(T, :submodels)
    subpath = :($path.submodels)
    return quote
        (;
            data = $path.data,
            submodels = $(_gen_get_submodel_data(subT, subpath)),
        )
    end
end

# This function determines the type of `model.field`
# given just the type of `model` (so we can't just call `typeof(model.field)`).
# This function is necessary because we need to unroll the recursion
# using a generated function, which means we have to work in the type domain
# (because the generated function is generated before runtime).
function _typeof_field(::Type{NamedTuple{names, T}}, field::Symbol) where {names, T}
    i = findfirst(n -> n === field, names)
    return T.parameters[i]
end

function _gen_get_submodel_data(::Type{@NamedTuple{}}, subpath)
    return quote
        (;)
    end
end

function _gen_get_submodel_data(subT, subpath)
    assignments = map(fieldnames(subT)) do field
        T = _typeof_field(subT, field)
        path = :($subpath.$field)
        Expr(:kw, field, _gen_get_data(T, path))
    end
    return Expr(:tuple, Expr(:parameters, assignments...))
end

@generated get_data_generated(model::NamedTuple) = _gen_get_data(model, :model)

Unfortunately, this example doesn't reproduce the issue our client had, but it does show how to use a @generated function to unroll recursion. Note that there is still recursion: _gen_get_data and _gen_get_submodel_data call each other. The key, though, is that this recursion happens before inference, which means that when get_data_generated is inferred, the recursion has already taken place, resulting in unrolled code without any recursion that might cause inference issues.

When we implemented this solution for our client, we saw the total memory utilization of their simulation decrease by ~35%. This was enough to allow them to disable garbage collection during the simulation, speeding it up to faster than real-time! And this was the first time this simulation had run faster than real-time!

Standardize Output of Different Branches

The client had different parts of their model update at different frequencies. As a result, at any particular time step only a subset of all the submodels actually needed to update. Here's an example of what this might look like:

function get_output(model, t)
    if should_update(model, t)
        out = update_model(model, t)
    else
        out = get_previous_output(model)
    end
    return out
end

Unfortunately, update_model and get_previous_output returned values of different types, resulting in type-instability: the output type of get_output depended on the runtime result of should_update.

Furthermore, this function was called at every time point on every submodel (and every sub-submodel, etc.), so the type-instability in this function affected the whole simulation.

The issue was that update_model typically returned the minimal subset of information actually needed for the specific model, whereas get_previous_output was generic and returned a wider set of information. For example, maybe update_model would return a NamedTuple like (x = [1, 2], xdot = [0, 0]), while get_previous_output would return something like (x = [1, 2], xdot = [0, 0], p = nothing, stop_sim = false).

To fix this issue, rather than manually updating the return values of all the methods of update_model for all the submodels in the system, we created a function standardize_output that took whatever NamedTuple returned by update_model and added the missing fields that get_previous_output included. Then, the only change needed in get_output was to call standardize_output:

out = update_model(model, t) |> standardize_output

The result of making this change was a 30% decrease in compilation time for their simulation!

Avoid Loops over `Tuple`s

The client stored submodels of a parent model as a Tuple or NamedTuple. This makes sense for type-stability because each submodel was of a unique type, so storing them in this way preserved the type information when accessing the submodels. In contrast, storing the submodels as a Vector{Any} would lose the type information of the submodels.

However, type-stability problems arise when looping over Tuples of different types of objects. The problem is that the compiler needs to compile code for the body of the loop, but the body of the loop needs to be able to handle all types included in the Tuple. As a result, the compiler must resort to dynamic dispatch in the loop body (but see the note on union-splitting further below).

President waiting for receptionist to look up his office

Here's an example of the issue:

module TupleLoop

function tupleloop(t::Tuple)

    for val in t
        do_something(val)
    end

end

do_something(val::Number) = val + 1
do_something(val::String) = val * "!"
do_something(val::Vector{T}) where {T} = isempty(val) ? zero(T) : val[1]
do_something(val::Dict{String,Int}) = get(val, "hello", 0)

end

Using SnoopCompile.jl reveals dynamic dispatches to do_something:

julia> using SnoopCompileCore

julia> tinf = @snoop_inference TupleLoop.tupleloop((1, 2.0, "hi", [10.0], Dict{String,Int}()));

julia> using SnoopCompile

julia> tinf
InferenceTimingNode: 0.019444/0.020361 on Core.Compiler.Timings.ROOT() with 4 direct children

julia> itrigs = inference_triggers(tinf); mtrigs = accumulate_by_source(Method, itrigs)
2-element Vector{SnoopCompile.TaggedTriggers{Method}}:
 eval_user_input(ast, backend::REPL.REPLBackend, mod::Module) @ REPL ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/REPL.jl:247 (1 callees from 1 callers)
 tupleloop(t::Tuple) @ Main.TupleLoop /path/to/TupleLoop.jl:3 (3 callees from 1 callers)

Looking at tupleloop with Cthulhu.jl:

julia> using Cthulhu

julia> ascend(mtrigs[2].itrigs[1])
Choose a call for analysis (q to quit):
     do_something(::String)
 >     tupleloop(::Tuple{Int64, Float64, String, Vector{Float64}, Dict{String, Int64}}) at /path/to/TupleLoop.jl
tupleloop(t::Tuple) @ Main.TupleLoop /path/to/TupleLoop.jl:3
 3 function tupleloop(t::Tuple{Int64, Float64, String, Vector{Float64}, Dict{String, Int64}}::Tuple)::Core.Const(nothing)
 5
 5     for val::Any in t::Tuple{Int64, Float64, String, Vector{Float64}, Dict{String, Int64}}
 6         do_something(val::Any)
 7     end
 9
 9 end

And we see the problem! Even though the Tuple is inferred, the loop variable val is inferred as Any, which means that calling do_something(val) must be a dynamic dispatch.

Note that in some cases Julia can perform union-splitting automatically to remove the dynamic dispatch caused by this type-instability. In this example, union-splitting occurs when the Tuple contains 4 (by default) or fewer unique types. However, it's not a general solution.

One way to remove the dynamic dispatch without relying on union-splitting is to eliminate the loop:

do_something(t[1])
do_something(t[2])
⋮

But we can quickly see that writing this code is not at all generic; we have to hard-code the number of calls to do_something, which means the code will only work with Tuples of a particular length. Fortunately, there's a way around this issue. We can write a @generated function to have the compiler unroll the loop for us in a generic way:

@generated function tupleloop_generated(t::Tuple)

    body = [:(do_something(t[$i])) for i in fieldnames(t)]
    return quote
        $(body...)
        return nothing
    end

end

(Note that this code would also work if we specified t::NamedTuple in the method signature.)

Due to the way @generated functions work, SnoopCompile.jl still detects dynamic dispatches, but note that tupleloop_generated does not have any dynamic dispatches reported:

julia> using SnoopCompileCore

julia> tinf = @snoop_inference TupleLoop.tupleloop_generated((1, 2.0, "hi", [10.0], Dict{String,Int}()));

julia> using SnoopCompile

julia> tinf
InferenceTimingNode: 0.022208/0.050369 on Core.Compiler.Timings.ROOT() with 5 direct children

julia> itrigs = inference_triggers(tinf); mtrigs = accumulate_by_source(Method, itrigs)
3-element Vector{SnoopCompile.TaggedTriggers{Method}}:
 (g::Core.GeneratedFunctionStub)(world::UInt64, source::LineNumberNode, args...) @ Core boot.jl:705 (1 callees from 1 callers)
 eval_user_input(ast, backend::REPL.REPLBackend, mod::Module) @ REPL ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/REPL.jl:247 (1 callees from 1 callers)
 var"#s1#1"(::Any, t) @ Main.TupleLoop none:0 (3 callees from 1 callers)

And we can verify with Cthulhu.jl that there are no more dynamic dispatches in tupleloop_generated:

julia> using Cthulhu

julia> ascend(mtrigs[2].itrigs[1])
Choose a call for analysis (q to quit):
 >   tupleloop_generated(::Tuple{Int64, Float64, String, Vector{Float64}, Dict{String, Int64}})
       eval at ./boot.jl:430 => eval_user_input(::Any, ::REPL.REPLBackend, ::Module) at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-1
tupleloop_generated(t::Tuple) @ Main.TupleLoop /path/to/TupleLoop.jl:11
Variables
  #self#::Core.Const(Main.TupleLoop.tupleloop_generated)
  t::Tuple{Int64, Float64, String, Vector{Float64}, Dict{String, Int64}}

Body::Core.Const(nothing)
    @ /path/to/TupleLoop.jl:11 within `tupleloop_generated`
   ┌ @ /path/to/TupleLoop.jl:15 within `macro expansion`
1 ─│ %1  = Main.TupleLoop.do_something::Core.Const(Main.TupleLoop.do_something)
│  │ %2  = Base.getindex(t, 1)::Int64
│  │       (%1)(%2)
│  │ %4  = Main.TupleLoop.do_something::Core.Const(Main.TupleLoop.do_something)
│  │ %5  = Base.getindex(t, 2)::Float64
│  │       (%4)(%5)
│  │ %7  = Main.TupleLoop.do_something::Core.Const(Main.TupleLoop.do_something)
│  │ %8  = Base.getindex(t, 3)::String
│  │       (%7)(%8)
│  │ %10 = Main.TupleLoop.do_something::Core.Const(Main.TupleLoop.do_something)
│  │ %11 = Base.getindex(t, 4)::Vector{Float64}
│  │       (%10)(%11)
│  │ %13 = Main.TupleLoop.do_something::Core.Const(Main.TupleLoop.do_something)
│  │ %14 = Base.getindex(t, 5)::Dict{String, Int64}
│  │       (%13)(%14)
│  │ @ /path/to/TupleLoop.jl:16 within `macro expansion`
└──│       return Main.TupleLoop.nothing
   └

Here we have to examine the so-called "Typed" code (since the source code was generated via metaprogramming), but we see that there is no loop in this code. As a result, each call to do_something is a static dispatch with a concretely inferred input. Hooray!

Avoid Dictionaries that Map to Functions

The client registered functions for updating their simulation visualization via a dictionary that mapped from a String key to the appropriate update function.

Sometimes it can be convenient to have a dictionary of functions, for example:

d = Dict{String, Function}(
    "sum" => sum,
    "norm" => norm,
    # etc.
)
x = [1.0, 2.0, 3.0]
d["sum"](x) # Compute the sum of the elements of `x`
d["norm"](x) # Compute the norm of `x`

This allows you to write generic code that can call the appropriate intermediate function based on a key supplied by the caller.

You could use multiple dispatch to achieve similar results, but it requires a bit more thought to organize the code in such a way that ensures the caller has access to the types to dispatch on.

As another alternative, you could also have the caller just pass in the function to call. But again, it takes a bit more effort to organize the code to make it work.

Unfortunately, using a dictionary in this way is type-unstable: Julia can't figure out what function will be called until runtime, when the precise dictionary key is known. And since the function is unknown, the type of the result of the function is also unknown.

One partial solution is to use type annotations:

d[func_key](x)::Float64

Then at least the output of the function can be used in a type-stable way. However, this only works if all the functions in the dictionary return values of the same type given the same input.

A slightly less stringent alternative is to explicitly convert the result to a common type, but this requires conversion to be possible.

Our client updated a dictionary using the output of the registered function, so the full solution we implemented for our client was to remove the dictionary and instead have explicit branches in the code. That is, instead of

updates[key] = d[key](updates[key])

we had

if key == "k1"
    updates[key] = f1(updates[key]::OUTPUT_TYPE_F1)
elseif key == "k2"
    updates[key] = f2(updates[key]::OUTPUT_TYPE_F2)
# Additional branches as needed
end

Note that we needed the type annotations OUTPUT_TYPE_F1 and OUTPUT_TYPE_F2 because updates had an abstractly typed value type. The key that makes this solution work is recognizing that in the first branch updates[key] is the output of f1 from the previous time step in the simulation (and similarly for the other branches). Therefore, in each branch we know what the type of updates[key] is, so we can give the compiler that type information.

Also note that the previously mentioned ideas of using multiple dispatch or just passing in the functions to use don't work in this situation without removing the updates dictionary (and refactoring the affected code).

Making the above change completely removed type-instabilities in that part of the client's code.

Summary

In this post, we explored a few problems related to type-stability that we helped our client resolve. We were able to diagnose issues using SnoopCompile.jl and Cthulhu.jl and make code improvements that enabled our client's most important simulation scenario to pass tests for the first time. This was possible because our solutions enabled the scenario to run faster than real-time and reduced compilation time by 30%.

Do you have type-instabilities that plague your Julia code? Contact us, and we can help you out!

Additional Links

SnoopCompile.jl Docs
- Documentation for SnoopCompile.jl.
Cthulhu.jl Docs
- Documentation (the package's README) for Cthulhu.jl.
Julia Performance Tips
- Very good tips for improving the performance of Julia code.
GLCS Software Development
- Connect with us for Julia development help.
Upcoming JuliaCon Talk Announcement
- Check out our JuliaCon 2025 talk announcement!

The cover image background of a person at the start of a race was found at freepik.com.

The cartoon about looping over Tuples was generated with AI.

Optimize Flight Simulations with Improved Type-Stability

Table of contents

Help Type-Inference by Unrolling Recursion

Standardize Output of Different Branches

Avoid Loops over `Tuple`s

Avoid Dictionaries that Map to Functions

Summary

Additional Links

Subscribe to my newsletter

Great Lakes Consulting

Great Lakes Consulting

Optimize Flight Simulations with Improved Type-Stability

Table of contents

Help Type-Inference by Unrolling Recursion

Standardize Output of Different Branches

Avoid Loops over Tuples

Avoid Dictionaries that Map to Functions

Summary

Additional Links

Subscribe to my newsletter

Great Lakes Consulting

Great Lakes Consulting

Avoid Loops over `Tuple`s