Optimize Flight Simulations with Improved Type-Stability


This post was written by Steven Whitaker.
In this Julia for Devs post, we will discuss improving the performance of simulation code written in Julia by eliminating sources of type-instabilities.
We wrote another post detailing what type-stability is and how type-instabilities can degrade performance. We also showed how SnoopCompile.jl and Cthulhu.jl can be used to pinpoint causes of type-instability.
This post will cover some of the type-instabilities we helped one of our clients overcome.
Our client is a technology innovator. They are building a first-of-its-kind logistics system focused on autonomous electric delivery to reduce traffic and air pollution. Their aim is to provide efficient delivery services for life-saving products to people in both urban and rural areas. Julia is helping them power critical Guidance, Navigation, and Control (GNC) systems.
With this client, we:
- Eliminated slowdown-related failures on the most important simulation scenario.
- Decreased the compilation time of the scenario by 30%.
- Improved the slowest time steps from 300 ms to 10 ms (30x speedup), enabling 2x real-time performance.
We will not share client-specific code, but will provide similar examples to illustrate root-cause issues and suggested resolutions.
Here are the root-cause issues and resolutions we will focus on:
- Help type-inference by unrolling recursion.
- Standardize the output of different branches.
- Avoid loops over
Tuple
s.- Use SnoopCompile.jl to reveal dynamic dispatches.
- Investigate functions with Cthulhu.jl.
- Avoid dictionaries that map to functions.
Let's dive in!
Help Type-Inference by Unrolling Recursion
One of the interesting problems we saw was that there was a part of the client's code that SnoopCompile.jl reported was resulting in calls to inference, but when we inspected the code with Cthulhu.jl the code looked perfectly type-stable.
This code consisted of a set of functions that recursively called each other, traversing the model tree to grab data from all the submodels.
As it turns out, recursion can pose difficulties for Julia's type-inference. Basically, if type-inference detects recursion but cannot prove it terminates (based only on the types of inputs---remember that type-inference occurs before runtime), inference gives up, resulting in code that runs like it is type-unstable. (See this Discourse post and comments and links therein for more information.)
The solution we implemented
was to use a @generated
function
to unroll the recursion at compile time,
resulting in a flat implementation
that could be correctly inferred.
Here's an example that illustrates the essence of the recursive code:
# Grab all the data in the entire model tree beginning at `model`.
get_data(model::NamedTuple) = (; data = model.data, submodels = get_submodel_data(model.submodels))
# This generated function is necessary for type-stability.
# It calls `get_data` on each of the fields of `submodels`
# and returns a `NamedTuple` of the results.
# (This is not the generated function implemented in the solution.)
@generated function get_submodel_data(submodels::NamedTuple)
assignments = map(fieldnames(submodels)) do field
Expr(:kw, field, :(get_data(submodels.$field)))
end
return Expr(:tuple, Expr(:parameters, assignments...))
end
get_submodel_data(::@NamedTuple{}) = (;)
Note that in this example
get_data
calls get_submodel_data
,
which in turn calls get_data
on the submodels.
Here's the code for unrolling the recursion:
function _gen_get_data(T, path)
subT = _typeof_field(T, :submodels)
subpath = :($path.submodels)
return quote
(;
data = $path.data,
submodels = $(_gen_get_submodel_data(subT, subpath)),
)
end
end
# This function determines the type of `model.field`
# given just the type of `model` (so we can't just call `typeof(model.field)`).
# This function is necessary because we need to unroll the recursion
# using a generated function, which means we have to work in the type domain
# (because the generated function is generated before runtime).
function _typeof_field(::Type{NamedTuple{names, T}}, field::Symbol) where {names, T}
i = findfirst(n -> n === field, names)
return T.parameters[i]
end
function _gen_get_submodel_data(::Type{@NamedTuple{}}, subpath)
return quote
(;)
end
end
function _gen_get_submodel_data(subT, subpath)
assignments = map(fieldnames(subT)) do field
T = _typeof_field(subT, field)
path = :($subpath.$field)
Expr(:kw, field, _gen_get_data(T, path))
end
return Expr(:tuple, Expr(:parameters, assignments...))
end
@generated get_data_generated(model::NamedTuple) = _gen_get_data(model, :model)
Unfortunately,
this example doesn't reproduce the issue our client had,
but it does show how to use a @generated
function
to unroll recursion.
Note that there is still recursion:
_gen_get_data
and _gen_get_submodel_data
call each other.
The key, though, is that this recursion happens before inference,
which means that when get_data_generated
is inferred,
the recursion has already taken place,
resulting in unrolled code
without any recursion
that might cause inference issues.
When we implemented this solution for our client, we saw the total memory utilization of their simulation decrease by ~35%. This was enough to allow them to disable garbage collection during the simulation, speeding it up to faster than real-time! And this was the first time this simulation had run faster than real-time!
Standardize Output of Different Branches
The client had different parts of their model update at different frequencies. As a result, at any particular time step only a subset of all the submodels actually needed to update. Here's an example of what this might look like:
function get_output(model, t)
if should_update(model, t)
out = update_model(model, t)
else
out = get_previous_output(model)
end
return out
end
Unfortunately,
update_model
and get_previous_output
returned values of different types,
resulting in type-instability:
the output type of get_output
depended on the runtime result of should_update
.
Furthermore, this function was called at every time point on every submodel (and every sub-submodel, etc.), so the type-instability in this function affected the whole simulation.
The issue was that update_model
typically returned the minimal subset of information
actually needed for the specific model,
whereas get_previous_output
was generic
and returned a wider set of information.
For example,
maybe update_model
would return a NamedTuple
like (x = [1, 2], xdot = [0, 0])
,
while get_previous_output
would return
something like (x = [1, 2], xdot = [0, 0], p = nothing, stop_sim = false)
.
To fix this issue,
rather than manually updating the return values
of all the methods of update_model
for all the submodels in the system,
we created a function standardize_output
that took whatever NamedTuple
returned by update_model
and added the missing fields
that get_previous_output
included.
Then,
the only change needed in get_output
was to call standardize_output
:
out = update_model(model, t) |> standardize_output
The result of making this change was a 30% decrease in compilation time for their simulation!
Avoid Loops over Tuple
s
The client stored submodels
of a parent model
as a Tuple
or NamedTuple
.
This makes sense for type-stability
because each submodel was of a unique type,
so storing them in this way
preserved the type information
when accessing the submodels.
In contrast,
storing the submodels as a Vector{Any}
would lose the type information
of the submodels.
However,
type-stability problems arise
when looping over Tuple
s
of different types of objects.
The problem is that the compiler needs
to compile code for the body of the loop,
but the body of the loop needs
to be able to handle
all types included in the Tuple
.
As a result,
the compiler must resort to dynamic dispatch
in the loop body
(but see the note on union-splitting further below).
Here's an example of the issue:
module TupleLoop
function tupleloop(t::Tuple)
for val in t
do_something(val)
end
end
do_something(val::Number) = val + 1
do_something(val::String) = val * "!"
do_something(val::Vector{T}) where {T} = isempty(val) ? zero(T) : val[1]
do_something(val::Dict{String,Int}) = get(val, "hello", 0)
end
Using SnoopCompile.jl reveals dynamic dispatches to do_something
:
julia> using SnoopCompileCore
julia> tinf = @snoop_inference TupleLoop.tupleloop((1, 2.0, "hi", [10.0], Dict{String,Int}()));
julia> using SnoopCompile
julia> tinf
InferenceTimingNode: 0.019444/0.020361 on Core.Compiler.Timings.ROOT() with 4 direct children
julia> itrigs = inference_triggers(tinf); mtrigs = accumulate_by_source(Method, itrigs)
2-element Vector{SnoopCompile.TaggedTriggers{Method}}:
eval_user_input(ast, backend::REPL.REPLBackend, mod::Module) @ REPL ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/REPL.jl:247 (1 callees from 1 callers)
tupleloop(t::Tuple) @ Main.TupleLoop /path/to/TupleLoop.jl:3 (3 callees from 1 callers)
Looking at tupleloop
with Cthulhu.jl:
julia> using Cthulhu
julia> ascend(mtrigs[2].itrigs[1])
Choose a call for analysis (q to quit):
do_something(::String)
> tupleloop(::Tuple{Int64, Float64, String, Vector{Float64}, Dict{String, Int64}}) at /path/to/TupleLoop.jl
tupleloop(t::Tuple) @ Main.TupleLoop /path/to/TupleLoop.jl:3
3 function tupleloop(t::Tuple{Int64, Float64, String, Vector{Float64}, Dict{String, Int64}}::Tuple)::Core.Const(nothing)
5
5 for val::Any in t::Tuple{Int64, Float64, String, Vector{Float64}, Dict{String, Int64}}
6 do_something(val::Any)
7 end
9
9 end
And we see the problem!
Even though the Tuple
is inferred,
the loop variable val
is inferred as Any
,
which means that calling do_something(val)
must be a dynamic dispatch.
Note that in some cases
Julia can perform union-splitting automatically
to remove the dynamic dispatch
caused by this type-instability.
In this example,
union-splitting occurs when the Tuple
contains 4 (by default) or fewer unique types.
However,
it's not a general solution.
One way to remove the dynamic dispatch without relying on union-splitting is to eliminate the loop:
do_something(t[1])
do_something(t[2])
⋮
But we can quickly see
that writing this code
is not at all generic;
we have to hard-code
the number of calls to do_something
,
which means the code will only work
with Tuple
s of a particular length.
Fortunately,
there's a way around this issue.
We can write a @generated
function
to have the compiler unroll the loop
for us in a generic way:
@generated function tupleloop_generated(t::Tuple)
body = [:(do_something(t[$i])) for i in fieldnames(t)]
return quote
$(body...)
return nothing
end
end
(Note that this code would also work
if we specified t::NamedTuple
in the method signature.)
Due to the way @generated
functions work,
SnoopCompile.jl still detects dynamic dispatches,
but note that tupleloop_generated
does not have any dynamic dispatches reported:
julia> using SnoopCompileCore
julia> tinf = @snoop_inference TupleLoop.tupleloop_generated((1, 2.0, "hi", [10.0], Dict{String,Int}()));
julia> using SnoopCompile
julia> tinf
InferenceTimingNode: 0.022208/0.050369 on Core.Compiler.Timings.ROOT() with 5 direct children
julia> itrigs = inference_triggers(tinf); mtrigs = accumulate_by_source(Method, itrigs)
3-element Vector{SnoopCompile.TaggedTriggers{Method}}:
(g::Core.GeneratedFunctionStub)(world::UInt64, source::LineNumberNode, args...) @ Core boot.jl:705 (1 callees from 1 callers)
eval_user_input(ast, backend::REPL.REPLBackend, mod::Module) @ REPL ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/REPL.jl:247 (1 callees from 1 callers)
var"#s1#1"(::Any, t) @ Main.TupleLoop none:0 (3 callees from 1 callers)
And we can verify with Cthulhu.jl
that there are no more dynamic dispatches in tupleloop_generated
:
julia> using Cthulhu
julia> ascend(mtrigs[2].itrigs[1])
Choose a call for analysis (q to quit):
> tupleloop_generated(::Tuple{Int64, Float64, String, Vector{Float64}, Dict{String, Int64}})
eval at ./boot.jl:430 => eval_user_input(::Any, ::REPL.REPLBackend, ::Module) at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-1
tupleloop_generated(t::Tuple) @ Main.TupleLoop /path/to/TupleLoop.jl:11
Variables
#self#::Core.Const(Main.TupleLoop.tupleloop_generated)
t::Tuple{Int64, Float64, String, Vector{Float64}, Dict{String, Int64}}
Body::Core.Const(nothing)
@ /path/to/TupleLoop.jl:11 within `tupleloop_generated`
┌ @ /path/to/TupleLoop.jl:15 within `macro expansion`
1 ─│ %1 = Main.TupleLoop.do_something::Core.Const(Main.TupleLoop.do_something)
│ │ %2 = Base.getindex(t, 1)::Int64
│ │ (%1)(%2)
│ │ %4 = Main.TupleLoop.do_something::Core.Const(Main.TupleLoop.do_something)
│ │ %5 = Base.getindex(t, 2)::Float64
│ │ (%4)(%5)
│ │ %7 = Main.TupleLoop.do_something::Core.Const(Main.TupleLoop.do_something)
│ │ %8 = Base.getindex(t, 3)::String
│ │ (%7)(%8)
│ │ %10 = Main.TupleLoop.do_something::Core.Const(Main.TupleLoop.do_something)
│ │ %11 = Base.getindex(t, 4)::Vector{Float64}
│ │ (%10)(%11)
│ │ %13 = Main.TupleLoop.do_something::Core.Const(Main.TupleLoop.do_something)
│ │ %14 = Base.getindex(t, 5)::Dict{String, Int64}
│ │ (%13)(%14)
│ │ @ /path/to/TupleLoop.jl:16 within `macro expansion`
└──│ return Main.TupleLoop.nothing
└
Here we have to examine the so-called "Typed" code
(since the source code was generated via metaprogramming),
but we see that there is no loop in this code.
As a result,
each call to do_something
is a static dispatch
with a concretely inferred input.
Hooray!
Avoid Dictionaries that Map to Functions
The client registered functions
for updating their simulation visualization
via a dictionary that mapped from a String
key
to the appropriate update function.
Sometimes it can be convenient to have a dictionary of functions, for example:
d = Dict{String, Function}(
"sum" => sum,
"norm" => norm,
# etc.
)
x = [1.0, 2.0, 3.0]
d["sum"](x) # Compute the sum of the elements of `x`
d["norm"](x) # Compute the norm of `x`
This allows you to write generic code that can call the appropriate intermediate function based on a key supplied by the caller.
You could use multiple dispatch to achieve similar results, but it requires a bit more thought to organize the code in such a way that ensures the caller has access to the types to dispatch on.
As another alternative, you could also have the caller just pass in the function to call. But again, it takes a bit more effort to organize the code to make it work.
Unfortunately, using a dictionary in this way is type-unstable: Julia can't figure out what function will be called until runtime, when the precise dictionary key is known. And since the function is unknown, the type of the result of the function is also unknown.
One partial solution is to use type annotations:
d[func_key](x)::Float64
Then at least the output of the function can be used in a type-stable way. However, this only works if all the functions in the dictionary return values of the same type given the same input.
A slightly less stringent alternative is to explicitly convert the result to a common type, but this requires conversion to be possible.
Our client updated a dictionary using the output of the registered function, so the full solution we implemented for our client was to remove the dictionary and instead have explicit branches in the code. That is, instead of
updates[key] = d[key](updates[key])
we had
if key == "k1"
updates[key] = f1(updates[key]::OUTPUT_TYPE_F1)
elseif key == "k2"
updates[key] = f2(updates[key]::OUTPUT_TYPE_F2)
# Additional branches as needed
end
Note that we needed the type annotations
OUTPUT_TYPE_F1
and OUTPUT_TYPE_F2
because updates
had an abstractly typed value type.
The key that makes this solution work
is recognizing that in the first branch
updates[key]
is the output of f1
from the previous time step in the simulation
(and similarly for the other branches).
Therefore,
in each branch we know what the type of updates[key]
is,
so we can give the compiler that type information.
Also note that the previously mentioned ideas
of using multiple dispatch
or just passing in the functions to use
don't work in this situation
without removing the updates
dictionary
(and refactoring the affected code).
Making the above change completely removed type-instabilities in that part of the client's code.
Summary
In this post, we explored a few problems related to type-stability that we helped our client resolve. We were able to diagnose issues using SnoopCompile.jl and Cthulhu.jl and make code improvements that enabled our client's most important simulation scenario to pass tests for the first time. This was possible because our solutions enabled the scenario to run faster than real-time and reduced compilation time by 30%.
Do you have type-instabilities that plague your Julia code? Contact us, and we can help you out!
Additional Links
- SnoopCompile.jl Docs
- Documentation for SnoopCompile.jl.
- Cthulhu.jl Docs
- Documentation (the package's README) for Cthulhu.jl.
- Julia Performance Tips
- Very good tips for improving the performance of Julia code.
- GLCS Software Development
- Connect with us for Julia development help.
- Upcoming JuliaCon Talk Announcement
- Check out our JuliaCon 2025 talk announcement!
The cover image background of a person at the start of a race was found at freepik.com.
The cartoon about looping over Tuple
s
was generated with AI.
Subscribe to my newsletter
Read articles from Great Lakes Consulting directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Great Lakes Consulting
Great Lakes Consulting
Modeling & Simulation, HPC, and Enterprise Software all under one roof. Great Lakes Consulting Services, Inc., a premier Information Technology Consulting company, serving others in IT staffing, analytic consulting, business intelligence and application development since 2009. We now specialize in custom Julia software services as the trusted partner to JuliaHub for their Consulting Services. Since 2015, we’ve partnered together to develop high-performance Julia code for low-latency data visualization and analytic solutions, high performance financial modeling, Modeling and Simulation for multiple sciences, personal Julia training, and legacy code migration & evolution.