Assembly vs Modern Languages

TJ GokkenTJ Gokken
16 min read

I recently watched a short video by Carly Taylor on gamer mindset. In that 1 minute video, she talks about Chris Sawyer and how he wrote the awesome game of Roller Coaster Tycoon entirely in assembly (all by himself) because the compute power at the time was not sufficient for what he wanted to build.

💡
For those of you that do not know what the assembly language is: It is a language that is closest to the language that computers understand (well, there is one closer, more on that later). You see, when you write your code in your favorite language, the computer does not understand it. This all happens in the background but you still need a translator, so to speak, to have the computer understand what you are talking about. It is kind of like the LLMs - we see words and chats but all they see is mathematics and vector databases. Kind of.

This video got me thinking about the assembly language and wondered how it stacked up against the modern languages. With all our modern conveniences—garbage collection, memory safety, zero-cost abstractions—I wondered: How much performance are we really giving up?

Now, it’s been a very long time since I did anything in assembly and when I did what I did, it was mainly for learning purposes. I had no idea where to begin or what to use but I was excited to go on this journey.

My goal was to develop same type of program in Assembly, Rust, C# and Python. I hoped that I could use Visual Studio Code for all of them (and turned out that I could).

I was curious to find out how this granddaddy of all programs compared.

The Benchmark App

As I mentioned before, I was looking at 4 very different languages:

  • Assembly - The bare metal baseline

  • Rust - Modern systems programming with safety

  • C# - Managed runtime with JIT optimization

  • Python - High-level interpreted convenience

With that in mind, I created seven CPU-intensive tests that would really show the differences:

  1. Counting Loop - Pure iteration (100M cycles)

  2. Array Sum - Memory access patterns (1M integers)

  3. String Building - Memory allocation Fstress test

  4. Prime Calculation - Sieve of Eratosthenes algorithm

  5. Matrix Multiplication - Heavy arithmetic (500x500 matrices)

  6. Bit Operations - Population counting and bit manipulation

  7. Fibonacci Sequence - Iterative mathematical computation

Each test was implemented as faithfully as possible across all languages as best I could, using each language's strengths while maintaining algorithmic equivalence.

I initially thought of putting it all in a web app with nice graphics and all but then decided that PowerShell environment would be more suited for the results.

The Setup

Here's where it got interesting. Setting up the modern languages was straightforward and a breeze. Compiling them was even easier:

  • Rust: cargo build --release

  • C#: dotnet build -c Release

  • Python: Just works™

But assembly? Oh boy. Let’s just say, the toolchain has... evolved. NASM, Visual Studio Build Tools, linker paths, 64-bit calling conventions—it took longer to get "Hello World" compiling than to write the actual benchmarks!

💡
If you would like to read about the setup woes, I have a section at the end of this article going through the whole experience.

On top of that, my initial attempts all crashed and burned because I was trying to be fancy and use Windows libraries but they had evolved too and I did not have much knowledge of them.

I can tell you one thing - we just do not appreciate it enough how good we have it these days.

The Results

Here’s the moment of truth. Behold, the fastest counting competition you’ll ever see…:

Edit: A friend asked why Rust shows as 0 in some tests and Assembly as <1. Here’s the deal: for Assembly, I’m grabbing the execution time using the program’s exit code, which doesn’t have any meaningful resolution — it’s just a hacky way to get something, but it's effectively useless for sub-ms timing. So I just show it as <1 since all of them finish basically instantly.

For Rust, the timer is real, but some of the benchmarks run so fast that they land in the sub-millisecond range, and my script’s granularity can’t really catch that. So yeah — Rust and Assembly are just too fast for my code to measure properly.

What This Actually Means

Assembly Still Rules

Assembly wasn't just faster—it was embarrassingly faster. So fast that my timing methods couldn't even measure it accurately. When you eliminate all abstraction layers, function call overhead, and runtime checks, the CPU just... flies.

However, it was considerably harder to setup and program. One needs to keep that in mind but when it comes together, boy oh boy - it is not even a competition.

Rust is Remarkable

Rust consistently came in second place as expected, often within striking distance of assembly. Those "zero-cost abstractions" aren't marketing hype—they're real. You get memory safety, fearless concurrency, and a fantastic type system while sacrificing almost nothing in performance. However, we need to be aware that Rust is a very different language to C# or Python as it is not at the same level with them when it comes to readability.

C# Surprised Me

Microsoft's JIT compiler has gotten scary good. C# was competitive with Rust on many tests, sometimes even faster. The .NET runtime's optimization wizardry in 2025 is genuinely impressive.

Python is... Python

Look, we all know Python isn't about speed. It's about developer productivity, readability, and the incredible ecosystem - especially when it comes to Machine Learning. But seeing those 200x differences really puts things in perspective.

The Real-World Reality Check

Now, before you throw away your high-level languages and embrace the assembly life, let's get real - Assembly is incredibly harder to program in. Debugging is next to impossible (maybe I didn’t know how to properly). All those results that we have at the time talking about how much faster Assembly was in total execution time? Well, reverse that for development time, and perhaps even square it. You may want to square it again.

But that’s not all - Python and C# were 150 and 180- lines of readable, human-friendly code. Rust was around 200 lines of code and while it looks familiar to the ones that know about the language, it is still pretty alien to the ones that do not..

Assembly, was 300 lines of… let’s say you will have better chance of understanding a lost and forgotten ancient language before you can figure out what’s going on with assembly. And yes, I did get help from AI at certain points but even two different LLMs failed miserably producing something that can compile. It is that hard. By that time, I could have made significant strides in cancer research with them.

In terms of bugs, Python and C# had 2 bugs, Rust had one that was caught by the compiler and Assembly had an infinite number of bugs, that made no sense at all. I am sure there are still bugs there, maybe even bugs have bugs.

I am exaggerating and joking, but I think you get the idea.

What I Learned

There is no denying that Assembly is the king of performance. It is not even close and there is no competition. But the development cost is enormous (unless you are probably Chris Sawyer).

Rust has a similar performance comparable to C, but it is still not the friendliest of languages. Yet, it is probably closest to the metal in the modern world of development.

C# is really a very nice language and the modern optimizations in the language are impressive.

Python is nice language but I do not think it will ever come close to C# in terms of software development efficiency and performance. It is the undisputed king of other areas, such as Machine Learning. Good luck training a neural network in C#.

The Bottom Line

It is amazing how much progress we’ve made over the years in terms of developer efficiency and language performance. The modern languages bring a lot to the table and yes, they are slower than Assembly but in a world of high amounts of RAM, GPUs and super-efficient CPUs, I think it is worth to swap speed for developer efficiency, maintainability and extensibility.

So, is Assembly still relevant today? It maybe in some very edge cases but when you have Rust (or Go), the question is why would you do that to yourself? You’d be giving up an enormous amount of readability and efficiency and sanity for the sake of a very small percentage gain. I am not familiar with Go, but if you want more performance, I think your time would be better spent learning some of the unsafe code features of rust for pure performance (and I can tell you having gone through that exercise in my Polycubes article, that is no picnic).

So, all these languages exist for a reason and they fulfil a certain gap. If you want performance you would go Rust or Go; if you want enterprise development I would go C# (and you can insert any number of other high level languages here); and finally if you want to do Machine Learning, you cannot beat Python.

I think Assembly is kind of like the vector databases of the AI world. It is there, it works but you do not want to mess with it. Instead you use the tools that mess with them.

Last but not least: Hats off to Chris Sawyer. I know the assembly he worked with was different than this one as the computers were different back then but a huge respect for the man who programmed an entire game in assembly. Simple counting and test is as far as I go in Assembly. Please don’t ask me to program a sequel to RollerCoaster Tycoon - we now have Unity for that.

💡
Oh by the way, if you really dislike a developer, ask their help for some Assembly code.
💡
You can find the source code here: https://github.com/tjgokken/BenchmarkProject

Bonus: Setting Up Assembly Today

I knew this was going to hurt - I just didn’t know how much. Well, turns out it was a breeze - like having your 4 wisdom teeth being pulled out at the same time without any anesthetic… while someone is constantly holding your nose and letting go making breathing harder kind of breeze.

Fun.

Anyway, back to the beginning. I was not even sure what to use for assembly in 2025. I mean back in the day, we had debug.exe and that was it.

After some research, I settled on NASM (Netwide Assembler) because it is still actively maintained, has decent documentation, works with modern Windows and use Intel syntax.

💡
Intel syntax is a dialect of the assembly language. Developers like it because of its familiarity and readability. I know what you are thinking but compare the Intel syntax of mov eax, ebx ; eax = ebx to the other dialect AT&T syntax of movl %ebx, %eax ; %eax = %ebx. Yeah, I think you see why this is favored.

Anyway, just double click the installer and that’s it, right? Right?

If only it was that easy. I mean, it feels like whoever designed this eco system, recently watched the move Airplane, and loved the scene where there is this famous quote: “That's just what they'll be expecting us to do.”

You see, NASM creates object files called .obj, but they are not executables. For that, you need something called a linker which converts these .obj files into .exes. Microsoft has a linker.exe in Windows.

So, I can hear you say - well, let’s use that. If it were only that easy. Again: “That's just what they'll be expecting us to do.”

There are approximately 47 different versions of link.exe scattered around your system in Windows. For example,

C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.44.35207\bin\Hostx64\x64\link.exe

or

C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.44.35207\bin\Hostx64\x64\link.exe

Plus about 45 others in various SDK folders - and none of them are in your PATH by default. Lovely.

Once you figure which one to use (and I have to admit Claude was very helpful here), you are faced (at least I was) with another reality: Windows is 64-bit now. Why would that matter? Well, it means the assembly programming has changed. Dramatically.

In the old days, if you wanted a 16-bit register, you would write (sorry for using bash as the code block language, my blog does not have support for assembly):

mov ax, 1000    ; AX is a 16-bit register, simple and direct

Registers were straightforward: AX, BX, CX, DX and you could directly manipulate them without worrying about complex rules. I am not going to full into Assembly here but int 21h was the magic DOS interrupt that did everything. Want to print? int 21h. Read a file? int 21h. Exit program? int 21h. Simple, direct, predictable.

Well, first of all we are not in DOS world anymore - it’s gone. You need to call the appropriate and correct Windows library.

Then it turns out that Windows x64 requires you to allocate 32 bytes of stack space before calling ANY function, even if the function takes no parameters. It's like a "parking spot" for the function to use if it wants to spill register contents to memory.

Why? Optimization. The function might want to save register values temporarily, and this gives it a guaranteed place to do so.

You also need to register parameter passing now.

; OLD WAY (parameters on stack):
push param4
push param3  
push param2
push param1
call function

; NEW WAY (parameters in registers):
mov rcx, param1    ; First parameter goes in RCX
mov rdx, param2    ; Second parameter goes in RDX  
mov r8, param3     ; Third parameter goes in R8
mov r9, param4     ; Fourth parameter goes in R9
; 5th+ parameters go on stack
call function

Why? Speed. Registers are much faster than memory access. Putting parameters in registers instead of on the stack makes function calls significantly faster.

Oh, and the last one: Your stack pointer (RSP) must be aligned to 16-byte boundaries before calling functions, or Windows will crash your program. So you need to allocate space and maintain alignment before calling a function. I know it is very technical but let’s just say that when it comes to this, Windows is more sensitive than Quantum Computers.

So, all in all you needed to remember 10 things in the old days. Now it is like 50.

This complexity is also a great way to demonstrate why modern languages became so popular: Assembly got harder as computers got more sophisticated. It is just not feasible anymore, and all I am trying to do is count.

So, you got through all of that (by the way, how lovely and human friendly is the code above, huh?), and you want to call GetTickCount64() for timing. This is a Windows API function.

Well, you better hope you can find kernel32.lib. And it better be the right version for your architecture. And it better be in a path your linker can find. It is in this lovely and very intuitive place - I mean I found it on the first go:

C:\Program Files (x86)\Windows Kits\10\Lib\10.0.26100.0\um\x64\kernel32.lib

My final working compile command looked something like this:

nasm -f win64 counting.asm -o counting.obj
"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.44.35207\bin\Hostx64\x64\link.exe" counting.obj "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.44.35207\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\Lib\10.0.26100.0\um\x64" kernel32.lib /subsystem:console /entry:main /out:counting.exe

Now, compare that to;

  • Rust: cargo build --release

  • C#: dotnet build -c Release

  • Python: Just run the file

Yeah, no contest.


Appendix A: Inside The Assembly Code

This is just in case, you want to make your life miserable by seeing at least a sample of Assembly code and understand what it means. Let’s look at our simple counting code:

section .text
global mainCRTStartup

mainCRTStartup:
    mov rax, 0
    mov rbx, 100000000

count_loop:
    inc rax
    cmp rax, rbx
    jl count_loop

    mov rax, 0
    ret

We’ll go through this code line by line, but before we do that, remember:

  • Assembly is very literal - each line does exactly one simple thing

  • Registers are like variables - but much faster than RAM (because they are on the chip while RAM is off-chip sitting across a memory-bus. All that travel makes it slower.).

  • Labels are like bookmarks - they mark places to jump to

  • The CPU sets flags - which other instructions can check

  • It's much more verbose than high-level languages

The beauty (and curse) of assembly is that there's no hidden complexity - what you see is exactly what the CPU does, one instruction at a time. No function calls, no memory allocation, no garbage collection - just pure, direct CPU instructions (but the language makes you want to question every choice you made in your life so far).

That's why it's so fast, and also why it takes 10 lines to do what Python does in 3!

So, let’s go through this simple code that counts up to 100,000,000.

Program Structure

section .text: Tells the assembler this is the code section.

💡
Assembly programs are divided into sections, text → executable code, data → initialized data such as variables with values, bss → uninitialized data such as empty arrays

global mainCRTStartup - Makes the mainCRTStartup label visible to the linker because the Windows linker looks for this specific name as the program entry point. This is like the main in modern languages

mainCRTStartup: This a label (like a bookmark) and it i actually where Windows starts running our code. The colon (:) marks it as a label, not an instruction.

mov rax, 0 - Moves the value 0 into the RAX register which is a 64-bit general-purpose register (think of it as a variable). It has a value of 0.

mov rbx, 100000000 - Same as RAX, just another 64-bit register. We’ll count up to this number. This is our target.

💡
Why use two different registers here? RAX is traditionally the "accumulator" register used for counters and arithmetic, while RBX is the "base" register good for holding constants. For our small code it really does not matter, but this like using i as the counter for the for-next loops.

count_loop: → basically, we are saying we are starting a loop. Like for, or while.

inc rax → Ok, this is more straightforward. It is the equivalent of counter++, just increase it by one.

cmp rax, rbx → compares RAX with RBX to determine if we reached our target value. Kind of like saying if (counter < target) (the comparison part)

jl count_loop → "Jump if Less" - jumps back to count_loop if RAX < RBX. Kind of like while (counter < target)

mov rax, 0 → Sets RAX to 0 once the loop is exited meaning we reached our target. 0 means success, such as return 0;

ret → Returns control to whoever called this function (Windows) and exits the program cleanly.

Well, there it is - Assembly, meet everyone. Everyone, this is Assembly.


Appendix B: The Machine Language

Just to complete the picture and dig a bit deeper.

When we write,

count_loop:
    inc rax        
    cmp rax, rbx   
    jl count_loop

the compiler assembles it into machine code; such as

48 FF C0        ; inc rax
48 3B C3        ; cmp rax, rbx  
7C F9           ; jl count_loop

I am not going to go into what all that means but the CPU sees;

Memory Address: Machine Code
0x1000:         48 FF C0
0x1003:         48 3B C3  
0x1006:         7C F9

Then executes those 8 bytes of 48 FF C0 48 3B C3 7C F9 100 million times.

This is why Assembly is so fast. There is no interpreter - the code is directly converted into machine language - the language that your CPU speaks.

We could of course type…

48 FF C0 48 3B C3 7C F9

… instead but that would be insane. It is unreadable and error prone - one wrong byte and you’re gone. Also good luck debugging hex codes. This is a new kind of hell that would make Dante squirm.

Yet once you see the machine language, you realize that even though Assembly is a very hard language, it is the only human-readable language to write machine language.

0
Subscribe to my newsletter

Read articles from TJ Gokken directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

TJ Gokken
TJ Gokken

TJ Gokken is an Enterprise AI/ML Integration Engineer with a passion for bridging the gap between technology and practical application. Specializing in .NET frameworks and machine learning, TJ helps software teams operationalize AI to drive innovation and efficiency. With over two decades of experience in programming and technology integration, he is a trusted advisor and thought leader in the AI community