Diving into the Machine: Lab 5

Introduction
In Lab 5, we delved into the fundamentals of 64-bit assembly programming by implementing simple loop-based programs on both x86_64 and AArch64 platforms. We have been tasked with writing assembly code to print a message repeatedly, convert loop index values to their ASCII representations (both decimal and hexadecimal), and format the output with or without leading zeros. Click here to see Lab instructions.
1. Lab Setup
Extracting the Examples
We are provided in with /public/spo600-assembler-lab-examples.tgz
and we have done Unpacking using:
shCopyEditcd ~
tar xvf /public/spo600-assembler-lab-examples.tgz
Getting a directory structure with both C and assembly versions of “Hello World” programs for x86_64 and AArch64.
2. Writing Assembly Programs
First, we’ll create programs that print a looped message with numeric output. Getting our hands with manually selecting registers, setting up initial values, and carefully crafting conditional branches.
A. Looping Structures and Register Management
In our programs, we used a dedicated register as a counter:
x86_64: We used, for example,
%r15
as the loop counter. The loop is implemented by incrementing%r15
and usingcmp
followed by a conditional jump (jl
for “jump if less”) to repeat the loop.AArch64: We used a general-purpose register (e.g.,
x19
) as the loop counter. ARM’s compare-and-branch instructions (CMP
andB.LT
/B.NE
) form the core of the loop. The uniformity of AArch64’s registers simplifies management since there’s no need to distinguish between partial registers (W0 vs. X0) except when size matters.
B. System Calls and Data Output
As avoiding C libraries, output was handled using direct Linux syscalls. Below is the difference for invoking it on different architecture.
x86_64:
Syscall Number: Placed in
RAX
(e.g.,1
forwrite
,60
forexit
).Arguments: Passed in registers
%rdi
,%rsi
,%rdx
, etc.Invocation: The
syscall
instruction triggers the kernel.
AArch64:
Syscall Number: Placed in
X8
(e.g.,64
forwrite
,93
forexit
).Arguments: Passed in
X0
–X5
in order.Invocation:
SVC #0
(Supervisor Call) invokes the kernel.
For example, to print a string in AArch64, we set up the registers as follows:
assemblyCopyEditmov x0, #1 // stdout file descriptor
mov x1, <buffer> // pointer to our message
mov x2, #<length> // message length
mov x8, #64 // syscall number for write
svc #0 // perform syscall
It can be viewed as a stark contrast to higher-level I/O functions that manage these details for us.
C. Converting Numbers to ASCII
A major challenge was converting a loop index (an integer) into its ASCII representation. The process involves:
Division and Remainder Calculation:
x86_64: The
div
instruction divides the 128-bit dividend (RDX:RAX
) by a divisor (e.g.,10
). Before usingdiv
, you must zero outRDX
(usingxor %rdx, %rdx
) to ensure correct results.AArch64: The
UDIV
instruction only provides the quotient. We then useMSUB
to compute the remainder:assemblyCopyEditudiv x1, x19, x3 // x1 = quotient (number/10) mul x2, x1, x3 // Multiply quotient by divisor. sub x2, x19, x2 // x2 = remainder.
ASCII Conversion:
For each digit (0–9), adding the immediate0x30
(48 in decimal) converts it to its corresponding ASCII code. For hexadecimal digits (10–15), adding0x37
converts them to ‘A’–‘F’.
D. Conditional Branching and Zero Suppression
To avoid printing leading zeros, we implemented conditional logic:
x86_64: We used
cmp $0, %rax
followed byJE .no_tens
to check if the tens digit is zero.AArch64: We executed
CMP x12, #0
andB.EQ no_tens
to branch over the code that prints the tens digit if it is zero.
E. Debugging Techniques
Each debugging tool provided a different view into the program’s execution. Objdump confirms the static structure of the binary, while gdb shows dynamic register states.
3. Architectural Comparison: x86_64 vs. AArch64
These differences illustrate the evolution of CPU design: AArch64’s uniform register file and simpler, more orthogonal instructions contrast with the legacy and complexity inherent in x86_64.
Register Architecture
x86_64:
Uses 16 general-purpose registers (RAX, RBX, …, R15). Instructions in AT&T syntax require$
for immediates and%
for registers. Division is complex; for example,div
uses RDX:RAX.AArch64:
Provides 31 general-purpose registers (X0–X30) with a more uniform naming scheme. ARM instructions are of fixed 32-bit width, and registers are used consistently (e.g., X0–X7 for function arguments). The system call interface is simpler: X8 for the syscall number and X0–X5 for arguments.
Instruction Set and Syntax
x86_64:
Rich, complex instruction set with variable-length instructions. Memory addressing is flexible (e.g., using base + index*scale). The AT&T syntax’s source-first, destination-second ordering contrasts with Intel’s more natural destination-first syntax.AArch64:
A true RISC design, with fixed-length instructions and a load-store architecture (memory operations and arithmetic operations are separate). It uses a more straightforward operand order (destination first) and does not require prefixes like%
or$
.
System Call Interface
x86_64:
Syscalls use RAX for the syscall number and RDI, RSI, RDX, etc. for arguments. The instructionsyscall
is used to transition to kernel mode.AArch64:
Syscalls use X8 for the syscall number and X0–X5 for arguments. TheSVC #0
instruction is used for invoking system calls.
4. Optional Challenge: Multiplication Tables
For an extra challenge, I implemented a times tables program in AArch64 assembly. This program uses nested loops to calculate and print the product of numbers from 1 to 12. The approach was:
Nested Loops:
The outer loop iterates over multipliers (1–12), and the inner loop iterates over multiplicands (1–12).Arithmetic Operations:
Multiply the loop indices to get the product, then convert the product to a formatted string.Formatted Output:
Use subroutines to format numbers with proper spacing and leading-zero suppression.Separators:
Print a line (e.g., “-------------”) between each table.
Reflections
Lab 5 was an intense and rewarding. Writing assembly forced me to think in terms of individual instructions, registers, and syscalls. There’s no hiding behind abstractions—you directly control every operation.
Source Code Links:
Subscribe to my newsletter
Read articles from Samarth Sharma directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Samarth Sharma
Samarth Sharma
Looping around thinking to write it down...