Diving into the Machine: Lab 5
data:image/s3,"s3://crabby-images/8ffe5/8ffe597b4dc06ccb349371f673748aff4cabc64b" alt="Samarth Sharma"
Introduction
In Lab 5, we delved into the fundamentals of 64-bit assembly programming by implementing simple loop-based programs on both x86_64 and AArch64 platforms. Our tasks included writing assembly code to print a message repeatedly, convert loop index values to their ASCII representations (both decimal and hexadecimal), and format the output with or without leading zeros.
1. Lab Setup
Extracting the Examples
The lab examples were provided in the archive /public/spo600-assembler-lab-examples.tgz
. Unpacking was done using:
shCopyEditcd ~
tar xvf /public/spo600-assembler-lab-examples.tgz
This yielded a directory structure with both C and assembly versions of “Hello World” programs for x86_64 and AArch64.
2. Writing Assembly Programs
Our primary goal was to create programs that print a looped message with numeric output. We’ll design our loops manually selecting registers, setting up initial values, and carefully crafting conditional branches.
A. Looping Structures and Register Management
Both architectures require explicit loop construction. In our programs, we used a dedicated register as a counter:
x86_64: We used, for example,
%r15
as the loop counter. The loop is implemented by incrementing%r15
and usingcmp
followed by a conditional jump (jl
for “jump if less”) to repeat the loop.AArch64: We used a general-purpose register (e.g.,
x19
) as the loop counter. ARM’s compare-and-branch instructions (CMP
andB.LT
/B.NE
) form the core of the loop. The uniformity of AArch64’s registers simplifies management since there’s no need to distinguish between partial registers (W0 vs. X0) except when size matters.
B. System Calls and Data Output
Since we avoided C libraries, output was handled using direct Linux syscalls. The syscalls are invoked differently on each platform:
x86_64:
Syscall Number: Placed in
RAX
(e.g.,1
forwrite
,60
forexit
).Arguments: Passed in registers
%rdi
,%rsi
,%rdx
, etc.Invocation: The
syscall
instruction triggers the kernel.
AArch64:
Syscall Number: Placed in
X8
(e.g.,64
forwrite
,93
forexit
).Arguments: Passed in
X0
–X5
in order.Invocation:
SVC #0
(Supervisor Call) invokes the kernel.
Technical Detail:
For example, to print a string in AArch64, we set up the registers as follows:
assemblyCopyEditmov x0, #1 // stdout file descriptor
mov x1, <buffer> // pointer to our message
mov x2, #<length> // message length
mov x8, #64 // syscall number for write
svc #0 // perform syscall
It can be viewed as a stark contrast to higher-level I/O functions that manage these details for you.
C. Converting Numbers to ASCII
A major challenge was converting a loop index (an integer) into its ASCII representation. The process involves:
Division and Remainder Calculation:
x86_64: The
div
instruction divides the 128-bit dividend (RDX:RAX
) by a divisor (e.g.,10
). Before usingdiv
, you must zero outRDX
(usingxor %rdx, %rdx
) to ensure correct results.AArch64: The
UDIV
instruction only provides the quotient. We then useMSUB
to compute the remainder:assemblyCopyEditudiv x1, x19, x3 // x1 = quotient (number/10) mul x2, x1, x3 // Multiply quotient by divisor. sub x2, x19, x2 // x2 = remainder.
ASCII Conversion:
For each digit (0–9), adding the immediate0x30
(48 in decimal) converts it to its corresponding ASCII code. For hexadecimal digits (10–15), adding0x37
converts them to ‘A’–‘F’.
D. Conditional Branching and Zero Suppression
To avoid printing leading zeros, we implemented conditional logic:
x86_64: We used
cmp $0, %rax
followed byJE .no_tens
to check if the tens digit is zero.AArch64: We executed
CMP x12, #0
andB.EQ no_tens
to branch over the code that prints the tens digit if it is zero.
E. Debugging Techniques
Each debugging tool provided a different view into the program’s execution. Objdump confirms the static structure of the binary, while gdb shows dynamic register states. Comparing against gcc-generated assembly is a sanity check that your low-level logic is sound.
3. Architectural Comparison: x86_64 vs. AArch64
These differences illustrate the evolution of CPU design: AArch64’s uniform register file and simpler, more orthogonal instructions contrast with the legacy and complexity inherent in x86_64.
Register Architecture
x86_64:
Uses 16 general-purpose registers (RAX, RBX, …, R15). Instructions in AT&T syntax require$
for immediates and%
for registers. Division is complex; for example,div
uses RDX:RAX.AArch64:
Provides 31 general-purpose registers (X0–X30) with a more uniform naming scheme. ARM instructions are of fixed 32-bit width, and registers are used consistently (e.g., X0–X7 for function arguments). The system call interface is simpler: X8 for the syscall number and X0–X5 for arguments.
Instruction Set and Syntax
x86_64:
Rich, complex instruction set with variable-length instructions. Memory addressing is flexible (e.g., using base + index*scale). The AT&T syntax’s source-first, destination-second ordering contrasts with Intel’s more natural destination-first syntax.AArch64:
A true RISC design, with fixed-length instructions and a load-store architecture (memory operations and arithmetic operations are separate). It uses a more straightforward operand order (destination first) and does not require prefixes like%
or$
.
System Call Interface
x86_64:
Syscalls use RAX for the syscall number and RDI, RSI, RDX, etc. for arguments. The instructionsyscall
is used to transition to kernel mode.AArch64:
Syscalls use X8 for the syscall number and X0–X5 for arguments. TheSVC #0
instruction is used for invoking system calls.
4. Optional Challenge: Multiplication Tables
For an extra challenge, I implemented a times tables program in AArch64 assembly. This program uses nested loops to calculate and print the product of numbers from 1 to 12. The approach was:
Nested Loops:
The outer loop iterates over multipliers (1–12), and the inner loop iterates over multiplicands (1–12).Arithmetic Operations:
Multiply the loop indices to get the product, then convert the product to a formatted string.Formatted Output:
Use subroutines to format numbers with proper spacing and leading-zero suppression.Separators:
Print a line (e.g., “-------------”) between each table.
Conclusion and Reflections
Lab 5 was an intense and rewarding journey into the heart of assembly programming. Here are some key takeaways:
Understanding at the Machine Level:
Writing assembly forced me to think in terms of individual instructions, registers, and syscalls. There’s no hiding behind abstractions—you directly control every operation.Debugging and Verification:
Tools likeobjdump
,gcc -S
, and gdb were indispensable. They provided multiple perspectives on the code, from static binary structure to dynamic register states.Architectural Differences:
Comparing x86_64 and AArch64 taught me that while both are 64-bit, their design philosophies differ. AArch64’s regular, RISC-based design contrasts with the complex, CISC-based x86_64, affecting everything from loop construction to system call invocation.
Source Code Links:
Subscribe to my newsletter
Read articles from Samarth Sharma directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
data:image/s3,"s3://crabby-images/8ffe5/8ffe597b4dc06ccb349371f673748aff4cabc64b" alt="Samarth Sharma"
Samarth Sharma
Samarth Sharma
Looping around thinking to write it down...