Introduction

In Lab 5, we delved into the fundamentals of 64-bit assembly programming by implementing simple loop-based programs on both x86_64 and AArch64 platforms. We have been tasked with writing assembly code to print a message repeatedly, convert loop index values to their ASCII representations (both decimal and hexadecimal), and format the output with or without leading zeros. Click here to see Lab instructions.

1. Lab Setup

Extracting the Examples

We are provided in with /public/spo600-assembler-lab-examples.tgz and we have done Unpacking using:

shCopyEditcd ~
tar xvf /public/spo600-assembler-lab-examples.tgz

Getting a directory structure with both C and assembly versions of “Hello World” programs for x86_64 and AArch64.

2. Writing Assembly Programs

First, we’ll create programs that print a looped message with numeric output. Getting our hands with manually selecting registers, setting up initial values, and carefully crafting conditional branches.

A. Looping Structures and Register Management

In our programs, we used a dedicated register as a counter:

x86_64: We used, for example, %r15 as the loop counter. The loop is implemented by incrementing %r15 and using cmp followed by a conditional jump (jl for “jump if less”) to repeat the loop.
AArch64: We used a general-purpose register (e.g., x19) as the loop counter. ARM’s compare-and-branch instructions (CMP and B.LT/B.NE) form the core of the loop. The uniformity of AArch64’s registers simplifies management since there’s no need to distinguish between partial registers (W0 vs. X0) except when size matters.

B. System Calls and Data Output

As avoiding C libraries, output was handled using direct Linux syscalls. Below is the difference for invoking it on different architecture.

x86_64:
- Syscall Number: Placed in RAX (e.g., 1 for write, 60 for exit).
- Arguments: Passed in registers %rdi, %rsi, %rdx, etc.
- Invocation: The syscall instruction triggers the kernel.
AArch64:
- Syscall Number: Placed in X8 (e.g., 64 for write, 93 for exit).
- Arguments: Passed in X0–X5 in order.
- Invocation: SVC #0 (Supervisor Call) invokes the kernel.

For example, to print a string in AArch64, we set up the registers as follows:

assemblyCopyEditmov     x0, #1         // stdout file descriptor
mov     x1, <buffer>   // pointer to our message
mov     x2, #<length>  // message length
mov     x8, #64        // syscall number for write
svc     #0             // perform syscall

It can be viewed as a stark contrast to higher-level I/O functions that manage these details for us.

C. Converting Numbers to ASCII

A major challenge was converting a loop index (an integer) into its ASCII representation. The process involves:

Division and Remainder Calculation:
- x86_64: The div instruction divides the 128-bit dividend (RDX:RAX) by a divisor (e.g., 10). Before using div, you must zero out RDX (using xor %rdx, %rdx) to ensure correct results.
- AArch64: The UDIV instruction only provides the quotient. We then use MSUB to compute the remainder:
```
  assemblyCopyEditudiv    x1, x19, x3        // x1 = quotient (number/10)
  mul     x2, x1, x3         // Multiply quotient by divisor.
  sub     x2, x19, x2        // x2 = remainder.
```
ASCII Conversion:
For each digit (0–9), adding the immediate 0x30 (48 in decimal) converts it to its corresponding ASCII code. For hexadecimal digits (10–15), adding 0x37 converts them to ‘A’–‘F’.

D. Conditional Branching and Zero Suppression

To avoid printing leading zeros, we implemented conditional logic:

x86_64: We used cmp $0, %rax followed by JE .no_tens to check if the tens digit is zero.
AArch64: We executed CMP x12, #0 and B.EQ no_tens to branch over the code that prints the tens digit if it is zero.

E. Debugging Techniques

Each debugging tool provided a different view into the program’s execution. Objdump confirms the static structure of the binary, while gdb shows dynamic register states.

3. Architectural Comparison: x86_64 vs. AArch64

These differences illustrate the evolution of CPU design: AArch64’s uniform register file and simpler, more orthogonal instructions contrast with the legacy and complexity inherent in x86_64.

Register Architecture

x86_64:
Uses 16 general-purpose registers (RAX, RBX, …, R15). Instructions in AT&T syntax require $ for immediates and % for registers. Division is complex; for example, div uses RDX:RAX.
AArch64:
Provides 31 general-purpose registers (X0–X30) with a more uniform naming scheme. ARM instructions are of fixed 32-bit width, and registers are used consistently (e.g., X0–X7 for function arguments). The system call interface is simpler: X8 for the syscall number and X0–X5 for arguments.

Instruction Set and Syntax

x86_64:
Rich, complex instruction set with variable-length instructions. Memory addressing is flexible (e.g., using base + index*scale). The AT&T syntax’s source-first, destination-second ordering contrasts with Intel’s more natural destination-first syntax.
AArch64:
A true RISC design, with fixed-length instructions and a load-store architecture (memory operations and arithmetic operations are separate). It uses a more straightforward operand order (destination first) and does not require prefixes like % or $.

System Call Interface

x86_64:
Syscalls use RAX for the syscall number and RDI, RSI, RDX, etc. for arguments. The instruction syscall is used to transition to kernel mode.
AArch64:
Syscalls use X8 for the syscall number and X0–X5 for arguments. The SVC #0 instruction is used for invoking system calls.

4. Optional Challenge: Multiplication Tables

For an extra challenge, I implemented a times tables program in AArch64 assembly. This program uses nested loops to calculate and print the product of numbers from 1 to 12. The approach was:

Nested Loops:
The outer loop iterates over multipliers (1–12), and the inner loop iterates over multiplicands (1–12).
Arithmetic Operations:
Multiply the loop indices to get the product, then convert the product to a formatted string.
Formatted Output:
Use subroutines to format numbers with proper spacing and leading-zero suppression.
Separators:
Print a line (e.g., “-------------”) between each table.

Reflections

Lab 5 was an intense and rewarding. Writing assembly forced me to think in terms of individual instructions, registers, and syscalls. There’s no hiding behind abstractions—you directly control every operation.

Source Code Links:

Diving into the Machine: Lab 5