Diving into the Machine: Lab 5

Samarth SharmaSamarth Sharma
5 min read

Introduction

In Lab 5, we delved into the fundamentals of 64-bit assembly programming by implementing simple loop-based programs on both x86_64 and AArch64 platforms. Our tasks included writing assembly code to print a message repeatedly, convert loop index values to their ASCII representations (both decimal and hexadecimal), and format the output with or without leading zeros.


1. Lab Setup

Extracting the Examples

The lab examples were provided in the archive /public/spo600-assembler-lab-examples.tgz. Unpacking was done using:

shCopyEditcd ~
tar xvf /public/spo600-assembler-lab-examples.tgz

This yielded a directory structure with both C and assembly versions of “Hello World” programs for x86_64 and AArch64.


2. Writing Assembly Programs

Our primary goal was to create programs that print a looped message with numeric output. We’ll design our loops manually selecting registers, setting up initial values, and carefully crafting conditional branches.

A. Looping Structures and Register Management

Both architectures require explicit loop construction. In our programs, we used a dedicated register as a counter:

  • x86_64: We used, for example, %r15 as the loop counter. The loop is implemented by incrementing %r15 and using cmp followed by a conditional jump (jl for “jump if less”) to repeat the loop.

  • AArch64: We used a general-purpose register (e.g., x19) as the loop counter. ARM’s compare-and-branch instructions (CMP and B.LT/B.NE) form the core of the loop. The uniformity of AArch64’s registers simplifies management since there’s no need to distinguish between partial registers (W0 vs. X0) except when size matters.


B. System Calls and Data Output

Since we avoided C libraries, output was handled using direct Linux syscalls. The syscalls are invoked differently on each platform:

  • x86_64:

    • Syscall Number: Placed in RAX (e.g., 1 for write, 60 for exit).

    • Arguments: Passed in registers %rdi, %rsi, %rdx, etc.

    • Invocation: The syscall instruction triggers the kernel.

  • AArch64:

    • Syscall Number: Placed in X8 (e.g., 64 for write, 93 for exit).

    • Arguments: Passed in X0X5 in order.

    • Invocation: SVC #0 (Supervisor Call) invokes the kernel.

Technical Detail:
For example, to print a string in AArch64, we set up the registers as follows:

assemblyCopyEditmov     x0, #1         // stdout file descriptor
mov     x1, <buffer>   // pointer to our message
mov     x2, #<length>  // message length
mov     x8, #64        // syscall number for write
svc     #0             // perform syscall

It can be viewed as a stark contrast to higher-level I/O functions that manage these details for you.


C. Converting Numbers to ASCII

A major challenge was converting a loop index (an integer) into its ASCII representation. The process involves:

  1. Division and Remainder Calculation:

    • x86_64: The div instruction divides the 128-bit dividend (RDX:RAX) by a divisor (e.g., 10). Before using div, you must zero out RDX (using xor %rdx, %rdx) to ensure correct results.

    • AArch64: The UDIV instruction only provides the quotient. We then use MSUB to compute the remainder:

        assemblyCopyEditudiv    x1, x19, x3        // x1 = quotient (number/10)
        mul     x2, x1, x3         // Multiply quotient by divisor.
        sub     x2, x19, x2        // x2 = remainder.
      
  2. ASCII Conversion:
    For each digit (0–9), adding the immediate 0x30 (48 in decimal) converts it to its corresponding ASCII code. For hexadecimal digits (10–15), adding 0x37 converts them to ‘A’–‘F’.


D. Conditional Branching and Zero Suppression

To avoid printing leading zeros, we implemented conditional logic:

  • x86_64: We used cmp $0, %rax followed by JE .no_tens to check if the tens digit is zero.

  • AArch64: We executed CMP x12, #0 and B.EQ no_tens to branch over the code that prints the tens digit if it is zero.


E. Debugging Techniques

Each debugging tool provided a different view into the program’s execution. Objdump confirms the static structure of the binary, while gdb shows dynamic register states. Comparing against gcc-generated assembly is a sanity check that your low-level logic is sound.


3. Architectural Comparison: x86_64 vs. AArch64

These differences illustrate the evolution of CPU design: AArch64’s uniform register file and simpler, more orthogonal instructions contrast with the legacy and complexity inherent in x86_64.

Register Architecture

  • x86_64:
    Uses 16 general-purpose registers (RAX, RBX, …, R15). Instructions in AT&T syntax require $ for immediates and % for registers. Division is complex; for example, div uses RDX:RAX.

  • AArch64:
    Provides 31 general-purpose registers (X0–X30) with a more uniform naming scheme. ARM instructions are of fixed 32-bit width, and registers are used consistently (e.g., X0–X7 for function arguments). The system call interface is simpler: X8 for the syscall number and X0–X5 for arguments.

Instruction Set and Syntax

  • x86_64:
    Rich, complex instruction set with variable-length instructions. Memory addressing is flexible (e.g., using base + index*scale). The AT&T syntax’s source-first, destination-second ordering contrasts with Intel’s more natural destination-first syntax.

  • AArch64:
    A true RISC design, with fixed-length instructions and a load-store architecture (memory operations and arithmetic operations are separate). It uses a more straightforward operand order (destination first) and does not require prefixes like % or $.

System Call Interface

  • x86_64:
    Syscalls use RAX for the syscall number and RDI, RSI, RDX, etc. for arguments. The instruction syscall is used to transition to kernel mode.

  • AArch64:
    Syscalls use X8 for the syscall number and X0–X5 for arguments. The SVC #0 instruction is used for invoking system calls.


4. Optional Challenge: Multiplication Tables

For an extra challenge, I implemented a times tables program in AArch64 assembly. This program uses nested loops to calculate and print the product of numbers from 1 to 12. The approach was:

  • Nested Loops:
    The outer loop iterates over multipliers (1–12), and the inner loop iterates over multiplicands (1–12).

  • Arithmetic Operations:
    Multiply the loop indices to get the product, then convert the product to a formatted string.

  • Formatted Output:
    Use subroutines to format numbers with proper spacing and leading-zero suppression.

  • Separators:
    Print a line (e.g., “-------------”) between each table.


Conclusion and Reflections

Lab 5 was an intense and rewarding journey into the heart of assembly programming. Here are some key takeaways:

  • Understanding at the Machine Level:
    Writing assembly forced me to think in terms of individual instructions, registers, and syscalls. There’s no hiding behind abstractions—you directly control every operation.

  • Debugging and Verification:
    Tools like objdump, gcc -S, and gdb were indispensable. They provided multiple perspectives on the code, from static binary structure to dynamic register states.

  • Architectural Differences:
    Comparing x86_64 and AArch64 taught me that while both are 64-bit, their design philosophies differ. AArch64’s regular, RISC-based design contrasts with the complex, CISC-based x86_64, affecting everything from loop construction to system call invocation.

Source Code Links:

0
Subscribe to my newsletter

Read articles from Samarth Sharma directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Samarth Sharma
Samarth Sharma

Looping around thinking to write it down...