Introduction to 64 bit Assembly
In this blog, I am going to talk about x86 and Aarch64 assemblers, and their basic commands and we are going to code a simple loop that prints the index. Seems like a simple task right? Guess again, as we are going to do it in assembly language for two different architectures(x86 and Aarch64). It is going to be an awful lot of code to print a simple index, so bear with me until the end.
Arch 64 Assembler
.text
.globl _start
min = 0 /* starting value for the loop index; **note that this is a symbol (constant)**, not a variable */
max = 10 /* loop exits when the index hits this number (loop condition is i<max) */
_start:
mov x19, min
loop:
/* ... body of the loop ... do something useful here ... */
add x19, x19, 1
cmp x19, max
b.ne loop
mov x0, 0 /* status -> 0 */
mov x8, 93 /* exit is syscall #93 */
svc 0 /* invoke syscall */
Aarch 64 is a Reduced Instruction Set Computer (RISC) processor. In simple words, the instructions for this assembler are simple and can only do one task at a time. Here is a fairly simple piece of Aarch 64 code, which loops 10 times but doesn't do anything inside the loop. The program starts at the label start, and in the last 3 lines of code, we use the syscall to exit the program. Notice our program starts at _start
but if we use the c-complier instead of the assembler, our program is going start at label main
. Lots of talking now let's get this code to print loop 10 times. We will be using this quick start guide to code.
In order to do that we are going to define our msg which is going to be "Loop" and we are going to do that in the .data
section. After that, we are going to load the x0
register with 1, which is stdout, following that register is going to be our x1
register which is going to contain the address of our message and lastly x2
is going to contain the length of the message, and after that we are going invoke the system call which is going to print the message to the screen.
.text
.globl _start
min = 0 /* starting value for the loop index; **note that this is a symbol (constant)**, not a variable */
max = 10 /* loop exits when the index hits this number (loop condition is i<max) */
_start:
mov x19, min
loop:
/* Print the Message */
mov x0, 1 /* file descriptor: 1 is stdout */
adr x1, msg /* message location (memory address) */
mov x2, len /* message length (bytes) */
mov x8, 64 /* write is syscall #64 */
svc 0 /* invoke syscall */
/* Continue the Loop */
add x19, x19, 1
cmp x19, max
b.ne loop
/* Exit the Program */
mov x0, 0 /* status -> 0 */
mov x8, 93 /* exit is syscall #93 */
svc 0 /* invoke syscall */
.data
msg: .ascii "Loop\n"
len= . - msg
Here is the output of the above code:
Now let's print the index number along with "Loop", it should look something like this:
In order to do that we first add some filler character to our message (We are using '#'), and inside our loop, we are going to replace this filler with our index, but first we would have to convert our index to a character, we can do it simply by adding 48(ASCII code for '0') to our value. Here is the assembly code that prints the loop index.
.text
.globl _start
min = 0 /* starting value for the loop index; **note that this is a symbol (constant)**, not a variable */
max = 31 /* loop exits when the index hits this number (loop condition is i<max) */
div = 10
_start:
mov x19, min
loop:
/* Convert Loop counter to char */
add x15, x19, '0'
adr x14, msg+6
strb w15, [x14]
/* Print the Message */
mov x0, 1 /* file descriptor: 1 is stdout */
adr x1, msg /* message location (memory address) */
mov x2, len /* message length (bytes) */
mov x8, 64 /* write is syscall #64 */
svc 0 /* invoke syscall */
/* Continue the Loop */
add x19, x19, 1
cmp x19, max
b.ne loop
/* Exit the Program */
mov x0, 0 /* status -> 0 */
mov x8, 93 /* exit is syscall #93 */
svc 0 /* invoke syscall */
.data
msg: .ascii "Loop: #\n"
len= . - msg
There is a limitation for the above code it can only print a single-digit index, let's modify our code so that it can print numbers up to 30. We can do that by dividing our index by 10, the quotient is the first digit and the remainder is the second digit. The only problem is the udiv
instruction in Arch 64 doesn't calculate the reminder, instead, we will be using the msub
instruction to calculate the reminder.
.text
.globl _start
min = 0 /* starting value for the loop index; **note that this is a symbol (constant)**, not a variable */
max = 31 /* loop exits when the index hits this number (loop condition is i<max) */
div = 10
_start:
mov x19, min
loop:
/* Initializing the registers */
mov x10, x19
mov x12, div
/* Calculating the Quotient (First Digit) */
udiv x11, x10, x12
/* Calculating the Reminder (Second Digit) */
msub x13, x12, x11, x10
/* Convert Loop counter to char and replacing the First Digit in the message */
add x15, x11, '0'
adr x14, msg+6
strb w15, [x14]
/* Convert Loop counter to char and replacing the Second Digit in the message */
add x16, x13, '0'
adr x17, msg+7
strb w16, [x17]
/* Print the Message */
mov x0, 1 /* file descriptor: 1 is stdout */
adr x1, msg /* message location (memory address) */
mov x2, len /* message length (bytes) */
mov x8, 64 /* write is syscall #64 */
svc 0 /* invoke syscall */
/* Continue the Loop */
add x19, x19, 1
cmp x19, max
b.ne loop
/* Exit the Program */
mov x0, 0 /* status -> 0 */
mov x8, 93 /* exit is syscall #93 */
svc 0 /* invoke syscall */
.data
msg: .ascii "Loop: #\n"
len= . - msg
Here is the output for the following code:
Let's make this output a little prettier by suppressing the leading zero. In order to do that we will be using the cpm
and b.eq
instructions to create a conditional jump to printing the second digit if the first digit is a zero.
.text
.globl _start
min = 0 /* starting value for the loop index; **note that this is a symbol (constant)**, not a variable */
max = 31 /* loop exits when the index hits this number (loop condition is i<max) */
div = 10
_start:
mov x19, min
loop:
/* Convert Loop counter to char */
mov x10, x19
mov x12, div
udiv x11, x10, x12
msub x13, x12, x11, x10
cmp x11, 0
b.eq second
first:
add x15, x11, '0'
adr x14, msg+6
strb w15, [x14]
second:
add x16, x13, '0'
adr x17, msg+7
strb w16, [x17]
/* Print the Message */
mov x0, 1 /* file descriptor: 1 is stdout */
adr x1, msg /* message location (memory address) */
mov x2, len /* message length (bytes) */
mov x8, 64 /* write is syscall #64 */
svc 0 /* invoke syscall */
/* Continue the Loop */
add x19, x19, 1
cmp x19, max
b.ne loop
/* Exit the Program */
mov x0, 0 /* status -> 0 */
mov x8, 93 /* exit is syscall #93 */
svc 0 /* invoke syscall */
.data
msg: .ascii "Loop: #\n"
len= . - msg
Here is the output of this code:
Now Let's jump to the x86 Assembler
x86 Assembler
We are going to code the same loop but this time with the x86 assembler. Here is the starter code, which is just a simple loop but it prints nothing. We will be using this quick start guide for our reference.
.text
.globl _start
min = 0 /* starting value for the loop index; **note that this is a symbol (constant)**, not a variable */
max = 10 /* loop exits when the index hits this number (loop condition is i<max) */
_start:
mov $min,%r15 /* loop index */
loop:
/* ... body of the loop ... do something useful here ... */
inc %r15 /* increment index */
cmp $max,%r15 /* see if we're done */
jne loop /* loop if we're not */
mov $0,%rdi /* exit status */
mov $60,%rax /* syscall sys_exit */
syscall
x86 is a Complex Instruction Set Computer (CISC) processor. In simple words, the instructions for this assembler are complex and can do multiple tasks. Now let's print "Loop" using this code. In order to do that we will have to add our msg in the .section .data
. After that it is pretty much the same as 64Arm we set up the registers containing the address to the message and the length of the message and finally we perform a syscall.
.text
.globl _start
min = 0 /* starting value for the loop index; **note that this is a symbol (constant)**, not a variable */
max = 10 /* loop exits when the index hits this number (loop condition is i<max) */
_start:
mov $min,%r15 /* loop index */
loop:
/* ... body of the loop ... do something useful here ... */
movq $len,%rdx /* message length */
movq $msg,%rsi /* message location */
movq $1,%rdi /* file descriptor stdout */
movq $1,%rax /* syscall sys_write */
syscall
inc %r15 /* increment index */
cmp $max,%r15 /* see if we're done */
jne loop /* loop if we're not */
mov $0,%rdi /* exit status */
mov $60,%rax /* syscall sys_exit */
syscall
.section .data
msg: .ascii "Loop #\n"
len = . - msg
Here is the output for the following code:
Now, Let's add an index number in the message by replacing the '#' as before.
.text
.globl _start
min = 0 /* starting value for the loop index; **note that this is a symbol (constant)**, not a variable */
max = 10 /* loop exits when the index hits this number (loop condition is i<max) */
_start:
mov $min,%r15 /* loop index */
loop:
/* Convert Loop Counter to char */
mov %r15, %r14
add $'0', %r14
mov %r14b, msg+6
/* Print the Message */
movq $len,%rdx /* message length */
movq $msg,%rsi /* message location */
movq $1,%rdi /* file descriptor stdout */
movq $1,%rax /* syscall sys_write */
syscall
/* Continue the Loop */
inc %r15 /* increment index */
cmp $max,%r15 /* see if we're done */
jne loop /* loop if we're not */
/* Exit the program */
mov $0,%rdi /* exit status */
mov $60,%rax /* syscall sys_exit */
syscall
.section .data
msg: .ascii "Loop: #\n"
len = . - msg
Here is the output of the following code:
Now let's extend this loop to print numbers until 30, luckily as x86 is a CISC processor, the div
instruction does calculate the remainder, unlike Aarch64. So we can calculate the first digit which is the quotient (It will be stored in the rax register) and the second digit which is the reminder (It will be stored in the rdx register).
.text
.globl _start
min = 0 /* starting value for the loop index; **note that this is a symbol (constant)**, not a variable */
max = 31 /* loop exits when the index hits this number (loop condition is i<max) */
div = 10
_start:
mov $min,%r15 /* loop index */
loop:
/* Initilize the Registers */
mov $0, %rdx
mov %r15, %rax
mov $div, %r13
/* Perform the Divison */
div %r13
/* Move the Quotient and Reminder to different registers */
mov %rax, %r12
mov %rdx, %r11
/* Replace the First Digit in the message */
add $'0', %r12
mov %r12b, msg+6
/* Replace the Second Digit in the message */
add $'0', %r11
mov %r11b, msg+7
/* Print the Message */
movq $len,%rdx /* message length */
movq $msg,%rsi /* message location */
movq $1,%rdi /* file descriptor stdout */
movq $1,%rax /* syscall sys_write */
syscall
/* Continue the Loop */
inc %r15 /* increment index */
cmp $max,%r15 /* see if we're done */
jne loop /* loop if we're not */
/* Exit the program */
mov $0,%rdi /* exit status */
mov $60,%rax /* syscall sys_exit */
syscall
.section .data
msg: .ascii "Loop: ##\n"
len = . - msg
Here is the Output of the following Program:
Now let's remove the leading zeros, we do that by using the cmp
and je
instruction.
.text
.globl _start
min = 0 /* starting value for the loop index; **note that this is a symbol (constant)**, not a variable */
max = 31 /* loop exits when the index hits this number (loop condition is i<max) */
div = 10
_start:
mov $min,%r15 /* loop index */
loop:
/* Convert Loop Counter to char */
mov $0, %rdx
mov %r15, %rax
mov $div, %r13
div %r13
mov %rax, %r12
mov %rdx, %r11
cmp $0, %r12
je second
first:
add $'0', %r12
mov %r12b, msg+6
second:
add $'0', %r11
mov %r11b, msg+7
/* Print the Message */
movq $len,%rdx /* message length */
movq $msg,%rsi /* message location */
movq $1,%rdi /* file descriptor stdout */
movq $1,%rax /* syscall sys_write */
syscall
/* Continue the Loop */
inc %r15 /* increment index */
cmp $max,%r15 /* see if we're done */
jne loop /* loop if we're not */
/* Exit the program */
mov $0,%rdi /* exit status */
mov $60,%rax /* syscall sys_exit */
syscall
.section .data
msg: .ascii "Loop: \n"
len = . - msg
My Experience
When coding assembler, surprisingly I was not stuck on a single error for a long time, which happens when I code in other languages maybe because I was accomplishing a simple task. My existing knowledge about the 6502 processor really helped me understand both assembly languages. It made me comfortable to work with registers instead of variables and labels instead of functions. One thing that I was happy about when coding in 64-bit assembly was that I no longer had to store values in memory and load them back in the register for mathematical operations as I had 4 times the amount of register in 64-bit Assembly compared to the 6502 processor.
If I had to choose between x86 and 64 Arm, I would go with 64 Arm as the instructions are very simple as well as the syntax. Debugging in Aarch 64 is also easier than in 64 Arm, as there are very few compile errors, at least I did not face any compile errors during this lab, but there were some semantic errors in my code, for instance, I was adding 30 to the index value to convert to a number but I didn't realize that I have to 30 in hexadecimal not in decimal, these were some of the minor errors that I faced during this lab. As always all the above code can be found here. Please feel free to propose changes by opening a Pull request.
Sources:
Tyler, C. (n.d.). Software portability and optimization. matrix.senecapolytechnic.ca. matrix.senecapolytechnic.ca/~chris.tyler/wi..
Subscribe to my newsletter
Read articles from Steven David Pillay directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by