Structure of an Assembly Program
Assembly Program
The below assembly program copies a string to another string. An assembly program includes:
labels
directives
assembly instructions
program comments
Labels
A label, such as strcpy
, stop
, srcStr
, and dstStr
, represents the memory address of the data or instruction marked with that label. The assembler replaces each label with its memory address, or its memory address offset when generating the executable.
A label must start with the beginning of a line without any leading space.
A label can be a function name (such as "_main"), which is the memory address of the first instruction of a function. The "__main" label is exported to allow the linker to find it and resolve this label.
Directives
The directives provide valuable information for assisting the assembler. The example uses directives PROC and ENDP to declare the start and the end of a function (or called a subroutine).
END indicates the end of an assembly program file. AREA defines code or data regions. ENTRY designates the initial entry into the program. ALIGN specifies the requirement of memory address alignment. DCB allocates and defines data.
Assembly Instructions
An assembly instruction is a machine command that controls the program flow or manipulates data. Some instructions are pseudo instructions, which are not real machine commands but are allowed in assembly language code. The assembler translates a pseudo instruction, such as "LDR rl, =srcStr" in the example code, into a real instruction. Pseudo instructions make the job of writing assembly language programs easier.
Comments
A comment is a text annotation that explains the programmer's intentions or assumptions. It aims to improve inter-programmer communication and code readability. A comment in an assembly program starts with a semicolon. Assemblers ignore everything after the semi-colon in that line.
The example has two areas:
Data Area:
The data area defines two strings:srcStr
anddstStr
. The program allocates memory space for both strings and gives them initial values. The NULL character terminates a string.Code Area:
The code area includes a function named__main
, which is equivalent to themain()
function in a C program. This program copies stringsrcStr
to stringdstStr
.
Cortex-M Assembly Instructions Categories
Most assembly instructions of Cortex-M3 can be classified into the following four categories:
arithmetic, shift, and logic instructions
data movement instructions
compare and branch instructions
miscellaneous instructions for various functions such as debugging
In addition to these instructions, Cortex-M4 and M7 also support:
digital signal processing instructions
floating-point instructions
Arithmetic, Shift, and Logic instructions
Arithmetic Instructions
Addition:
ADD
ADC (add with carry)
Subtraction:
SUB
RSB (reverse subtract)
SBC (subtract with carry)
Multiplication:
MUL
MLA (multiply with accumulate)
MLS (multiply with subtract)
SMULL (signed long multiply)
UMULL (unsigned long multiply)
SMLAL (signed long multiply with accumulate)
UMLAL (unsigned long multiply with accumulate)
Division:
SDIV (signed)
UDIV (unsigned)
Saturation:
SSAT (signed)
USAT (unsigned)
Extension:
SXTB (signed-extend a byte)
SXTH (signed-extend a halfword)
UXTB (unsigned- extend or zero-extend a byte)
UXTH (unsigned- extend or zero-extend a halfword)
Bit field Extract:
SBFX (signed extraction)
UBFX (unsigned extraction)
Shift Instructions
LSL (logic shift left)
LSR (logic shift right)
ASR (arithmetic shift right)
ROR (rotate write)
RRX (rotate right with extend)
Logic Instructions
AND (bitwise AND)
ORR (bitwise OR)
EOR (bitwise Exclusive OR)
ORN (bitwise OR NOT)
MVN (move NOT or bitwise NOT)
Bit Instructions
BFC (bit field clear)
BFI (bit field insert)
BIC (bit clear)
CLZ (count leading zeros)
RBIT (reverse bit order in a word)
REV (reverse byte order in a word)
REV16 (reverse byte order in each halfword independently)
REVSH (reverse byte order in the bottom halfword, and signed extend to 32 bits)
Data Movement Instructions
Read Data Memory
LDRB (load byte)
LDRH (load halfword)
LDR (load word)
LDRD (load double word)
LDRSB (load signed byte)
LDRSH (load signed halfword)
LDM, LDMDB, LDMFD (load multiple words)
LDREXB, LDREXH, LDREX (load register exclusive with a byte, halfword, and word)
LDRT (load in privileged modes)
POP (load from stack)
Write Data Memory
STRB (store byte)
STRH (store halfword)
STR (store word)
STRD (store double-word)
STRSB (store signed byte)
STRSH (store signed halfword)
STM, STMDB, STMFD (store multiple words)
STREXB, STREXH, STREX (store register exclusive with a byte, halfword, and word)
STRT (store in privileged modes)
PUSH (store into stack)
Data Copy Instructions
MOV (move)
MOVT (move top)
MOVW (move half word)
MRS (move from coprocessor)
MSR (move to coprocessor)
Compare and Branch Instructions
Data Compare Instructions
CMP (compare)
CMN (compare negative)
TST (test)
TEQ (test equal)
IT (if-then)
Branch Instructions
B (branch)
CBZ (compare and branch on zero)
CBNZ (compare and branch on non-zero)
TBB (table branch byte)
TBH (table branch halfword)
Subroutine Instructions
BL (branch with link)
BLX (branch with link and exchange)
BX (branch and exchange)
Miscellaneous Instructions
BKPT (breakpoint)
NOP (no operation)
SEV (set event)
WFE (wait for event)
WFI (wait for interrupt)
CPSID (interrupt disable)
CPSIE (interrupt enable)
DMB (data memory barrier)
DSB (data synchronization barrier)
ISB (instruction synchronization barrier)
Assembly Directives
In assembly programs, directives are not actual commands. Instead, they are used to provide key information to compile the source program, such as declaring constants and symbolic names, defining data layout, allocating memory space, and specifying the program structure and entry point.
Some commonly used directives are:
Directive | Function |
AREA | Make a new block of data or code |
ENTRY | Declare an entry point where the program execution starts |
ALIGN | Align data or code to a memory boundary |
DCB | Allocate one or more bytes of data |
DCW | Allocate one or more halfwords of data |
DCD | Allocate one or more words of data |
DCFS | Allocate single-precision floating-point numbers |
DCFB | Allocate double-precision floating-point numbers |
SPACE | Allocate a zeroed block of memory |
FILL | Allocate a block of memory and fill with a given value |
EQU | Give a symbol name to a numeric constant |
RN | Give a symbol name to a register |
EXPORT | Declare a symbol and make it referable by other source files |
IMPORT | Provide a symbol defined outside the current source file |
INCLUDE/GET | Include a separate source file within the current source file |
PROC | Declare the start of a procedure |
ENDP | Designate the end of a procedure |
END | Designate the end of a source file |
A typical skeleton frame of an assembly program will be as below:
AREA myData, DATA, READWRITE ; Define a data section
Array DCD 1, 2, 3, 4, 5 ; Define an array with five integers
AREA myCode, CODE, READONLY ; Define a code section
EXPORT __main ; Make __main visible to the Linker
ENTRY ; Mark the entrance to the entire program
__main PROC ; PROC marks the beginning of subroutine
... ; Assembly program starts here
ENDP ; Mark the end of a subroutine
END ; Mark the end of a program
AREA
An application consists of one or multiple data and code areas. The AREA directive indicates to the assembler the start of a new data or code section.
A code section contains a list of instructions, and a data section includes the declaration and initialization of variables.
An area is a basic independent and indivisible unit processed by the linker. Each area should have a name, and areas within the same source file cannot share the same name. An assembly program must have at least one code area. By default, a code area can only be read (READONLY), and a data area may be read from and written to (READWRITE).
ENTRY
The ENTRY directive marks the first instruction to be executed within an application.
There must be one and only one entry directive in an application, no matter how many source files the application has.
When there is no entry directive, the linker generates an error message. When there are multiple entry directives, the assembler gives an error message.
For applications written in C or C++, the entry point is in the C library's initialization function, not directly visible to programmers.
END
The END directive indicates the end of a source file.
Each assembly program file must end with this directive.
Suppose we have two assembly source files A and B. When A uses either GET or INCLUDE to include B, the assembler returns to A after reaching END in B, and continues to assemble the rest of A. The END directive of the top-level file informs the assembler to complete the application.
PROC and ENDP
These are function or subroutine definition directives.
PROC and ENDP mark the beginning and the end of a function (also called a subroutine or procedure).
PROC stands for "procedure" and ENDP means "end of procedure".
A single source file can contain multiple subroutines. However, PROC and ENDP cannot be nested. We cannot define a subroutine within another subroutine.
A C program must have at least one function named main()
. Similarly, an assembly program must have at least one subroutine named __main
.
DCB, DCW, DCD, DCQ, SPACE, and FILL
These are data allocation directives.
An assembly program needs to reserve space in the data memory for variables and set their initial contents. commonly used data allocation directives are:
Directive | Description | Memory Space |
DCB | Define Constant Byte | Reserve 8-bit values |
DCW | Define Constant Halfword | Reserve 16-bit values |
DCD | Define Constant Word | Reserve 32-bit values |
DCQ | Define Constant Doubleword | Reserve 64-bit values |
SPACE | Define Zeroed Bytes | Reserve some zeros bytes |
FILL | Define Initialized Bytes | Reserve and fill each byte with a value |
The below example shows how to declare an initialized string, initialized integer arrays, a zeroed memory region, and a few variables in different formats.
AREA myData, DATA, READWRITE
hello DCB "Hello World!",0 ; Allocate a string that is null terminated
dollar DCB 2, 10, 0, 200 ; Allocate integers ranging from -128 to 255
scores DCD 2, 3.5, -0.8, 4.0 ; Allocate 4 words containing decimal values
miles DCW 100, 200, 50, 0 ; Allocate integers between -32768 and 65535
p SPACE 255 ; Allocate 255 bytes of zeroed memory space
f FILL 20, 0xFF, 1 ; Allocate 20 bytes and set each byte to 0xFF
binary DCB 2_01010101 ; Allocate a byte in binary
octal DCB 8_73 ; Allocate a byte in octal
char DCB 'A' ; Allocate a byte i.ni.ti.ali.zed to ASCII of 'A'
EQU and RN
EQU and RN are to make an assembly program easier to understand.
The EQU directive associates a symbolic name to a numeric constant. Like "#define" in a C program, EQU can be used to define a constant in an assembly code.
; Interrupt Number Definition (IRQn)
BusFault_IRQn EQU -11 ; Cortex-M Bus Fault Interrupt
SVCall_IRQn EQU -5 ; Cortex-M Supervisor Call (SVC) Interrupt
PendSV_IRQn EQU -2 ; Cortex-M Pend SVC Interrupt
SysTick_IRQn EQU -1 ; Cortex-M System Ti.ck Interrupt
The RN directive gives a symbolic name to a register.
Dividend RN 6 ; Defines dividend for register 6
Divisor RN 5 ; Defines divisor for register 5
ALIGN
To improve performance, many processors require that the starting memory address of an instruction or a variable must be a multiple of 2**n. For example, an address aligned to a word boundary must be divisible by 4 (i.e., 2**2).
If instructions or data are not appropriately aligned in memory, some processors generate a misalignment fault signal and abort the memory access.
Cortex-M processors allow unaligned memory accesses at the sacrifice of performance. Multiple memory accesses may be required to fetch a misaligned data item or instruction.
The following example shows the usage of ALIGN:
AREA myCode, CODE, ALIGN = 3 ; Memory address begins at a multiple of 8
ADD r0, r1, r2 ; Instructions start at a multiple of 8
AREA myData, DATA, ALIGN = 2 ; Address begins at a multiple of 4
a DCB 0xFF ; The first byte of a word (4 bytes)
ALIGN 4, 3 ; Align to the Last byte of a word
b DCB 0x33 ; Set the fourth byte of a 4-byte word
c DCB 0x44 ; Add a byte to make next data misaligned
ALIGN ; Force the next data to be aligned
d DCD 0x12345 ; Skip three bytes and store the word
The layout of the data area will be as the below figure:
EXPORT and IMPORT
EXPORT and IMPORT define and locate symbols externally defined in different source files.
The EXPORT declares a symbol and makes this symbol visible to the linker.
The IMPORT gives the assembler a symbol that is not defined locally in the current assembly file. The IMPORT is like the "extern" keyword in C.
Include or Get
The INCLUDE or GET directive is to include an assembly source file within another source file.
It is useful to include constant symbols defined by using EQU and stored in a separate source file.
For example, if constants are defined by using EQU directives and are stored in a separate assembly file called "constants.s". To include these constants, we can use a simple statement "INCLUDE constants. s".
INCLUDE constants.s ; Load Constant Definitions
AREA myCode, CODE, READONLY
EXPORT __main
ENTRY
__main PROC
...
ENDP
END
Subscribe to my newsletter
Read articles from Ahmed Gouda directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Ahmed Gouda
Ahmed Gouda
Embedded Software Engineer who are interested in Embedded Systems Design, Embedded Software, Electrical Engineering, and Electronics, C\C++ programming, Assembly, Computer Architecture, Microprocessors, Microcontrollers, and Communication Protocols.