Structure of an Assembly Program

Ahmed GoudaAhmed Gouda
11 min read

Assembly Program

The below assembly program copies a string to another string. An assembly program includes:

  • labels

  • directives

  • assembly instructions

  • program comments

Labels

A label, such as strcpy, stop, srcStr, and dstStr, represents the memory address of the data or instruction marked with that label. The assembler replaces each label with its memory address, or its memory address offset when generating the executable.

A label must start with the beginning of a line without any leading space.

A label can be a function name (such as "_main"), which is the memory address of the first instruction of a function. The "__main" label is exported to allow the linker to find it and resolve this label.

Directives

The directives provide valuable information for assisting the assembler. The example uses directives PROC and ENDP to declare the start and the end of a function (or called a subroutine).

END indicates the end of an assembly program file. AREA defines code or data regions. ENTRY designates the initial entry into the program. ALIGN specifies the requirement of memory address alignment. DCB allocates and defines data.

Assembly Instructions

An assembly instruction is a machine command that controls the program flow or manipulates data. Some instructions are pseudo instructions, which are not real machine commands but are allowed in assembly language code. The assembler translates a pseudo instruction, such as "LDR rl, =srcStr" in the example code, into a real instruction. Pseudo instructions make the job of writing assembly language programs easier.

Comments

A comment is a text annotation that explains the programmer's intentions or assumptions. It aims to improve inter-programmer communication and code readability. A comment in an assembly program starts with a semicolon. Assemblers ignore everything after the semi-colon in that line.

The example has two areas:

  1. Data Area:
    The data area defines two strings: srcStr and dstStr. The program allocates memory space for both strings and gives them initial values. The NULL character terminates a string.

  2. Code Area:
    The code area includes a function named __main, which is equivalent to the main() function in a C program. This program copies string srcStr to string dstStr.

Cortex-M Assembly Instructions Categories

Most assembly instructions of Cortex-M3 can be classified into the following four categories:

  • arithmetic, shift, and logic instructions

  • data movement instructions

  • compare and branch instructions

  • miscellaneous instructions for various functions such as debugging

In addition to these instructions, Cortex-M4 and M7 also support:

  • digital signal processing instructions

  • floating-point instructions

Arithmetic, Shift, and Logic instructions

Arithmetic Instructions

Addition:

  • ADD

  • ADC (add with carry)

Subtraction:

  • SUB

  • RSB (reverse subtract)

  • SBC (subtract with carry)

Multiplication:

  • MUL

  • MLA (multiply with accumulate)

  • MLS (multiply with subtract)

  • SMULL (signed long multiply)

  • UMULL (unsigned long multiply)

  • SMLAL (signed long multiply with accumulate)

  • UMLAL (unsigned long multiply with accumulate)

Division:

  • SDIV (signed)

  • UDIV (unsigned)

Saturation:

  • SSAT (signed)

  • USAT (unsigned)

Extension:

  • SXTB (signed-extend a byte)

  • SXTH (signed-extend a halfword)

  • UXTB (unsigned- extend or zero-extend a byte)

  • UXTH (unsigned- extend or zero-extend a halfword)

Bit field Extract:

  • SBFX (signed extraction)

  • UBFX (unsigned extraction)

Shift Instructions

  • LSL (logic shift left)

  • LSR (logic shift right)

  • ASR (arithmetic shift right)

  • ROR (rotate write)

  • RRX (rotate right with extend)

Logic Instructions

  • AND (bitwise AND)

  • ORR (bitwise OR)

  • EOR (bitwise Exclusive OR)

  • ORN (bitwise OR NOT)

  • MVN (move NOT or bitwise NOT)

Bit Instructions

  • BFC (bit field clear)

  • BFI (bit field insert)

  • BIC (bit clear)

  • CLZ (count leading zeros)

  • RBIT (reverse bit order in a word)

  • REV (reverse byte order in a word)

  • REV16 (reverse byte order in each halfword independently)

  • REVSH (reverse byte order in the bottom halfword, and signed extend to 32 bits)

Data Movement Instructions

Read Data Memory

  • LDRB (load byte)

  • LDRH (load halfword)

  • LDR (load word)

  • LDRD (load double word)

  • LDRSB (load signed byte)

  • LDRSH (load signed halfword)

  • LDM, LDMDB, LDMFD (load multiple words)

  • LDREXB, LDREXH, LDREX (load register exclusive with a byte, halfword, and word)

  • LDRT (load in privileged modes)

  • POP (load from stack)

Write Data Memory

  • STRB (store byte)

  • STRH (store halfword)

  • STR (store word)

  • STRD (store double-word)

  • STRSB (store signed byte)

  • STRSH (store signed halfword)

  • STM, STMDB, STMFD (store multiple words)

  • STREXB, STREXH, STREX (store register exclusive with a byte, halfword, and word)

  • STRT (store in privileged modes)

  • PUSH (store into stack)

Data Copy Instructions

  • MOV (move)

  • MOVT (move top)

  • MOVW (move half word)

  • MRS (move from coprocessor)

  • MSR (move to coprocessor)

Compare and Branch Instructions

Data Compare Instructions

  • CMP (compare)

  • CMN (compare negative)

  • TST (test)

  • TEQ (test equal)

  • IT (if-then)

Branch Instructions

  • B (branch)

  • CBZ (compare and branch on zero)

  • CBNZ (compare and branch on non-zero)

  • TBB (table branch byte)

  • TBH (table branch halfword)

Subroutine Instructions

  • BL (branch with link)

  • BLX (branch with link and exchange)

  • BX (branch and exchange)

Miscellaneous Instructions

  • BKPT (breakpoint)

  • NOP (no operation)

  • SEV (set event)

  • WFE (wait for event)

  • WFI (wait for interrupt)

  • CPSID (interrupt disable)

  • CPSIE (interrupt enable)

  • DMB (data memory barrier)

  • DSB (data synchronization barrier)

  • ISB (instruction synchronization barrier)

Assembly Directives

In assembly programs, directives are not actual commands. Instead, they are used to provide key information to compile the source program, such as declaring constants and symbolic names, defining data layout, allocating memory space, and specifying the program structure and entry point.

Some commonly used directives are:

DirectiveFunction
AREAMake a new block of data or code
ENTRYDeclare an entry point where the program execution starts
ALIGNAlign data or code to a memory boundary
DCBAllocate one or more bytes of data
DCWAllocate one or more halfwords of data
DCDAllocate one or more words of data
DCFSAllocate single-precision floating-point numbers
DCFBAllocate double-precision floating-point numbers
SPACEAllocate a zeroed block of memory
FILLAllocate a block of memory and fill with a given value
EQUGive a symbol name to a numeric constant
RNGive a symbol name to a register
EXPORTDeclare a symbol and make it referable by other source files
IMPORTProvide a symbol defined outside the current source file
INCLUDE/GETInclude a separate source file within the current source file
PROCDeclare the start of a procedure
ENDPDesignate the end of a procedure
ENDDesignate the end of a source file

A typical skeleton frame of an assembly program will be as below:

        AREA myData, DATA, READWRITE    ; Define a data section
Array   DCD 1, 2, 3, 4, 5               ; Define an array with five integers

        AREA myCode, CODE, READONLY     ; Define a code section
        EXPORT __main                   ; Make __main visible to the Linker
        ENTRY                           ; Mark the entrance to the entire program
__main  PROC                            ; PROC marks the beginning of subroutine
        ...                             ; Assembly program starts here
        ENDP                            ; Mark the end of a subroutine
        END                             ; Mark the end of a program

AREA

An application consists of one or multiple data and code areas. The AREA directive indicates to the assembler the start of a new data or code section.

A code section contains a list of instructions, and a data section includes the declaration and initialization of variables.

An area is a basic independent and indivisible unit processed by the linker. Each area should have a name, and areas within the same source file cannot share the same name. An assembly program must have at least one code area. By default, a code area can only be read (READONLY), and a data area may be read from and written to (READWRITE).

ENTRY

The ENTRY directive marks the first instruction to be executed within an application.

There must be one and only one entry directive in an application, no matter how many source files the application has.

When there is no entry directive, the linker generates an error message. When there are multiple entry directives, the assembler gives an error message.

For applications written in C or C++, the entry point is in the C library's initialization function, not directly visible to programmers.

END

The END directive indicates the end of a source file.

Each assembly program file must end with this directive.

Suppose we have two assembly source files A and B. When A uses either GET or INCLUDE to include B, the assembler returns to A after reaching END in B, and continues to assemble the rest of A. The END directive of the top-level file informs the assembler to complete the application.

PROC and ENDP

These are function or subroutine definition directives.

PROC and ENDP mark the beginning and the end of a function (also called a subroutine or procedure).

PROC stands for "procedure" and ENDP means "end of procedure".

A single source file can contain multiple subroutines. However, PROC and ENDP cannot be nested. We cannot define a subroutine within another subroutine.

A C program must have at least one function named main(). Similarly, an assembly program must have at least one subroutine named __main.

DCB, DCW, DCD, DCQ, SPACE, and FILL

These are data allocation directives.

An assembly program needs to reserve space in the data memory for variables and set their initial contents. commonly used data allocation directives are:

DirectiveDescriptionMemory Space
DCBDefine Constant ByteReserve 8-bit values
DCWDefine Constant HalfwordReserve 16-bit values
DCDDefine Constant WordReserve 32-bit values
DCQDefine Constant DoublewordReserve 64-bit values
SPACEDefine Zeroed BytesReserve some zeros bytes
FILLDefine Initialized BytesReserve and fill each byte with a value

The below example shows how to declare an initialized string, initialized integer arrays, a zeroed memory region, and a few variables in different formats.

         AREA  myData, DATA, READWRITE
hello    DCB   "Hello World!",0    ; Allocate a string that is null terminated
dollar   DCB   2, 10, 0, 200       ; Allocate integers ranging from -128 to 255
scores   DCD   2, 3.5, -0.8, 4.0   ; Allocate 4 words containing decimal values
miles    DCW   100, 200, 50, 0     ; Allocate integers between -32768 and 65535
p        SPACE 255                 ; Allocate 255 bytes of zeroed memory space
f        FILL  20, 0xFF, 1         ; Allocate 20 bytes and set each byte to 0xFF
binary   DCB   2_01010101          ; Allocate a byte in binary
octal    DCB   8_73                ; Allocate a byte in octal
char     DCB   'A'                 ; Allocate a byte i.ni.ti.ali.zed to ASCII of 'A'

EQU and RN

EQU and RN are to make an assembly program easier to understand.

The EQU directive associates a symbolic name to a numeric constant. Like "#define" in a C program, EQU can be used to define a constant in an assembly code.

; Interrupt Number Definition (IRQn)
BusFault_IRQn    EQU    -11        ; Cortex-M Bus Fault Interrupt
SVCall_IRQn      EQU    -5         ; Cortex-M Supervisor Call (SVC) Interrupt
PendSV_IRQn      EQU    -2         ; Cortex-M Pend SVC Interrupt
SysTick_IRQn     EQU    -1         ; Cortex-M System Ti.ck Interrupt

The RN directive gives a symbolic name to a register.

Dividend    RN    6        ; Defines dividend for register 6
Divisor     RN    5        ; Defines divisor for register 5

ALIGN

To improve performance, many processors require that the starting memory address of an instruction or a variable must be a multiple of 2**n. For example, an address aligned to a word boundary must be divisible by 4 (i.e., 2**2).

If instructions or data are not appropriately aligned in memory, some processors generate a misalignment fault signal and abort the memory access.

Cortex-M processors allow unaligned memory accesses at the sacrifice of performance. Multiple memory accesses may be required to fetch a misaligned data item or instruction.

The following example shows the usage of ALIGN:

    AREA myCode, CODE, ALIGN = 3    ; Memory address begins at a multiple of 8
    ADD r0, r1, r2                  ; Instructions start at a multiple of 8

    AREA myData, DATA, ALIGN = 2    ; Address begins at a multiple of 4
a   DCB 0xFF                        ; The first byte of a word (4 bytes)
    ALIGN 4, 3                      ; Align to the Last byte of a word
b   DCB 0x33                        ; Set the fourth byte of a 4-byte word
c   DCB 0x44                        ; Add a byte to make next data misaligned
    ALIGN                           ; Force the next data to be aligned
d   DCD 0x12345                     ; Skip three bytes and store the word

The layout of the data area will be as the below figure:

EXPORT and IMPORT

EXPORT and IMPORT define and locate symbols externally defined in different source files.

The EXPORT declares a symbol and makes this symbol visible to the linker.

The IMPORT gives the assembler a symbol that is not defined locally in the current assembly file. The IMPORT is like the "extern" keyword in C.

Include or Get

The INCLUDE or GET directive is to include an assembly source file within another source file.

It is useful to include constant symbols defined by using EQU and stored in a separate source file.

For example, if constants are defined by using EQU directives and are stored in a separate assembly file called "constants.s". To include these constants, we can use a simple statement "INCLUDE constants. s".

        INCLUDE constants.s        ; Load Constant Definitions
        AREA myCode, CODE, READONLY
        EXPORT __main
        ENTRY
__main  PROC
        ...
        ENDP
        END
0
Subscribe to my newsletter

Read articles from Ahmed Gouda directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Ahmed Gouda
Ahmed Gouda

Embedded Software Engineer who are interested in Embedded Systems Design, Embedded Software, Electrical Engineering, and Electronics, C\C++ programming, Assembly, Computer Architecture, Microprocessors, Microcontrollers, and Communication Protocols.