basics of arm assembly

Sep 26, 2021 | embedded

Before working with ARM assembly, you need to have the arm compiler toolchain installed. If you are on debian-based distros, you can run sudo apt install gcc-5-arm-linux-gnueabi, arch users can install the package arm-none-eabi-gcc using pacman.

Now open up your favourite text editor and type out the following boilerplate code:

.global _start
.section .text

_start:

.section .data

Some notes about the above given template:

testing a simple instruction

.global _start
.section .text

_start:
    mov r0, #4

.section .data

Here we demonstrate the mov instruction that is used here to move the immediate value 4 (immediate values start with a # sign in ARM assembly) to the r0 register.

Let’s save it as 001.asm.

compiling and executing

We need to assemble this file so we use the arm-none-eabi-as tool in arch (or arm-linux-gnueabi-as in debian) to turn this into an intermediate object file that will be converted into executable ELF binary by arm gcc.

$ arm-none-eabi-as 001.asm -o 001.o
$ file 001.o
001.o: ELF 32-bit LSB relocatable, ARM, EABI5 version 1 (SYSV), not stripped

Now we pass the object file through the linker.

$ arm-none-eabi-gcc 001.o -o 001.elf -nostdlib
$ file 001.elf
001.elf: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, not stripped

Note that we passed the -nostdlib flag to the linker, this is because if we don’t do that the linker includes the standard libc and when that happens the _start symbol gets redefined and calls the main function that is not defined in our code.

running the executable

Now you are probably not working on a machine with ARM architecture (as of 2021) so you need to install qemu which is “a free and open-source hypervisor”. Arch users can sudo pacman -S qemu-arch-extra.

$ qemu-arm ./001.elf
Illegal instruction (core dumped)

So our program crashed when we tried executing it and this is because it didn’t exit properly, instead it ran the mov instruction and continued to executed the code below it. We can use xxd 001.elf to see that there is a lot of content in the .elf file after the mov instruction.

In order to make sure that the program exits properly, we need to make use of system calls.

When we usualy write a program, it runs in the user-mode as opposed to kernel mode. A user-mode application process can’t end itself so it has to make use of a software interrupt to ask to kernel to end the user-mode process.

In arm assembly, we use the register r7 to specify what the kernel should do and the registers r0-r4 to specify the how it should be done (i.e. the configurations).

syscall table

Visit this link to view the syscall table for arm 32-bit.

See the row where it reads exit. We will now use the info given in this row to make our exit syscall.

exit

.global _start
.section .text

_start:
    mov r7, #0x1
    mov r0, #13

    swi 0

.section .data

As given in the table, we need to load the hex value of 0x1 into the register r7 and we can load the return value (any arbitrary number) in r0.

Let us now try executing this code.

$ arm-none-eabi-as 001.asm -o 001.o
$ arm-none-eabi-gcc 001.o -o 001.elf -nostdlib
$ qemu-arm 001.elf
$ echo $?
13

So no errors this time and the return value is what we specified in our assembly code.

writing to STDOUT

I order to write data to the screen, once again we need to refer to the syscall table and this time the write syscall is of interest to us. In Linux, we have three types of file descriptors:

As we are going to print text on the screen, the file descriptor 1 - STDOUT is where we should write our data.

.global _start
.section .text

_start:
    mov r7, #0x4
    mov r0, #1
    ldr r1, =message
    mov r2, #13

    swi 0

    mov r7, #0x01
    mov r0, #42

    swi 0

.section .data
    message:
    .ascii "Hello, World\n"

According to the table, we move the value 0x4 into the r7 register because we are going to use system call 4, next we load 1 into r0 register and we do this because we are going to write to the file descriptor 1. We then edit to the data section and define the message symbol that is of type ascii and the message is Hello, World\n. The ldr instruction can be used to load a memory address to a register so we use this instruction to load the address of message to the r1 register. The last data that the write system call needs is the length of the message string which is 13 in this case so we load it into the register r2.

We also add the instructions for an exit syscall at the end as without it the program will crash (as seen previously) after printing our message.

We can now try to assemble, compile and run our program to see that it prints the message and exits successfully.

$ arm-none-eabi-as 001.asm -o 001.o
$ arm-none-eabi-gcc 001.o -o 001.elf -nostdlib
$ qemu-arm 001.elf
Hello, World

Neat!

I prepared this article as a text version of a video created by @lowlevellearning on youtube, therefore the credit for the content goes to him. Do check out his channel, he makes amazing content on embedded systems and I have learnt a lot from his videos and streams.

~ end of post ~