ARM Memory Load & Store: Readings and Examples

Memory instructions: load and store

Pre-Lab Reading:

Please watch the video that is posted on this week’s outline on ARM Load & Store operations.
Read the "DaveSpace ARM Tutorial," pay more attention on the sections “Single register data transfer" and “Addressing modes”.
Read the Raspberry Pi tutorial on memory and load/store. Also look at Chapter 8 - Arrays and Structures.
Read through the following tutorial on ARM Load & Store (memory read and memory write) and try the code shown in the tutorial:

Because ARM processors can only perform data processing on registers, interactions with memory only come in two flavors: (1) loading values from memory into registers, and (2) storing values into memory from registers.
The basic instructions for that are ldr (LoaD Register) and str (STore Register), which load and store words. Again, the most general form uses two registers and an Op2:

op{cond}{type} Rd, [Rn, Op2]

Here op is either ldr or str. Because they're so similar in syntax, we will just use ldr for the remainder of the discussion on syntax, except when things are different. The condition flag again goes directly behind the base opcode. The type refers to the datatype to load (or store), which can be words, halfwords or bytes. The word forms do not use any extension, halfwords use -h or -sh, and bytes use -b and -sb. The extra s is to indicate a signed byte or halfword. We only use the word format in our labs. Because the registers are 32-bit, the top bits need to be sign-extended or zero-extended, depending on the desired datatype.

The first register here, Rd can be either the destination or source register. The thing between brackets always denotes the memory address; ldr means load from memory, in which case Rd is the destination, and str means store to memory, so Rd would be the source there. Rn is known as the base register, for reasons that we will go into later, and Op2 often serves as an offset. The combination works very much like array indexing and pointer arithmetic in C and C++.

As we discussed in class, there are three generally ways to access memory. The first available form is register indirect addressing, which gets the address from a register, like ‘ldr Rd, [Rn]’. An extension of this is pre-indexed addressing, which adds an offset to the base register before the load. The base form of this is ‘ldr Rd, [Rn, Op2]’. This is very much like array accesses. For example, ‘ldr r1, [r0, r2, lsl #2]’ corresponds to r0_w[r2*4]: a word-array load using r2’s value times 4 as the index/offset.

Addressing modes

And then there are the so-called write-back modes. In the pre-index mode, the final address was made up of Rn+Op2, but that had no effect on Rn. With write-back, the final address is put in Rn. This can be useful for walking through arrays because you won't need an actual index.
There are two forms of write-back, pre-indexing and post-indexing. Pre-indexing write-back works much like the normal write-back and is indicated by an exclamation mark after the brackets: ‘ldr Rd, [Rn, Op2]!’. Post-indexing doesn't add Op2 to the address (and Rn) until after the memory access; its format is ‘ldr Rd, [Rn], Op2’.

Assemble and link the following program then head into the debugger to watch what happens:

/* -- copy_primes.s */

.text
.global _start
_start:
mov r1, #4

/* get the address of primes into r0 */
ldr r0, =primes

/* indirect addressing and pre/post indexing */
ldr r5, [r0] /* r5 = primes[0]; */
ldr r6, [r0, r1] /* r6 = primes[1]; */
ldr r7, [r0, r1, lsl #1] /* r7 = primes[2] */
mov r0, r7
B end
/* Pre- and post-indexing write-back */
ldr r8, [r0, #4]! /* primes++; r8= *primes */
ldr r9, [r0], #4 /* r9= *primes; primes++ */
ldr r10, [r0], #4 /* r10= *primes; primes++ */
ldr r11, [r0], #4 /* r11= *primes; primes++ */

ldr r0, =copy_primes /* set address to store */
str r5, [r0], #4 /* store to memory and increment index */
str r6, [r0], #4
str r7, [r0], #4

end:

mov r7, #1
swi 0

.data

.balign 4
/* Declares an array with fixed values as listed (primes) */
primes: .word 2,3,5,7,11,13,17
/* Allocates space for 7 new values (7 * 4), uninitialized */
copy_primes: .skip 28
Debugging and Inspecting Arrays with gdb – try it!!!
You might start by using the debugger to see how the register values change with the different indexing options for load / store. To use the debugger, we need to assemble our code with an additional directive which gdb will use. Use:
as –g –o copy_primes.o copy_primes.s
ld –o copy_primes copy_primes.o
Once you have linked your code, invoke gdb, inspect the code, disassemble the code, and set some break points:
gdb copy_primes
(gdb) list
(gdb) disassemble _start
(gdb) b _start
(gdb) b end
(gdb) q
For example, here’s an interaction that shows how the index incrementing works as we store the primes back into the the copy_primes array:
(gdb)r
(gdb)s
(gdb) info registers r0
r0 0x205d8 132568

Write-Back Modes

There’s a handy way to dump out the contents of an array within the debugger. To do this we need to print the label after casting it to an appropriate C-style array. Here’s an example:
GNU gdb (Raspbian 7.7.1+dfsg-5+rpi1) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
…
(gdb) break _start
Temporary breakpoint 1 at 0x103e8
Starting program: /home/pi/dev/arm-asm-cs-271/copy_primes
Temporary breakpoint 1, 0x000103e8 in main ()
(gdb) break end
Breakpoint 2 at 0x1041c
(gdb) r

Breakpoint 2, 0x0001041c in end ()
(gdb) p (int[6])primes
$4 = {2, 3, 5, 7, 11, 13}
(gdb) p (int[6])copy_primes
$7 = {2, 3, 5, 7, 11, 13}

Part 1: Modify your code from Lab AAL 3 Part 3 so that it LOADS the N1, N2, N3 values from MEMORY into the registers. Use a label and a .data block to put these into three consecutive memory locations. Then store the answer into a fourth memory location immediately following the three original values; use a second label with a .skip directive or an initialized value to setup this location.

Modify your code from Lab AAL3 Part 4 so that it STORES the first 25 Fibonacci numbers that are generated into sequential memory locations. Use a label and .skip directive to setup 25 memory words, then store the Fib sequence in those locations. That is, every time you generate a Fibonacci number, you store them in RAM, including the initial two, which has the value of 1 and 1. Name the block fibs and use the p command of gdb to show the values. Capture this as am image to turn un as part of your answers. Try to use gdb to see what is going on with your program.

Modify your “Hello World” program from Lab 2 so that it asks the user for their first initial (e.g. J) and then responds with “Hi _!” (e.g. “Hi J!”). The Linux system call to read is Syscall 3, meaning that you put 3 into R7 to indicate the system call number. When you use this call, the system looks for the device to read from in R0 (0 = keyboard), looks to R2 for the number of characters to read (in this case 1) and places the characters into memory starting with the address in R1. Syscall 3 is essentially the inverse operation to Syscall Part B

Pre-lab reading:

For the "DaveSpace ARM Tutorial," read the sections “Multiple Register Data Transfer", “The Stack”, and “A Call Chain”.
Read Chapter 9 and Chapter 10 of the ARM assembler in Raspberry Pi.

Assembling and Debugging with gdb

There are 16, 32-bit core (integer) registers visible to the ARM and are labeled R0-R15. In this specification uppercase is used when the register has a fixed role in the procedure call standard. Table 2, Core registers and AAPCS usage summarizes the uses of the core registers in this standard. In addition to the core registers there is one status register (CPSR) that is available for use in conforming code.
Register Synonym Special Role in the procedure call standard
r15 PC The Program Counter.
r14 LR The Link Register.
r13 SP The Stack Pointer.
r12 IP The Intra-Procedure-call scratch register.
r11 v8 Variable-register 8.
r10 v7 Variable-register 7.
r9 v6 SB TR Platform register - defined by the platform standard.
r8 v5 Variable-register 5.
r7 v4 Variable register 4.
r6 v3 Variable register 3.
r5 v2 Variable register 2.
r4 v1 Variable register 1.
r3 a4 Argument / scratch register 4.
r2 a3 Argument / scratch register 3.
r1 a2 Argument / result / scratch register 2.
r0 a1 Argument / result / scratch register 1.

The first four registers r0-r3 (a1-a4) are used to pass argument values into a subroutine and to return a result value from a function. They may also be used to hold intermediate values within a routine (but, in general, only between subroutine calls). These values may be erased (not restored by functions on return). Saving values prior to a function call is the responsibility of the caller.

Register r12 (IP) may be used by a linker as a scratch register between a routine and any subroutine it calls. It can also be used within a routine to hold intermediate values between subroutine calls.
The role of register r9 is platform specific. A platform that has no need for such a special register may designate r9 as an additional callee-saved variable register, v6.
Typically, the registers r4-r8, r10 and r11 (v1-v5, v7 and v8) are used to hold the values of a routine’s local variables. A subroutine must preserve the contents of the registers r4-r8, r10, r11 and SP. Preserving the original values in these registers and restoring the values prior to the return is the responsibility of the callee.
In all variants of the procedure call standard, registers r12-r15 have special roles. In these roles they are labeled IP, SP, LR and PC.

AAL5 Part 1: Start by stepping through the following example code. Look at each line and predict what the results will be, then step execute the line and check your prediction; repeat this for each line of code. Here is an example of a recursive function for doing multiplication using the ADD instruction: Change the values moved to r0 and r1 to different values and test your understanding.
.text
.global _start

Modifying Previous Code

_start:
stmfd sp!, {lr}
mov r0, #4
mov r1, #3
bl mult
mov r0, r3
end:
ldmfd sp!, {lr}
mov r7, #1
swi 0

/* recursive multiplication using only addition */
/* load multiple, ldm, and store multiple, stm */

mult:
stmfd sp!, {lr}
cmp r1, #0
bne else
mov r3, #0
b endMult

else:
sub r1, r1, #1
bl mult
add r3, r3, r0

endMult:
ldmfd sp!, {lr}
bx lr

In the C programming language, it would look like this:
int mult(x, y) {
if (y == 0)
return 0;
return mult(x, y-1) + x;
}

int main(void) {
return mult(4,3);
}

The same code in Python is :
def mult(x, y):
""" assuming y >=0 and is an integer """
if y == 0:
return 0
return x + mult (x, y -1)

print("mult(4, 3)", 4*3, mult(4, 3))

Note that this is somewhat easier than general recursive functions as we don’t have to save any local values to the stack other than the return addresses.

AAL5 Part 2: Modify your code from Lab #4 Part 1 so that it now has a “main”, really a starting point like _start, and a separate function “largest” that take 3 parameters (r0, r1, r2) and returns the largest of the three in r4. The “main” will load these values from memory, call the function and then store the result back into memory.

As this function is a “leaf” function, and should only use R0-R3, it can be written more simply than recursive example in part 1 as it will not need to save and restore values from the stack. Still, your code must use stmfd to call the function and ldmfd to pop the activation record.
AAL5 Part 3: The sum of numbers between 0 and n is the nth triangle number tri(n), sometimes represented as . Triangle numbers can be calculated recursively via tri(n) = tri(n-1) + n. Write the code that it now has a “main” that simply calls a RECURSIVE triangle function with a single parameter of 12 (that is retrieved from a memory labeled “trinput”). This function will return the 12th triangle value and the main will then store only this one value back into memory in a memory location labeled “trinumber”. This recursive function implements the loop construction from lab 4. Triangle will have to save at least one parameter to the runtime stack in addition to the return addresses. [Hint: write this function in python first!!!]

AAL5 Part 4: Write code that has a “main” that simply calls a RECURSIVE fibonacci function with a single parameter of 16 (that is retrieved from a memory labeled “fibinput”. This function will return the 16th fibonacci value and the main will then store only this one value back into memory in a memory location labeled “fibnumber”. Fibonacci will have to save at least the two parameters to the runtime stack in addition to the return addresses.
The KEYs of doing this lab successfully!!!
Start early
Test and do “stepwise refinement”
Be ready to use gdb to debug, even you believe your code never has bugs
Here is what the recursive Python version of the code might look like:
def fib(n):
if n <= 1 :
return 1
return fib(n - 1) + fib (n -2)

print (fib(16))
# in your ARM assembly implementation, you may have to store fib(n-1), then call fib(n-2)!!!

Get instant help from 5000+ experts for