Tutorial 7 (Nov 14): External variables
[external1.s][1085B] Example 1: Declaring and accessing global variables (example provided by Prof. Manzara)
[external2.s][2163B] Example 2: Declaring and accessing global arrays, and using .bss
External Variables are used in assembly to implement C language global and static local variables. Previously, we used registers or local variables to store data. Let's compare all of them to understand them better.
Registers
Registers are located directly on the CPU, therefore there is less latency. In other words, they can be accessed very quickly. However, in general, the closer the memory is to the CPU, the more expensive it is. As such, we tend to reserve registers for heavily used data/variables.
Local variables
Since registers are so limited, we cannot store large arrays or structs in the registers. Instead, we allocate stack memory (RAM) for local variables. The stack uses high memory, and each closed subroutine call has its own stack frame. The "local" part represents the scope, as each local variable is only available to the block of code it is allocated in. In A3, we allocated i, j, and array V[] using STP, which meant the variables were allocated at the start of the subroutine. The memory gets deallocated at the end of the subroutine with LDP, meaning the variable is available throughout the subroutine. In A3, we also allocated temp, in the middle of the subroutine. Until we allocated temp, it was not available to the subroutine. It was also unavailable to the code after we deallocated temp, therefore temp was local to only part of the code inside the subroutine.
External variables
Sometimes we need to share information between function calls (static/class variables) or between multiple files (global variables). The common link between these is the persistence of data outside of individual function calls. Variables/data declared in the .data section are available to the entire file, and resemble static global variables. By using the .global directive, we can make these variables global variables, available to other compilation units such as other files or C code.
Local Variables | External Variables | |||
Memory Allocation |
decrement SP (middle of subroutine) |
STP (start of subroutine) |
.data/.bss (without .global) |
.data/.bss (with .global) |
Scope | code block | subroutine | file | program |
Lifetime | code block | subroutine | program | program |
.text, .data, .bss sections
In A1-A4, we didn't specify the sections for our code. The default section is the .text section, which contains read-only data. This includes our program instructions, as well as read-only data such as constants and string literals. The .data section read/write data, and can be programmer-initialized with pseudo-ops such as .word. It can also contain unitialized data, by using the .skip pseudo-op. Finally, the .bss section contains non-programmer initialized memory, and generally only uses the .skip pseudo-op. All memory allocated in .bss is zeroed before program execution.
Pseudo-ops
This isn't anything new, since you have already been declaring strings literals for printf(), using fmt: .string "myString Format".
.data
a_m: .byte 10 // 1 byte = 8 bits = number of bits to encode a single character (ASCII)
b_m: .hword 20 // 2 bytes = 16 bits
c_m: .word 30 // 4 bytes = 32 bits = int
d_m: .dword 40 // 8 bytes = 64 bits
arraya_m .skip 5*4 // 20 bytes (5 * 4) of uninitialized memory
arrayb_m:.word 10, 20, 30, 40, 50 // 20 bytes (array of 5 words/ints * 4 bytes each)
arrayc_m:.dword 10, 20, 30, 40, 50 // 40 bytes (array of 5 dwords * 4 bytes each)
sa_m: .string "this string is null-terminated" // .string automatically adds a 0 byte (.byte 0) to terminate the string
sb_m: .asciz "this string is null terminated too" // .asciz is the same as .string
sc_m: .ascii "this string is not null-terminated" // .ascii does not add a 0 byte to the end, and is not mull-terminated
char_m: .byte 'a' // ASCII characters are encoded using 7 bits, and can be stored in a byte (8 bits)
chars_m: .byte 'h', 'e', 'l', 'l', 'o' // a string is usually comprised of a char[] array
.global directive
To make a block of code available to other files or compilation units, we need to use the .global directive. This is also something you already know how to do. The format is usually:
.global <label_name> // the whole block, between the specified label and the next label, becomes global
Remember this?
.global main
main: stp x29, x30, [sp, -16]!
mov x29, sp
...
We used .global to make the main() subroutine global, so that we can call it from the OS. If you forgot to include .global main, you would get this error from GCC compiler:
undefined reference to `main'
Since we label all our global variables (using the labels as names), we can do the same thing to make them global:
.data
.global a_m // global int = 10
a_m: .word 10
.global array_m // global int[5] = [10, 20, 30, 40, 50]
array_m: .word 10, 20, 30, 40, 50
.global empty_m // global int[5], not initialized
empty_m: .skip 5*4
Accessing (store/load) external variables
Believe it or not, you also know how to do this already. Labels point to the beginning of an address location. Given an address, you can load a number of bytes from it. Accesing external variables combines two things you know:
- Getting the address from a label.
- Loading from an address, using LDR and offsets.
.data
index_m: .word 0
array_m: .word 10, 20, 30, 40, 50
.text
.global main
main: ...
adrp x28, index_m // use adrp to get the base address of index_m
add x28, x28, :lo12:index_m // still need to add the lower 12-bits of index_m's address
ldr w19, [x28] // using x28 as a pointer, load the value of index
add w19, w19, 1 // modify index
str w19, [x28] // store index back to its address adrp x27, array_m // use adrp to get the base address of array_m
add x27, x27, :lo12:array_m // still need to add the lower 12-bits of array_m's address
ldr w20, [x27, w19, SXTW 2] // x27 as base address, index*4 as offset
add w20, w20, 1 // modify value at array[index]
str w20, [x27, w19, SXTW 2] // store value back to its position in the array