Assignment 3
rand()
bl rand
Since we are using the gcc (GNU Compiler Collection) compiler, we can use the C libraries included in the implementation. Similar to how we call C's printf() function using bl printf, we can call the rand() function using bl rand. So far, we have only used registers w0-w7 to pass arguments, but they are also used to return results. Calling bl rand will place a random number in w0.
The range of the randon number depends on the implementation. With gcc, RAND_MAX is 2^32. As shown in the assignment's C code, you will have to perform a bitwise AND to mod 256 the result.
Pseudo-random
You will notice that your program always generates the same sequence of numbers using rand(), and as such they are not truly random. This is NOT a problem for your assignment.
Why? One of the biggest problems in CPSC (and especially computer security) is generating truly random numbers. A computer is a machine, and a machine must follow rules. Pseudo-random numbers are actually generated using some algorithm. Many algorithms depend on what's called a seed, applying some transformation to the seed to generate a new number. A common (and very insecure) way to simulate random numbers is to seed rand() with the current time, which is always changing. As you already encountered, using the same seed will generate the same sequence every time. If a malicious attacker knows the algorithm and the seed being used, it is very simple for them to crack an encryption.
Bubble sort
While you should have learnt what a bubble sort is from CPSC 331, here's a quick explanation.
- Watch this video.
- How it works:
- In the video, we try to sort [3, 2, 9, 6, 5].
- This array has 5 elements. We want to find the last element in the sorted array, which will be the biggest number in the array. This is why index i in the assignment starts from the last element. Once we find the biggest number, we no longer need to include the last element in the sorting. Instead, we look for the biggest number in the first four elements, and place it in the fourth slot. The fourth element is now sorted, we continue to sort the first three elements, etc.
- To identify the biggest element, we want to search from the start (index j = 0). Imagine a competition, where the first element is the challenger and it's trying to move up to be the biggest number. We first compare [3, 2]. Since 3 is bigger than 2, 3 wins and moves up (do a swap). The array is now [2, 3, 9, 6, 5]. Next, 3 competes with 9. However, 9 is bigger than 3, so that's as far as 3 goes. We don't have to swap them. Since 9 is the winner, 9 continues to challenge the remaining numbers, so we compare [9, 6]. 9 is bigger than 6, so we swap again and the array becomes [2, 3, 6, 9, 5]. Finally, 9 is bigger than 5, leaving us with [2, 3, 6, 5, 9]. We can easily tell 9 is the biggest number, hence it is in the last position.
- Now that we know the 5th and last element is sorted, we repeat the process with the first four elements of the array, [2, 3, 6, 5]. The sorted result is [2, 3, 5, 6], and we know 6 is the biggest number here. We now have [2, 3, 5][6][9]. Now we repeat again with [2, 3, 5]. Keep doing this until we've determined the number in every position, ie. until the entire array is sorted.
One-dimensional array
UPDATE: The previous example stored index i in register w19. I've updated the example so that it is storing and loading the values from the Stack instead, when necessary. You need to store v[], i, j, and temp all on the Stack for A3. Notice how we are always loading i from the Stack, and storing it into w19, before using it. After we have modified i, we also make sure to write the new value back to the Stack.
Get the code here: 1darray.s (4383 bytes)
Get the code from terminal: wget "www.edwinckc.com/uploads/355/1darray.s"
// Initialize an array of size 10, where array[i] = i*2.
// Print out the numbers after.
fmt1: .string "array[%d] is %d.\n"
// We simply call these assembly "equates".
// They are similar to using m4 macros or register equates (below).
size = 10
arraybase = 16 // This is the base address of the
// array, which is stored directly
// after the 16 bytes allocated for
// the frame record (x29 and x30).
i_s = 56 // This is the address of our index i,
// directly after our array in memory.
alloc = -(16 + 40 + 4) & -16
dealloc = -alloc
// x29 and x30 are known collectively as the frame record.
// We store the frame record on the stack, along with any
// additional stack variable. To do this, we first need to
// calculate the amount of space required for all these
// variables. The frame record has two registers, and
// each register is 8 bytes (64-bits), hence by default
// we decrement sp by 16. The reason we decrement is because
// stack memory grows backwards (addresses become smaller).
// Next, we need to allocate space for the array as well as
// the index i_s. We will use words (4 bytes), and the array
// size is 10, so the array will take 4*10 = 40 bytes. The
// index will take another 4 bytes, so we need to allocate
// 16+40+4 = 60 bytes. However, the stack must be quadword
// alligned, meaning addresses must be divisible by 16.
// To guarantee this, we take -(60) and perform a bitwise
// AND with -16. The last 4 bits of -16 are 0s, and doing
// a bitwise AND with -16 will clear the last 4 bits of -60.
// The result is -60&-16 = -64, which is divisible by 16.
// .req is Register EQuate, which is similar to using macros
// .req is built into ARMv8, you don't need to use m4
fp .req x29
lr .req x30
.balign 4 // Stack addresses must be divisible
// by 16, so we quadword align all
// the instructions. A quadword is
// 16 bytes (1 word = 4 bytes).
.global main
main: stp x29, x30, [sp, alloc]!
mov fp, sp // Same as mov x29, sp
add x28, fp, arraybase // Calculate array base address.
// This is just 16 bytes after SP,
// directly after the frame record.
//---LOOP 1: Create an array where array[i] = i*2.-----------------//
//---The resulting array is [0, 2, 4, 6, 8, 10, 12, 14, 16, 18].---//
mov w19, 0 // Initialize index to 0.
str w19, [fp, i_s] // Write index i to stack. b test // Branch to optimized pre-test.
loop: mov w21, 2 // 2x multipler for i.
ldr w19, [fp, i_s] // Read index i from stack. mul w20, w19, w21 // Calculate i*2.
str w20, [x28, w19, SXTW 2] // We want to store i*2 at array[i]
// x21 = $fp + 16, the base address.
// "w19, SXTW 2" does two things:
// First, SXTW w19 to x19, since our
// addresses are 64 bits. Second, the
// "2" after SXTW performs an LSL,
// shifting the numbers left by 2.
// That's the same as scaling by 4.
ldr w19, [fp, i_s] // Get current index i from stack
add w19, w19, 1 // i++
str w19, [fp, i_s] // Save current index i back to stack
test: cmp w19, size // If i < 10...
b.lt loop // ... do loop.
//---LOOP 1 END----------------------------------------------------//
//---LOOP 2: Iterate through the loop and print out the numbers.---//
mov w19, 0 // Initialize index i to 0 again.
str w19, [fp, i_s] // Write index i to stack. b test2 // Branch to optimized pre-test.
loop2: ldr w19, [fp, i_s] // Read index i from the stack.
ldr w20, [x28, w19, SXTW 2] // Read the number at array[i]. // x28 is the base address of the array.
// Offset by i*4 bytes.
// Same as above, SXTW 2 to sign extend
// w19 to x19, then left-shift by 2
// to scale w19 by 4, giving us the
// correct offset in bytes.
adrp x0, fmt // Here we just print the values.
add w0, w0, :lo12:fmt1
mov w1, w19
mov w2, w20
bl printf
ldr w19, [fp, i_s] // Get current index i from stack
add w19, w19, 1 // i++
str w19, [fp, i_s] // Save current index i back to stack
test2: ldr w19, [fp_is] // Read index i from stack.
cmp w19, size // If i < 10... b.lt loop2 // ... do loop.
//---LOOP 2 END---------------------------------------------------------------//
done: mov w0, 0 // Return 0 to OS
ldp fp, lr, [sp], dealloc // Deallocate stack memory.
ret // Return to calling code in OS.