Binary Search Trees (Mar 8)
As promised, since most people are busy reviewing for the midterm, here is what we did in the tutorial today:
- For both size and height, we want to identify a recursive algorithm. Assuming we are given some tree T as input, there should be two recursive calls: one with the left sub-tree of T as input, one with the right sub-tree of T as input.
- Size(T)
- root of T + size of left sub-tree + size of right sub-tree
- 1 + Size(T.left) + Size(T.right)
- Height(T)
- if both left and right subtrees are null, return 0
- else, return 1+max[ Height(T.left), Height(T.right) ]
- Size(T)
- All of the questions a) to h) refer to the algorithm at the top of Q8.
- Refer to slides 16-17 of Lecture #18-19 for an example of sketching a proof for a recursive algorithm. Another example in slides 18-21 of Lecture #15.
- Two main parts are induction proof, and termination.
- In our induction proof, we want to show that given an originating i-th call is correct, the i+1st recursive call is also correct.
- Show that if the preconditions of the originating i-th call are satisfied, then the preconditions of the i+1st recursive call are also satisfied.
- This is similar to how we proved a loop invariant for showing the correctness of loops.
- Show that if the postconditions of the i+1st recursive call are satisfied, then the postconditions of the originating i-th call are also satisfied.
- When we worked with loops, we showed that the loop ended, and then showed that the results imply the postconditions. However, with recursion, each recursive calls needs to return some result back to the originating call. As such, we care most about satisfying the post-conditions of the outer-most call.
- Show that if the preconditions of the originating i-th call are satisfied, then the preconditions of the i+1st recursive call are also satisfied.
- We then want to write out a recurrence relation (like Q5 of A1), and use the relation to prove that the recursion terminates. Check Slides 18-21 of Lecture #15.
- Three cases for recurrence relation, given input tree T of size n.
- Generic case: Steps(n) = Steps(a) - Steps(n-a-1) + Θ(n).
- Worst case: Steps(n) = Steps(n-1) + Steps(0) + Θ(n).
- Best case: Steps(n) = 2Steps(n/2) + Θ(n).
In each case, we add the cost of the current call which is Θ(n), to the cost of both recursive calls. Each call costs Θ(n) because we need to create a new ArrayList and add every element of T. The recursive calls take the left and right sub-trees as inputs. In the generic case, one sub-tree is size = a, and the other sub-tree is size n-a-1. The -1 comes from excluding the root, which is not included in the recursive calls' inputs. In the worst case, the tree is either left-heavy or right-heavy. As such, one side with have a null sub-tree, whereas the other side will have a sub-tree with n-1 elements. Finally, the best-case is when the tree is balanced. Then the sizes of the left and right sub-trees are fairly even, approximately n/2. It's actually (n-1)/2 with floors and ceilings, since we should exclude the root, but it won't affect out final result.
- We want to use the above recurrence relations to prove that Steps(n) is in O(size(T) x height(T)). O-notation means an upper bound, so we can use either the generic case, or the worst-case.
- O(size(T) x height(T))
- In the worst case, height is equal to n-1.
- O(n x (n-1))
- O(n2)
- Generic case
- Steps(n) = Steps(a) - Steps(n-a-1) + Θ(n)
- a < n, and (n-a-1) < n. Each recursive call, we only take a part of n as input. To maximize the number of recursive calls, we set a = n-1, meaning we decrease the input size by the smallest unit possible (1), for each recursive call. The number of recursive calls is equal to the height (n-1), in the worst case.
- The cost of each iteration is still Θ(n).
- (number of recursive calls) x (cost of each call) = (n-1)(Θ(n)) = Θ(n2)
- Worst case
- Steps(n) = Steps(n-1) + Steps(0) + Θ(n)
- Steps(n) = Steps(n-1) + Θ(n), we can remove Steps(0) since that is a Θ(1), which is less than Θ(n)
- If each recursive call takes n-1 input size, then there will be n-1 recursive calls. Each call costs Θ(n).
- (n-1) x Θ(n) = Θ(n2)
- O(size(T) x height(T))
- For a value x stored in a node whose depth in T is a positive integer d, a lower bound for the number of copies of x written into ArrayLists is d+1. For some value with depth d, we need to make 1 originating call to the algorithm, plus d recursive calls, before d becomes the root of the recursive input. For each call, we need to copy d into a new ArrayList. Only when d is above our recursive input (that is, d is a proper ancestor of the root of the current recursive input), will d no longer be copied.
- Using the recurrence relation for the worst-case, we can show that the worst-case of the algorithm is in Ω(size(T) × height(T)). That is, the algorithm will take at least size(T) x height(T) steps.
- Steps(n) = Steps(n-1) + Steps(0) + Θ(n)
- In the worst case, the number of recursive calls (n-1) is equal to the height of the tree (n-1).
- The cost of each call is still Θ(n), and the size of the tree is also n.
- (n-1) x Θ(n) = Θ(n2)
- Using the recurrence relation for the best-case, we can show that the best-case algorithm is in Ω(size(T) × log(size(T))). That is, the algorithm will take at least size(T) x log(size(T)) steps.
- First, note that in the best case, height is equal to log2(n+1)-1. You can check slide 7 of Lecture #18-19.
- As such, we are trying to show that the minimum cost, in the best-case, is size(T) x log(size(T) = n x log(n) = nlog(n).
- Steps(n) = 2Steps(n/2) + Θ(n)
- Recall that, given a recurrence relation where the input n is halved for each call, the number of calls is equal to log2(n). Refer to slides 18-21 of Lecture #15. (This was also used for bonus question of A1).
- Since we have log2(n) calls, and each call costs Θ(n) steps, then Θ(n) x log2(n) = Θ(nlog2n).
- We can start off by using the recurrence relations above. Regardless of whether we are talking about running time or storage size, the input is still in terms of n. The part we have to check is Θ(n), which we had as the running time cost of each call. For each call, it turns out we also need Θ(n) storage space, since we are storing a fraction of n each time we make a new ArrayList in a recursive call. As such, the asymptotic bounds of storage space should be the same as that of the running time, for this particular algorithm.
- When compared to the algorithm in the lecture supplement "Tree Traversals" in Week #6 (check the schedule), the algorithm in the tutorial is less efficient. The tutorial algorithm requires copying the subtrees for every recursive call, resulting in a huge number of copies given a large input tree T. The supplement algorithm (slide 10) uses a stack instead, and each node of the tree is only pushed once and popped once. As a result, it is more efficient in terms of both running time and storage space.
- Refer to slides 16-17 of Lecture #18-19 for an example of sketching a proof for a recursive algorithm. Another example in slides 18-21 of Lecture #15.