Analysis of Running Time of Algorithms (Jan 25)
^ Given the Distinct Entries algorithm fom Tutorial #3 (Jan 23), find the worst-case, best-case, and average-case running times.
[T04WarmupExercises.pdf][75KB]
To get a better understanding of the upper and lower bounds of worst/best/average cases, please look at the longer example provided by Prof. Jacobson first.
It is often difficult to identify the exact worst/best/average times for any input size n, given any algorithm. As such, we use upper and lower bounds to describe it. You might think that given the upper bound of the worst-case, we already known the absolute worst running time for any given input n. Then why do we care about the lower bound? First, you should know that the upper bound does NOT mean the actual worst-case for any given input. Rather, it is the worst POSSIBLE case. If I asked you to guess a number less than 100, you may find it very difficult. If, however, I told you to guess between 80 and 100, it would be much easier. In many cases, the worst-case for any input size n might actually be much lower than the upper bound. Without both the lower bound and upper bound, we may not know this. Also, if we have tight lower and upper bounds, then we can get a much better approximation for the worst-case of any given input size n.
- The first step is to rewrite the for-loops into while-loops. The loop variants and invariants will stay the same, but it will be easier to analyze the algorithm.
Notice how we split each for-loop into three parts: the initial value of the loop index, the loop test, and the iterator. It is easier to count the number of instructions now, and the loop test is isolated. Also, we rewrotei++
andj++
intoi=i+1
andj=j+1
respectively. An increment likei++
is actually composed of an addition plus an assignment, hence the unit cost of this one line is 2. - Upper bound of the worst-case.
- To get the running-time of the outer loop, we need to know the cost of the outer loop's body. Since there is another loop inside this body, we will first examine the inner loop.
- Inner loop body:
- In general, to get the worst-case running time, we want to maximize the number of times our loops will execute. In the conditional,
if (A[j] == A[i])
, then the entire alrogithm will return and halt. Since we want the algorithm to continue, we don't want the body of the conditional to execute. As such, we assume A[j] != A[i], and we do not count the return statement in our analysis. - The entire conditional has a cost of 1 (if the body does not execute), for the condition test itself. The line
j=j+1
has a cost of 2. Adding the two parts, the inner loop body has a total cost of 3.
- In general, to get the worst-case running time, we want to maximize the number of times our loops will execute. In the conditional,
- Inner loop executions:
- Next we need to know how many times the loop will execute, in the worst-case. We can determine this using the loop variant, and the initial value of the loop counter.
- The loop variant for the inner loop is i - j. If this isn't clear, please refer to the lecture notes, as it is very important to know.
- The initial value of j is 0; we know this from the first part of the for-loop, which we rewrote into
int j = 0
. - Note that we do NOT know the value of i, since the value of i varies inside the outer loop.
- The number of executions in the worst-case is i -j, when j = 0, therefore i - 0 = i.
- The inner loop will execute i times in the worst case.
- We know this is true because the worst-case is when all entries are distinct, and every entry has to be compared.
- Inner loop total cost:
- From the lecture notes, we know that given k executions, loop test cost Ttest, and loop body cost Tbody, the total cost is: k(Tbody) + (k+1)(Ttest)
- We multiply Ttest by (k+1), because after k executions, the loop test will execute one last time, before failing and exiting the loop.
- Ttest is just 1, for the comparison j < i
- k = i
Tbody = 3
Ttest = 1
k(Tbody) + (k+1)(Ttest) =
i(3) + (i+1)(1) =
3i + i + 1 =
4i + 1
- Outer loop body:
- Now that we know the total cost of the inner loop, we can begin to solve the outer loop.
- The outer loop body consists of
int j = 0
(cost = 1), the inner loop (cost = 4i + 1), andi=i+1
(cost = 2). - The outer loop's body has a cost of 4i + 4.
- Outer loop executions:
- Same as above, we use the loop variant (A.length - i) and the initial value of the loop counter (
int i = 1
) to determine the number of times the outer loop will execute in the worst-case. - We usually refer to A.length as n, so the loop variant is n - i. Since the initial value of i is 1, the loop will execute n - 1 times in the worst-case.
- Same as above, we use the loop variant (A.length - i) and the initial value of the loop counter (
- Outer loop total cost:
- Using the above formula, k(Tbody) + (k+1)(Ttest)...
- k = n - 1
Tbody = 4i + 4
Ttest = 1 -
We first isolate the sum, to make it easier to calculate.
The sum of the first n - 1 numbers if (n)(n-1)/2.
As mentioned on the course website, this is a formula you should have seen in MATH 271.We need to use this because the cost of the outer loop body is dependent on the value of i. This makes sense, because as index i increases, there are more entries before it to be compared with. For example, if i = 2, A[2] is compared with A[0] and A[1]. If i = 3, then A[3] is compared with A[0], A[1], and A[2].
- Algorithm total cost
- The very last step is to add the cost of
int i = 1
(cost = 1) andreturn true
(cost = 1). - 2n2 + 3n - 4 + 2 =
2n2 + 3n - 2
- The very last step is to add the cost of
- Lower bound of the worst-case.
- To find the lower bound, we need to find some function such that for every input size n, there is at least one input that will be at least this slow. Two inputs of the same size may have different running times. Take A = [1, 2, 3] and A = [1, 1, 1] for example. Both are of length n = 3, but the first array will take longer to process because it needs to do three comparisons. The latter only needs to do one comparison, before returning false (not distinct). If at least one input for each possible input size n is slower than our function, then our function is a valid lower bound.
- Finding a good lower bound is a little more tricky, because it requires a good understanding of the algorithm, and some intuition on your part.
- The idea is, once again, to maximize the number of times our loops will execute. In some cases, this is not possible for all inputs. Coincidentally, for this particular algorithm, it is indeed possible. As we already found out earlier, this happens when all the entries are disctinct, forcing the algorithm to make every possible comparison. As such, we can actually re-use the above proof and say that the lower bound is 2n2 + 3n - 2.
- Since our lower and upper bounds are equal, we can say that the upper bound is tight.
- Lower bound of the best-case running time.
- The best-case running time is easier to find, because we will do the opposite from above: minimize the loop executions.
- Looking at the outer loop test, we know the loop only executes if i < A.length, same as n > i. We know i = 1 before the first loop test. Then if n ≤ 1, the outer loop will not execute at all. Of course, the inner loop won't execute either. In the preconditions from tutorial 3, we know n is a positive integer. As such, the absolute best-case is when n = 1.
- The reason is that if n = 1, then there is only one element in the array and there are no comparisons to be made.
- The total cost here is just 3. One each for
int i = 1
, the outer loop test, and thereturn true
statement at the end.
- Upper bound of the best-case running time.
- The loops are essentially doing comparisons. To minimize the loops, we need to minimize the number of comparisons made.
- The easiest way is to have A[j] == A[i] on the very first iteration of the loops, so that the algorithm will immediately return and halt.
- Let's say for all input sizes n ≥ 2, we pick In such that A[0] == A[1]. Then the running time will be exactly 6, one for each line until the algorithm returns false.
- But what about n = 1? Since our rule for In says that A[0] == A[1], it means n ≥ 2.
- Luckily, we already know the running time when n = 1 is just 3, which is faster than the upper bound of 6. That means that for all input sizes n, 6 is a valid upper bound of the best-case.
- Average-case. While the average-case is very useful, it is also very difficult to determine. The average-case is not just the average of the best- and worst-case, because we do not know the distribution of our inputs. For some algorithms, the average case could also be very close to the worst-case. For this algorithm, the running time depends on size n of the input, as well as the chances of any input A[i] being equal to A[j].
What happens when n = 1?
This question was raised in class, and I wasn't able to answer it clearly at the time.
Worst-case lower/upper bound: 2n2 + 3n - 2
Best-case lower bound: 3
Best-case upper bound: 6
When n = 1, the worst-case is 2(1)2 + 3(1) - 2 = 2 + 3 - 2 = 3.
When n = 1, the best-case is between 3 and 6.
It just so happens that when n = 1, the best and worst case are the same.