CHPT 22 Elementary graph algorithms

Representing the graph

Adjacent list

It consists of an array of vertices. In each entry u, adj[u] stores all the vertices v that has an edge u,v connected. The asymptotic space is O(E + V) but counter-intuitively, we need more space to store undirected graph than directed because for directed graph, an edge (u, v) will only be stored in adj[u]. But for an undirected graph, we need to store v in adj[u] and u in adj[v].

Adjacency matrix

A |V| * |V| matrix such that A_{ij} = 1 if (i,j) \in E. It is symmetric so for undirected graph we can only store the upper trangule. We can also store weighted graph. A_{ij} = w for (i,j) \in E whose weight is w. For unconnected vertices, the weights between would be inf.(or 0 or NULL)


A graph traversal algo that explores the shallow vertices first. And the simple path from s to v forms a shortest path. The algorithm explores all vertices with distance k first before exploring vertices with distance k + 1

We can see that this algo use a queue (FIFO) to imitate the exploring the shallowest point first algo.

Grey nodes are nodes that haven’t explored all its adjacent vertices. The while loop from line 10 to 18 will run as long as there are grey nodes.


single bit store color

Show that using a single bit to store each vertex color suffices by arguing that the
BFS procedure would produce the same result if lines 5 and 14 were removed.

u, the predecessor needs to become black once all its adjacent nodes are discovered(greyed). The for loop at line 12 did exactly that.

2 coloring problem

n nodes, r pair, give an algo that in O(n + r) time to decide if we can have pair color1 with color 2.

Even level of BFS is group 1 and odd level of BFS is group 2.

Diameter of a tree.

The diameter of a tree T is the largest of all shortest-path distances in the tree. Gave an algo that finds it.

Run BFS from a any node v . It will find the node that’s furthest from it. Then run BFS again from the furthest node, the largest distance will be the diameter.


Explores edges out of the most recently discovered vertex that still has unexplored edges leaving it.

This algo uses recursion.

Classification of edges

 classes of edges
Its from here!


iterative DFS

DFS tree has only outgoing edge

Explain how a vertex u of a directed graph can end up in a depth-first tree containing only u, even though u has both incoming and outgoing edges in G.

DFS for connected component

Show that we can use a depth-first search of an undirected graph G to identify the
connected components of G, and that the depth-first forest contains as many trees
as G has connected components. More precisely, show how to modify depth-first
search so that it assigns to each vertex v an integer label between 1 and k,
where k is the number of connected components of G, such that u:cc = c:cc if
and only if u and c are in the same connected component.

Topological sort(on a dag)

DAG is directed acyclic graph.
A topoligical sort on a dag in a linear order of all its vertices such that if G contains an edge(u,v) then u appears before v in the ordering.
We can think of it as a ordering such that all directed edge go from left to right.

**It is often used for scheduling ** where each directed edge (u,v) means u a prerequisite for v.


Call DFS(G) to compute the finish times for each vertex v.
As they finish insert them into the front of a linkedlist.
return the linked list.

Why it works.

lemma 1: a directed graph is acyclic if and only if a DFS of G yields bo back edges.
An edge (u,v) is a back edge if v is an ancestor of u.

First we want to show 1 if no back edge -> no cycle
Then we want to show 2. if no cycle -> no back edge
Using contra positive we know 1 is equivalent to if cycle -> G has back edge and 2 is equivalent to if back-edge -> has cycle
if DFS has back edge (u,v):
v is an ancestor of u, means there is a path from v to u. The back edge will form a cycle.

CLRS chpt 12 Binary tree

Search tree data structure supports search, max, min, insert, delete, predecessor and successor. Basic operation is bounded by the height of the tree(even though the worst case can have \Theta(n). Binary tree has a pointer to parent and left and right child.

Binary search tree

Binary search tree satisfy the binary search tree property: Let x be a node, if y is a left child of x, x.key > y.key. If y is a right child of x, then x.key < y.key.

The binary search tree allows us to print the node in sorted order through in order traversal.

if x != null:

This algorithm will go down to the far left until it reaches the leftest element. It will attempt to find leftest element’s children, but there isn’t any so it only prints and go back up to the parent that called it. The parent is the next node to be printed, then the right node of the parent. This pattern goes on… Essentially we are squashing down the tree such that the parents fit between the two children and make a sorted list.

It takes O(N) time to in order walk through the tree.


Min heap property

the value of each node is greater than or equal to the value of its parent, with the minimum-value element at the root. It is different from binary tree property because min heap only knows the relationship between children and parent, not between children. On the contrary, we know the left child is smaller than the parent and the right child.

Non recursive in order

We can use a stack as a aux data structure to hold parent’s value. 1. we init an empty stack 2. we init current node as root 3. push the current node onto the stack and set the current = current.left 4. If current is null and the stack is not empty then we pop the stack, print it and set current = poped.right. Then we go to step 3 5.If the stack is empty, we are done.

stack s
current = root
While not s.isEmpty() or current == root:
    if current != Null:
        current = current.left
        poped = s.pop() #pop the parent
        current = poped.right

Pre order traversal

In preorder, we want to print the root before the children.

if x != null:

Post order traversal

In post order, we want to print the root after the children

if x! = null:

Argue that since sorting n elements takes O(nlgn) time in the worst case in

the comparison model, any comparison-based algorithm for constructing a binary search tree from an arbitrary list of n elements takes O(nlgn) time in the worst case.

Suppose not, then we can construct a binary search tree in o(nlgn) time(not big O, it means it is strictly smaller than nlgn). Then we can just do a in-order traversal and print. In order takes O(n) so sorting is o(nlgn) instead of O(nlgn) =><= so constructing binary search tree must be O(nlgn).

Querying a binary search tree

We want to find max, min and search. All of these operations are bounded by the height of the tree.

tree search

    if x.key == k or x == null:
        return x
    if x.key < k:
        tree-search(x.right, k)
        tree-search(x.left, k)

This search will form a path in the BST and if we didn’t find it, it will return null. This is very helpful. One less boundary check.

An iterative version is:

    while x.key != k or x!= null:
        if x.key < k:
            x = x.right
           x = x.left
    return x

If we found x.key == k, then we will return the node. Else we will go to the bottom of the tree and return null.

find min and find max

while x.left != null:
    x = x.left
return x

the min is the left most node of the tree.

Same with find max. find max is at the right most of the tree.

while x.right != null:
    x = x.right
return x

###successor and predecessor
If all keys are distinct, then successor of x will be the element with the smallest value that's bigger than x.key
    if x.right != null:
        return find_min(x.right)
    y = x.parent
    while y != null and x == y.right:
        x = y
        y = y.parent
    return y

The code has two section, the first one is when our node has a right node, then the next biggest would be the right node’s min. The second part of the when the node is a right node and has no right node. Then traverse back the tree until the current node is a left node of the parent y. then that parent would be bigger than the node under exam.


recursive find max and find min

    if x.right == null:
        return x
    x = x.right

Tree predecessor

    if x.left != null: 
    return find_max(x.left) 
    else: return x.parent

Show that if a node in a binary search tree has two children, then its successor has no left child and its predecessor has no right child.

If a node has two children x.left and x.right, then the successor will be the min of the subtree rooted at x.right. it will be the left most child of the subtree therefore no left child. The predecessor will be the max of the subtree rooted at x.left. If predecessor has right child then it won’t be the max. (This can be proved using contradiction)

predecessor and ancestor II

Consider a binary search tree T whose keys are distinct. Show that if the right subtree of a node x in T is empty and x has a successor y, then y is the lowest ancestor of x whose left child is also an ancestor of x. (Recall that every node is its own ancestor.)

Insert and delete

We insert through traverse through the tree till we hit a null that can fit our inserted node.

We need to alter the tree and z

    y = null x = T.root 
    while x != null: 
        y = x 
    if z.key > x.key: x = x.right 
    else x = x.left 
    #Now we have traverse till null, y points to last not-null node 
    if y == null: #if the tree is empty: 
        T.root = z 
    if z.key > y.key: y.right = z 
    elif z.key < y.key y.left = z

Delete is more intricate since we might need to move more than 1 nodes around We have 4 cases when trying to remove z

 1. The deleted node z has no left child: replace z with z's right child
 2. The deleted node z has no right child: replace z with z's left child
 3. The deleted node z has two child and its predecessor is its right child(right child has no left child). replace z with z's right child.
 4. The deleted node z has two child and its predecessor isn't its right child(right child has  left child). We need to traverse down z's right child until we find the leftmost node *left_most*, which is the predecessor of z. Then replace *left_most* with it's right child(recall predecessor shouldn't have left child). Then we replace z with *left_most*.

The graphs on p297 might be helpful. 

We will need a transplant subroutine which will point one node's parent point to another node. (It won't change what the node children though).


recursive insert

insert(r, x): 
    if r == null: 
        if x.key > r.parent.key : r.parent = x 
        else r.parent.left = x 
        if x.key > r.key: insert(r.right, x)
        else insert(r.left, x)

nodes examined for searching

Suppose that we construct a binary search tree by repeatedly inserting distinct values into the tree. Argue that the number of nodes examined in searching for a value in the tree is one plus the number of nodes examined when the value was first inserted into the tree.
When the node was first inserted, assume it takes n steps. It takes the same path as the path to search for this node since both these steps use the same comparison method. We need to traverse over the searched node to reach null to return. That’s where the + 1 comes from.

Another kind of insertion sort

We can sort a given set of n numbers by first building a binary search tree containing these numbers (using TREE-INSERT repeatedly to insert the numbers one by one) and then printing the numbers by an inorder tree walk. What are the worstcase and best-case running times for this sorting algorithm?

The worst case for tree insertion is to insert in increasing order. That would require us to traverse the whole list to insert. (1 + 2 + 3 + … + n- 1 + n) That would be \Theta(n^2)

CLRS chpt 11: hashtable

We use hashtable when we only want to insert, search and delete. There are three main stream methood for hashtable: 1. direct address. 2. chainning and 3. open addressing.

Direct addressing.

We directly access the array that contains the pointer to satellite data. It works well when the universe of k is small and we can afford to allocate enough space.(we assume each key is unique). Since everyone has slot already, why not store the data directly?

Exercise for direct addressing


A bit vector is simply an array of bits (0s and 1s). A bit vector of length m takes much less space than an array of m pointers. Describe how to use a bit vector to represent a dynamic set of distinct elements with no satellite data. Dictionary operations should run in O(1) time.

Given a bit vector b, it contains 1 in position k if k is in the dynamic set.


Suggest how to implement a direct-address table in which the keys of stored elements do not need to be distinct and the elements can have satellite data. All three dictionary operations (INSERT, DELETE, and SEARCH) should run in O(1) time. (Don’t forget that DELETE takes as an argument a pointer to an object to be deleted, not a key.)

The keys are not distinct so we could let T[x.key] points to a doubly linked list and store all the data that has the same key. Insert takes O(1). When we delete, we take pointer to the object as an argument and removing a node from linked list is O(1).
Search will take key as the argument and check if T[k] is null so O(1).

Down side of direct addressing

We need to allocate space for all possible key in the universe and this become infeasible when Universe is large. More importantly the actual keys used might be way smaller than U. So we use hashtable


Hashtable uses hash function to computer key for the table. For an element with key k, we will store it at slot h(k). But multiple key values might be hashed into the same value: h(k1) = h(k2)… when k1 != k2 !=…. a collision has occurred.

What constitutes a good hashing function

A good hashing function will appear to be “random” while being deterministic(else we can’t retrieve).

Load factor

Assume uniform hashing, where any given element is equally likely to be hashed into any of the m slot. n = number of elements in the dictionary. m = number of slot in the table. n/m is the load factor, the average length of linked list pointed by the table.


Collision occurs when two element’s key got hash to the same value. Then we can do Chaining; we will store the elements with the same h(k) into the same linked list.

Run time of Insert, search and delete


we will insert at the front of the doubly linked list pointed by h(x.key). O(1) time

Search(T,k) k is the key value.

Go to h(k) first and traverse the linked list until we either find k or fail to find it. It takes O(n/m) to search. (1 if we fail, we need to traverse the whole list, O(n/m)
(2 if we the key is in the dictionary…well we need to take the average of 1 plus the expected number of elements added to x’s list after x was added to the list(because we have to traverse through all the element before x, who are added after x

\dfrac{1}{n} \sum_{i = 1}^{n} 1 + \sum_{j = i + 1}^{n}{E(X_{(ij)}

Where X_{ij} is the indicator variable that h(ki) = h(kj). The inner summation will sum the expected value of collision after x is inserted. And the earlier the x is inserted, the larger the number of sum will be.  P(h(k_i)) == h(k_j)) = 1/mby uniform hashing. So the inner loop sums to\dfrac{n}{m}. We then take the average of it and it will still be\dfrac{n}{m}. (Please ignore the gross math omission but the general idea is close.  <h3>Exercise</h3>  <h4>1</h4>  Professor Marley hypothesizes that he can obtain substantial performance gains by modifying the chaining scheme to keep each list in sorted order. How does the professor's modification affect the running time for successful searches, unsuccessful searches, insertions, and deletions?  It will make insert slower to O(n/m) since we need to traverse. It will <strong>NOT</strong> make delesion faster since the order doesn't help delete to be asymptotically faster. But it will help search to become faster if instead of using linked list we just use an array! Because we can use binary search. Times to search reduce to O(1 + lg(n/m))  <h1>Hash function</h1>  <h2>Division method</h2>  h(k) = k mod m. For this method, we want m not to be power of two. Otherwise it is just shifting bits and will not have the "random" effect a good hash function would have  <h2>Multiplication  method</h2>  h(k) = \floor{m (kA mod1)}Where kA mod1 is the fractional part of the kA.  This method has the advantage that m can be power of 2.  <h2>Universal hashing</h2>  We randomly choose hash function from a class of suitable function. The probability of going to a bad one is slim. (Just like quick sort).  <h1>Open addressing</h1>  In open addressing, all element occupy the table itself. SO NO POINTER and the element stored is the key. And the table can fill up. No more than m element can be stored in the table and the load factor is never greater than 1. Instead of pointer, for each hashing, we generate a sequence of slots to be examined in case of collision. To insert and search, we <strong>probe</strong> the table until we find a suitable position for our element.  h : U x {1,2,3…m – 1} = {1,2,3… m – 1}The sequence of probe depends on the hash value of the element. For every key k, we want the prob sequence to be<(h(k,0), h(k,1)….h(k, m – 1)>. It is a permutation of {1,2,3...m}. so that eventually we consider every slot when the table is filling up.  <pre><code class="python">hash_insert(T, k):    i = 0     while T[h(h,i)] != null and i != m:  #probe until we either find null( empty) or have no space left         i += 1     if i != m:         T[h(k,i)] = k     else:error "no space left" </code></pre>  The same logic goes for search. We either find it in the permutation sequence or finish the whole sequence...  <pre><code class="python">search(T,k): i = 0 while T[h(h,i)] != null and i != m:       i +=1 if i != m:     return i </code></pre>  But for delete, we can not just delete the key in the table. Because removing key will make it impossible to search. We can insert a DELETE instead of Null when delete a key.  We define h'(k,i), an ordinary hash function, to be the auxiliary hash function.  <h2>Linear probing</h2>  h(k) = (h'(k) + 1) mod m)linear probing will search h'(k) then h'(k) + 1.. h'(k) + m - 1 and then wrap around and search until h'(k) - 1. It will iterate through the whole table eventually. It suffers from <strong>primary clustering</strong> and long run sequence will build up the table and search will take longer. <strong>the initial starting position will dictate the sequence</strong>  <h2>Quardratic</h2>  h(k) = (h'(k) + c_1 i + c_2 i^2) mod mThe offset is provided by the polynomial. It works better than linear probing but we must constrain the use ofc_1, c_2, m.  <strong>the initial starting position will dictate the sequence as well</strong>  <h2>Double</h2>  h(k) = h_1(k) + i h_2(k) mod mwhere h1 and h2 are both aux hash function. The first probe position ish_1(k)and then we offset each probe with(h_2(k) + i) mod m
So the probe sequence depends on two hashing function is not dictated by the initial starting point.

Perfect hashing

Two level hashing using universal hashing at each level.

CHPT10 Basic Data structure

We can implement stack and queue using array and pointers.

Stack and Queue


Stack uses array and a pointer that indicates the top of the stack.


  1. Check empty : if the is at the beginning it is empty:
if == 0:
    return True
    return False
  1. Push we can push to the top + 1 of the array We need to check if we “overflow”, if we ran out of space in the stack.
    if + 1 > len(S): error overflow = + 1
    S[] = x
  1. Pop the last element at, if the stack is empty then just return error.


Queue has insert and delete as well. They are called Enqueue and Dequeue. They follow FIFO, just like the line. The guy who first entered it will leave it first. Since the operations will happen on both side of the queue, we will need TWO pointers to keep the head and the tail. And we can allow the Queue “wrap around” in the array. The elements will consist of Q.head, Q.head + 1, Q.head + 2, Q.head + 3… Q tail – 2, Q tail – 1.
The queue is empty when Q.head == Q.tail. The queue is full when Q.head == Q.tail + 1


  1. Enqueue(Q,x)

    Q[Q.tail] = x if Q.tail == len(Q): Q.tail = 0 #wrap around to 0 else: Q.tail = Q.tail + 1

So the Q.tail will always point to something outside of the Queue, except when the queue is full.


x = Q[Q.head()]
if Q.head == len(Q):
    Q.head = 0
else: Q.head += 1
return x

##Deque(double ended queue)
Deque allows insertation and deletion at both ends.
Deque can be used in solving this problem:
[Sliding window maximum][1]

##Problems in stack and queue

###How to implement a queue using two stacks?
Observe that when we pop an stack off and store the result in another stack, the order will be reversed.
pop S1=[1,2,3] off until empty and push then onto an empty stack S2 we get [3,2,1] and the element we want to deque, in this case, number 1 now can be poped off from S2. This might take O(n) time at worst because we have to pop S1 empty(potentially size n) to access the element at the very beginning.

###How to implement a stack using two queues.
Given two queues q1 and q2. when we push to stack we push to stack, we push to q1. When when pop, we want to access the top of the q1. We can dequeue q1 to q2 until q1 is empty and the top element will be at the very beginning of q2 and we can dequeue q2. This takes O(n) at worst

#Linked list
A linear ordered data structure where the order is determined by the pointer.
Single:where each node keeps only the next  
Double:Each node keeps prev and next  
Circular: Head node's prev = tail. = head

##Common ops
###Search element and return that node
search(L, x):
    while  L.head != None:
        if x != L.value:
            L.head =
        else: return L

###Insert front(x is a node
insert_front(L,x): = L.nead
    if L.head != None: L.head.prev = x
    L.head = x


Need to add in the edge case when the element found is at the beginning(L.head)

LIST-DELETE.(L,x): (x is a node)
if x.prev != None: =
    L.head =

if != None: #Not at the tail = x.prev
    x.prev = None

But with the use of Dummy variable,( a nominal head) we can make the code clearer. It is called Sentinel. It reminds me of the thing in the Matrix.

Instead of using L.head to point to the first node, we use L.nil. It will lie between head and tail and L.head.prev = L.nil and = L.nil. Subsequently = L.head and L.nil.prev = tail.

If we add then in then delete would become python
node = search(L,x): = = node.prev

Since even x is at the beginning, == and that is the head. and we update head with If node is at the very end, then will be L.nil and L.nil.prev points to the end of the tail, in this case, node.prev.

Questions from the book about linked-list

in search we have two conditions: while( x!= L.nil and x.key !=k). How to shrink it to one condition check?

We can let L.nil.value = k and it will return L.nil if none is found. 2.Union 2 set S1 and S2 in O(1) time.
S1 and S2 are in doubly linked list. We want to let the end of S1 points to the start of S2. We use sentinels.\ Let l1.nil be S1’s sentinel and l2.nil be S2’s sentinel. We first let l1.nil.prev points to This let the tail of l1 points to the beginning of l2.

We then let = l1.nil.prev. This let the head of l2 points to the tail of l1.

We also have to let = l2.nil.prev. Head of l1 points to the tail of l2

And l2.prev =

We need 4 operations. It makes sense since we have to change two sentinel and each has two pointers.

How to reverse a single linked list?

First we could do this with recursion. Then non recursively


Binary tree.

We could use a ternary linked list with parent, left and right children pointer.

tree with at most k children

Each node will have k + 1 poitner, where 1 is for pointer to the parent. If you don’t have that many children, just let them point to null.

Unbounded children?

The previous implementation would break down. it breaks down because the parent has to keep track of all the children. But we could solve this by **left children, right sibling ** method. We will have three pointers in a node. one for parent and one for the left most child and one for the sibling on the right.

We lose access to all children at the parent but we can store then in O(n) space.



Write an O(n) time recursive procedure that, given an n-node binary tree, prints out the key of each node in the tree.


Write an O(n) time nonrecursive procedure that, given an n-node binary tree, prints out the key of each node in the tree. Use a stack as an auxiliary data structure.

Hello again CLRS

I have picked up this book again and this time I will make notes as I read through. It will help me think more thoroughly about the contend and in the future I will make them public so for people who stumble upon my study notes, these notes might become useful to them.

This is not a comprehensive study guide, please only use it as a reference.


Chapter 2, getting started

Insertion sort

The idea is really simple. When you sort an array of number, you start with two sides. One side will hold the sorted elements, the other will hold the rest of the unsorted. In the very beginning, the sorted part will hold one element, which is sorted by itself. Then we pick the second element will be inserted into the sorted part of the array based on the preference(non-descending or non-increasing).

This process continues in a loop. We start with the 1… (i – 1) element sorted in the array and we pick the ith element and insert it into the right place. If we want non-descending and the ith element is bigger than (i – 1)element, then we will move the (i – 1)element to ith element’s place, effectively making space for the ith element. The psedo code is:

We assume the array starts at position 1.

for j = 2 to A.length   
    key = A[j] //insert A[j] into the sorted array A[1..J-1]   
    i = j - 1   
    while i > 0 and A[i] > key // while we haven't reached the first elem of sorted array and the key is smaller than the current element A
        A[i + 1] = A[i]
        i --
    A[i + 1] = key

Loop invariants and the correctness of the insertion sort

This algorithm is in place. To show the correctness of the algorithm, we will usee something called loop invariant, which means at the start of each iteration, the subarrayA[1..j – 1] consists of the same elements originally in A[1..j – 1] but in sorted order. To show this, we need to prove three things:
1. Initialization: it is true at the beginning of the loop. (The first element is sorted by itself.) 2. Maintenance: it is true before an iteration of a loop and it is also true before the next iteration. (after an iteration, the sorted part will remain sorted with one additional element, the ith element.) 3. Termination: when the loop stops, the invariant gives us useful information.(When then loop stops, the subarray A[1..n] will be sorted.

It is very similar to induction except induction will never terminate.

Analyzing algorithms

We want to predict the resource an algorithm needs such as memory and time.


RAM, No concurrent operations. No memory hierarchy such as cache or virtual effect.

Input size

It depends on the problem to decide which notion should be used for input size. For sorting, it is better to use the number of element, but for graph problem, it is better to use two numbers: the number of vertices and edges.

Running time

Of an algorithm is the number of primitive ops or steps executed.

We will now analyze the insertion sort. First the complicated way.

The c_1 and c_2 are costs of the instruction. For each j = 2,3,...n where n = A.length, we denote t_j to be the number of times the while loop will be executed for the value of j

Then the running time is the sum of each statement’s time. We pay special attention to line 5 – 7. For line 5, the time this statement will be run is determined by the number of comparison needed to find the right place for key. For each key, the time t_j will differ. But the sum of these t_j multiplies by the time it takes to execute the comparison will give us the total time of the while statement. The same for the statement that follows while. But the test is executed one more time than the body.

The formula is shown above. The order of the input will hugely affect the time. Assuming the list is already sorted. Then for each comparison in the while statement, we only need to compare once therefore t_j will be 1 for all j and our formula becomes The expression is of the form an + b and it is linear function of n.

But if the array is in reverse order then each comparison in while loop will take j times. and

the formula will be above. It is of the form an^2 + bn + c and this is a quardratic function.

Worst case and average case analysis

We usually will try to find the worst-case running time, the longest running time for any input of size n. There are three reason as to why we want to find worst case.

  1. The worst-case will provide us with a guarantee.
  2. The worst case often happens. EG: For database search, the worst case is not finding the content and that happened quite often.
  3. The “average case is often as bad as the worst case. For our insertion sort, assume half of the array is sorted and half is not. So we basically will need to sort the unsorted half, which has input size of \dfrac{n}{2} and it is still a quadratic function.

Order of growth

We want to make some simplification. We are interested in the order of growth. So we consider only the leading term of the formula an^2. We also ignore the leading coefficient since it is far less significant than the n’s effect. So we are left with n^2. The worst-case running time is \Theta(n^2).

Designing algorithms

For our insertion sort, we have an incremental approach. We first start with a singleton sorted list, then make the sorted list bigger and bigger one element after another. We will now learn about divide-and-conqure

The divide-and-conquer approach

Lots of good algorithms are recursive; Solving a problem by breaking the problem up into smaller and similar problems and after solving them, combine the solutions to find the final solution. Divide-and-conquer is one.

  • Divide: Divide the problem into sub problems
  • Conquer: Solve the subproblem recursively(breaking them up again and again). If the subproblems are small enough, we will solve them straghtforwardly.
  • Combine: Combine the solutions of subproblems to get the original.

For merge sort: We will divide the unsorted list into 2 unsorted list(divide). Then we will sort the two unsorted list recursively using merge sort.(conquer). Then merge the two sorted list to get the original answer.

To merge, we need to get an auxiliary function merge(A, p, q, r), where A is the list, and we assume A[1,p] and A[p + 1, q] are both sorted.

The merge function will start with an empty array and pick the smaller of the two array’s first element. Then after picking an element from that array, we push all other element forward so we have a new first element in that array. This process will repeat till one array is empty. Then we will just append the other array to the end of the sorted array.

The psedo code has one trick: It won’t check if we have reached the end for both array. Instead it will use an sentinel value, \infty. Since we will put the smaller of the two top element into the sorted array and nothing is smaller than \infty, we virtually append the array to the sorted array.

Merge(A, p, q, r)
    n1 = q - p + 1 // length of the first sorted array
    n2 = r - q    //length of the second sorted array
    let L[1... n1 + 1] and R[1..n2 + 1] be new arrays //leave 1 extra slot for sentinel value
    for i = 1 to n1 
        L[i] = A[ p + i - 1]
    for j = 1 to n2 
        R[j] = A[ q + j]

    L[n1 + 1] = infty
    R[n2 + 1] = infty

    i = 1
    j = 1

   for k = p to r
       if L[i] <= R[j]
            A[k] = L[i]
            i += 1
            A[k] = R[j]
            j += 1

We need to check the loop invariant for this algorithm. The loop invariant is at the beginning of the last for loop, A[p..k – 1] contain k – p number of the smallest element among array L and R. And L[i] and R[j] are the smallest element in their arrays that aren’t copied into A. The initialization is good since at the beginning the A array is “empty” and k – p = 0. And L[i] and R[j] have the smallest element(sorted, not touch, so the first one is the smallest).

Maintenance: Picking one element from the two array and insert will give us a A[p..k] that’s sorted. And L and R will remain sorted.

Termination: This algorithm terminate when k = r. So the array A[p..r] will be sorted which is the whole array.

We can see that merge’s largest iteration is a single for loop and has n step. In each step, the operation takes constant time. So the running time is \Theta(n).

Now go back to merge sort

Merge_sort(A, p, r)
    if p < r
    q = floor((p + q)/2)
    Merge_sort(A, p, q)
    Merge_sort(A, q + 1, r)
    Merge(A, p, q, r)

It will look like:

To find the running time, we assume that we will divide the problem into a parts and each one is \dfrac{1}{b} the size of the original. Ignoring the time it takes to divide and combine

[T(n) = aT(\dfrac{n/b})]

We will need master theorem to find the exact running time. But we can informally deduct the time for merge sort.
Let’s assume that the n is exactly a power of 2. Then we will make a recursive tree. The next level from the top will have two subtree that has size n/2. Assuming the time to do divide and combine step for each element is c, the total cost of the two subtree is c(n/2 + n/2) = cn. The next level will have 4 (n/4) subtree. The total time is 4c(n/4) = cn.
The i level below the top has 2^i node and will takes 2^i * \dfrac{1}{2^i} *c = cn. The total number of level is lgn + 1.
So the total cost is level * cost at each level = cn * lgn = \Theta(nlogn).

Chapter 6 Heap sort

Intro to heapsort

In sorting, data are rarely isolated. Each data element is called record that contains a key, the value to be sorted, and a satellite data, the actual content of the data.

Why sorting:

  • Some problems are inherently sorting problem
  • Sorting can be a key sub-routine for some other algorithms
  • We can prove a non-trivial lower bound for sorting and our best upper bounds match with the lower bound asymptotically. So we know the sorting algorithms are asymptotically optimal. We can also use the lower bound to prove other thing

Comparison sorts

Sorts that determine the sorted order based on comparison is called comparison sort. Insertion, quick, merge and heap sort are all comparison sort.

But if we can gather information about the sorted order of the input by means other than comparing elements, we can beat comparison sort. Bucket sort and radix sort.

Order statistics:

the ith order statistics of a set of n number is the ith smallest number in the set. With a randomized algorithm, we can find the ith order statictics in O(n) time

Overview of heap sort

Heap sort can run O(nlgn) but it is also in place( unlike merge sort). The term “heap” was coined in the context of heapsort. But now people also refer to it for garbage collection.


A (binary)heap is a data structure is an array that we can view as almost a binary tree(except the lowest level, which is filled from left up to the end.
An array A that represent the heap has two attribute, one is A.length, representing the array’s length, the other is A.heapsize, representing the heap size. Note these two are different, A.length \geq A.heapsize and only A[1..A.heapsize] are valid member of a heap.

The root of tree is A[1] and given the index i, we can know the index its parent, left child and right child

  • parent = floor(i / 2)
  • left_child = 2i
  • right_child = 2i + 1

We can use bit operations. parent = i >> 1, left_child = i << 1, right_child = i<<1 + 1.

There are two kinds of heap, max-heap and min-heap. Max-heap has max-heap property that keeps every parent bigger than its children. A[parent(i)] \geq A[i]. The largest elem is the root A[1]. Min-heap is the opposite. parents are smaller or equal to their children.

Heap is a binary tree and the height of the tree, the number of edges on the longest simple downward path from the node to a leaf, is \Theta(lgn)

Maintain the heap

To do so, we will use max-heapify function that takes in an array A and an index i. It will assume the left and right child of A[i] will be max heap but A[i] itself might violate the heap property(smaller than its children.) So max-heapify will float down A[i] so the sub tree rooted at ith position in the array will have the heap property.

    l = 2i
    r = 2i + 1
    max_pos = i

    if l <= A.heapsize and max < A[l] 
        max_pos = l

    else max_pos = i

    if r<= A.heapsize and A[r] > A[max_pos]
        max_pos = r

    if max_pos != i
        swap A[i] with A[max_pos]
        max_heapify(A, max_pos)

The fix up among children and parents are Theta(1).
The children’s sub tree can have size at most 2n/3, when the bottom level of the tree is exactly half full. The running time is T(n) <= T(2n/3) + Theta(1).

By Master Theorem, the run time is T(n) = O(lgn). Alternative, we can switch from n to height. The running time is O(h).

Building a heap

We can use max-heapify in a bottom-up manner. Start from the leaves node and go through the remaining node to run heapify.

    A.heapsize = A.length
    for i = floor(n/2) to 1
        max_heapify(A, i)

Use loop invariant to show it will maintain the heap properity. The invariant statement is:

at the start iteration of for, each node i + 1, i+ 2 …n is the root of a max-heap.

  • Init: at the beginning, we know floor(n/2) + 1 … n are all leaves node, so they are trivial max heap.
  • maintain: on ith iteration, we know all nodes with index higher than i are heaps and any children of nodes i to 1 will be a max-heap. So node i’s children will be root of max heap. So we can run max_heapify for all node i through 1,
  • Terminate: when i = 0 it terminate. At that time node 1 through n are all root of some heap. Especially 1 is the root.

To find a tighter upperbbound we need to use the knowledge that a heap at height h has at most celling(n/ 2^{h + 1}) for any height.
We know max-heapify run on height h will take O(h)

(1)   \begin{align*} \sum_{h = 0}^{lgn} ceil(\dfrac{n}{2^{h + 1}} O(h) = O(n \cdot \sum_{h = 0}^{lgn} \dfrac{1}{2^{h + 1}} \end {align*} Use formula A.8 to replace the last sum with x = 1/2. we got  \begin{align*} O(n) \end{align*}

Heap sort algorithm

We first start with BUILD_MAX_HEAP, and we will have a heap of size n. Since it is a heap, the max element of the array will be A[1]. We swap A[1] with A[n] and reduce the size of the heap to n – 1. Then we Hipify the heap. Then A[1] will hold the second largest elem. Swap it with A[n – 1] and do it again. We stop when the heap size is 2.

    for i = n downto 2
        A[i] = A[1]
        A.heapsize - = 1

The BUILD_MAX_HEAP takes O(n) time, and each MAX_HIPIFY takes O(lgn) time and we do it n times. So the running time is O(nlgn).

Priority Queue

IT is the most popular application of the heap. It is a data structure for storing a set of element S. Each element has an associated value called key. A max priority queue supports:

  • Insert(S,x) inserts an element into S.
  • Maximum(S) returns the element with the largest priority
  • Extract_max(S) removes and returns the elem with the largest element
  • Increase_key(S,x,k) increases element x’s priority to k (k should already be bigger than x’s priority.

Max-priority queue are used as job scheduler on time share computer.
Alternatively, we have min-priority queue. It is used in simulation where time occurence is the key. We want to have the next closest value to run.

We will write some pseudo code for the functions

    if A.heapsize < 1
        error "heap underflow"

    max = A[1]
    A[1] = A[A.heapsize]  //we will reduce the heapsize by 1, so move the last one to 1's position.
    A.heapsize -= 1
    MAX_Heapify(A, 1)

    return max

The running time is O(lgn) since the max_heapify takes O(lgn) time.

// increase the priority of the element at A[i] to key

    if key < A[i]
        error "new key smaller than curr key"

    A[i] = key
    while i > 1 and A[parent(i)] < A[i]
        swap A[i] with A[parent(i)]
        i = parent(i)

Since key can only be bigger than key, the subtree rooted at i are still max-heap. But the parents of i might not keep the heap property.(If key is supper, big, it should have been the root.) So we need to traverse the key at ith element up the tree till it’s smaller than its current parent or become the root.

Since the path going up to the root(height) is of order O(lgn), the running time of Incrase_key is O(lgn).

Insert(A, key)
    A.heapsize += 1
    A[A.heapsize] = -infty
    Max_increase_key(A, A[A.heapsize], key)

We first make one additional space for the new key value and assign that slog -infty, so no matter what the key value is, we can still increase A[A.heapsize].



min and max number of elem in a heap of height h.

Since a heap is a binary tree(ish) structure, we know the number from root to second to last layer will be 2^h. The least the last level can have is 1 node, so the min is 2^h + 1. The max the tree can have is a full binary tree and that is 2^{h + 1} - 1. (Bit tricks to show 2^h = \sum_{i = 0}^{h - 1} 2^h

An n-element heap has height floor(lgn)

We can write n = 2^m - 1 + k where m is as large as possible. The heap is a complete binary tree plus additional leaves. The binary tree is of height m - 1 so the heap’s height will be m - 1 + 1 (plus the last level leaves. So h = lgn.

Show with the array representation for storing an n-elem heap, the leaves are the nodes indexed by floor(n/2) + 1, floor(n/2) + 2…

We know the left child formula is 2i, where i is the parent index. To let the leaves don’t have any children, we need to make 2i an invalid index. One way to do so is let 2i > n, i > n/2. So let i > floor(n/2) + 1 will make 2i > n. So floor(n/2) + 1… will be the leaves index.

Chapter 4: divide and conquer


We have explored divide-and-conquer before in merge sort. In it, we use recursive function to solve it and once the problem is small enough, we will solve straightforwardly. Sometimes, we have to solve the sub-problems differently from the original and the solving cost is part of the combine step.


It is an equation or inequality that describes a function in terms of its value on smaller input.

We have three ways to solve recurrence relationship.

  • Substitution method, give the recurrence a guess for bound then use induction to prove it.
  • Recursion-tree method, convert the recursion into a tree whose node is the cost of each subprobelm.
  • Master method

Some technicalities

We usually ignore the floor and ceiling function since they rarely makes any difference. (change more than a constant factor).

Maximum-subarray problem

We are trying to find a subarray whose sum will be the largest among all possible subarray.

Brute Force

It is pretty easy, we just compose every possible subarray and keep track of the max and location of the max-making subarray.

    n = A.length
    max = 0
    max_right = 0
    max_left = 0
    for i from 0 to n
        subarray_sum = 0
        for j from i to n
            subarray_sum = subarray_sum + A[j] //kept track of subarray_sum so no recompute
            if subarray_sum > max
                max = subarray_sum
                max_right = j
                max_left = i

To find the running time, there is a intuitive way; We need to select 2 end points out of all n points and $ {n \choose 2} \in O(n)$

It is not fast enough.

Divide and conquer

There are only three possible ways for the maximum subarray to exit. Assume the left side is l, right side is r and the midpoint is m. The midpoint will either be

  1. Entirely in A[l, m]
  2. Cross the midpoint m
  3. Entirely in A[m, r]

If we are in situation 1 or 3, its good since they will just be smaller instances of the original problem.

In the middle situation

If the maximum sbuarray is in the middle, then it is not a smaller instance of the original problem because now we have to cross the mid point. Since any max_subarray in the mid point situation must have the form A[i..m] and A[m + 1.. j], we can just find the max subarray for these two subarray.

find_max_crossing_array(A, low, mid, high)

    left_sum = -infty
    sum = 0
    for i = mid downto low
        sum = sum + A[i]
        if sum > left_sum
            left_sum = sum
            max-left = i

    right_sum = -infty
    sum = 0
    for i = mid upto high
        sum = sum + A[i]
        if sum > right_sum
            right_sum = sum
            max_right = i

    return(max-left, max-right, left-sum + right-sum

Now with find middle at hand, we can solve the max_subarray problem

find_max_subarray(A, low, high)
    if high == low
        return (low, high, A[low]) // base case

    else mid = floor((low + high) / 2)

    (left-low, left-high, left-sum) = find_max_subarray(A, low, mid)
    (right-low, right-high, right-sum ) = find_max_subarray(A, mid | 1, high)
    (cross-low, cross-high, cross-sum) = find_max_crossing_array(A, low, mid, high)

    if left-sum is the biggest
        return (left-low, left-high, left-sum)
    else if right-sum is the biggest
        return (right-low, right-high, right-sum ) 
        return (cross-low, cross-high, cross-sum)


On each term, we divide the problem size by 2 and solve two of these subproblems. $T(n) = 2T(\dfrac{n}{2}) + \Theta(n)$ where the last linear term is the find_max_mid_array’s cost.

It is the same recurrence relationship as merge sort. So the running time is $n \log n$

Another method in linear time

There is a linear method called Kadane’s method that will do in linear time. It starts with a question: if we know the maximum subarray ending at i is Bi, what will be the maximum subarray ending at i + 1? The answer is it will either contain the previous maximum sub-array as prefix or just by itself depending on who is bigger. So in other word, we the maximum sub array ending at i + 1 will be the one that gives value of $max(B_i + A_{i + 1}, A_i)$. The only case to discard prefix is when the prefix subarray is negative.

    max_ending_here = A[0]
    max_so_far = A[0]
    for i from 1 = A.length
        max_ending_here = max(x, max_ending_here + x)
        max_so_far = max(max_so_far, max_ending_here)

    return max_so_far

Strassen matrix multiplication

The usual way will yield $O(n^3)$ running times.

Naive divide_and_conquer

We know that block matrix multiplication is the same with matrix multiplication. So we can partition the matrix evenly and multiply them in the usual way.

$C_{11} = A_{11} * B_{11} + A_{12} * B_{21}$ and so on.

So the algorithm for square matrix multiplication is:

We can see the number of sub problems is 8 for each recurrence. The problem size becomes 1/2 of the original. (We aren’t counting entries here). And the addition of the matrices takes $O(n^2)$ time. So

$T(n) = 8T(n/2) + O(n^2)$ From master theorem,

We can use indexing to avoid copying entries. So no cost is incurred. With master theorem, we know it is still $O(n^3)$

Strassen’s method

The key is the make more additions and subtraction and reduce one fewer multiplication. In other word, making the recursion tree less bushy. We will have 7 intermediate matrices that are the product of 7 matrix multiplication. Then using these 7 matrices to perform additions and subtractions to get the result we want. It is faster based on Master theorem.

The substitution method for solving recurrences

  1. Guess the form of the solution
  2. Use induction to find the constants and show the solution works.

Sometimes the boundary condition can’t hold but since for asymptotic notation

Chapter 5, Randomized algorithm


A randomized algorithm is an algorithm whose behaviors are not only determined by inputs but also by a random number generator.

Hiring problem

You are the HR of a company and you need to recruit assistants. You will fire the current one if you find a better one. To do so, you need to get help from recruiting company who will charge c_0 for each interview and c_1 if that person is hired.

The cost of this problem highly depends on the order of the interview. The best case is to interview the best at the very front so the total cost would be c_1 + (n - 1) c_0 where n is the number of interviewee. The worst case is when the ability of the candidates are in strictly ascending order. So you have to hire and fire every one except the last guy, yielding a cost of n(c_0 + c_1).

We have to find the Average running time, basically averaging the running time over all possible input. But to do so, we need knowledge, or ASSMUPTION about the distribution of the input.

Prereq for analyze the problem

For hiring problem, we can assume the interviewee came in random order. There are n candidates so there are n! ways to order them(permutation).

I(A) = 1 if A occurs otherwise I(A) = 0.

EG: We define X_H to be the indicator variable of a coin toss. and X_H = 1 if head, else X_H = 0.

(1)   \begin{align*} E(X_H) = E(I_H) \linebreak = 1 \cdot Pr(H) + 0 \cdot Pr(T) \linebreak = \dfrac{1}{2} + 0 = \dfrac{1}{2} \linebreak \end{align*}

We can guess the expected value of a indicator variable of A is the probability of A.

Lemma 1:Given a sample space S and an event A in the sample space S, let X_A = I{A}. Then E(X_A) = Pr(A). Pf

(2)   \begin{equation*} E(X_A) = 1 \cdot Pr(A) + 0 \cdot Pr(\bar{A}) \ = Pr(A) \end{equation*}

Analyze the hiring problem

Define X_n to be the number of hiring made in the process, we want to find E(X_n). X_n is interviewing n people and it is consists of x_1, x_2...x_n, which are indicator variable about whether the ith person is hired or not. and X_n = x_1 + x_2 + .. +x_n

E(x_i), the expected value of the i-th person being hired means that the i th person must be better than all i - 1 people ahead of him. For a random distribution, the chance of the i-th person being the best is \dfrac{1}{i}. So E(x_1) = \dfrac{1}{i}.

(3)   \begin{align*} E(X) = E[\sum_{i = 0}^{n} x_i] \ = \sum_{i = 0}^{n} E(x_i) \ = \sum_{i = 0}^{n} \dfrac{1}{i} \ = ln(n) + O(1) (equation A.7) \end{align*}

So the average hiring cost will be O(c_1 ln(n), which is significantly better than O(c_1 n).

Randomized algorithms

For the hiring problem, we assume that our input follows uniform distribution. But sometimes we don’t have that knowledge. Another fact about our previous algorithm is that for a particular input, the time cost will always be the same. In other word, the algorithm is deterministic and the randomness rests in the input’s distribution.

But we want to know the cost even if we don’t know the distribution. We can **impose ** a distribution on the input. In the hiring problem, we permute the input. No particular input elicits its worst-case behavior.

The only thing to change

    randomly permute the order of the candidate
    best = 0
    for i = 1 to n
        interview candidate i
        if i.value > best
            best = i.value
            hire i

To differentiate this from our previous notation, we call the cost of this algorithm expected cost, instead of average cost.

Randomly permuting arrays

One way of permuting the array is to give each entry a random priority and sort it base on the priority.

    n = A.length
    P = [1..n]  //randomly create an array
    for i = 1 to n
        P[i] = Random(1, n**3)

    sort A, using P as a sort key

We know sorting is O(nlgn).

We need to prove now that the array is uniform random permutation, it is likely to produce every permutation of 1 through n.

Lemma 4: Our procedure produce a uniform random permutation of inputs, assuming all priorities are distinct.

pf: we want to show that the probability of any permutation showing up is \dfrac{1}{n!}. For any array (1,2,…n), we generate permutation (perm(1), perm(2)… perm(n)), let E_1 be the event that 1 receives the perm(i)th smallest priority then we want to find the probability that all i, event E_i occurs:

(4)   \begin{align*} Pr{E_1 \cap E_2 \cap ... \cap E_n} \end{align*}

We know this probability joint probability will give us

Quick Math fact

Sum of expectation

Linearity of expectation is the property that the expected value of the sum of random variables is equal to the sum of their individual expected values, regardless of whether they are independent.

A better way to do this is to generate the array in place.

    n = A.length
    for i = 1 to n
        A[i] = randomly pick from A[i...n]

We will use a loop invariant to show that at the end, the algorithm will produce a uniform random permutation.

Before we start, it is good to know what is a k-permutation. On a set n, a k-perm is a sequence containing k of the n elements. There are \dfrac{n!}{(n - k)!} such possible k-permutations. And we know n-permutation will be n!

Lemma 5: randomize_in_place will give a uniform random permutation pf:

  • Initialization: at the beginning, for i = 1, we know A[1] will be picked out of A[1]. So we have 1-permutation and the n possibility.
  • Maintenance: assume at just before ith iteration and our assumption that for i - 1th, A[i – 1] follows a uniform distribution and each possible permutation will appear in A[0..i -1] with possibility \dfrac{(n - i + 1)!}{n!}. Now adding one more element to the subarray, we got A[0..i] and there are n - i + 1 possible element to pick for A[i], so the probability for any permutation for subarray A[1..i] is \dfrac{(n - i + 1)!}{n!} \cdot \dfrac{1}{n - i + 1)}( = \dfrac{(n - i )!}{n!}, which is the formula for i-permutation.

Let’s make it more formalize. Let’s call E_i the event in which the first i – 1 iterations have created the particular (i – 1) permutation(x_o, x_1, x_2,...x_{i - 1} in A[1..i – 1]. By the loop invariant, Pr[E_i] = \dfrac{(n - i + 1)!}{n!}. Let’s denote E_2 be the event that the ith iteration put a particular x_i in position i. Pr[E_2] = \dfrac{1}{n - i + 1}. So if we want to know the the prob of permutation (x_0, x_1,...x_i), we want to know Pr[E_1 \cap E_2].

Pr[E_1 \cap E_2] = Pr[E_2 | E_1] \cdot Pr[E_1].

Advanced random stuff

Talked about brithday paradox There is the classic way then there is a way using indicator variable. It is simpler but an approximation. We define X_{ij}for 1 \leq i < j \leq k, X_{ij}= I{person I and J have the same birthyday} = 1 if they do else 0.

E[X_{ij} = \dfrac{1}{n}

Let X be the variable that counts the number of pairs of individuals having the same birthday,

X = \sum_{i = 1}^{ k} \sum_{j = i + 1}^ k X_{ij}

Taking the expected value of it;

E{X} = E[\sum_{i = 1}^{ k} \sum_{j = i + 1}^ k X_{ij}]

= \sum_{i = 1}^{ k} \sum_{j = i + 1}^ k E[X_ij]

= k choose 2 \dfrac{1}{n} = \dfrac{k(k - 1)}{2n}.

We can think of it like this: The outer loop pick one person who are on the ith position, then the inner loop has ith to kth position to be chosen, effectively making it a k choose 2.

So the expected number of pairs that will have the same birthday will be \dfrac{k(k - 1)}{2n}. We want at least 1 pair so k(k - 1) > 2n, k^2 \geq 2n + 1, k = \geq \sqrt{2n + 1}. We got k = 28.

I should talk about some distribution, Geometric, Bernoulli, Poisson and so on in another post.