Chapter 5, Randomized algorithm

Randomized?

A randomized algorithm is an algorithm whose behaviors are not only determined by inputs but also by a random number generator.

Hiring problem

You are the HR of a company and you need to recruit assistants. You will fire the current one if you find a better one. To do so, you need to get help from recruiting company who will charge c_0 for each interview and c_1 if that person is hired.

The cost of this problem highly depends on the order of the interview. The best case is to interview the best at the very front so the total cost would be c_1 + (n - 1) c_0 where n is the number of interviewee. The worst case is when the ability of the candidates are in strictly ascending order. So you have to hire and fire every one except the last guy, yielding a cost of n(c_0 + c_1).

We have to find the Average running time, basically averaging the running time over all possible input. But to do so, we need knowledge, or ASSMUPTION about the distribution of the input.

Prereq for analyze the problem

For hiring problem, we can assume the interviewee came in random order. There are n candidates so there are n! ways to order them(permutation).

I(A) = 1 if A occurs otherwise I(A) = 0.

EG: We define X_H to be the indicator variable of a coin toss. and X_H = 1 if head, else X_H = 0.

(1)   \begin{align*} E(X_H) = E(I_H) \linebreak = 1 \cdot Pr(H) + 0 \cdot Pr(T) \linebreak = \dfrac{1}{2} + 0 = \dfrac{1}{2} \linebreak \end{align*}

We can guess the expected value of a indicator variable of A is the probability of A.

Lemma 1:Given a sample space S and an event A in the sample space S, let X_A = I{A}. Then E(X_A) = Pr(A). Pf

(2)   \begin{equation*} E(X_A) = 1 \cdot Pr(A) + 0 \cdot Pr(\bar{A}) \ = Pr(A) \end{equation*}

Analyze the hiring problem

Define X_n to be the number of hiring made in the process, we want to find E(X_n). X_n is interviewing n people and it is consists of x_1, x_2...x_n, which are indicator variable about whether the ith person is hired or not. and X_n = x_1 + x_2 + .. +x_n

E(x_i), the expected value of the i-th person being hired means that the i th person must be better than all i - 1 people ahead of him. For a random distribution, the chance of the i-th person being the best is \dfrac{1}{i}. So E(x_1) = \dfrac{1}{i}.

(3)   \begin{align*} E(X) = E[\sum_{i = 0}^{n} x_i] \ = \sum_{i = 0}^{n} E(x_i) \ = \sum_{i = 0}^{n} \dfrac{1}{i} \ = ln(n) + O(1) (equation A.7) \end{align*}

So the average hiring cost will be O(c_1 ln(n), which is significantly better than O(c_1 n).

Randomized algorithms

For the hiring problem, we assume that our input follows uniform distribution. But sometimes we don’t have that knowledge. Another fact about our previous algorithm is that for a particular input, the time cost will always be the same. In other word, the algorithm is deterministic and the randomness rests in the input’s distribution.

But we want to know the cost even if we don’t know the distribution. We can **impose ** a distribution on the input. In the hiring problem, we permute the input. No particular input elicits its worst-case behavior.

The only thing to change

randomize_hire(n)
    randomly permute the order of the candidate
    best = 0
    for i = 1 to n
        interview candidate i
        if i.value > best
            best = i.value
            hire i

To differentiate this from our previous notation, we call the cost of this algorithm expected cost, instead of average cost.

Randomly permuting arrays

One way of permuting the array is to give each entry a random priority and sort it base on the priority.

Permute_by_sorting(A)
    n = A.length
    P = [1..n]  //randomly create an array
    for i = 1 to n
        P[i] = Random(1, n**3)

    sort A, using P as a sort key

We know sorting is O(nlgn).

We need to prove now that the array is uniform random permutation, it is likely to produce every permutation of 1 through n.

Lemma 4: Our procedure produce a uniform random permutation of inputs, assuming all priorities are distinct.

pf: we want to show that the probability of any permutation showing up is \dfrac{1}{n!}. For any array (1,2,…n), we generate permutation (perm(1), perm(2)… perm(n)), let E_1 be the event that 1 receives the perm(i)th smallest priority then we want to find the probability that all i, event E_i occurs:

(4)   \begin{align*} Pr{E_1 \cap E_2 \cap ... \cap E_n} \end{align*}

We know this probability joint probability will give us

Quick Math fact

Sum of expectation

Linearity of expectation is the property that the expected value of the sum of random variables is equal to the sum of their individual expected values, regardless of whether they are independent.

A better way to do this is to generate the array in place.

generate_in_place(A)
    n = A.length
    for i = 1 to n
        A[i] = randomly pick from A[i...n]

We will use a loop invariant to show that at the end, the algorithm will produce a uniform random permutation.

Before we start, it is good to know what is a k-permutation. On a set n, a k-perm is a sequence containing k of the n elements. There are \dfrac{n!}{(n - k)!} such possible k-permutations. And we know n-permutation will be n!

Lemma 5: randomize_in_place will give a uniform random permutation pf:

  • Initialization: at the beginning, for i = 1, we know A[1] will be picked out of A[1]. So we have 1-permutation and the n possibility.
  • Maintenance: assume at just before ith iteration and our assumption that for i - 1th, A[i – 1] follows a uniform distribution and each possible permutation will appear in A[0..i -1] with possibility \dfrac{(n - i + 1)!}{n!}. Now adding one more element to the subarray, we got A[0..i] and there are n - i + 1 possible element to pick for A[i], so the probability for any permutation for subarray A[1..i] is \dfrac{(n - i + 1)!}{n!} \cdot \dfrac{1}{n - i + 1)}( = \dfrac{(n - i )!}{n!}, which is the formula for i-permutation.

Let’s make it more formalize. Let’s call E_i the event in which the first i – 1 iterations have created the particular (i – 1) permutation(x_o, x_1, x_2,...x_{i - 1} in A[1..i – 1]. By the loop invariant, Pr[E_i] = \dfrac{(n - i + 1)!}{n!}. Let’s denote E_2 be the event that the ith iteration put a particular x_i in position i. Pr[E_2] = \dfrac{1}{n - i + 1}. So if we want to know the the prob of permutation (x_0, x_1,...x_i), we want to know Pr[E_1 \cap E_2].

Pr[E_1 \cap E_2] = Pr[E_2 | E_1] \cdot Pr[E_1].

Advanced random stuff

Talked about brithday paradox There is the classic way then there is a way using indicator variable. It is simpler but an approximation. We define X_{ij}for 1 \leq i < j \leq k, X_{ij}= I{person I and J have the same birthyday} = 1 if they do else 0.

E[X_{ij} = \dfrac{1}{n}

Let X be the variable that counts the number of pairs of individuals having the same birthday,

X = \sum_{i = 1}^{ k} \sum_{j = i + 1}^ k X_{ij}

Taking the expected value of it;

E{X} = E[\sum_{i = 1}^{ k} \sum_{j = i + 1}^ k X_{ij}]

= \sum_{i = 1}^{ k} \sum_{j = i + 1}^ k E[X_ij]

= k choose 2 \dfrac{1}{n} = \dfrac{k(k - 1)}{2n}.

We can think of it like this: The outer loop pick one person who are on the ith position, then the inner loop has ith to kth position to be chosen, effectively making it a k choose 2.

So the expected number of pairs that will have the same birthday will be \dfrac{k(k - 1)}{2n}. We want at least 1 pair so k(k - 1) > 2n, k^2 \geq 2n + 1, k = \geq \sqrt{2n + 1}. We got k = 28.

I should talk about some distribution, Geometric, Bernoulli, Poisson and so on in another post.

#

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax