# Randomized?

A randomized algorithm is an algorithm whose behaviors are not only determined by inputs but also by a random number generator.

# Hiring problem

You are the HR of a company and you need to recruit assistants. You will fire the current one if you find a better one. To do so, you need to get help from recruiting company who will charge for each interview and if that person is hired.

The cost of this problem highly depends on the order of the interview. The best case is to interview the best at the very front so the total cost would be where n is the number of interviewee. The worst case is when the ability of the candidates are in strictly ascending order. So you have to hire and fire every one except the last guy, yielding a cost of .

We have to find the **Average running time**, basically averaging the running time over all possible input. But to do so, we need knowledge, or **ASSMUPTION** about the distribution of the input.

## Prereq for analyze the problem

For hiring problem, we can assume the interviewee came in random order. There are n candidates so there are ways to order them(permutation).

if A occurs otherwise = 0.

EG: We define to be the indicator variable of a coin toss. and if head, else .

(1)

We can guess the expected value of a indicator variable of A is the probability of A.

Lemma 1:Given a sample space S and an event A in the sample space S, let . Then Pf

(2)

## Analyze the hiring problem

Define to be the number of hiring made in the process, we want to find . is interviewing n people and it is consists of , which are indicator variable about whether the th person is hired or not. and

, the expected value of the i-th person being hired means that the i th person must be better than all people ahead of him. For a random distribution, the chance of the i-th person being the best is . So .

(3)

So the average hiring cost will be , which is significantly better than .

# Randomized algorithms

For the hiring problem, we assume that our input follows uniform distribution. But sometimes we don’t have that knowledge. Another fact about our previous algorithm is that for a particular input, the time cost will always be the same. In other word, the algorithm is deterministic and the randomness rests in the input’s distribution.

But we want to know the cost even if we don’t know the distribution. We can **impose ** a distribution on the input. In the hiring problem, we permute the input. *No particular input elicits its worst-case behavior*.

The only thing to change

```
randomize_hire(n)
randomly permute the order of the candidate
best = 0
for i = 1 to n
interview candidate i
if i.value > best
best = i.value
hire i
```

To differentiate this from our previous notation, we call the cost of this algorithm expected cost, instead of average cost.

## Randomly permuting arrays

One way of permuting the array is to give each entry a random priority and sort it base on the priority.

```
Permute_by_sorting(A)
n = A.length
P = [1..n] //randomly create an array
for i = 1 to n
P[i] = Random(1, n**3)
sort A, using P as a sort key
```

We know sorting is O(nlgn).

We need to prove now that **the array is uniform random permutation**, it is likely to produce every permutation of 1 through n.

Lemma 4: Our procedure produce a uniform random permutation of inputs, assuming all priorities are distinct.

pf: we want to show that the probability of any permutation showing up is . For any array (1,2,…n), we generate permutation (perm(1), perm(2)… perm(n)), let be the event that 1 receives the perm(i)th smallest priority then we want to find the probability that all i, event occurs:

(4)

We know this probability joint probability will give us

# Quick Math fact

## Sum of expectation

Linearity of expectation is the property that the expected value of the sum of random variables is equal to the sum of their individual expected values, regardless of whether they are independent.

A better way to do this is to generate the array in place.

```
generate_in_place(A)
n = A.length
for i = 1 to n
A[i] = randomly pick from A[i...n]
```

We will use a loop invariant to show that at the end, the algorithm will produce a uniform random permutation.

Before we start, it is good to know what is a k-permutation. On a set n, a k-perm is a sequence containing k of the n elements. There are such possible k-permutations. And we know n-permutation will be n!

Lemma 5: randomize_in_place will give a uniform random permutation pf:

- Initialization: at the beginning, for i = 1, we know A[1] will be picked out of A[1]. So we have 1-permutation and the n possibility.
- Maintenance: assume at just before ith iteration and our assumption that for th, A[i – 1] follows a uniform distribution and each possible permutation will appear in A[0..i -1] with possibility . Now adding one more element to the subarray, we got A[0..i] and there are possible element to pick for A[i], so the probability for any permutation for subarray A[1..i] is , which is the formula for i-permutation.

Let’s make it more formalize. Let’s call the event in which the first i – 1 iterations have created the *particular* (i – 1) permutation in A[1..i – 1]. By the loop invariant, . Let’s denote be the event that the ith iteration put a particular in position i. . So if we want to know the the prob of permutation , we want to know .

.

# Advanced random stuff

Talked about brithday paradox There is the classic way then there is a way using indicator variable. It is simpler but an approximation. We define for , = I{person I and J have the same birthyday} = 1 if they do else 0.

Let X be the variable that counts the number of pairs of individuals having the same birthday,

Taking the expected value of it;

= k choose 2 .

We can think of it like this: The outer loop pick one person who are on the ith position, then the inner loop has ith to kth position to be chosen, effectively making it a k choose 2.

So the expected number of pairs that will have the same birthday will be . We want at least 1 pair so . We got k = 28.

I should talk about some distribution, Geometric, Bernoulli, Poisson and so on in another post.

#