## Insertion sort

vector<int> sortArray(vector<int>& nums) {
//insertion sort.
// [sorted | ... > x ... | x | ....]
// after sort
// [smaller than x | x | larger than x|  not sorted]
/*
we need to move x to the right place
The right place means everything larger than x
is to x's right (till the sorted boundary)
everything smaller than x is to its left(till the beginning)

In insertion sort, during the kth iteration, the first k + 1th
element are in order (relatively)
So this loop invariant, when k == nums.size() means the whole
array is in order relatively. IF not show contradiction.
*/

int cur_elem_pos = 1;
for(;cur_elem_pos < nums.size()  ; cur_elem_pos ++){
int insert_index = cur_elem_pos;
int elem_to_be_insert = nums[cur_elem_pos];
while(insert_index > 0 &&
nums[insert_index - 1] > elem_to_be_insert){
nums[insert_index] = nums[insert_index - 1];
insert_index --;
}
nums[insert_index] = elem_to_be_insert;

}

return nums;

}


# Matching arithmetic expression

An arithmetic grammar for +-x/ and parenthesis:

grammar Expr;

/** The start rule; begin parsing here. */
prog:   stat+ ;         // a program contains multiple statement
//a statement can either be expr \n or assignment \n or \n
stat:   expr NEWLINE
|   ID '=' expr NEWLINE
|   NEWLINE
;

expr:   expr ('*'|'/') expr
|   expr ('+'|'-') expr
|   INT
|   ID
|   '(' expr ')'
;

ID  :   [a-zA-Z]+ ;      // match identifiers <label id="code.tour.expr.3"/>
INT :   [0-9]+ ;         // match integers
NEWLINE:'\r'? '\n' ;     // return newlines to parser (is end-statement signal)
WS  :   [ \t]+ -> skip ; // toss out whitespace



Some highlights

1. Grammar consists of rules for synatctic structure and rules for vocabulary symbols(tokens).
2. program consisits of multiple statements
3. Seperate alternatives of a rule with |
4. Group symbols with parenthesis into subrules(‘*’ | ‘/’)
5. We can do left recursion(invoke itself at the start of alternative rules.

To test, we run

**$antlr4 Expr.g4$ javac Expr*.java
$grun Expr prog -gui t.expr # launches org.antlr.v4.runtime.misc.TestRig**  We will see the parse tree for input in t.expr and **invoke the parser with rule prog **. Calling a rule method is like invoking that rule. We can call any parser rule method. # Building a calculator We need to perform some actions using parse-tree visitor. To do so wee need to add 1.label the alternative rules so we can have different visitor per rule so we can have different event for different input. expr: expr op=('*'|'/') expr # MulDiv | expr op=('+'|'-') expr # AddSub | INT # int | ID # id | '(' expr ')' # parens ;  1. We need to define token names for the operator literals so we can reference token names as Java constants in the visitor MUL:'*'; DIV:'/';  This time, we will use the visitor pattern and ANTLR will generate a method for each labeled alternative name $ antlr4 -no-listener -visitor LabeledExpr.g4


It will also also generate default visitor implementation and we can subclass it. We will need to override methods associated with statement and expression alternatives.

Some interesting override is:

    @Override
public Integer visitMulDiv(LabeledExprParser.MulDivContext ctx) {
int left = visit(ctx.expr(0));  // get value of left subexpression
int right = visit(ctx.expr(1)); // get value of right subexpression
if ( ctx.op.getType() == LabeledExprParser.MUL ) return left * right;
return left / right; // must be DIV
}



Here, when we visit multi or divide, we will first infer the left and right expression. Then we will use ctx.op.getType to determine the type of the op. Then do operation accordingly.

# Translation of java Source code

import java.util.List;
import java.util.Map;
public class Demo {
void f(int x, String y) { }
int[ ] g(/*no args*/) { return null; }
List<Map<String, Integer>>[] h() { return null; }
}


We want to translate the source code into interface(just like header definition) while preserving whitespace and common.

<br />interface IDemo {
void f(int x, String y);
int[ ] g(/*no args*/);
List<Map<String, Integer>>[] h();
}



We are given a Java Grammar and make a parse tree. We will derive interface from class name and grab signature(return type, name and arg list).

WE want to know how a class is decleared and how the method is defined.

classDeclaration
:   'class' Identifier typeParameters? ('extends' type)?
('implements' typeList)?
classBody
;

methodDeclaration
:   type Identifier formalParameters ('[' ']')* methodDeclarationRest
|   'void' Identifier formalParameters methodDeclarationRest
;


We will use antlr4 to get the visitors

\$ antlr4 Java.g4


We do a

ls Java*.java


and got

 JavaBaseListener.java JavaListener.java
JavaLexer.java JavaParser.java


We will then extract an extended class from JavaBaseListener, (Java here means a listener for the Java grammar). We need to have 3 strategies:

1. print out interface header at the start of a class
2. print a terminating } at the end of class definition
3. When we see each method definition, print out signature.
    @Override
public void enterClassDeclaration(JavaParser.ClassDeclarationContext ctx){
System.out.println("interface I"+ctx.Identifier()+" {");
}

@Override
public void exitClassDeclaration(JavaParser.ClassDeclarationContext ctx) {
System.out.println("}");
}

@Override
public void enterMethodDeclaration(
JavaParser.MethodDeclarationContext ctx
)
{
// need parser to get tokens
TokenStream tokens = parser.getTokenStream();
String type = "void";
if ( ctx.type()!=null ) {
type = tokens.getText(ctx.type());
}
String args = tokens.getText(ctx.formalParameters());
System.out.println("\t"+type+" "+ctx.Identifier()+args+";");
}


We also need to write a main to run the parser and walk the tree with listener.

public class ExtractInterfaceTool {
public static void main(String[] args) throws Exception {
String inputFile = null;
if ( args.length>0 ) inputFile = args;
InputStream is = System.in;
if ( inputFile!=null ) {
is = new FileInputStream(inputFile);
}
ANTLRInputStream input = new ANTLRInputStream(is);

JavaLexer lexer = new JavaLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
JavaParser parser = new JavaParser(tokens);
ParseTree tree = parser.compilationUnit(); // parse

ParseTreeWalker walker = new ParseTreeWalker(); // create standard walker
ExtractInterfaceListener extractor = new ExtractInterfaceListener(parser);
walker.walk(extractor, tree); // initiate walk of tree with listener
}


# embedding actions in a grammar

For more control, we can embed actions in grammar. These actions will be copied into the recursive-descent parser. actions are code snippets surrounded by curly braces.

Disclaimer :code snippets are from the ANTLR reference. If you found this note to be infrindging upon its copy right, please contact me and I will take it down.

## LC 81, search in rotated array 2

The sorted array is rotated once, and we might have duplicate (i.e., [0,0,1,2,2,5,6] might become [2,5,6,0,0,1,2]).

We will still use binary search but this time the duplicate will give us an edge case:
[1,1,1,1,1,1,1,1,1,2,3,4] rotate -> [1,2,3,4,1,1,1,1,1,1,1]
low = 1, high = 1 and mid = 1 and we don’t know which part is sorted so we can either low ++ or high — to edge toward an interval where at least we know [low,mid] or [mid, high] is sorted.

class Solution {
public:
bool search(vector<int>& nums, int target) {
int low = 0;
int high = nums.size() - 1;
while(low <= high){
int mid = low + (high - low) / 2;
if(nums[mid] == target)
return true;
if(nums[low] < nums[mid]){//low-mid are sorted
if(nums[low] <= target && target < nums[mid]){
high = mid - 1;
}
else{
low = mid + 1;
}

}
else if(nums[mid] < nums[low]){  //mid-high are sorted
if(nums[mid] < target && target <= nums[high]){
low = mid + 1;
}
else{
high = mid - 1;
}
}

else{
low ++;
}

}
return false;
}
};


# bugs

## else if condition

In else if, I first used

else if(nums[mid] < nums[high]


to decide if [mid, high] is sorted but it will fail at test case such as [3,1,1] t = 3.
low = 0 high = 2 mid = 1

we see that nums[low] !< nums[mid] so [low, mid] is not sorted so [mid, high] should be sorted, but our elseif statement is not true because nums[mid] == nums[high]. So we should just change elseif to the compliement of

if(nums[low] < nums[mid])


Which is

else if(nums[low] > nums[mid]


(Not exact complement since equality is already tested)

## LC 58. Length of Last Word

Given a string s consists of upper/lower-case alphabets and empty space characters ‘ ‘, return the length of last word in the string.

If the last word does not exist, return 0.

# Idea

Keep a counter, lastWordLen, for the length of the last word encounter. It is initialized at 0. We also need a counter to determine the current word len.

Then we iterate through the list, if we see a non-whitespace, we increment the curWordLen by 1.

If we see a whitespace

1. if curWordLen is 0, the previous letter is also white space and it won’t constitute a word, so we do nothing
2. else, we update lastWordLen and set curWordLen to 0.

# Edge case

## Empty input

since lastWordLen is 0 and will never be updated, its good.

## Single word

The for loop only update lastwordLen when it sees a white space. If the input is single word, we need to check lastWordLen and update.

    int lengthOfLastWord(string s) {
int lastWordLen = 0;
int lenCtr = 0;
for (int i = 0; i < s.size(); i ++){
if (s[i] != ' '){
lenCtr += 1;
}
else{
if (lenCtr != 0){
lastWordLen = lenCtr;
}

lenCtr = 0;
}
}
if(lenCtr != 0)
lastWordLen = lenCtr;
return lastWordLen;

}


## Recovery system in database

WE will use log and checkpoints to recover

# Transaction roll back

If we try to rollback transaction Ti

## Scan the log backward

We will try to scan each log record <Ti, Xj, V1, V2> that belongs to Ti.

Then we will write V1(the old value) to Xj.

Write a REDO-only log <Ti, Xj, V1> to the log. This is called the compensating log. ( we don’t need to store both old and new value here since during a undo, because if system crash during recovery, we only need to REDO these logs to finish the UNDO).

Once we found in our backward scan, we stop and add a in log.

# Recovery after a crash

## REDO phase

Scan the log forward from the last checkpoint. and replay all transactions.
This replay include both rollback and uncommitted transaction.

We also need to determine the list of incomplete transactions. They are either in ( active transactions during checkpoint ) or start after the checkpoint and didn’t have neither commit nor abort. This list is called undo-list and will be used to rollback during UNDO.

## UNDO Phase

Rollback all transactions in the undo-list by scanning the log backward from the end

If DB finds a transaction Ti in the undo list, it will rollback to old value.

When DB sees a and Ti is in the undo list, that means we have undone all Ti’s change and can write a to the log to signified that this transaction is complete.

Undo terminate when we have seen all Ti’s

# Fuzzy checkpoint

This will allow updates to start once has been written, but before the modified buffers are written to disk.

If a crash happened before all pages in the modified buffers are written to disk, we might have an incomplete checkpoint.
We will deal with this by storing a fixed position last-checkpoint ptr on disk and we will not update this ptr when we first write .

The will create a list of modified buffer blocks and last-checkpoint will only be updated once all modified blocks are written to disk.

## LC 39 Combination Sum

Given a set of candidate numbers (candidates) (without duplicates) and a target number (target), find all unique combinations in candidates where the candidate numbers sums to target.

The same repeated number may be chosen from candidates unlimited number of times.

# DFS

We basically try to find paths in tree that sum up to target. At the beginning, we pick a number as the root and initialize the total_sum to that number. If total_sum < target, we proceed to “branch off”, recursively calling DFS and adding leaves.

## update solution sets

We update the solution set by appending to sol when the target == 0.

## return

return is used to terminate a function.

## Avoid duplicates

### How does duplicates occur

Duplicates occur because when we branch off the tree, we didn’t exclude the elements used in previous path.
To exclude duplicates, when we are making recursive calls at each level, we make sure each function will

# code

    def __init__(self):
self.all_path = []
def combinationSum(self, candidates, target):
"""
:type candidates: List[int]
:type target: int
:rtype: List[List[int]]
"""
candidates.sort()
self.bfs(candidates, 0, target, [])
return self.all_path

def bfs(self, nums, index, target, path):
if target < 0:
return
if target == 0:
self.all_path.append(path)
return
for i in range(index, len(nums)):
self.bfs(nums, i, target - nums[i], path + [nums[i]])


## Exclude duplicates

In the for loop, we append nums[i] to the path. Also, we will let index be i so the paths afterword will only contain nums[i]…nums[end].
This avoids duplicates because any possible combinations that sums to target with nums…nums[i – 1] and nums[i]…nums[end] are already found by the function with index nums[i – 1]. So a start index at i will not add this solution to the path.

## 41. First Missing Positive

Given an unsorted integer array, find the smallest missing positive integer.

Your algorithm should run in O(n) time and uses constant extra space.

# O(n) sorts

## Database index and hashing

We want to locate where our records are. We might have to resort to index, just like when we are trying to find a key word, we go to the indice in the back of a book.

# Basic concepts:

## Indices

Indices are used to locate record; Attributes that are used to loop up are called search key. They don’t have to be Key or super key.

## operations and performance affected by choice of indices

### Access type:

We can have specific attribute look up or range look up

# Ordered indices

Ordered indices are indices that store indices in sorted order.

## Clustering indices

Record in the files might themselves be sorted in some order. A clustering index is an index whose search key also defines the sequential order of the file.
They are also called primary index.

## Nonclustering indices

Indices order is different from the sequential order of the records. There is a second list that has pointers to the physical rows.

# Index entry

It is a search key value and pointer to one or **more than one ** records that has the specific search key value.

## Dense index

Index entry appears for every search key value in the file.

## CLRS chpt 11: hashtable

We use hashtable when we only want to insert, search and delete. There are three main stream methood for hashtable: 1. direct address. 2. chainning and 3. open addressing.

We directly access the array that contains the pointer to satellite data. It works well when the universe of k is small and we can afford to allocate enough space.(we assume each key is unique). Since everyone has slot already, why not store the data directly?

### 1

A bit vector is simply an array of bits (0s and 1s). A bit vector of length m takes much less space than an array of m pointers. Describe how to use a bit vector to represent a dynamic set of distinct elements with no satellite data. Dictionary operations should run in O(1) time.

Given a bit vector b, it contains 1 in position k if k is in the dynamic set.

### 2

Suggest how to implement a direct-address table in which the keys of stored elements do not need to be distinct and the elements can have satellite data. All three dictionary operations (INSERT, DELETE, and SEARCH) should run in O(1) time. (Don’t forget that DELETE takes as an argument a pointer to an object to be deleted, not a key.)

The keys are not distinct so we could let T[x.key] points to a doubly linked list and store all the data that has the same key. Insert takes O(1). When we delete, we take pointer to the object as an argument and removing a node from linked list is O(1).
Search will take key as the argument and check if T[k] is null so O(1).

## Down side of direct addressing

We need to allocate space for all possible key in the universe and this become infeasible when Universe is large. More importantly the actual keys used might be way smaller than U. So we use hashtable

# Hashtable

Hashtable uses hash function to computer key for the table. For an element with key k, we will store it at slot h(k). But multiple key values might be hashed into the same value: h(k1) = h(k2)… when k1 != k2 !=…. a collision has occurred.

## What constitutes a good hashing function

A good hashing function will appear to be “random” while being deterministic(else we can’t retrieve).

Assume uniform hashing, where any given element is equally likely to be hashed into any of the m slot. n = number of elements in the dictionary. m = number of slot in the table. n/m is the load factor, the average length of linked list pointed by the table.

## Chaining

Collision occurs when two element’s key got hash to the same value. Then we can do Chaining; we will store the elements with the same h(k) into the same linked list.

### Run time of Insert, search and delete

#### Insert(T,x)

we will insert at the front of the doubly linked list pointed by h(x.key). O(1) time

#### Search(T,k) k is the key value.

Go to h(k) first and traverse the linked list until we either find k or fail to find it. It takes O(n/m) to search. (1 if we fail, we need to traverse the whole list, O(n/m)
(2 if we the key is in the dictionary…well we need to take the average of 1 plus the expected number of elements added to x’s list after x was added to the list(because we have to traverse through all the element before x, who are added after x Where P(h(k_i)) == h(k_j)) = 1/m \dfrac{n}{m} \dfrac{n}{m} h(k) = k mod m h(k) = \floor{m (kA mod1)} h : U x {1,2,3…m – 1} = {1,2,3… m – 1} <(h(k,0), h(k,1)….h(k, m – 1)> h(k) = (h'(k) + 1) mod m) h(k) = (h'(k) + c_1 i + c_2 i^2) mod m c_1, c_2, m h(k) = h_1(k) + i h_2(k) mod m h_1(k) (h_2(k) + i) mod m
So the probe sequence depends on two hashing function is not dictated by the initial starting point.

# Perfect hashing

Two level hashing using universal hashing at each level.