vector<int> sortArray(vector<int>& nums) {
//insertion sort.
// [sorted | ... > x ... | x | ....]
// after sort
// [smaller than x | x | larger than x| not sorted]
/*
we need to move x to the right place
The right place means everything larger than x
is to x's right (till the sorted boundary)
everything smaller than x is to its left(till the beginning)
In insertion sort, during the kth iteration, the first k + 1th
element are in order (relatively)
So this loop invariant, when k == nums.size() means the whole
array is in order relatively. IF not show contradiction.
*/
int cur_elem_pos = 1;
for(;cur_elem_pos < nums.size() ; cur_elem_pos ++){
int insert_index = cur_elem_pos;
int elem_to_be_insert = nums[cur_elem_pos];
while(insert_index > 0 &&
nums[insert_index - 1] > elem_to_be_insert){
nums[insert_index] = nums[insert_index - 1];
insert_index --;
}
nums[insert_index] = elem_to_be_insert;
}
return nums;
}
Category: Uncategorized
Learning ANTLR 3: More examples are coming
Matching arithmetic expression
An arithmetic grammar for +-x/ and parenthesis:
grammar Expr;
/** The start rule; begin parsing here. */
prog: stat+ ; // a program contains multiple statement
//a statement can either be expr \n or assignment \n or \n
stat: expr NEWLINE
| ID '=' expr NEWLINE
| NEWLINE
;
expr: expr ('*'|'/') expr
| expr ('+'|'-') expr
| INT
| ID
| '(' expr ')'
;
ID : [a-zA-Z]+ ; // match identifiers <label id="code.tour.expr.3"/>
INT : [0-9]+ ; // match integers
NEWLINE:'\r'? '\n' ; // return newlines to parser (is end-statement signal)
WS : [ \t]+ -> skip ; // toss out whitespace
Some highlights
- Grammar consists of rules for synatctic structure and rules for vocabulary symbols(tokens).
- program consisits of multiple statements
- Seperate alternatives of a rule with |
- Group symbols with parenthesis into subrules(‘*’ | ‘/’)
- We can do left recursion(invoke itself at the start of alternative rules.
To test, we run
**$ antlr4 Expr.g4
$ javac Expr*.java
$ grun Expr prog -gui t.expr # launches org.antlr.v4.runtime.misc.TestRig**
We will see the parse tree for input in t.expr and **invoke the parser with rule prog **.
Calling a rule method is like invoking that rule. We can call any parser rule method.
Building a calculator
We need to perform some actions using parse-tree visitor.
To do so wee need to add
1.label the alternative rules so we can have different visitor per rule so we can have different event for different input.
expr: expr op=('*'|'/') expr # MulDiv
| expr op=('+'|'-') expr # AddSub
| INT # int
| ID # id
| '(' expr ')' # parens
;
- We need to define token names for the operator literals so we can reference token names as Java constants in the visitor
MUL:'*';
DIV:'/';
This time, we will use the visitor pattern and ANTLR will generate a method for each labeled alternative name
$ antlr4 -no-listener -visitor LabeledExpr.g4
It will also also generate default visitor implementation and we can subclass it. We will need to override methods associated with statement and expression alternatives.
Some interesting override is:
@Override
public Integer visitMulDiv(LabeledExprParser.MulDivContext ctx) {
int left = visit(ctx.expr(0)); // get value of left subexpression
int right = visit(ctx.expr(1)); // get value of right subexpression
if ( ctx.op.getType() == LabeledExprParser.MUL ) return left * right;
return left / right; // must be DIV
}
Here, when we visit multi or divide, we will first infer the left and right expression. Then we will use ctx.op.getType to determine the type of the op. Then do operation accordingly.
Translation of java Source code
import java.util.List;
import java.util.Map;
public class Demo {
void f(int x, String y) { }
int[ ] g(/*no args*/) { return null; }
List<Map<String, Integer>>[] h() { return null; }
}
We want to translate the source code into interface(just like header definition) while preserving whitespace and common.
<br />interface IDemo {
void f(int x, String y);
int[ ] g(/*no args*/);
List<Map<String, Integer>>[] h();
}
We are given a Java Grammar and make a parse tree. We will derive interface from class name and grab signature(return type, name and arg list).
WE want to know how a class is decleared and how the method is defined.
classDeclaration
: 'class' Identifier typeParameters? ('extends' type)?
('implements' typeList)?
classBody
;
methodDeclaration
: type Identifier formalParameters ('[' ']')* methodDeclarationRest
| 'void' Identifier formalParameters methodDeclarationRest
;
We will use antlr4 to get the visitors
$ antlr4 Java.g4
We do a
ls Java*.java
and got
JavaBaseListener.java JavaListener.java
JavaLexer.java JavaParser.java
We will then extract an extended class from JavaBaseListener, (Java here means a listener for the Java grammar). We need to have 3 strategies:
- print out interface header at the start of a class
- print a terminating } at the end of class definition
- When we see each method definition, print out signature.
@Override
public void enterClassDeclaration(JavaParser.ClassDeclarationContext ctx){
System.out.println("interface I"+ctx.Identifier()+" {");
}
@Override
public void exitClassDeclaration(JavaParser.ClassDeclarationContext ctx) {
System.out.println("}");
}
@Override
public void enterMethodDeclaration(
JavaParser.MethodDeclarationContext ctx
)
{
// need parser to get tokens
TokenStream tokens = parser.getTokenStream();
String type = "void";
if ( ctx.type()!=null ) {
type = tokens.getText(ctx.type());
}
String args = tokens.getText(ctx.formalParameters());
System.out.println("\t"+type+" "+ctx.Identifier()+args+";");
}
We also need to write a main to run the parser and walk the tree with listener.
public class ExtractInterfaceTool {
public static void main(String[] args) throws Exception {
String inputFile = null;
if ( args.length>0 ) inputFile = args[0];
InputStream is = System.in;
if ( inputFile!=null ) {
is = new FileInputStream(inputFile);
}
ANTLRInputStream input = new ANTLRInputStream(is);
JavaLexer lexer = new JavaLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
JavaParser parser = new JavaParser(tokens);
ParseTree tree = parser.compilationUnit(); // parse
ParseTreeWalker walker = new ParseTreeWalker(); // create standard walker
ExtractInterfaceListener extractor = new ExtractInterfaceListener(parser);
walker.walk(extractor, tree); // initiate walk of tree with listener
}
embedding actions in a grammar
For more control, we can embed actions in grammar. These actions will be copied into the recursive-descent parser. actions are code snippets surrounded by curly braces.
Disclaimer :code snippets are from the ANTLR reference. If you found this note to be infrindging upon its copy right, please contact me and I will take it down.
LC 81, search in rotated array 2
The sorted array is rotated once, and we might have duplicate (i.e., [0,0,1,2,2,5,6] might become [2,5,6,0,0,1,2]).
We will still use binary search but this time the duplicate will give us an edge case:
[1,1,1,1,1,1,1,1,1,2,3,4] rotate -> [1,2,3,4,1,1,1,1,1,1,1]
low = 1, high = 1 and mid = 1 and we don’t know which part is sorted so we can either low ++ or high — to edge toward an interval where at least we know [low,mid] or [mid, high] is sorted.
class Solution {
public:
bool search(vector<int>& nums, int target) {
int low = 0;
int high = nums.size() - 1;
while(low <= high){
int mid = low + (high - low) / 2;
if(nums[mid] == target)
return true;
if(nums[low] < nums[mid]){//low-mid are sorted
if(nums[low] <= target && target < nums[mid]){
high = mid - 1;
}
else{
low = mid + 1;
}
}
else if(nums[mid] < nums[low]){ //mid-high are sorted
if(nums[mid] < target && target <= nums[high]){
low = mid + 1;
}
else{
high = mid - 1;
}
}
else{
low ++;
}
}
return false;
}
};
bugs
else if condition
In else if, I first used
else if(nums[mid] < nums[high]
to decide if [mid, high] is sorted but it will fail at test case such as [3,1,1] t = 3.
low = 0 high = 2 mid = 1
we see that nums[low] !< nums[mid] so [low, mid] is not sorted so [mid, high] should be sorted, but our elseif statement is not true because nums[mid] == nums[high]. So we should just change elseif to the compliement of
if(nums[low] < nums[mid])
Which is
else if(nums[low] > nums[mid]
(Not exact complement since equality is already tested)
LC 58. Length of Last Word
Given a string s consists of upper/lower-case alphabets and empty space characters ‘ ‘, return the length of last word in the string.
If the last word does not exist, return 0.
Idea
Keep a counter, lastWordLen, for the length of the last word encounter. It is initialized at 0. We also need a counter to determine the current word len.
Then we iterate through the list, if we see a non-whitespace, we increment the curWordLen by 1.
If we see a whitespace
- if curWordLen is 0, the previous letter is also white space and it won’t constitute a word, so we do nothing
- else, we update lastWordLen and set curWordLen to 0.
Edge case
Empty input
since lastWordLen is 0 and will never be updated, its good.
Single word
The for loop only update lastwordLen when it sees a white space. If the input is single word, we need to check lastWordLen and update.
int lengthOfLastWord(string s) {
int lastWordLen = 0;
int lenCtr = 0;
for (int i = 0; i < s.size(); i ++){
if (s[i] != ' '){
lenCtr += 1;
}
else{
if (lenCtr != 0){
lastWordLen = lenCtr;
}
lenCtr = 0;
}
}
if(lenCtr != 0)
lastWordLen = lenCtr;
return lastWordLen;
}
Recovery system in database
WE will use log and checkpoints to recover
Transaction roll back
If we try to rollback transaction Ti
Scan the log backward
We will try to scan each log record <Ti, Xj, V1, V2> that belongs to Ti.
Then we will write V1(the old value) to Xj.
Write a REDO-only log <Ti, Xj, V1> to the log. This is called the compensating log. ( we don’t need to store both old and new value here since during a undo, because if system crash during recovery, we only need to REDO these logs to finish the UNDO).
Once we found
Recovery after a crash
REDO phase
Scan the log forward from the last checkpoint. and replay all transactions.
This replay include both rollback and uncommitted transaction.
We also need to determine the list of incomplete transactions. They are either in
UNDO Phase
Rollback all transactions in the undo-list by scanning the log backward from the end
If DB finds a transaction Ti in the undo list, it will rollback to old value.
When DB sees a
Undo terminate when we have seen all Ti’s
Fuzzy checkpoint
This will allow updates to start once
If a crash happened before all pages in the modified buffers are written to disk, we might have an incomplete checkpoint.
We will deal with this by storing a fixed position last-checkpoint ptr on disk and we will not update this ptr when we first write
The
LC 39 Combination Sum
Given a set of candidate numbers (candidates) (without duplicates) and a target number (target), find all unique combinations in candidates where the candidate numbers sums to target.
The same repeated number may be chosen from candidates unlimited number of times.
DFS
We basically try to find paths in tree that sum up to target. At the beginning, we pick a number as the root and initialize the total_sum to that number. If total_sum < target, we proceed to “branch off”, recursively calling DFS and adding leaves.
update solution sets
We update the solution set by appending to sol when the target == 0.
return
return is used to terminate a function.
Avoid duplicates
How does duplicates occur
Duplicates occur because when we branch off the tree, we didn’t exclude the elements used in previous path.
To exclude duplicates, when we are making recursive calls at each level, we make sure each function will
code
def __init__(self):
self.all_path = []
def combinationSum(self, candidates, target):
"""
:type candidates: List[int]
:type target: int
:rtype: List[List[int]]
"""
candidates.sort()
self.bfs(candidates, 0, target, [])
return self.all_path
def bfs(self, nums, index, target, path):
if target < 0:
return
if target == 0:
self.all_path.append(path)
return
for i in range(index, len(nums)):
self.bfs(nums, i, target - nums[i], path + [nums[i]])
Exclude duplicates
In the for loop, we append nums[i] to the path. Also, we will let index be i so the paths afterword will only contain nums[i]…nums[end].
This avoids duplicates because any possible combinations that sums to target with nums[0]…nums[i – 1] and nums[i]…nums[end] are already found by the function with index nums[i – 1]. So a start index at i will not add this solution to the path.
41. First Missing Positive
Given an unsorted integer array, find the smallest missing positive integer.
Your algorithm should run in O(n) time and uses constant extra space.
O(n) sorts
Database index and hashing
We want to locate where our records are. We might have to resort to index, just like when we are trying to find a key word, we go to the indice in the back of a book.
Basic concepts:
Indices
Indices are used to locate record; Attributes that are used to loop up are called search key. They don’t have to be Key or super key.
Ordered indices: it is based on sorted order of the records
Hash indices
operations and performance affected by choice of indices
Access type:
We can have specific attribute look up or range look up
Space overhead
Deletion
Insert
Ordered indices
Ordered indices are indices that store indices in sorted order.
Clustering indices
Record in the files might themselves be sorted in some order. A clustering index is an index whose search key also defines the sequential order of the file.
They are also called primary index.
Nonclustering indices
Indices order is different from the sequential order of the records. There is a second list that has pointers to the physical rows.
Index entry
It is a search key value and pointer to one or **more than one ** records that has the specific search key value.
Dense index
Index entry appears for every search key value in the file.
CLRS chpt 11: hashtable
We use hashtable when we only want to insert, search and delete. There are three main stream methood for hashtable: 1. direct address. 2. chainning and 3. open addressing.
Direct addressing.
We directly access the array that contains the pointer to satellite data. It works well when the universe of k is small and we can afford to allocate enough space.(we assume each key is unique). Since everyone has slot already, why not store the data directly?
Exercise for direct addressing
1
A bit vector is simply an array of bits (0s and 1s). A bit vector of length m takes much less space than an array of m pointers. Describe how to use a bit vector to represent a dynamic set of distinct elements with no satellite data. Dictionary operations should run in O(1) time.
Given a bit vector b, it contains 1 in position k if k is in the dynamic set.
2
Suggest how to implement a direct-address table in which the keys of stored elements do not need to be distinct and the elements can have satellite data. All three dictionary operations (INSERT, DELETE, and SEARCH) should run in O(1) time. (Don’t forget that DELETE takes as an argument a pointer to an object to be deleted, not a key.)
The keys are not distinct so we could let T[x.key] points to a doubly linked list and store all the data that has the same key. Insert takes O(1). When we delete, we take pointer to the object as an argument and removing a node from linked list is O(1).
Search will take key as the argument and check if T[k] is null so O(1).
Down side of direct addressing
We need to allocate space for all possible key in the universe and this become infeasible when Universe is large. More importantly the actual keys used might be way smaller than U. So we use hashtable
Hashtable
Hashtable uses hash function to computer key for the table. For an element with key k, we will store it at slot h(k). But multiple key values might be hashed into the same value: h(k1) = h(k2)… when k1 != k2 !=…. a collision has occurred.
What constitutes a good hashing function
A good hashing function will appear to be “random” while being deterministic(else we can’t retrieve).
Load factor
Assume uniform hashing, where any given element is equally likely to be hashed into any of the m slot. n = number of elements in the dictionary. m = number of slot in the table. n/m is the load factor, the average length of linked list pointed by the table.
Chaining
Collision occurs when two element’s key got hash to the same value. Then we can do Chaining; we will store the elements with the same h(k) into the same linked list.
Run time of Insert, search and delete
Insert(T,x)
we will insert at the front of the doubly linked list pointed by h(x.key). O(1) time
Search(T,k) k is the key value.
Go to h(k) first and traverse the linked list until we either find k or fail to find it. It takes O(n/m) to search. (1 if we fail, we need to traverse the whole list, O(n/m)
(2 if we the key is in the dictionary…well we need to take the average of 1 plus the expected number of elements added to x’s list after x was added to the list(because we have to traverse through all the element before x, who are added after x
Where P(h(k_i)) == h(k_j)) = 1/m
\dfrac{n}{m}
\dfrac{n}{m}
h(k) = k mod m
h(k) = \floor{m (kA mod1)}
h : U x {1,2,3…m – 1} = {1,2,3… m – 1}
<(h(k,0), h(k,1)….h(k, m – 1)>
h(k) = (h'(k) + 1) mod m)
h(k) = (h'(k) + c_1 i + c_2 i^2) mod m
c_1, c_2, m
h(k) = h_1(k) + i h_2(k) mod m
h_1(k)
(h_2(k) + i) mod m
So the probe sequence depends on two hashing function is not dictated by the initial starting point.
Perfect hashing
Two level hashing using universal hashing at each level.