Divide & conquer 2: binary search

class Solution {
    int search(vector<int>& nums, int target) {
        return bin_search(nums, 0, nums.size() - 1, target);
    int bin_search(vector<int>& nums, int l, int h, int target){
        if(l > h) return -1;
        int mid = (l + h)  / 2;
        if(nums[mid] == target) return mid;
        if(nums[mid] > target){
            // target in [l: mid - 1]
            return bin_search(nums, l, mid - 1, target);
        return bin_search(nums, mid + 1, h, target);


One thing that helps with this recursive problem is to set the boundary of input for the algorithm.
For binary search, a bin_search(nums, int l, int h, int target), specify that this will search [l, h] for target. So for the subproblems, you will know that it will search between [l, mid – 1] and [mid + 1, h] with bin_search.

Merge sort

It uses divide and conquer and use O(n) extra memory to achieve the O(nlogn) run time

class Solution { public: vector<int> sortArray(vector<int>& nums) { int l = 0, r = nums.size() -1; vector<int> temp(r-l+1); mergeSort(nums, temp, l, r); return nums; } void mergeSort(vector<int>& A, vector<int>& temp, int l, int r){ if(l >= r) return; int m = (l+r)/2; mergeSort(A, temp, l, m); mergeSort(A, temp, m+1, r); merge(A, temp, l, m, r); } void merge(vector<int>& A, vector<int>& temp, int l, int m, int r){ int p1 = l, p2 = m+1, p_temp = 0; while(p1 <= m && p2 <= r){ if(A[p1] <= A[p2]) temp[p_temp++] = A[p1++]; else temp[p_temp++] = A[p2++]; } while(p1 < m+1) temp[p_temp++] = A[p1++]; while(p2 < r+1) temp[p_temp++] = A[p2++]; for(int i = 0; i < r-l+1; i++){ A[l+i] = temp[i]; } } };

The function MergeSort(nums, temp, start, end) will sort an array that starts at index start and whose last element is at index end.
So the subsequent subproblem will sort [start, middle], and [middle + 1, end].

Merge will mergetwo sorted subarrays A[l, m] and A[m + 1, r].

NOTICE, we uses index, not size in the while in merge that’s why we have p1 <= m.
If we use size, notice that an array that starts from index l and ends on index m will have size m – l + 1 and since p1 starts at l, it needs to go forward m steps to reach the last element.

Network programming

Client-server model

In this model, an application consists of one server and multiple client. Server manages resources and provide services to client. 1. Client request services 2. Server manipulate resources to satisfy the request 3. Server sends a response to the client


A network is a hierarchal system of wires and boxes that’s organized by geography proximity. EG: LAN, WAN.


Lowest Level is LAN. It is consisted of a bunch of hosts connected by wires to a hub. Each host has a 48 bits address. (MAC address) EG:00:16:ea:e3:54:e6.

Host sends each other via frame. Broadcast, so everybody sees every bits.

Next level: internet(lowercase)

internet stands for inter-connected network that connects incompatible LANs together.

internets is usuallyy ad hoc so there is no set topology and we send data via hops across routers.


Protocols are a set of rules that govern how hosts and routers should communicate. It smoothes out differences between network

Internet Protocols(IP) does: 1. Provide naming scheme such that each host and router can be identified by a unique address. The uniform format is called host address.

  1. Provide deliver mechanisms: Provide the standard unit of transmission called packet that’s consisted of header and payload. Header consists of the packet size, source host address and so on.

The Internet

The Internet can be thought of as a world collection of hosts with the following properities:

  1. Hosts are mapped to 32 bits address.
  2. IP address is mapped to a set of identifier called domain name.
  3. A process on the Internet can communicate with another process via connection.

IP address

IP addresses are stored in IP address struct. The struct contains 1 unsigned int

 /* Internet address structure */
struct in_addr {
    uint32_t  s_addr; /* network byte order (big-endian) */


We can convert the hex to human readable dotted decimal form:

0x8002C2F2 = 128.2.194,242

Domain name

The Internet maintains a mapping between IP addresses to domain names called DNS domain name server.

The mapping is not one-to-one. A domain name can map to 1 address, multiple addresses or no address.

The node of the tree represent domain names formed by the path to the root.

Internet connection

Client and server communicates by sending streams of byte over a connection. Each connection is full duplex and point-to-point

A socket is an endpoint of a connection.

A port is a 16 bits that identifies a process: 1. ephemeral port: ports automatically assigned by kernel when client makes a request. 2. Well-known port : Associated with some service provided by the server. EG port 80 for http

Each connection is defined by a socket pair

(client ip: port, server ip: port)

# Socket 

For Linux Kernel, socket is an endpoint of communication. For program, it is just a file with a file descriptor.

A socket address is a struct that contain info about the ip address and port. We have to use a generic socket struct then cast it to different socket address struct since when socket was designed, we can't use generic pointer `void*`.

<img src="http://kedizheng.com/wp-content/uploads/2019/08/Screen-Shot-2019-08-19-at-4.20.59-PM.png" alt="" width="1424" height="490" class="alignnone size-full wp-image-957" />

<img src="http://kedizheng.com/wp-content/uploads/2019/08/Screen-Shot-2019-08-19-at-4.21.51-PM.png" alt="" width="1416" height="616" class="alignnone size-full wp-image-959" />

Above is an IPV4 socket address. For functions to use socket address correctly, we need to first cast `struct sockaddr_in*` to `struct sockaddr *`

## Convert from string representation of sockaddr to struct
we use `getaddrinfo` to do so.

int getaddrinfo(const char *host,            /* Hostname or address */
                const char *service,         /* Port or service name */
                const struct addrinfo *hints,/* Input parameters */
                struct addrinfo **result);   /* Output linked list */

Given host and port number, returns result that points to a linked list of addrinfo structs. In each addrinfo contains a pointer to sockaddr that can be used to bind, accept and listen.

In hints, we specify more control about the connection. Such as IP type and socket types.

The reason getddrinfo returns linked list of addresses structure is because the default behavior is to return 3 structs for 3 different types of connection. 1 for connection, 1 for raw socket and 1 for data gram.

Also a host might have multiple IP address.

It is a linked list because an address for a host might have multiple sockets.

EG: Twitter might have multiple sockets listening?

We can get the host info from struct with

int getnameinfo(const SA *sa, socklen_t salen, /* In: socket addr */
char *host, size_t hostlen,/* Out: host */
char *serv, size_t servlen,/* Out: service */
int flags); /* optional flags */

Socket interface

Creating socket

int socket(int domain, int type, int protocol);

Both client and server uses this function to create socket. EG:

int sockfd = socket(AF_INET, SOCK_STREAM, 0);

AF_INET means we use IPV4 and SOCK_STREAM means this is an endpoint of a connection.

Better to use getaddrinfo() to get info for socket since it will be protocol independent.

 struct addrinfo *p = ...;
int clientfd = socket(p->ai_family, p->ai_socktype, p->ai_protocol);

Connect (by client)

A client establishes connection via

int connect(int clientfd, SA *addr, socklen_t addrlen);

It will block until a connection is successfully established. If successful, we can now read/write via clientfd. The connection is characterized by a pair (x:y,addr.sin_addr:addr.sin_port).


int bind(int sockfd, SA *addr, socklen_t addrlen);

Where typdef struct sockaddr SA

It will ask the kernel to associate the socket address struct with the socket file descriptor.

After bind, process can read and write from sockfd.


By default, program will assume a socket opened by socket(...) is an active socket from the client end. Use listen will turn the socket from active socket * to *listening socket.

 int listen(int sockfd, int backlog);

backlog is an hint for the server about how many requests can be queued before rejecting further requests.


int accept(int listenfd, SA *addr, int *addrlen);

accept(...) will wait for connection request bounded to socket at listenfd, then fill in the client address in addr and the size of socket address length at addrlen.

If successful, it will return a connected descriptor to communicate with the client via UNIX IO.

Connected fd is different from listening socket. A listen socket is created once and will listen for Connection requests. A connected socket is created for the connection and exists only as long as it takes for the server to service the client.

We have such distinct so we can create concurrent server to connect with many client. EG: when a new request comes, we fork a child to handle.

Web server

System level I/O with Unix/Linux


A linux file is a series of m bytes /[b_0, b_1…b_{m – 1} ]

All I/O devices are such as terminal, Network, printers are modeled as file.

Openning a file

An application opens a file by request a file from Kernel. The Kernel returns a small non-negative integer called File descriptor. The kernel keeps info about the file, and the app only keeps track of the descriptor.

Change file position

Kernel keeps a current position k for each file and it starts at 0. App can explicitly change it via sleek.

Read/write file

Read a file copies n bytes from the file to memory starting at k. It will then increment the file position k by n.

If the file size is m and m < n then Kernel will trigger a condition called EOF. There is no explicit EOF character associated with the file though…

Writing is similar. It will write n bytes to the file starting at k and increment the current position k by n.

File type

Regular file

contains arbitrary data

Directory file

A file contains multiple links to other file, mapping file names to files, including directory files. Each directory files contain 2 files at least. The first is “.”, which points to itself. The other is “..”, which points to the parent.


File used to communicate with other process across networks.

Open and close a file

Openning a file

Open will takes a filename and flags and user mode and return the smallest int that’s not used in by the current processor as the file descriptor. For example, the first file a process open will has fd == 3. (0 is stdin, 1 is stdout and 2 is stderr).

int open(char *filename, int flags, mode_t mode);

flags indicate how to access the file. The flag can also be ored to create bit mask that gives additional info. We can do

open('temp.txt, O_RDONLY | O_TRUNC, 0);

Close file

int close(int fd);

Will close the file associated with the given fd. Closing a closed file is error.

Always check return codes, even for seemingly benign functions such as close()

Read/Write file

ssize_t read(int fd, void* buf, size_t n);

will read at most n byte from fd starting at the current position and store it into buf. A return value of -1 indicates Error. 0 indicates EOF. Else the return value indicates the bytes read.

ssize_t write(int fd, const void* buf, size_t n);

copy at most n bytes from buf to fd at the current position.

Short Count

For both read and write, the return value might be smaller than n. This is called Short count. It can happen due to three things: 1. Encounter EOF 2. Reading from text line in terminal. If we are reading from a terminal, no matter how big the n is, read will just read a text line and return the length of the text. 3. Reading from socket.

Standard I/O

C standard library libc.h has high level file functions such as fopen, fclose, and so on. Open and close file: fopen, fclose Read and write bytes; fread, fwrite Read and write text line fget, fput Format read write fprintf, fscanf

standard IO models file as stream, abstraction for file descriptor and a buffer.

Applications often read/write one byte at a time and doing read, write is expensive because it requires kernel call. So we use buffered IO that put contents into a buffer first.

EG: stdout uses buffer. It will flush when we call fflush(stdout) or when the function exits.

Most of the time, we would prefer standard IO but it has problem with: 1. Can not access meta data 2. Not async-data safe (not safe for signal handling) 3. Not suitable for network socket 4. Restrictions

Standard IO uses full duplex so we can use the stream both to read and write but the buffer causes problem. More on standard IO restriction: 1. Can’t use input right after output because of buffer. If we want to use it, we must flush it with fflush 2. An output can’t follow input function without calling fseek or rewind. It causes problem for network since we can’t fseek on a socket.

We can open two streams one for read and one for write but we need to close them

FILE* fpin, *fpout;
fpin = fopen(socked, "r");
fpout = fopen(socked, "w");

We need to close both streams to release the resource. But since both streams have the same underlying socket, the second close will cause problem.

When to use Unix IO

  1. Inside signal handler
  2. When performance matter a lot

What not to use for binary file

Text-oriented IO

such as fget

string function

Such as strlen because ‘\0’ will be interpreted as null terminator

How kernel represent open files

To represent files in Unix kernel, we need three tables:

Descriptor table: Each process keeps a descriptor table where each entry is an open file. Each entry points to an entry in the file table.

File table: File table keeps track of the current position offset of each file, and a reference count of the file and a pointer to entry to v-node table. Closing a file will decrement the reference count of the associated file table. File table is shared across all processes

v-node table: Keeps stat info about that table.

Two file descriptor represents two different files.

Two file descriptors that point to the same file. EG, do two open on the same file name.

Forking will let children inherit files opened by parents
Before forking:

After forking:

We also increase the reference count by 1 for each file.

IO redirection

Linux IO allows user to redirect standard output to file on disk

$ ls > foo.txt

will redirect the output of ls to foo.txt.

It achieves this by using dup2(int oldfd, int newfd);
For each process, dup2 will copy oldfd’s table entry to that of the new file descriptor.

EG: dup2(4,1) will copy the pointer pointed to by file descriptor 4 to the entry of file descriptor 1 in the file descriptor table

if newfd is open, then we will close it first(and decrement the refcount).

Issues with Concurrency

Thread (un)safety

4 types of thread-unsafe functions

Access unprotected shared data

Easy to fix, add lock

Function that keeps state across multiple invocation.

Such as rand. solution: stop using state via static variable. Only solution is to change the function such that it doesn’t use static and provide that value via variable.

function that return pointer to static

function that use thread-unsafe function

Thread reentrant

Thread-reentrant is thread function that doesn’t rely on any shared data.
Not the same as thread safe. A function can be thread-safe but not reentrant.


Race condition occurs when the correctness of the program depends on the order of execution.


A pair (or multiple) threads that are blocked because they need resources from other threads.
For mutex, we can prevent it by locking it in one order and unlocking it in reverse order.


Semaphore is a concept invented by Dutch Compute Scientist Dijkstra.
It is a global counter that can only be changed by 2 atomic operations: P(s) and V(s).
P will decrement s if s > 0 and return immediately but would suspend the thread if s = 0.
Once s > 0 due to V operation, the thread is restarted and decrement s and return control.
P stands for proberen( to test) in Dutch.

V will increment s by one. If there are threads waiting for s to become non-zero, V will restart exactly one thread that will complete and decrement s.

V stands for verhogen(to increment) in Dutch.

Binary semephore(where the max value of s is 1) is called lock. P will lock and V will unlock.

Counting Semaphore

Semaphore can be used as a counter and tells us about resources needed by threads.
A classic example is Producer-consumer problem. There is a shared buffer among producer and consumer threads and producers can produce when there are empty slots in the buffer. Consumer can consume if there are goods made by producer.

We will use 3 semaphores. A binary semaphore to lock the buffer. A counting Semaphore to count empty space(resource for producer) and a counting semaphore for how many good(resource for consumer ).

When a producer try to produce, it will do

P(empty_space_semaphore)   //check for empty space
P(lock)                    //lock
buf[loc].write()           //produce
V(goods_semaphore)         //Announce goods have been made

When consume, it will be basically the same, but with

P(goods_semaphore)        //consume
V(empty_space_semaphore)  //add empty space

Why I picked up CPP Concurrency in Acation

The BuzzDB development has led me to the world of multi-thread development. I only had posix thread experience from system classes and reading the doc has become inadequate for the development need.

I have searched only for good book that systematically introduces c++11’s std::thread and gladly I found one: CPP Concurrency in Action.

The author of this book is Anthony Williams. He is one of the main contributor of Boost ‘s thread library and subsequently become a cpp committee member and was part of the group that’s behind std::thread. I have skimmed through the first chapter and found his language accessible. It should be a fun ride!

Our school library provides free access for Ebooks and I picked up a “copy” from there.

Reference not found for xxx

 In Gtest

If main is not found, you might missed something in cmake, such as not didn’t add executable.

In polymorphic classes

not define/override some virtual function


Multiple inherentence (in the case of Operator and Project in BuzzDB)

Actually the problem caused when I change the base class signature and not recompile from scratch. Solved it by make clean, cmake .., and make.