Divide & conquer 2: binary search

``````class Solution {
public:
int search(vector<int>& nums, int target) {
return bin_search(nums, 0, nums.size() - 1, target);
}
int bin_search(vector<int>& nums, int l, int h, int target){
if(l > h) return -1;
int mid = (l + h)  / 2;
if(nums[mid] == target) return mid;
if(nums[mid] > target){
// target in [l: mid - 1]
return bin_search(nums, l, mid - 1, target);
}
return bin_search(nums, mid + 1, h, target);

}
};
``````

One thing that helps with this recursive problem is to set the boundary of input for the algorithm.
For binary search, a `bin_search(nums, int l, int h, int target)`, specify that this will search [l, h] for target. So for the subproblems, you will know that it will search between [l, mid – 1] and [mid + 1, h] with `bin_search`.

Merge sort

It uses divide and conquer and use O(n) extra memory to achieve the O(nlogn) run time

``````class Solution {
public:
vector<int> sortArray(vector<int>& nums) {
int l = 0, r = nums.size() -1;
vector<int> temp(r-l+1);
mergeSort(nums, temp, l, r);
return nums;
}

void mergeSort(vector<int>& A, vector<int>& temp, int l, int r){
if(l >= r)
return;
int m = (l+r)/2;
mergeSort(A, temp, l, m);
mergeSort(A, temp, m+1, r);
merge(A, temp, l, m, r);
}

void merge(vector<int>& A, vector<int>& temp, int l, int m, int r){
int p1 = l, p2 = m+1, p_temp = 0;
while(p1 <= m && p2 <= r){
if(A[p1] <= A[p2])
temp[p_temp++] = A[p1++];
else
temp[p_temp++] = A[p2++];
}
while(p1 < m+1)
temp[p_temp++] = A[p1++];
while(p2 < r+1)
temp[p_temp++] = A[p2++];
for(int i = 0; i < r-l+1; i++){
A[l+i] = temp[i];
}
}
};
``````

The function `MergeSort(nums, temp, start, end)` will sort an array that starts at index start and whose last element is at index end.
So the subsequent subproblem will sort [start, middle], and [middle + 1, end].

Merge will mergetwo sorted subarrays A[l, m] and A[m + 1, r].

NOTICE, we uses index, not size in the `while` in `merge` that’s why we have p1 <= m.
If we use size, notice that an array that starts from index l and ends on index m will have size m – l + 1 and since p1 starts at l, it needs to go forward m steps to reach the last element.

Client-server model

In this model, an application consists of one server and multiple client. Server manages resources and provide services to client. 1. Client request services 2. Server manipulate resources to satisfy the request 3. Server sends a response to the client

Networks

A network is a hierarchal system of wires and boxes that’s organized by geography proximity. EG: LAN, WAN.

LAN

Lowest Level is LAN. It is consisted of a bunch of hosts connected by wires to a hub. Each host has a 48 bits address. (MAC address) EG:00:16:ea:e3:54:e6.

Host sends each other via frame. Broadcast, so everybody sees every bits.

Next level: internet(lowercase)

internet stands for inter-connected network that connects incompatible LANs together.

internets is usuallyy ad hoc so there is no set topology and we send data via hops across routers.

protocols

Protocols are a set of rules that govern how hosts and routers should communicate. It smoothes out differences between network

Internet Protocols(IP) does: 1. Provide naming scheme such that each host and router can be identified by a unique address. The uniform format is called host address.

1. Provide deliver mechanisms: Provide the standard unit of transmission called packet that’s consisted of header and payload. Header consists of the packet size, source host address and so on.

The Internet

The Internet can be thought of as a world collection of hosts with the following properities:

1. Hosts are mapped to 32 bits address.
2. IP address is mapped to a set of identifier called domain name.
3. A process on the Internet can communicate with another process via connection.

IP addresses are stored in IP address struct. The struct contains 1 unsigned int

`````` /* Internet address structure */
uint32_t  s_addr; /* network byte order (big-endian) */
};
``````

EG:0x8002C2F2

We can convert the hex to human readable dotted decimal form:

0x8002C2F2 = 128.2.194,242

Domain name

The Internet maintains a mapping between IP addresses to domain names called DNS domain name server.

The mapping is not one-to-one. A domain name can map to 1 address, multiple addresses or no address.

The node of the tree represent domain names formed by the path to the root.

Internet connection

Client and server communicates by sending streams of byte over a connection. Each connection is full duplex and point-to-point

A socket is an endpoint of a connection.

A port is a 16 bits that identifies a process: 1. ephemeral port: ports automatically assigned by kernel when client makes a request. 2. Well-known port : Associated with some service provided by the server. EG port 80 for http

Each connection is defined by a socket pair

``````(client ip: port, server ip: port)

# Socket

For Linux Kernel, socket is an endpoint of communication. For program, it is just a file with a file descriptor.

A socket address is a struct that contain info about the ip address and port. We have to use a generic socket struct then cast it to different socket address struct since when socket was designed, we can't use generic pointer `void*`.

<img src="http://kedizheng.com/wp-content/uploads/2019/08/Screen-Shot-2019-08-19-at-4.20.59-PM.png" alt="" width="1424" height="490" class="alignnone size-full wp-image-957" />

<img src="http://kedizheng.com/wp-content/uploads/2019/08/Screen-Shot-2019-08-19-at-4.21.51-PM.png" alt="" width="1416" height="616" class="alignnone size-full wp-image-959" />

Above is an IPV4 socket address. For functions to use socket address correctly, we need to first cast `struct sockaddr_in*` to `struct sockaddr *`

## Convert from string representation of sockaddr to struct
we use `getaddrinfo` to do so.

```c
const char *service,         /* Port or service name */
const struct addrinfo *hints,/* Input parameters */
``````

Given host and port number, returns result that points to a linked list of addrinfo structs. In each addrinfo contains a pointer to sockaddr that can be used to `bind`, `accept` and `listen`.

In `hints`, we specify more control about the connection. Such as IP type and socket types.

The reason `getddrinfo` returns linked list of addresses structure is because the default behavior is to return 3 structs for 3 different types of connection. 1 for connection, 1 for raw socket and 1 for data gram.

Also a host might have multiple IP address.

It is a linked list because an address for a host might have multiple sockets.

EG: Twitter might have multiple sockets listening?

We can get the host info from struct with

``````int getnameinfo(const SA *sa, socklen_t salen, /* In: socket addr */
char *host, size_t hostlen,/* Out: host */
char *serv, size_t servlen,/* Out: service */
int flags); /* optional flags */

``````

Socket interface

Creating socket

``````int socket(int domain, int type, int protocol);
``````

Both client and server uses this function to create socket. EG:

``````int sockfd = socket(AF_INET, SOCK_STREAM, 0);
``````

AF_INET means we use IPV4 and SOCK_STREAM means this is an endpoint of a connection.

Better to use `getaddrinfo()` to get info for socket since it will be protocol independent.

`````` struct addrinfo *p = ...;
int clientfd = socket(p->ai_family, p->ai_socktype, p->ai_protocol);
``````

Connect (by client)

A client establishes connection via

``````int connect(int clientfd, SA *addr, socklen_t addrlen);
``````

It will block until a connection is successfully established. If successful, we can now read/write via clientfd. The connection is characterized by a pair `(x:y,addr.sin_addr:addr.sin_port)`.

Bind

``````int bind(int sockfd, SA *addr, socklen_t addrlen);
``````

Where `typdef struct sockaddr SA`

It will ask the kernel to associate the socket address struct with the socket file descriptor.

After bind, process can read and write from `sockfd`.

Listen

By default, program will assume a socket opened by `socket(...)` is an active socket from the client end. Use listen will turn the socket from active socket * to *listening socket.

`````` int listen(int sockfd, int backlog);
``````

`backlog` is an hint for the server about how many requests can be queued before rejecting further requests.

Accept

``````int accept(int listenfd, SA *addr, int *addrlen);
``````

`accept(...)` will wait for connection request bounded to socket at listenfd, then fill in the client address in addr and the size of socket address length at addrlen.

If successful, it will return a connected descriptor to communicate with the client via UNIX IO.

Connected fd is different from listening socket. A listen socket is created once and will listen for Connection requests. A connected socket is created for the connection and exists only as long as it takes for the server to service the client.

We have such distinct so we can create concurrent server to connect with many client. EG: when a new request comes, we fork a child to handle.

UNIX I/O

A linux file is a series of m bytes /[b_0, b_1…b_{m – 1} ]

All I/O devices are such as terminal, Network, printers are modeled as file.

Openning a file

An application opens a file by request a file from Kernel. The Kernel returns a small non-negative integer called File descriptor. The kernel keeps info about the file, and the app only keeps track of the descriptor.

Change file position

Kernel keeps a current position k for each file and it starts at 0. App can explicitly change it via sleek.

Read a file copies n bytes from the file to memory starting at k. It will then increment the file position k by n.

If the file size is m and m < n then Kernel will trigger a condition called EOF. There is no explicit EOF character associated with the file though…

Writing is similar. It will write n bytes to the file starting at k and increment the current position k by n.

File type

Regular file

contains arbitrary data

Directory file

A file contains multiple links to other file, mapping file names to files, including directory files. Each directory files contain 2 files at least. The first is “.”, which points to itself. The other is “..”, which points to the parent.

Socket

File used to communicate with other process across networks.

Open and close a file

Openning a file

Open will takes a filename and flags and user mode and return the smallest int that’s not used in by the current processor as the file descriptor. For example, the first file a process open will has fd == 3. (0 is stdin, 1 is stdout and 2 is stderr).

``````int open(char *filename, int flags, mode_t mode);
``````

flags indicate how to access the file. The flag can also be ored to create bit mask that gives additional info. We can do

``````open('temp.txt, O_RDONLY | O_TRUNC, 0);
``````

Close file

``````int close(int fd);
``````

Will close the file associated with the given fd. Closing a closed file is error.

Always check return codes, even for seemingly benign functions such as close()

``````ssize_t read(int fd, void* buf, size_t n);
``````

will read at most n byte from fd starting at the current position and store it into buf. A return value of -1 indicates Error. 0 indicates EOF. Else the return value indicates the bytes read.

``````ssize_t write(int fd, const void* buf, size_t n);
``````

copy at most n bytes from buf to fd at the current position.

Short Count

For both read and write, the return value might be smaller than n. This is called Short count. It can happen due to three things: 1. Encounter EOF 2. Reading from text line in terminal. If we are reading from a terminal, no matter how big the n is, `read` will just read a text line and return the length of the text. 3. Reading from socket.

Standard I/O

C standard library libc.h has high level file functions such as fopen, fclose, and so on. Open and close file: `fopen`, `fclose` Read and write bytes; `fread`, `fwrite` Read and write text line `fget`, `fput` Format read write `fprintf`, `fscanf`

standard IO models file as stream, abstraction for file descriptor and a buffer.

Applications often read/write one byte at a time and doing `read`, `write` is expensive because it requires kernel call. So we use buffered IO that put contents into a buffer first.

EG: stdout uses buffer. It will flush when we call `fflush(stdout)` or when the function exits.

Most of the time, we would prefer standard IO but it has problem with: 1. Can not access meta data 2. Not async-data safe (not safe for signal handling) 3. Not suitable for network socket 4. Restrictions

Standard IO uses full duplex so we can use the stream both to read and write but the buffer causes problem. More on standard IO restriction: 1. Can’t use input right after output because of buffer. If we want to use it, we must flush it with `fflush` 2. An output can’t follow input function without calling `fseek` or `rewind`. It causes problem for network since we can’t `fseek` on a socket.

We can open two streams one for read and one for write but we need to close them

``````FILE* fpin, *fpout;
fpin = fopen(socked, "r");
fpout = fopen(socked, "w");
``````

We need to close both streams to release the resource. But since both streams have the same underlying socket, the second close will cause problem.

When to use Unix IO

1. Inside signal handler
2. When performance matter a lot

What not to use for binary file

Text-oriented IO

such as `fget`

string function

Such as `strlen` because ‘\0’ will be interpreted as null terminator

How kernel represent open files

To represent files in Unix kernel, we need three tables:

Descriptor table: Each process keeps a descriptor table where each entry is an open file. Each entry points to an entry in the file table.

File table: File table keeps track of the current position offset of each file, and a reference count of the file and a pointer to entry to v-node table. Closing a file will decrement the reference count of the associated file table. File table is shared across all processes

v-node table: Keeps `stat` info about that table.

Two file descriptor represents two different files.

Two file descriptors that point to the same file. EG, do two `open` on the same file name.

Forking will let children inherit files opened by parents
Before forking:

After forking:

We also increase the reference count by 1 for each file.

IO redirection

Linux IO allows user to redirect standard output to file on disk

``````\$ ls > foo.txt
``````

will redirect the output of ls to foo.txt.

It achieves this by using `dup2(int oldfd, int newfd);`
For each process, `dup2` will copy oldfd’s table entry to that of the new file descriptor.

EG: `dup2(4,1)` will copy the pointer pointed to by file descriptor 4 to the entry of file descriptor 1 in the file descriptor table

if newfd is open, then we will close it first(and decrement the refcount).

Function that keeps state across multiple invocation.

Such as rand. solution: stop using state via static variable. Only solution is to change the function such that it doesn’t use static and provide that value via variable.

function that return pointer to static

Not the same as thread safe. A function can be thread-safe but not reentrant.

Race

Race condition occurs when the correctness of the program depends on the order of execution.

A pair (or multiple) threads that are blocked because they need resources from other threads.
For mutex, we can prevent it by locking it in one order and unlocking it in reverse order.

Semaphores

Semaphore is a concept invented by Dutch Compute Scientist Dijkstra.
It is a global counter that can only be changed by 2 atomic operations: P(s) and V(s).
P will decrement s if s > 0 and return immediately but would suspend the thread if s = 0.
Once s > 0 due to V operation, the thread is restarted and decrement s and return control.
P stands for proberen( to test) in Dutch.

V will increment s by one. If there are threads waiting for s to become non-zero, V will restart exactly one thread that will complete and decrement s.

V stands for verhogen(to increment) in Dutch.

Binary semephore(where the max value of s is 1) is called lock. P will lock and V will unlock.

Counting Semaphore

Semaphore can be used as a counter and tells us about resources needed by threads.
A classic example is Producer-consumer problem. There is a shared buffer among producer and consumer threads and producers can produce when there are empty slots in the buffer. Consumer can consume if there are goods made by producer.

We will use 3 semaphores. A binary semaphore to lock the buffer. A counting Semaphore to count empty space(resource for producer) and a counting semaphore for how many good(resource for consumer ).

When a producer try to produce, it will do

``````P(empty_space_semaphore)   //check for empty space
P(lock)                    //lock
buf[loc].write()           //produce
V(lock)
V(goods_semaphore)         //Announce goods have been made
``````

When consume, it will be basically the same, but with

``````P(goods_semaphore)        //consume
...
``````

Why I picked up CPP Concurrency in Acation

The BuzzDB development has led me to the world of multi-thread development. I only had posix thread experience from system classes and reading the doc has become inadequate for the development need.

I have searched only for good book that systematically introduces c++11’s std::thread and gladly I found one: CPP Concurrency in Action.

The author of this book is Anthony Williams. He is one of the main contributor of Boost ‘s thread library and subsequently become a cpp committee member and was part of the group that’s behind std::thread. I have skimmed through the first chapter and found his language accessible. It should be a fun ride!

Our school library provides free access for Ebooks and I picked up a “copy” from there.

In Gtest

If main is not found, you might missed something in cmake, such as not didn’t add executable.

In polymorphic classes

not define/override some virtual function

https://stackoverflow.com/questions/12573816/what-is-an-undefined-reference-unresolved-external-symbol-error-and-how-do-i-fix

Multiple inherentence (in the case of Operator and Project in BuzzDB)

Actually the problem caused when I change the base class signature and not recompile from scratch. Solved it by make clean, cmake .., and make.