Chapter 2(representing and manipulating information) notes

Info storage


We want to see bit pattern. Binary is too cumbersome and decimal needs to be converted so we have Hax. The val of a single byte can range from 00_{16} - FF_{16}

Convert from decimal to hex: (1)Divide the decimal number by 16 repeatedly: x = q \cdot 16 + r (2)Then r will be the least significant digit of the hex. (3) repeat this process on q

Convert from hex to decimal: Multiply each digit of the hex by the appropriate power of 16. 0x 7AF = 7 * 16^2 + 10 * 16^1 + 15*16^0


Word size: the normal size of int or pointer. Very important because it dictates the size of the virtual address space. Since pointer is as long as word length, so the address range from 0 - 2^w = 1.

Pointers need to use the full size of the word so for a 64-bit machine the pointer will be 8 bytes long. But for 32-bit machine it will only need 4 bytes. Potentials for bugs since some 32-bit programs would store pointer in int type but that won’t work for 64bits.

Address for multi bytes objects

Multi-bytes object is stored on contiguous sequences of bytes and the address is the smallest address of the bytes used.

Byte ordering(Important, came up in Network)

0x87654321 Little Endian: The beginning(smallest) address will store the least significant bit(on the right most of the number) 12, 34,56,78

low \rightarrow high

Big endian: start with storing most significant to least(Left to right, natural for us)

It matters in

(a)Network: If little communicate with big through a string of bytes, the bytes will be received in reverse order.

(b)Reading bytes generated by little endian: Need to reverse it. Least on the left and most sig on the right.


A string in C is an array of chars terminated by the null. Each char is represented by some encoding (ASCII).

Text data is more independent than binary data.

Boolean algebra

Multiplication distributes over addition, so does AND distributes over OR. We can’t distribute addition over multiplication, But we can distribute or over AND. a | (b & c) = (a | b) & (a | c)

Each element is its own inverse in bool algebra. a ^ a = 0 where 0 is an all-zero vector. This holds even we rearrange and combine them in different orders. (a ^ b) ^ a = b. An very interesting application is in place swap

void inplace_swap(int *x, int *y){
    *y = *x ^ *y;
    *x = *x ^ *y;
    *y = *x ^ *y;

If the x stores a and y stores b, then

step & *x & *y

initial & a & b

1 & a & a ^ b

2 & a ^ a ^ b = b & a ^ b

3 & b & a ^ b ^ b = b

(The Latex tabular is broken lol)

Bit masking

using bit patter to indicate a selected set of bits within a word.

x & 0xFF will get the least significant byte of x.

~0 will get a mask of all ones, regardless of word size(portable)

Logic operatior && || and ~

OR, AND, NOT in logic and any non-zero arg is TRUE and the operation will return True or False. It has a short circuit, if the result of the expression can be determined by evaluating the first arg, then the operation won’t eval the second args. So a && 5 / a won’t ever cause a division by 0 and p && *p++ won’t dereferencing a null pointer

A way to do x == y is to

!( x ^ y)

x ^ y will give us all zero if all the bits match. Then the NOT will return 1 only if all bits are zeros.

shift operation in C.

x = [x_{n -1},x_{n -2},...,x_{0}] , we apply x << k, shifting k bits to the left and drop off the k most significant bits, filling the right with k zeros. We got [x_{n k -1},x_{n k -2},...,x_{0}, 0 ... k zeros ...]. Shift associate from left to right. x<<j<<y = (x<<j)<<y

Right shift. Two kinds. 1:arithmatic, 2: logic. The logic one is similar to left shift, shifting k elements to the right and leave k zeros on the left. Arithmetic will shift to the right by k, but instead of filling k zeros on the left, it will have k x_{n - 1}, the most significant bit.

Integer representation

Two’s complement

B2T_w (x) = -x_{w + 1}2^{w - 1}+ \sum_{i = 0}^{w - 2}x_i 2^i We can see the most significant bit(sign bit) is a negative weight. When the sign bit is 1, the whole number has to be negative.

The range of two’s complement is asymmetric; |Tmin| = |Tmax| + 1 because half of the bit patterns are for negative and half are for non-negative, which includes zero so the |Tmax| will be 1 smaller than Tmin.

Other representations:

One’s complement: Similar to two’s complement, except the negative weight of the most sig dig is 2^{w - 1} - 1.

Sign magnitude. The sig dig determine the sign.

Both method has two ways to representation zero. NOT cool. Note the different position of apostrophes: Two’s complement versus Ones’ complement. The term “two’s complement” arises from the fact that for nonnegative x we compute a w-bit representation of −x as 2w − x (a single two). The term “ones’ complement” comes from the property that we can compute −x in this notation as [111 . . . 1]− x (multiple ones).

Conversion between signed and unsigned.

When convert between sign and unsigned whose word sizes are the same, the numerical value might change but the bit values stay identical. (

short int v = -12345;
unsigned short uv = (unsigned short) v;

v = -12345,where -12345 = - 2^16 + 53191 uv = 53191 . Because now we interpret the two’s complement as the unsigned. The numerical value difference between unsigned and two’s complement with the same bit pattern is 0 if x >=0 is x = 2^w if x_{w - 1} = 1(negative)

When an operation is performed on an int and an unsigned int, the int will be dragged to to unsigned, assuming the number is nonnegative.

Problem for relational operators < and >. eg: -1 < 0U. -1 will be thought as nonnegative so -1 is actually 4294….. and 4293…. < 0 is false.

Expanding bit representation

Type promotion, expand space. For unsign we use zero expansion, adding zeros to the left. For two’s complement, we use sign extension, adding copies of the most significant bit to the left. [x_{w - 1}, x_{w - 2}....x_1] ] [x_{w - 1},x_{w - 1},x_{w - 1},x_{w - 1},x_{w - 2}, ...x_1] Case in point: 101 = -3, 1101 = -8 + 4 + 1 = -3. The newly added 4th bit will cancel out half with the previous 3rd bit, therefore we will subtract the same number. (To prove formally, use induction)

If we are doing

short sx = -12345;
unsigned uy = sx;

uy = 4294954951

Which means we first convert short to (unsigned)(int) sx.


w-bit number to k-bit number w > k, we will drop the leading w – k bits. For unsigned number x, the numerical value is equivalent to x mod 2^k. Because 2^k mod 2^k = 0 , i > k. (4 mod 2 = 0, 8 mod 2 = 0).

For two’s complement, it is very similar. We first convert it to unsign, then unsign to two’s complement

Integer Arithmetic

Word size inflation: To to addition for two k bits number might require k + 1 bts. can not place any bound on the word size required to fully represent the result. If x + y < 2^k the leading bit of k + 1 bit is 0, so the numerical value of this int addition is the same as the exact math addition.

If 2^{k } < x + y < 2^{k + 1} , then overflows happened, and we discard the leading k + 1 bit,(truncate) and this is equivalant to x + y - 2^k.

To check if overflow happens or not for s = a + b, we can check s >= b. if this is true, then overflow didn’t happened. Because a + b \geq a so s \geq a. On the other hand if overflow happens, s = a + b - 2^k and a < 2^k so a - 2^k < 0 ,s = (a -2^k) + b < b. So if overflows happened, s < b, s < a

Floating point


For IEEE floating point, we have V = (-1)^s * M * 2^E. Where s stands a single sign bit. Significand Mis a fractional binary number that ranges either between 1 and 2 – \episilon or between 0 and 1 – \episilon. The exponent E weights the value by a ( possibly negative) power of 2.

For single precision, s, exp, and frac are 1, k = 8, n = 23.

For double precision, s, exp, and frac are 1, k = 11, n = 52.

Three cases of floating point

Normalized value.

When exp is neither all 0 nor all 1(255). It is interpreted as a sign integer in biased form. E = e - Bias where e is the unsigned number with bit representation e_{k-1}....e_{1}e_{0} and Bias is a bias value equal to 2^{k - 1} - 1.(127 for single precision). So the exponent ranges from -126 to 127.

Implied leading bit: the leading bit is always one so we can ignore it and use the leftover bit for one bit more precision. To see that,we use an example:

0.0123 = 1.23 * 10^{-2}. Same with binary, we can always shift the exponent so the leading bit is one.

Denormalized values

When the exponent field is all ZEROS. Exponent value is E = 1 – Bias and the significand value is M = f. The value of the fraction field is without an implied leading 1.

It can reps 0 since a normalized value’s M is always M > 1.

Floating points rounding.

Find the closest matching number in floating point number format. The design decision for rounding is when the result is between two possible result. For example, If we want to round to the closest integer and our number is 2.5, we have 2 and 3 to choose from.

The default is round-to-even, which rounds the number either up or down so the least significant digit is 0(even). To do so will avoid statistical bias.
Only bit pattern XXX.YYYY100…, where X and Y are arbitrary numbers, will need round-to-even because it is half way between two results.

Floating ops.

Floating additions are communicative but not associative.
Floating point multiplication is communicative but not associative due to possible overflow or losee of precision.

Random findings

(a) Floating point arithmetic is not associative! A case in point is 3.14 + (1e20 – 1e20) = 3.14 but for most computer (3.14 + 1e20) - 1e20 = 0

(b) The C++ use the same exact numerical representations and ops as C. But Java uses a new set of standards! Java is more specific on the formats.


zhenk14 > gcc -std=c99 prog.c

is using ISO C99 standard, which took over ANSI (c89)

(d) using a cast to allow an object to be referenced according to a different data type from which it is created. Discouraged but used in system programming.

(e) long in 32-bit machine is as long as an int?

IA32 floating point arithmetic use a special 80 bit extended precision floating point format. All single and double precision floating point number are converted to 80 bits to perform arith ops. After the operations, the results are converted back to store in the memory.

About C

We can dereference with the array notation

byte_pointer start;
start[3] //This will access the 4th elem stored in the array pointed by the byte_pointer.

In general, start[i] will pointed i position beyond the loc pointed by start.


Typedef can name type and is very similar to var declaration.

pointer creation and deref

The C “address of” operator & creates a pointer.

int *byte_pointer = &x //x is an int

Will create a pointer that points to the location that holds x. The type of the pointer depends on the type of x.


(byte_pointer ) &x //x is an int

will cast the pointer to x, which is an int, to type byte_pointer and the program will reference it that way. Nothing change in the byte level, just the compiler to refer the data according to byte_pointer type.

Strlen() doesn’t count the null terminator

shift operations

If we are trying to shift k bits when the data is w and w < K then in C it is not specified but generally it should perform k mod w. JUST DONT DO IT Also shift has lower precedence than addition.

Floating point ops

C doesn’t require IEEE. But it is is math.h.

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    Markdown is turned off in code blocks:
     [This is not a link](

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see