chapter3 Machine-level representation of programs

History prospective


The first chips to have 32 bits architecture in Intel. It has flat address mode, which Linux still uses today. When we compile with GCC, it will assume it is compiling for i386.

Pentium 4E

Added hyperthreading, a method to run two programs in one processor.


The first multi-core processor. Didn’t support multi-threading


Up to 4 cores.

x86 comes from I386. Makes you wonder how “severe” the back-compatability issues are if they can’t even shake off the old name lol

Program encoding

gcc -01 -o p p1.c p2.c

gcc is the default c compiler.

-01 is the level of optimization

-o will specify the name of the compiled program, in this case p.

First the C preprocessor will expand the source code by including the in #include and macros defined by #define.

Then the compiler will generate assembly code of the expanded c code p1.s and p2.s. Next.

Data Format

ATT format

Also known as the GAS format. Instructions with multiple operands will be in reverse order.


Condition codes

CPU has a set of single bit condition registers. The famous ones are


Carry flag. Will be one if the most recent ops generated a carry out.


zero flag. Recent ops generated 0


Signed flag. If most recent ops has negative result.


If most recent ops generates overflow in two’s complement.

Fun commands


Almost all ops (except leal, since it only copies address) will change condition registers.

We will concern ourselves with CMP(compare) and TEST. These two are ops that will set conditions code without altering other registers.

CMP behaves the same way as SUB. TEST behaves the same way as AND.


SET is a class of ops that will set a bit to 0 or 1, usually one of the flag register or the lower byte of general register, based on combinations of the condition codes.

For example, setl(setwhen signed <) will effectively do

D <- SF ^ OF

Based on the calculation t = a – b.

When a < b and a - b < 0, SF = 1 and OF = 0.

When a < b but a – b > 0, an overflow has occured and OF = 1, SF = 0.

So SF ^ OF = 1 if a < b and D will be set by setl.

For a comprehensive list of set ops, please check out the book.


Test is the almost the same as AND but it doesn’t change the destination register. It will change ZF when the result is 0, and sign flag when the result’s sign bit is set . We usually do

<br />TEST %eax %eax

Then check the sign flag.

Control structure

The code that C translate to Assembly will use a lot of GOTO style code.

WE use different jump statements.


JMP will just jump to a labeled location. It jumps unconditionally.

1: Direct jump. The jump target is encoded as part of the instruction.

2: Indirect jump. Jump target is read from a register or a memory location.

Indirect jump use ‘*’, followed by an operand specifier.

<br />jmp *%eax

use the value in reigister %eax as the jump target.

<br />jmp *(%eax)

will read the jump target from memory.

Conditional jump

They can only be direct. The way to jump to target are directed by PC relative. It directs the difference between the address of the target instruction and the address of the instruction immediately following the jump.  This PC relative applies when we disassembled the assembly code.

(I would love to post some assembly code here but there is no code tag for it)

If we want to jump to byte 0x17 through a jump, and our next byte is 0xa,

then the jump value is 0x17 – 0xa = 0xd.

THE PROGRAM COUNTER WOULD HAVE THE ADDRESS OF THE INSTRUCTION FOLLOWING THE JUMP WHEN PERFORMING PC RELATIVE ADDRESSING. Because in the old day, updating the PC with the next time is the first step in executing an instruction.


Do while, while all use conditional jump.

Most of compiler generate loop based on do-while form of a loop.




while (test-expr);

We translate it using goto, which resembles the assembly code:



if (test-expr)

goto loop;

while loop



It might terminate the loop before the first execution. To translate it to do-while we need to add a conditional jump to skip the first execution of the body.


goto done;




goto loop;


For loop

for (init-expr; test-expr; update-expr)


First translate it to while

i = init-expr;




Then translate it to go-to…

i = init-expr;

if (!text-expr) goto done;







Continue will jump directly to the end of the current loop. Need to jump to the update-section.


Conditional move

It will test a condition and move the data according to that condition. Because it is really hard to predict where if statement will jump to, so pipelineing’s ability to do calculation beforehand is limited. A misprediction of where the next code will be will let CPU discard much of the work done before.

int func(int x, int y){
if (x - y > 0)
    return x - y;
    return x + y;

If the machine predicts the sign of x – y wrong, (eg, predicting it to be positive but it is negative), then it has to discard the result of x – y and calculate x + y and return it. But for conditional moves, it will check conditions and then based on the result, move the appropriate data into appropriate register. No jumps so no works will be discarded. But we have to calculate both x – y and x + y.

int func_cond_move(int x , int y){
int tval = x - y;
int rval = x + y;
int test = x > y;
if(test) rval = tval; //cond move
return rval;

But if one of the conditional expression might lead to a error or side effect, we CAN’T use conditional moves.

int cread(int *xp){
return( xp? *xp: 0);

This code will return *xp if xp is anything but zero. So even given an invalid pointer that is not 0, we will deference that address since in conditional move, we evaluate all statements


Switch has a multi-way branching ability and the time it takes to perform the switch is INDEPENDENT OF THE CASES.

In the assembly code of the switch code, we will first “bring down” the cases number to start at zero. For example, if you have cases ranging from 100 to 106, then GCC will bring it back to 0-6. Then GCC use a switch table, an array that stores code for different cases.

jmp *.L7(, %eax, 4)

We use an indirect jump to jump to the location pointed by L7 + 4*%eax, where %eax stores the case number. L7 is the start of the switch table. We multiple by 4 because the word size is 4 bytes.

In the C code, the switch table is

<br />static void *jt[7]= {

&&loc_A, &&loc_def, &&loc_b...


And it stores address of codes through the && that creates a pointer for code location.


A procedure call involves both data and control transfer. And it must allocates for local vars and deallocated them when the procedure ends. This is implemented through the program stack.

Stack frame

The protion of the stack allocated for a single procedure is called stack frame. The stack grow from higher address to lower address. The stack top has the lowest address and is pointed to by %rsp, the stack pointer.The stack bottom is pointed to by %rbp, the frame poitner.( you can think of the b as “base”) Since the top can be always moving, we always access information relative to the frame pointer %rbp.

The stack grows down so the top of stack will always have lower or equal address than the bottom. In other word, RSP <= RBP. And to allocate space on the stack for the current procedure, we will subtract RSP. To deallocate, we add to RSP.

When P( the caller) calls Q( the calee) the argument to Q are contained in P. On top of that, the return address within P where to program will resume execution, is pushed onto the stack, forming the end of P. (Before Q pushes anything onto the stack) The stack frame of Q will start with the saved value of the frame pointer and copies of any other saved register values.

Q would also uses the stack for any local vars that can’t be stored in registered. 1. Not enough register 2. Local vars are arrays so must be accessed by array references 3. &, the address operator is applied and we must be able to generate an address for it.

Stack frame is also used for storing arguments to any procedure it calls. We can allocate space for stack by decrementing the stack pointer( the top of the stack) then deallocate it later by incrementing it.

Leaf procedure

x86-64 will only allocate stack space if needed so if a procedure has fewer than 6 arguments(since x86 will store the first 6 args on the registers) and doesn’t call other function(calling other functions require the caller push the return address onto the stack), that procedure won’t need stack frame.

Control transfer

When procedure P calls Q, x86-64 calls instruction Call Q and it will push an address A onto the stack and sets the PC to the beginning of Q. The address A is called return address * and is computed as the instruction right next to the *call instruction.

The counterpart is called ret and will pop A off the stack and sets the PC to A.

When P calls Q, at least 8 bytes of storage has to be allocated by subtracting 8 from RSP.

Data Transfer

Usually data transfer is implemented through registers. Args are passed through RDI, RSI and so on. Return value is passed back through RAX. x86-64 use up to six register to pass integral(pointer and integer) arguments in per-defined orders. For 64 bits, the order is : rdi, rsi, rdx, rcx, r8 and r9. If the function has more then 6 arguments, x86 will push the rest of the arguments onto the stack. The 7th argument will be on the top of the stack(last to push on) because we will pop it first.

If we call a function with 8 arguments, we will see the 7th argument at RSP + 16 and the 6th at RSP + 8. (RSP is the beginning of the return address).

Local storage on the stack.

Sometimes we might need to store things onto the stack. Three situations are likely to cause this: 1. Not enough register. 2. The address operator is applied to a local variable and we need to generate an address for that variable.(Registers don’t have address). 3.We encountered an array or structure.

For example, if we want to swap two values through pointers, The calling code of this function has to allocate a stack frame because of the presence of the address operator.

<br />TEST %eax %eax
void swap(int *xp, int*up){ 
    long x = *xp; 
    long y = *yp; 
    *xp = y; 
    *yp = x; 

Local storage in the registers

Program registers are resources shared by all procedures and only ONE procedure can be active at a given time. So we must prevent overwrite when one function calls another function. There are two kinds of register in terms of storage. The first one is callee-saved. As the name indicates, the called function will save or won’t touch the value stored in callee-saved registers. The second kind of register is Caller-saved register.

By convention %rbx % rbp and %r12-%r15 are callee-saved registers. If the called function will alter the register, the function will push the original values onto the stack and before it returns, pop them off to the respective registers. They will be stored in a portion of the stack frame called “Saved register”. The data in callee-saved registers won’t be currupted by the callee

All other registers (except %rsp) are Caller-saved register. It is on the caller to save the values before calling callee.

We often use callee-saved register to hold results from previous calculation. After we push the callee-saved registers value onto the stack and finish the work in the function, we will need to pop them off into the register in reverse order to follow the stack convention.

Recursive calls

The stack convention allows each procedure to have their own private space and local variables in multiple outstanding procedures won’t interfere with each other. With this, we can do recursive calls. The book uses Factorial as an example. You should try to write the assembly for Fib to try out.


Array in C aggregrigate scalar data into larger data type. But C generages pointer to element within the array and can perform pointer arithmatics. If we declear an array that stores n data of size L, the total space allocated will be n * L. If the starting address of the array is X_a, the address of the ith element will be X_a + i*L.

Random Finding


LEA doesn’t change any flag register so it is used a lot by compiler to do simple arithmatics to help the schedulor.

Let’s assume a at %ecx, b at % edx.

LEAL (%ecx, %edx), %ebx

will store a + b in %ebx

We can also use LEA to do left shift

LEAL (%edx, %edx), %edx

Will shift( %eax) to the left by 1 bit and store at %eax.

Test non negative

We will use

TEST %eax, %eax

To see if eax is non-negative since TEST will flip the zero and sign bit based on the result of %ead & % eax.

Calculating Parity

We can use XOR to calculate parity.

<br />int parity(unsigned x){

int val = 0;

while (x != 0){

val ^= x;

x >> 1;


return 0x1 & val;


This code will calculate the parity of x. It will shift x’s bit pattern to the right one by one and XOR it against the previous x bit pattern, effectively go through all digits to check parity at the least significant bit.

The least significant bit in value will start with 0 and flip to 1 through XOR if x has 1 in the least significant bit. The XOR result will be the next val.

The next val will be check against the right-shifted x. If the second to last bit of x is 1, which will become the least significant bit of the right-shifted x, will be XOR against val. If x = 11, we will have 11 XOR 01 = 10. A 0 at the least significant bit means the parity is even.

Flipping bits

It is just amazing how the book does it!!!

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    Markdown is turned off in code blocks:
     [This is not a link](

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see