Logging and recovery

Big pictures

Why we want logs

The goal here is either all or no modification made by a transaction.
We need to first output to stable storage without modifying the database.

Why we need logs

It helps to make all modifications reflected on the database

Might be in the course of recovery

No change made to db in an aborted transaction


Log is a series of log records that contain the update information of db.

Update log

It usually has these 4 items:

  1. Transaction id
  2. Data item id. It usually has a disk location and an offset
  3. Old value
  4. new value

Old value allows us to undo

In case of an aborted transaction, we can undo the change.

new value allows us to redo

When a transaction is committed but didn’t make it to the disk, we can redo the transaction.
Just replace the data item with the old value

Start log

Commit log

Transaction Ti has committed


Transaction has aborted

How a transaction happens


  1. The transaction performs some computation
  2. The transaction modifies the data block in the disk buffer
  3. DB executes page write to write the data block back to disk

When to modify

Deferred modification

A transaction doesn’t update till it has completed.

Immediate modification

modify the database while the transaction is still active

Crashing scenario

Not all changes made it to the disk

Transaction committed, but modifications not finished


We need to set the data item to new value in log record

Transaction modified db but then abort

We need to erase the change made to db


Set the data specified to old value in the log record

Orders matter during recovery

The order to updates by redo is important.
Else we might end up with wrong value.
Recovery usually scan the log, and perform for each log record.
To preserve the order of update


Commit means we have output <Ti, commit> log record to stable storage.

It ensures transaction atomicity

If a system crash before <ti, commit>, we will need to roll back Ti.

Another look at REDO and UNDO


UNDO is needed during recover if a transaction Ti has but not or <Ti, abort> WE need to roll back all changes and also writes to indicate the undo has completed
This makes sure every transaction is complete, either a commit or abort.


Recall that we need to REDO during a recovery if we found out a transaction T1 has <T1, start> and have <T1, commit> or <T1, abort>

Why do we need REDO when we see abort

Usually when a transaction abort, we don’t need to do it again. But we need to REDO them anyway
Because the might come from UNDO, in which case, we still need to finish that transaction


Recall during recovery, we need to scan from the beginning of the log file. It will take a long time.

Strict checkpoint

  • doesn’t permit any update
  • outputs all modified buffer to disk

It follows steps

  1. Output all log records that are in main memory to log
  2. output all modified block in buffer to disk
  3. output a where L is a list of active transaction

Any database modification happened before the checkpoint doesn’t need redo or undo.
A transaction Ti that has its or before must have written its change to disk because checkpoint output all modified block to disk. So no need to perform REDO.

The REDO and UNDO only need to be applied to transaction in L(active trasaction during checkpointing.

Fuzzy checkpoint

We could have done a fuzzy checkpoint.
during fuzzy checkpoint, transactions are allowed to perform updates.

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax