Big pictures
Why we want logs
The goal here is either all or no modification made by a transaction.
We need to first output to stable storage without modifying the database.
Why we need logs
It helps to make all modifications reflected on the database
Might be in the course of recovery
No change made to db in an aborted transaction
Log
Log is a series of log records that contain the update information of db.
Update log
It usually has these 4 items:
- Transaction id
- Data item id. It usually has a disk location and an offset
- Old value
- new value
Old value allows us to undo
In case of an aborted transaction, we can undo the change.
new value allows us to redo
When a transaction is committed but didn’t make it to the disk, we can redo the transaction.
Just replace the data item with the old value
Start log
Commit log
#
How a transaction happens
Steps
- The transaction performs some computation
- The transaction modifies the data block in the disk buffer
- DB executes page write to write the data block back to disk
When to modify
Deferred modification
A transaction doesn’t update till it has completed.
Immediate modification
modify the database while the transaction is still active
Crashing scenario
Not all changes made it to the disk
Transaction committed, but modifications not finished
Redo
We need to set the data item to new value in log record
Transaction modified db but then abort
We need to erase the change made to db
Undo
Set the data specified to old value in the log record
Orders matter during recovery
The order to updates by redo is important.
Else we might end up with wrong value.
Recovery usually scan the log, and perform for each log record.
To preserve the order of update
Commit
Commit means we have output <Ti, commit> log record to stable storage.
It ensures transaction atomicity
If a system crash before <ti, commit>, we will need to roll back Ti.
Another look at REDO and UNDO
UNDO
UNDO is needed during recover if a transaction Ti has
This makes sure every transaction is complete, either a commit or abort.
REDO
Recall that we need to REDO during a recovery if we found out a transaction T1 has <T1, start> and have <T1, commit> or <T1, abort>
Why do we need REDO when we see abort
Usually when a transaction abort, we don’t need to do it again. But we need to REDO them anyway
Because the
Checkpoints
Recall during recovery, we need to scan from the beginning of the log file. It will take a long time.
Strict checkpoint
- doesn’t permit any update
- outputs all modified buffer to disk
It follows steps
- Output all log records that are in main memory to log
- output all modified block in buffer to disk
- output a
where L is a list of active transaction
Any database modification happened before the checkpoint doesn’t need redo or undo.
A transaction Ti that has its
The REDO and UNDO only need to be applied to transaction in L(active trasaction during checkpointing.
Fuzzy checkpoint
We could have done a fuzzy checkpoint.
during fuzzy checkpoint, transactions are allowed to perform updates.