Index

We want to locate where our records are. We might have to resort to index, just like when we are trying to find a key word, we go to the indice in the back of a book.

Basic concepts:

Indices

Indices are used to locate record; Attributes that are used to loop up are called search key. They don’t have to be Key or super key.

Ordered indices: it is based on sorted order of the records

Hash indices

operations and performance affected by choice of indices

Access type:

We can have specific attribute look up or range look up

Space overhead

Deletion

Insert

Ordered indices

Ordered indices are indices that store indices in sorted order.

Clustering indices

Record in the files might themselves be sorted in some order. A clustering index is an index whose search key also defines the sequential order of the file. They are also called primary index.

Nonclustering indices

Indices order is different from the sequential order of the records. There is a second list that has pointers to the physical rows.

Index entry

It is a search key value and pointer to one or **more than one ** records that has the specific search key value.

Dense index

Index entry appears for every search key value in the file. If multiple records have the same search key, then the index’s pointer will point to the first record. But for non-clustering index, each index entry will need store pointers to all records that has the specific search key. (sequential scan from pointer won’t work since the record is not in order).

Sparse index

Sparse index doesn’t store index entry for every search key in the records. It can only be used by clustering index. Because to locate a record, we will find the largest index entry that’s smaller than the search key we are looking for. Then do a sequential scan.

Comparisons and compromise

Dense index is fast but takes up more space Sparse index is slower but take less space. Compromise: Use sparse index to point to the beginning of each block. Since the cost of processsing in DB is to drag blocks from disk to memory.

Multilevel index

Store a index record of the original index. For comparison purpose, let’s compared this two level index with binary search.

For a clustering index, we can use binary search and the time is O(log_2 N). So for a 10,000 block of index, it will take at most 14 block reads to find the desired index. Also, if overflow blocks are used, binary search is not even available.

We can construct a multilevel index, treating the index just a regular sequential file and construct a sparse outer index on the original index. (Because the original index is sorted, we can use sparse outer index).

To find a record, we first find the largest outer index that’s smaller than the search key we want. The pointer in that index will point to a block of index. Then we can do sequential scan.

let’s assume 10,000 block of orignal index would need 100 block of outer sparse index. Since 100 block is small enough, it could be stored in memory so there will be no block read from disk. This is a huge improvement from 14 block reads required by binary search.

borrowed from tutorialspoint

Index update

We need to update index when a record is inserted or delete. We also need to update index when the search key attribute is altered.

Insertion

First we need to look up.

Dense index insertion

If the search key is not in the index, Db inserts an index entry at the appropriate place.

Else,

  1. If the index entry stores pointers of all records that have the same search key value, append this record’s pointer to the index entry.
  2. Else if the index entry only stores the first record with the same search key value, DB will place the record being inserted after the other records that have the same search key value.

Sparse index insertion

We first need to assume each index stores an entry for each block.

  1. If our record is stored in a new block. we store the search key value in index entry.
  2. ELIF the new record has the least search key value in its block, making that record the first of its block, we update the index entry of that block pointing to our record.
  3. Else we do not change the index

Delete for dense index

First, we need to look up the search key

If the delete record is the only record with the search key, just remove the record and index entry.

If the dense index stores all the pointers to records with the same search key, then remove pointer to to-be-deleted record

Elif the dense index stores the pointer of the first record that has the same search key value and if the to-be-deleted record is the first record in the search-key pointer, we need to update the pointer to point the next next record that has the same search key value.

Delete for sparse

If the index doesn’t have the search key value of the to-be-deleted record, then do nothing

elif

  1. If the deleted record is the only record with that search key and the next record in the search key order doesn’t have index pointing at it, update the index entry to point to the next record by search key order. But if that record already has an index entry pointing at it, we will delete the index record.
  2. ELIF if multiple records have the same record, we update the index to point to the next record that has the same search key value.

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax