Learn SQL | Indexing and Optimization

Indexes in SQL

Indexes are used to retrieve data from the database more quickly than otherwise. The users cannot see the indexes, they are just used to speed up searches/queries.

Types of Indexes

There are three types of indexes:

- Unique Index
- Clustered Index
- Non-Clustered Index

Unique Index

Unique Indexes are used not just for performance but for data integrity as well, as they do not allow the field to have duplicate values if the column is unique indexed.

Clustered Index

Clustered Indexes are the indexes that physically sort the data rows in the table on the basis of their key values. There can be only one clustered index on a table.

Non-Clustered Index

Non-Clustered Indexes are the indexes where the logical order of the index does not match the physical stored order of the rows on disk. The leaf layer of a non-clustered index does not consist of the data pages. Instead, the leaf nodes contain index rows.

Query Optimization Techniques

Query optimization is the process of selecting the most efficient way to execute a query. The process of optimization is a complex one, as it involves making decisions based on the available statistics and the query plan. The query optimizer is the component of the database management system that performs this optimization.

Types of Query Optimization Techniques

There are several query optimization techniques:

- Join Elimination
- Indexing
- Query Rewriting
- Query Decomposition
- Query Compilation
- Query Execution

Join Elimination

Join elimination is the process of removing joins from a query that are not necessary for the query to return the correct results. This can be done by using indexes or by rewriting the query to eliminate the join.

Indexing

Indexing is the process of creating indexes on the columns that are frequently used in queries. This can speed up the query by allowing the database to quickly locate the rows that match the query criteria.

Query Rewriting

Query rewriting is the process of rewriting a query to make it more efficient. This can involve changing the order of operations in the query or rewriting the query to use a different algorithm.

Query Decomposition

Query decomposition is the process of breaking a complex query into smaller, simpler queries that can be executed more efficiently. This can be done by using subqueries or by rewriting the query to eliminate unnecessary operations.

Query Compilation

Query compilation is the process of compiling a query into an executable form that can be executed by the database engine. This can involve optimizing the query plan and generating efficient code to execute the query.

Query Execution

Query execution is the process of executing the compiled query and returning the results to the user. This can involve reading data from disk, processing the data, and returning the results to the user.

Normalization and Denormalization

Normalization is the process of organizing data in a database. This includes creating tables and establishing relationships between those tables according to rules designed both to protect the data and to make the database more flexible by eliminating redundancy and inconsistent dependency.

Types of Normalization

There are several types of normalization:

- First Normal Form (1NF)
- Second Normal Form (2NF)
- Third Normal Form (3NF)
- Boyce-Codd Normal Form (BCNF)
- Fourth Normal Form (4NF)
- Fifth Normal Form (5NF)

First Normal Form (1NF)

First Normal Form (1NF) is the first step in the normalization process. It requires that the table have a primary key and that each column in the table contain only atomic values.

Second Normal Form (2NF)

Second Normal Form (2NF) requires that the table be in 1NF and that all non-key attributes be fully functionally dependent on the primary key.

Third Normal Form (3NF)

Third Normal Form (3NF) requires that the table be in 2NF and that all non-key attributes be non-transitively dependent on the primary key.

Boyce-Codd Normal Form (BCNF)

Boyce-Codd Normal Form (BCNF) requires that the table be in 3NF and that all determinants be candidate keys.

Fourth Normal Form (4NF)

Fourth Normal Form (4NF) requires that the table be in BCNF and that there be no multi-valued dependencies.

Fifth Normal Form (5NF)

Fifth Normal Form (5NF) requires that the table be in 4NF and that there be no join dependencies on any subset of the candidate key.

Denormalization

Denormalization is the process of adding redundant data to a database to improve read performance. This can involve adding redundant columns to a table or duplicating data in multiple tables.