MS sql indexes. Sql server - Does order matter when creating a covering index in Microsoft SQL? Constant calculated columns

--An index is a structure on disk that is associated with a table or view and speeds up the retrieval of rows from the table or view. An index contains keys built from one or more columns in a table or view. These keys are stored in a balanced tree structure that supports quick search of rows by their key values ​​in SQL Server.

--Clustered indexes sort and store rows of data in tables or views based on their key values. These values ​​are the columns included in the index definition. There is only one clustered index per table because the data rows can only be sorted in a single order.
--Rows of data in a table are stored in sort order only if the table contains a clustered index. If a table has a clustered index, then the table is called clustered. If a table does not have a clustered index, the data rows are stored in an unordered structure called a heap.

--A nonclustered index has exactly the same structure as a clustered index, but with two important differences:
--a nonclustered index does not change the physical order of the rows in the table, and leaf pages in a nonclustered index consist of index keys and bookmarks.

--Clustered indexes provide faster data retrieval than nonclustered indexes. They usually turn out to be faster when updating too, but not when many updates are happening at the same place in the middle of the relationship.

--For some reason, a clustered index tends to run faster than a nonclustered index. When the system scans a clustered index, there is no need to leave the B-tree structure to scan data pages because such pages are already present at the leaf level of the tree.

--A nonclustered index also requires more I/O operations than the corresponding clustered index.

--The nonclustered index needs to read the data pages after scanning the B-tree or, if there is a clustered index on another column(s) of the table, the nonclustered index needs to read the B-tree structure of the clustered index.

--So a clustered index will be significantly faster than a table scan, even if its selectivity is quite poor (the query returns a lot of rows)

CREATE TABLE tsql.dbo.NI
ID int NOT NULL,
T char(8) NULL
);

CREATE TABLE tsql.dbo.NCI
ID int NOT NULL,
T char(8) NULL
);

--Create a clustered index

CREATE CLUSTERED INDEX IX_1
ON tsql.dbo.NCI(ID);

--Create a nonclustered index on a table

CREATE NONCLUSTERED INDEX IX_2
ON tsql.dbo.NCI(T);

--Add test data
DECLARE @i INT = 100000;
DECLARE @t CHAR(1) = "T";

WHILE @i > 0
BEGIN
insert into tsql.dbo.NI values(@i, @t + CAST(@i AS char(6)));
insert into tsql.dbo.NCI values(@i, @t + CAST(@i AS char(6)));
SET @i -= 1;
END

--Queries on a table with indexes
SELECT ID, T FROM tsql.dbo.NCI
ORDER BY ID, T

SELECT ID, COUNT(*) AS C FROM tsql.dbo.NCI
GROUP BY ID, T

SELECT ID, T FROM tsql.dbo.NCI
WHERE ID > 4000 AND ID< 55000 AND T LIKE "T%"

--Query using both indexes
USE tsql;
SELECT CAST(dbo.NCI.ID AS VARCHAR)
FROM dbo.NCI
GROUP BY dbo.NCI.ID
UNION ALL
SELECT dbo.NCI.T
FROM dbo.NCI
GROUP BY dbo.NCI.T

--Indices information
SELECT index_type_desc, index_depth, index_level,
page_count, record_count
FROM sys.dm_db_index_physical_stats
(DB_ID(N"tsql"), OBJECT_ID(N"dbo.NCI"), NULL, NULL , "DETAILED");

--Deleting indexes
IF EXISTS (SELECT name FROM sys.indexes
WHERE name = N"IX_1")
DROP INDEX IX_1 ON tsql.dbo.NCI;

IF EXISTS (SELECT name FROM sys.indexes
WHERE name = N"IX_2")
DROP INDEX IX_2 ON tsql.dbo.NCI;

In the previous article, we introduced ways to optimize relational databases and discussed how clustered and nonclustered indexes work in the context of optimizing database query execution time. Now it's time to put this knowledge into practice by learning how to create optimization indexes for a MS SQL database.

Let me remind you of the definition of the Staffs table schema that we will work with:

Staffs table

Let's say we need to create a non-clustered index for the Staffs table, which will optimize the following query:

SELECT Id, Name, Job FROM Stuffs WHERE SALARY > 1000 AND Photo IS NOT NULL

The index key will be the SALARY and Photo columns, since the selection is filtered by these fields. And the Id, Name and Job columns will be the columns included in the index.

The general command syntax is as follows:

USE GO

CREATE NONCLUSTERED INDEX ON (ASC -- index key columns)

INCLUDE ( -- included columns) GO

In our case, the request will look like this:

(Salary, Photo) INCLUDE (Id, Name, Job) GO

We have created a non-clustered index. Or rather, a non-clustered covering index. This means that the index contains all the fields necessary to execute the query and SQL Server will not access the base table when executing the query.

If our code were like this:

CREATE NONCLUSTERED INDEX IDX_StaffsSearch ON Stuffs

(Salary, Photo) INCLUDE (Id) GO

In this case, the index ceases to be a covering index, since it does not include all the columns used in the query. The optimizer will still use this index when executing the query, but its efficiency will be reduced by an order of magnitude since it will require access to the base table.

The clustered index is created using the following command:

CREATE CLUSTERED INDEX IDX_Stsffsid ON Stuffs (Id)

Here a unique clustered index was created based on the table's primary key (Id column).

Real example

Let's now develop a scenario in which we can realistically evaluate the degree of performance gain in the case of using indexes.

Let's create a new database:

CREATE DATABASE TestDB;

And a single Customers table, which will consist of four columns:

CREATE TABLE .(

NOT NULL, NULL, NULL, NULL) GO

Now let's fill our table with random data. The Id column will be increased in a loop, and the remaining three columns of the table will be filled with random numbers using a peculiar version of the random function:

DECLARE @i int = 0;

WHILE (@i< 500000) BEGIN INSERT INTO Customers(Id, Num1, Num2, Num3) VALUES(

@i, abs(checksum(newid())), abs(checksum(newid())), abs(checksum(newid())) SET @i = @i + 1; END

This script adds half a million records to the table, so be patient, the script will run for at least 3 minutes.

Everything is ready for the test. We will evaluate the performance characteristics of the query. Since the query execution time may depend on the specific machine, we will analyze a more independent indicator - the number of logical reads.

To enable statistics collection mode, you must run the following command:

Now, after executing each request, on the Messages tab we will have access to statistics on the execution of this request, as shown below:

We are only interested in the value of the logical reads parameter.

So, there are no indexes in our table yet. Let's run the following three queries and record the number of logical reads for each query in the results table below:

1) SELECT Id, Num1, Num2 FROM Customers WHERE Id = 2000

2) SELECT Id, Num1, Num2 FROM Customers WHERE Id >= 0 AND Id< 1000

3) SELECT Id, Num1, Num2 FROM Customers WHERE Id >= 0 AND Id< 5000

These queries will return 1 row, 1000 rows and 5000 rows, respectively. Without indexes, the performance indicator (number of logical reads) for all queries is the same and equal to 1621. Let’s enter the data into the results table:

We see that for the second and third queries, when a fairly large number of rows are returned, the index we created did not improve performance. However, for a query that returns a single row, the speedup was huge. Thus, we can conclude that it makes sense to create non-covering indexes when optimizing queries that return a single result.

Now let's create a covering index, thereby achieving maximum performance.

First, let's delete the previous index:

USE TestDB GO DROP INDEX Customers.TestIndex1

And let's create a new index:

CREATE NONCLUSTERED INDEX TestIndex2 ON dbo.Customers(Id) INCLUDE (Num1, Num2);

Now let's run our queries a third time and write the results into a table:

No indexes

Non-covering index

Covering index

It’s easy to see that the performance increase has been enormous. Thus, we have increased the speed of query execution tens of times. When running a database that stores millions of rows, this performance gain will be quite noticeable.

In this article, we looked at an example of optimizing a database by creating indexes. It is worth noting that the creation of indexes is a purely individual process for each request. To build an index that will truly optimize query performance, you must carefully analyze the query itself and its execution plan.

Efficient index building is one of the best ways to improve the performance of a database application. Without the use of indexes, SQL Server is like a reader trying to find a word in a book by looking at each page. If the book has a subject index (index), the reader can search for the necessary information much more quickly.

In the absence of an index, the SQL server, when retrieving data from a table, will scan the entire table and check each row to see if the query criteria are met. Such a full scan can be disastrous for the performance of the entire system, especially if there is a lot of data in the tables.

One of the most important tasks when working with a database is building an optimal index to improve system performance. Most major databases provide tools to view the query execution plan and help you tune and optimize indexes. This article highlights several good rules of thumb that apply when creating or modifying indexes in a database. First, let's look at situations where indexing improves performance and where indexing can hurt.

Useful indexes

So, table indexing will be useful when searching for a specific record in a table using the Where statement. Such queries include, for example, queries that search for a range of values, queries that match an exact value to a specific value, and queries that merge two tables.

For example, the following queries against the Northwind database will run more efficiently when building an index on the UnitPrice column.

Delete from Products Where UnitPrice=1
Select * from products Where UnitPrice between 14 AND 16

Because index items are stored sorted, indexing is also useful when building a query using the Order by clause. Without an index, records are loaded and sorted while the query is running. An index based on UnitPrice will allow you to simply scan the index and retrieve rows by reference when processing the next request. If you want to sort the rows in descending order, you can simply scan the index in reverse order.

Select * From Products order by UnitPrice ASC

Grouping a record using the Group by statement also often requires sorting, so building an index on the UnitPrice column will also be useful for the next query that counts the number of units of a product at each specific price

Select count(*), UnitPrice From Products Group by UnitPrice

Indexes are useful for maintaining a unique value for a column, since the DBMS can easily look at the index to see if the value already exists. For this reason, primary keys are always indexed.

Disadvantages of Indexing

Indexes degrade system performance during record changes. Any time a query is executed to change data in a table, the index must also change. To select the optimal number of indexes, you need to test the database and monitor its performance. Static systems, where databases are used primarily for data retrieval, such as reporting, can contain more indexes to support read-only queries. Databases with a large number of transactions to change data will need a small number of indexes to provide higher throughput.

Indexes take up additional space on disk and in RAM. The exact size will depend on the number of records in the table, as well as the number and size of columns in the index. In most cases, this is not a major issue since disk space is now easy to sacrifice for better performance.

Building an Optimal Index

Simple index

A simple index is an index that uses the values ​​of a single field in a table. Using a simple index is beneficial for two reasons. Firstly, running a database puts a lot of stress on your hard drive. Large index keys will force the database to perform more I/O operations, which limits performance.

Second, because index elements are often involved in comparisons, smaller indexes are easier to compare. For these two reasons, a single integer column is a better index because it is small and easy to compare. Character strings, on the other hand, require character-by-character comparisons and attention to parameter handling.

Selective index

The most efficient indexes are those with a low percentage of duplicate values. For example, a telephone directory for a city in which almost everyone has the last name Smith will not be as useful if the entries in it are sorted by last name.

An index with a high percentage of unique values ​​is also called a selective index. Obviously, a unique index has the greatest selectivity, since it does not contain duplicate values. Many DBMSs can track statistics about each index and can recognize how many non-duplicate values ​​each index contains. This statistics is used when generating a query execution plan.

Covering Indexes

Indexes consist of a column of data on which the index itself is built and a pointer to the corresponding row. It's like a book's index: it just contains the keywords and a link to a page you can go to for more information. Typically the DBMS will follow pointers to a row from the index to collect all the information needed for the query. However, if the index contains all the columns needed in the query, the information can be retrieved without accessing the table itself.

Let's consider an index on the UnitPrice column, which was already mentioned above. The DBMS can only use the index items to execute the next query.

Select Count(*), UnitPrice From Products Group by UnitPrice

This type of query is called a covering query because all the columns being queried can be retrieved from a single index. For the most important queries, you may want to consider creating a covering index for the best possible performance. Such indexes are likely to be composite (using more than one column), which is the opposite of the first principle: create simple indexes. Obviously, choosing the optimal number of columns in an index can only be assessed through testing and monitoring the performance of the database in various situations.

Cluster index

Many databases have one special index on a table, where all the data from a row is contained in the index. In SQL Server, such an index is called a clustered index. A clustered index can be compared to a telephone directory because each index element contains all the information you need and does not contain links to obtain additional data.

There is a general rule - every non-trivial table must have a clustered index. If it is possible to create only one index on a table, make it clustered. In SQL Server, when a primary key is created, a clustered index will be automatically created (if it does not already contain one), using the primary key column as the indexing key. A clustered index is the most efficient index (if used, it covers the entire query) and in many DBMSs such an index helps to effectively manage the space requested for storing tables, since otherwise (without building a clustered index) table rows are stored in an unordered structure, which called a heap.

Be careful when selecting columns for a clustered index. If you change a record and change the value of a column in a clustered index, the database will be forced to rebuild the index items (to keep them in sorted order). Remember, the index items for a clustered index contain all of the column values, so changing the value of a column is comparable to executing a Delete statement followed by an Insert statement, which will obviously cause performance problems if done frequently. For this reason, clustered indexes often consist of a primary key and a foreign key column. If key values ​​change, they change very rarely.

Conclusion

Determining the correct indexes to use in a database requires careful analysis and testing of the system. The practices presented in this article are good rules for constructing indexes. After applying these methods, you will need to retest your specific application under your specific hardware, memory, and operations conditions.

One of the most important ways to achieve high productivity SQL Server is the use of indexes. An index speeds up the query process by providing quick access to rows of data in a table, much like an index in a book helps you quickly find the information you need. In this article I will give a brief overview of indexes in SQL Server and explain how they are organized in the database and how they help speed up database queries.

Indexes are created on table and view columns. Indexes provide a way to quickly search data based on the values ​​in those columns. For example, if you create an index on a primary key and then search for a row of data using the primary key values, then SQL Server will first find the index value and then use the index to quickly find the entire row of data. Without an index, a full scan of all rows in the table will be performed, which can have a significant performance impact.
You can create an index on most columns in a table or view. The exception is mainly columns with data types for storing large objects ( LOB), such as image, text or varchar(max). You can also create indexes on columns designed to store data in the format XML, but these indexes are structured slightly differently than the standard ones and their consideration is beyond the scope of this article. Also, the article does not discuss columnstore indexes. Instead, I focus on those indexes that are most commonly used in databases SQL Server.
An index consists of a set of pages, index nodes, which are organized in a tree structure - balanced tree. This structure is hierarchical in nature and starts with a root node at the top of the hierarchy and leaf nodes, the leaves, at the bottom, as shown in the figure:


When you query an indexed column, the query engine starts at the top of the root node and works its way down through the intermediate nodes, with each intermediate layer containing more detailed information about the data. The query engine continues to move through the index nodes until it reaches the bottom level with the index leaves. For example, if you are looking for the value 123 in an indexed column, the query engine will first determine the page at the first intermediate level at the root level. In this case, the first page points to a value from 1 to 100, and the second from 101 to 200, so the query engine will access the second page of this intermediate level. Next you will see that you should turn to the third page of the next intermediate level. From here, the query subsystem will read the value of the index itself at a lower level. Index leaves can contain either the table data itself or simply a pointer to rows with data in the table, depending on the type of index: clustered index or nonclustered index.

Clustered index
A clustered index stores the actual rows of data in the leaves of the index. Returning to the previous example, this means that the row of data associated with the key value of 123 will be stored in the index itself. An important characteristic of a clustered index is that all values ​​are sorted in a specific order, either ascending or descending. Therefore, a table or view can only have one clustered index. In addition, it should be noted that data in a table is stored in sorted form only if a clustered index has been created on this table.
A table that does not have a clustered index is called a heap.
Non-clustered index
Unlike a clustered index, the leaves of a nonclustered index contain only those columns ( key) by which this index is determined, and also contains a pointer to rows with real data in the table. This means that the subquery system requires an additional operation to locate and retrieve the required data. The content of the data pointer depends on how the data is stored: clustered table or heap. If a pointer points to a clustered table, it points to a clustered index that can be used to find the actual data. If a pointer refers to a heap, then it points to a specific data row identifier. Nonclustered indexes cannot be sorted like clustered indexes, but you can create more than one nonclustered index on a table or view, up to 999. This does not mean that you should create as many indexes as possible. Indexes can either improve or degrade system performance. In addition to being able to create multiple non-clustered indexes, you can also include additional columns ( included column) into its index: the leaves of the index will store not only the value of the indexed columns themselves, but also the values ​​of these non-indexed additional columns. This approach will allow you to bypass some of the restrictions placed on the index. For example, you can include a non-indexable column or bypass the index length limit (900 bytes in most cases).

Types of Indexes

In addition to being either a clustered or nonclustered index, it can be further configured as a composite index, a unique index, or a covering index.
Composite index
Such an index can contain more than one column. You can include up to 16 columns in an index, but their total length is limited to 900 bytes. Both clustered and nonclustered indexes can be composite.
Unique index
This index ensures that each value in the indexed column is unique. If the index is composite, then uniqueness applies to all columns in the index, but not to each individual column. For example, if you create a unique index on the columns NAME And SURNAME, then the full name must be unique, but duplicates in the first or last name are possible.
A unique index is automatically created when you define a column constraint: primary key or unique value constraint:
  • Primary key
    When you define a primary key constraint on one or more columns then SQL Server automatically creates a unique clustered index if a clustered index has not been created previously (in this case, a unique non-clustered index is created on the primary key)
  • Uniqueness of values
    When you define a constraint on the uniqueness of values ​​then SQL Server automatically creates a unique non-clustered index. You can specify that a unique clustered index be created if no clustered index has yet been created on the table
Covering index
Such an index allows a specific query to immediately obtain all the necessary data from the leaves of the index without additional access to the records of the table itself.

Designing Indexes

As useful as indexes can be, they must be designed carefully. Because indexes can take up significant disk space, you don't want to create more indexes than necessary. In addition, indexes are automatically updated when the data row itself is updated, which can lead to additional resource overhead and performance degradation. When designing indexes, several considerations regarding the database and queries against it must be taken into account.
Database
As noted earlier, indexes can improve system performance because they provide the query engine with a fast way to find data. However, you should also take into account how often you intend to insert, update, or delete data. When you change data, the indexes must also be changed to reflect the corresponding actions on the data, which can significantly reduce system performance. Consider the following guidelines when planning your indexing strategy:
  • For tables that are updated frequently, use as few indexes as possible.
  • If the table contains a large amount of data but changes are minor, then use as many indexes as necessary to improve the performance of your queries. However, think carefully before using indexes on small tables, because... It is possible that using an index search may take longer than simply scanning all rows.
  • For clustered indexes, try to keep fields as short as possible. The best approach is to use a clustered index on columns that have unique values ​​and do not allow NULL. This is why a primary key is often used as a clustered index.
  • The uniqueness of the values ​​in a column affects the performance of the index. In general, the more duplicates you have in a column, the worse the index performs. On the other hand, the more unique values ​​there are, the better the performance of the index. Use a unique index whenever possible.
  • For a composite index, take into account the order of the columns in the index. Columns that are used in expressions WHERE(For example, WHERE FirstName = "Charlie") must be first in the index. Subsequent columns should be listed based on the uniqueness of their values ​​(columns with the highest number of unique values ​​come first).
  • You can also specify an index on calculated columns if they meet certain requirements. For example, expressions used to obtain the value of a column must be deterministic (always return the same result for a given set of input parameters).
Database queries
Another consideration when designing indexes is what queries are being run against the database. As stated earlier, you must consider how often the data changes. Additionally, the following principles should be used:
  • Try to insert or modify as many rows as possible in one query, rather than doing it in several single queries.
  • Create a non-clustered index on columns that are frequently used as search terms in your queries. WHERE and connections in JOIN.
  • Consider indexing columns used in row lookup queries for exact value matches.

And now, actually:

14 questions about indexes in SQL Server that you were embarrassed to ask

Why can't a table have two clustered indexes?

Want a short answer? A clustered index is a table. When you create a clustered index on a table, the storage engine sorts all rows in the table in ascending or descending order, according to the index definition. A clustered index is not a separate entity like other indexes, but a mechanism for sorting data in a table and facilitating quick access to data rows.
Let's imagine that you have a table containing the history of sales transactions. The Sales table includes information such as order ID, product position in the order, product number, quantity of product, order number and date, etc. You create a clustered index on columns OrderID And LineID, sorted in ascending order as shown in the following T-SQL code:
CREATE UNIQUE CLUSTERED INDEX ix_oriderid_lineid ON dbo.Sales(OrderID, LineID);
When you run this script, all rows in the table will be physically sorted first by the OrderID column and then by LineID, but the data itself will remain in a single logical block, the table. For this reason, you cannot create two clustered indexes. There can only be one table with one data and that table can only be sorted once in a specific order.

If a clustered table provides many benefits, then why use a heap?

You're right. Clustered tables are great and most of your queries will perform better on tables that have a clustered index. But in some cases you may want to leave the tables in their natural, pristine state, i.e. in the form of a heap, and create only non-clustered indexes to keep your queries running.
The heap, as you remember, stores data in random order. Typically, the storage subsystem adds data to a table in the order in which it is inserted, but the storage subsystem also likes to move rows around for more efficient storage. As a result, you have no chance to predict in what order the data will be stored.
If the query engine needs to find data without the benefit of a nonclustered index, it will do a full scan of the table to find the rows it needs. On very small tables this is usually not a problem, but as the heap grows in size, performance quickly drops. Of course, a non-clustered index can help by using a pointer to the file, page and row where the required data is stored - this is usually a much better alternative to a table scan. Even so, it's difficult to compare the benefits of a clustered index when considering query performance.
However, the heap can help improve performance in certain situations. Consider a table with a lot of inserts but few updates or deletes. For example, a table storing a log is primarily used to insert values ​​until it is archived. On the heap, you won't see paging and data fragmentation like you would with a clustered index because the rows are simply added to the end of the heap. Splitting pages too much can have a significant impact on performance, and not in a good way. In general, the heap allows you to insert data relatively painlessly and you won't have to deal with the storage and maintenance overheads that you would with a clustered index.
But lack of updating and deleting data should not be considered the only reason. The way the data is sampled is also an important factor. For example, you shouldn't use a heap if you frequently query ranges of data or the data you query often needs to be sorted or grouped.
All this means is that you should only consider using the heap when you're working with very small tables or all of your interaction with the table is limited to inserting data and your queries are extremely simple (and you're using non-clustered indexes anyway). Otherwise, stick with a well-designed clustered index, such as one defined on a simple ascending key field, like a widely used column with IDENTITY.

How do I change the default index fill factor?

Changing the default index fill factor is one thing. Understanding how the default ratio works is another matter. But first, take a few steps back. The index fill factor determines the amount of space on the page to store the index at the bottom level (leaf level) before starting to fill a new page. For example, if the coefficient is set to 90, then when the index grows, it will occupy 90% of the page and then move to the next page.
By default, the index fill factor value is in SQL Server is 0, which is the same as 100. As a result, all new indexes automatically inherit this setting unless you specifically specify a value in your code that is different from the system standard value or change the default behavior. You can use SQL Server Management Studio to adjust the default value or run a system stored procedure sp_configure. For example, the following set T-SQL commands sets the coefficient value to 90 (you must first switch to the advanced settings mode):
EXEC sp_configure "show advanced options", 1; GO RECONFIGURE; GO EXEC sp_configure "fill factor", 90; GO RECONFIGURE; GO
After changing the index fill factor value, you need to restart the service SQL Server. You can now check the set value by running sp_configure without the specified second argument:
EXEC sp_configure "fill factor" GO
This command should return a value of 90. As a result, all newly created indexes will use this value. You can test this by creating an index and querying for the fill factor value:
USE AdventureWorks2012; -- your database GO CREATE NONCLUSTERED INDEX ix_people_lastname ON Person.Person(LastName); GO SELECT fill_factor FROM sys.indexes WHERE object_id = object_id("Person.Person") AND name="ix_people_lastname";
In this example, we created a non-clustered index on a table Person in the database AdventureWorks2012. After creating the index, we can get the fill factor value from the sys.indexes system tables. The query should return 90.
However, let's imagine that we deleted the index and created it again, but now we specified a specific fill factor value:
CREATE NONCLUSTERED INDEX ix_people_lastname ON Person.Person(LastName) WITH (fillfactor=80); GO SELECT fill_factor FROM sys.indexes WHERE object_id = object_id("Person.Person") AND name="ix_people_lastname";
This time we have added instructions WITH and option fillfactor for our index creation operation CREATE INDEX and specified the value 80. Operator SELECT now returns the corresponding value.
So far, everything has been pretty straightforward. Where you can really get burned in this whole process is when you create an index that uses a default coefficient value, assuming you know that value. For example, someone is tinkering with the server settings and is so stubborn that they set the index fill factor to 20. Meanwhile, you continue to create indexes, assuming the default value is 0. Unfortunately, you have no way to find out the fill factor until as long as you don't create an index and then check the value like we did in our examples. Otherwise, you will have to wait for the moment when query performance drops so much that you begin to suspect something.
Another issue you should be aware of is rebuilding indexes. As with creating an index, you can specify the index fill factor value when you rebuild it. However, unlike the create index command, rebuild does not use the server default settings, despite what it may seem like. Even more, if you do not specifically specify the index fill factor value, then SQL Server will use the value of the coefficient with which this index existed before its restructuring. For example, the following operation ALTER INDEX rebuilds the index we just created:
ALTER INDEX ix_people_lastname ON Person.Person REBUILD; GO SELECT fill_factor FROM sys.indexes WHERE object_id = object_id("Person.Person") AND name="ix_people_lastname";
When we check the fill factor value, we will get a value of 80, because that is what we specified when we last created the index. The default value is ignored.
As you can see, changing the index fill factor value is not that difficult. It is much more difficult to know the current value and understand when it is applied. If you always specifically specify the coefficient when creating and rebuilding indexes, then you always know the specific result. Unless you have to worry about making sure someone else doesn't screw up the server settings again, causing all the indexes to be rebuilt with a ridiculously low index fill factor.

Is it possible to create a clustered index on a column that contains duplicates?

Yes and no. Yes you can create a clustered index on a key column that contains duplicate values. No, the value of a key column cannot remain in a non-unique state. Let me explain. If you create a non-unique clustered index on a column, the storage engine adds a uniquifier to the duplicate value to ensure uniqueness and therefore be able to identify each row in the clustered table.
For example, you might decide to create a clustered index on a column containing customer data LastName keeping the surname. The column contains the values ​​Franklin, Hancock, Washington, and Smith. Then you insert the values ​​Adams, Hancock, Smith and Smith again. But the value of the key column must be unique, so the storage engine will change the value of the duplicates so that they look something like this: Adams, Franklin, Hancock, Hancock1234, Washington, Smith, Smith4567 and Smith5678.
At first glance, this approach seems fine, but an integer value increases the size of the key, which can become a problem if there are a large number of duplicates, and these values ​​will become the basis of a nonclustered index or a foreign key reference. For these reasons, you should always try to create unique clustered indexes whenever possible. If this is not possible, then at least try to use columns with a very high unique value content.

How is the table stored if a clustered index has not been created?

SQL Server supports two types of tables: clustered tables that have a clustered index and heap tables or just heaps. Unlike clustered tables, the data on the heap is not sorted in any way. In essence, this is a pile (heap) of data. If you add a row to such a table, the storage engine will simply append it to the end of the page. When the page is filled with data, it will be added to a new page. In most cases, you'll want to create a clustered index on a table to take advantage of sortability and query speed (try imagining looking up a phone number in an unsorted address book). However, if you choose not to create a clustered index, you can still create a nonclustered index on the heap. In this case, each index row will have a pointer to a heap row. The index includes the file ID, page number, and data line number.

What is the relationship between value uniqueness constraints and a primary key with table indexes?

A primary key and a unique constraint ensure that the values ​​in a column are unique. You can only create one primary key for a table and it cannot contain values NULL. You can create several restrictions on the uniqueness of a value for a table, and each of them can have a single record with NULL.
When you create a primary key, the storage engine also creates a unique clustered index if a clustered index has not already been created. However, you can override the default behavior and a non-clustered index will be created. If a clustered index exists when you create the primary key, a unique nonclustered index will be created.
When you create a unique constraint, the storage engine creates a unique, nonclustered index. However, you can specify the creation of a unique clustered index if one has not been created previously.
In general, a unique value constraint and a unique index are the same thing.

Why are clustered and non-clustered indexes called B-tree in SQL Server?

Basic indexes in SQL Server, clustered or nonclustered, are distributed across sets of pages called index nodes. These pages are organized in a specific hierarchy with a tree structure called a balanced tree. At the top level there is the root node, at the bottom there are the leaf nodes, with intermediate nodes between the top and bottom levels, as shown in the figure:


The root node provides the main entry point for queries attempting to retrieve data through the index. Starting from this node, the query engine initiates a navigation down the hierarchical structure to the appropriate leaf node containing the data.
For example, imagine that a request has been received to select rows containing a key value of 82. The query subsystem starts working from the root node, which refers to a suitable intermediate node, in our case 1-100. From the intermediate node 1-100 there is a transition to node 51-100, and from there to the final node 76-100. If this is a clustered index, then the node leaf contains the data of the row associated with the key equal to 82. If this is a non-clustered index, then the index leaf contains a pointer to the clustered table or a specific row in the heap.

How can an index even improve query performance if you have to traverse all these index nodes?

First, indexes don't always improve performance. Too many incorrectly created indexes turn the system into a quagmire and degrade query performance. It's more accurate to say that if indexes are carefully applied, they can provide significant performance gains.
Think of a huge book dedicated to performance tuning SQL Server(paper version, not electronic version). Imagine you want to find information about configuring Resource Governor. You can drag your finger page by page through the entire book, or open the table of contents and find out the exact page number with the information you are looking for (provided that the book is correctly indexed and the contents have the correct indexes). This will certainly save you significant time, even though you must first access a completely different structure (the index) to get the information you need from the primary structure (the book).
Like a book index, an index in SQL Server allows you to run precise queries on the data you need instead of completely scanning all the data contained in a table. For small tables, a full scan is usually not a problem, but large tables take up many pages of data, which can result in significant query execution time unless an index exists to allow the query engine to immediately obtain the correct location of the data. Imagine getting lost at a multi-level road junction in front of a major metropolis without a map and you'll get the idea.

If indexes are so great, why not just create one on every column?

No good deed should go unpunished. At least that's the case with indexes. Of course, indexes work great as long as you run operator fetch queries SELECT, but as soon as frequent calls to operators begin INSERT, UPDATE And DELETE, so the landscape changes very quickly.
When you initiate a data request by the operator SELECT, the query engine finds the index, moves through its tree structure, and discovers the data it is looking for. What could be simpler? But things change if you initiate a change statement like UPDATE. Yes, for the first part of the statement, the query engine can again use the index to locate the row being modified - that's good news. And if there is a simple change in data in a row that does not affect changes in key columns, then the change process will be completely painless. But what if the change causes the pages containing the data to be split, or the value of a key column is changed causing it to be moved to another index node - this will result in the index possibly needing a reorganization affecting all associated indexes and operations, resulting in widespread decline in productivity.
Similar processes occur when calling an operator DELETE. An index can help locate the data being deleted, but deleting the data itself may result in page reshuffling. Regarding the operator INSERT, the main enemy of all indexes: you start adding a large amount of data, which leads to changes in indexes and their reorganization and everyone suffers.
So consider the types of queries to your database when thinking about what type of indexes and how many to create. More doesn't mean better. Before adding a new index to a table, consider the cost of not only the underlying queries, but also the amount of disk space consumed, the cost of maintaining functionality and indexes, which can lead to a domino effect on other operations. Your index design strategy is one of the most important aspects of your implementation and should include many considerations, from the size of the index, the number of unique values, to the type of queries the index will support.

Is it necessary to create a clustered index on a column with a primary key?

You can create a clustered index on any column that meets the required conditions. It is true that a clustered index and a primary key constraint are made for each other and are a match made in heaven, so understand the fact that when you create a primary key, then a clustered index will be automatically created if one has not been created before. However, you may decide that a clustered index would perform better elsewhere, and often your decision will be justified.
The main purpose of a clustered index is to sort all the rows in your table based on the key column specified when defining the index. This provides quick search and easy access to table data.
A table's primary key can be a good choice because it uniquely identifies each row in tables without having to add additional data. In some cases, the best choice will be a surrogate primary key, which is not only unique, but also small in size and whose values ​​increase sequentially, making nonclustered indexes based on this value more efficient. The query optimizer also likes this combination of a clustered index and a primary key because joining tables is faster than joining in another way that does not use a primary key and its associated clustered index. Like I said it's a match made in heaven.
Finally, however, it is worth noting that when creating a clustered index there are several aspects to consider: how many non-clustered indexes will be based on it, how often the value of the key index column will change, and how large. When the values ​​in the columns of a clustered index change or the index does not perform as expected, then all other indexes on the table can be affected. A clustered index should be based on the most persistent column whose values ​​increase in a specific order but do not change in a random manner. The index must support queries against the table's most frequently accessed data, so the queries take full advantage of the fact that the data is sorted and accessible at the root nodes, the leaves of the index. If the primary key fits this scenario, then use it. If not, then choose a different set of columns.

What if you index a view, is it still a view?

A view is a virtual table that generates data from one or more tables. Essentially, it is a named query that retrieves data from the underlying tables when you query that view. You can improve query performance by creating a clustered index and nonclustered indexes on this view, similar to how you create indexes on a table, but the main caveat is that you first create a clustered index, and then you can create a nonclustered one.
When an indexed view (materialized view) is created, then the view definition itself remains a separate entity. This is, after all, just a hardcoded operator SELECT, stored in the database. But the index is a completely different story. When you create a clustered or nonclustered index on a provider, the data is physically saved to disk, just like a regular index. In addition, when data changes in underlying tables, the view's index automatically changes (this means you may want to avoid indexing views on tables that change frequently). In any case, the view remains a view - a view of the tables, but one executed at the moment, with indexes corresponding to it.
Before you can create an index on a view, it must meet several constraints. For example, a view can only reference base tables, but not other views, and those tables must be in the same database. There are actually many other restrictions, so be sure to check the documentation for SQL Server for all the dirty details.

Why use a covering index instead of a composite index?

First, let's make sure we understand the difference between the two. A compound index is simply a regular index that contains more than one column. Multiple key columns can be used to ensure that each row in a table is unique, or you may have multiple columns to ensure that the primary key is unique, or you may be trying to optimize the execution of frequently invoked queries on multiple columns. In general, however, the more key columns an index contains, the less efficient the index will be, which means that composite indexes should be used judiciously.
As stated, a query can benefit greatly if all the required data is immediately located on the leaves of the index, just like the index itself. This is not a problem for a clustered index because all the data is already there (which is why it's so important to think carefully when you create a clustered index). But a non-clustered index on leaves only contains key columns. To access all the other data, the query optimizer requires additional steps, which can add significant overhead to executing your queries.
This is where the covering index comes to the rescue. When you define a nonclustered index, you can specify additional columns to your key columns. For example, let's say your application frequently queries column data OrderID And OrderDate in the table Sales:
SELECT OrderID, OrderDate FROM Sales WHERE OrderID = 12345;
You can create a compound non-clustered index on both columns, but the OrderDate column will only add index maintenance overhead without serving as a particularly useful key column. The best solution would be to create a covering index on the key column OrderID and additionally included column OrderDate:
CREATE NONCLUSTERED INDEX ix_orderid ON dbo.Sales(OrderID) INCLUDE (OrderDate);
This avoids the disadvantages of indexing redundant columns while still maintaining the benefits of storing data in leaves when running queries. The included column is not part of the key, but the data is stored on the leaf node, the index leaf. This can improve query performance without any additional overhead. In addition, the columns included in the covering index are subject to fewer restrictions than the key columns of the index.

Does the number of duplicates in a key column matter?

When you create an index, you must try to reduce the number of duplicates in your key columns. Or more precisely: try to keep the repetition rate as low as possible.
If you are working with a composite index, then the duplication applies to all key columns as a whole. A single column can contain many duplicate values, but there should be minimal repetition among all index columns. For example, you create a compound nonclustered index on columns FirstName And LastName, you can have many John Doe values ​​and many Doe values, but you want to have as few John Doe values ​​as possible, or preferably just one John Doe value.
The uniqueness ratio of a key column's values ​​is called index selectivity. The more unique values ​​there are, the higher the selectivity: a unique index has the greatest possible selectivity. The query engine really likes columns with high selectivity values, especially if those columns are included in the WHERE clauses of your most frequently executed queries. The more selective the index, the faster the query engine can reduce the size of the resulting data set. The downside, of course, is that columns with relatively few unique values ​​will rarely be good candidates for indexing.

Is it possible to create a non-clustered index on only a specific subset of a key column's data?

By default, a nonclustered index contains one row for each row in the table. Of course, you can say the same thing about a clustered index, assuming that such an index is a table. But when it comes to a non-clustered index, the one-to-one relationship is an important concept because, starting with version SQL Server 2008, you have the option of creating a filterable index that limits the rows included in it. A filtered index can improve query performance because... it is smaller in size and contains filtered, more accurate statistics than all tabular ones - this leads to the creation of improved execution plans. A filtered index also requires less storage space and lower maintenance costs. The index is updated only when the data that matches the filter changes.
In addition, a filterable index is easy to create. In the operator CREATE INDEX you just need to indicate in WHERE filter condition. For example, you can filter out all rows containing NULL from the index, as shown in the code:
CREATE NONCLUSTERED INDEX ix_trackingnumber ON Sales.SalesOrderDetail(CarrierTrackingNumber) WHERE CarrierTrackingNumber IS NOT NULL;
We can, in fact, filter out any data that is not important in critical queries. But be careful, because... SQL Server imposes several restrictions on filterable indexes, such as the inability to create a filterable index on a view, so read the documentation carefully.
It may also be that you can achieve similar results by creating an indexed view. However, a filtered index has several advantages, such as the ability to reduce maintenance costs and improve the quality of your execution plans. Filtered indexes can also be rebuilt online. Try this with an indexed view.

And again a little from the translator

The purpose of the appearance of this translation on the pages of Habrahabr was to tell or remind you about the SimpleTalk blog from RedGate.
It publishes many entertaining and interesting posts.
I am not affiliated with any company products RedGate, nor with their sale.

As promised, books for those who want to know more
I recommend three very good books from myself (links lead to kindle versions in the store Amazon):

In principle, you can open simple indexes
  • for beginners
  • index
  • Add tags
    Microsoft SQL Server 2012 T-SQL Fundamentals (Developer Reference)
    Author Itzik Ben-Gan
    Publication Date: July 15, 2012
    The author, a master of his craft, provides basic knowledge about working with databases.
    If you've forgotten everything or never knew, it's definitely worth reading.

    ROWID indexes are database objects that provide a display of all the values ​​in a table column, as well as the ROWIDs of all the rows in the table that contain the column's values.

    ROWID is a pseudo-column that is a unique identifier for a row in a table and actually describes the exact physical location of that particular row. Based on this information Oracle can subsequently find the data associated with the table row. Each time a row is moved, exported, imported, or any other operation that changes its location, the ROWID line because it occupies a different physical position. For data storage ROWID 80 bits (10 bytes) required. Identifiers ROWID consist of four components: object number (32 bits), relative file number (10 bits), block number (22 bits) and line number (16 bits). These identifiers are displayed as 18-character sequences indicating the location of the data in the database, with each character represented in base-64 format consisting of the characters A-Z, a-z, 0-9, + and /. The first six characters are the data object number, the next three are the relative file number, the next six are the block number, and the last three are the line number.

    Example:

    SELECT fam, ROWID FROM student;

    FAM ROWID

    ——————————————

    IVANOV AAAA3kAAGAAAAGsAAA

    PETROV AAAA3kAAGAAAAGsAAB

    In the database Oracle indexes are used for different purposes: to ensure the uniqueness of values ​​in the database, to improve the performance of searching for records in a table, etc. Performance is improved by including a reference to the indexed column or columns in the search criteria for data in the table. IN Oracle indexes can be created on any table column except LONG columns. Indexes differentiate between speed-insensitive applications and high-performance applications, especially when working with large tables. However, before deciding to create an index, you need to weigh the pros and cons regarding system performance. Performance will not improve if you simply enter an index and forget about it.

    Although the biggest performance improvement comes from creating an index on a column where all values ​​are unique, you can get similar results for columns that contain duplicate or NULL values. It is not necessary for the column values ​​to be unique to create an index. Here are some recommendations to help you achieve the desired performance boost when using a standard index, and we'll also look at issues related to the balance between performance and disk space consumption when creating an index.

    Using indexes to look up information in tables can provide significant performance improvements over scanning tables whose columns are not indexed. However, choosing the right index is not at all easy. Of course, a column whose values ​​are all unique is preferable for indexing using a B-tree index, but a column that does not meet these requirements is a good candidate as long as about 10% of its rows contain identical values ​​and no more. “Switch” or “flag” columns, for example those that store information about a person’s gender, are not suitable for B-tree indexes. Columns that are used to store a small number of “reliable values”, as well as those that store certain values, are also not suitable. then signs, for example, “reliability” or “unreliability”, “activity” or “inactivity”, “yes” or “no”, etc., etc. Finally, indexes with reverse keys are used, as a rule, where where it is installed and operates Oracle Parallel Server and you need to increase the level of parallelism in the database to the maximum.