Query tuning

What is Query Tuning?

SQL can be said to be one of the indispensable skills for every programmer. Writing a query is actually quite simple. However, just writing the query is not enough; ensuring that the query is always executed effectively in practice is the difficult problem. That is why we must evaluate query performance and optimize the query ( Query Tuning ) if the query performance is low.

Query Tuning can be understood as rewriting a query so that it is executed faster, more efficiently, and is an inevitable thing to do if the current query has too low performance. So how to optimize the query, how to optimize? Let's find out together!

Query Processing và Query Execution

To learn how to optimize, we first need to understand how the process of processing and executing a query takes place.

As you can see, the process includes:

Step 1 (Parse and binding): The query will be parsed into one parse-treeand bound again into one.algebrized-tree
- Input: sql statement
- Output: relational algebra expression.
- For example:
For example:
Input
```
SELECT balance FROM account 
WHERE balance < 2500
```
Output
𝑝𝑏𝑎𝑙𝑎𝑛𝑐It is<2500(Pi𝑏𝑎𝑙𝑎𝑛𝑐It is(𝑎𝑐𝑐𝑂in𝑛𝑡))pbalance<2500( Pibalance(account))
or
Pi𝑏𝑎𝑙𝑎𝑛𝑐It is(𝑝𝑏𝑎𝑙𝑎𝑛𝑐It is<2500(𝑎𝑐𝑐𝑂in𝑛𝑡))Pibalance( pbalance<2500(account))
Step 2: Optimization (Query Optimization)
- Input: relational algebra expression
- Output: query plan
In the following section, I will talk more about this step.
Step 3: Query Execution
- Input: query plan
- Output: query results

Query Optimization

The purpose of this step is to find the best execution plan for the query.

Note:
Input: relational algebra expression
Output: query plan

The optimizer lists all possible plans, gets information about the current state of the database, evaluates the cost of each plan, and then selects the best plan. . This process includes 3 steps:

1. Equivalent transformation

Two algebraic relational expressions are considered equivalence if they both produce the same result on the same legal data set (database that meets all integrity constraints specified in the strategy). database map)

The equivalence transformation rule is to transform a relational algebra expression into another equivalent relational algebra expression.

2. Create query plans

Algebraic expressions are not query plans
An algebraic expression can generate multiple query plans

In this step, we will add:

The indexes will be used, for joins or selects,...
Algorithms used: sort, merge or join,...
...

For example:

3. Estimate costs

The purpose of this step is to find the best query plan

These are costs ƯỚC TÍNH, not actual costs

This cost is estimated based on factors such as:

Number of tuples per relation
Number of blocks on disk for each relation
Number of distinct values per field
histogram gives the value of each field

Based on the estimated cost, the "cheapest" query plan will be selected for the query optimization process.

Why Query Tuning?

After learning about Query Processing, do you think Query optimization is enough to optimize queries? The answer is not enough, because:

Transformations are only part of the query plan.
Only part of the plans are used.
Costs are estimates only.

If you perform Query Tuning to optimize the query right from the beginning of Query Processing, the Query Optimization process will be simplified and more effective. So we still need to do Query Tuning.

When is Query Tuning needed?

When queries run too slow

Queries accessing too much memory (retrieving from multiple tables)
Indexed queries are not used.

Some ways to Query Tuning

Get only the data you need:

Avoid SELECT *

Listing specific fields when using SELECT will help eliminate unnecessary fields from the query.

Name the result when SELECT to check the EXISTS condition

 SELECT driverslicensenr, name                                   
 FROM Drivers                                             
 WHERE EXISTS                                             
    (SELECT 'check'                                              
    FROM Fines                                               
    WHERE fines.driverslicensenr = drivers.driverslicensenr);

Avoid DISTINCT
DISTINCT will increase query time, so if you don't need it, it's best not to use it, especially for unique fields.
Limit the data you return by using TOP, LIMIT, or ROWNUM

Limit the use of nested queries

Using subqueries may not be effective if these subqueries are not correlated (attributes of the outer query are not used in subqueries).

As in the example in avoiding SELECT *, using INNER JOIN is more effective than using subquery:

 SELECT driverslicensenr, name                                   
 FROM Drivers                                             
 INNER JOIN fines ON fines.driverslicensenr = drivers.driverslicensenr;

Don't use HAVING when WHERE alone is enough

HAVING is often used with GROUP BY to conditionally limit the groups returned. However, if you use this clause in your query, the index will not be used, which can slow down the query.

SELECT state, COUNT(*)
FROM Drivers
WHERE state IN ('GA', 'TX')
GROUP BY state
ORDER BY state

SELECT state, COUNT(*)
FROM Drivers
GROUP BY state
HAVING state IN ('GA', 'TX')
ORDER BY state

The two queries above will return the same results on the same database but query_01will be faster because query_01they will set conditions to limit the returned results and query_02synthesize all results then use HAVING to remove the returned results. state other than "GA" and "TX".

Use `clustering indexes`to join

SELECT Employee.ssnum
FROM Employee, Student
WHERE Employee.name = Student.name

SELECT Employee.ssnum
FROM Employee, Student
WHERE Employee.ssnum = Student.ssnum

query_bwill be more effective query_bbecause:

Joining on two cluster indexes will allow merge join => better query efficiency.
Integer comparison is faster than string comparison

Note some system issues

LIKE and index
When using the LIKE operator, index will not be used if the pattern contains %or _. Furthermore, this query can also retrieve a lot of unnecessary records.

OR and index

Some systems never use index when using the OR operator. The reason is because indexes are used to quickly locate or look up data without having to search every row in the database.

SELECT driverslicensenr, name
FROM Drivers
WHERE driverslicensenr = 123456
    OR driverslicensenr = 678910
    OR driverslicensenr = 345678;

can be replaced with

SELECT driverslicensenr, name
FROM Drivers
WHERE driverslicensenr IN (123456, 678910, 345678);

NOT and index

Similar to OR, index can also be used with the NOT operator instead

SELECT driverslicensenr, name
FROM Drivers
WHERE NOT (year > 1980);

Let's write:

SELECT driverslicensenr, name
FROM Drivers
WHERE year <= 1980;

AND and index

Similar to OR and AND, it will slow down the query in some cases where it is used ineffectively such as:

SELECT driverslicensenr, name
SELECT driverslicensenr, name
FROM Drivers
WHERE year >= 1960 AND year <= 1980;

you can change it to:

SELECT driverslicensenr, name
FROM Drivers
WHERE year BETWEEN 1960 AND 1980;

Order in the FROM clause: Ordering in FROM is not appropriate and will slow down your query unless the query has JOIN too many tables (8 tables or more).

PreviousPerformance Tuning NextInterview ques

Last updated 1 year ago