23 Oct Optimizing SQL Queries for Screaming Fast Applications
The performance of your application is highly dependent on how well your database is design. Designing high performing and normalized SQL Databases can be challenging. At Daxima we’ve designed hundreds of applications and databases and have identified some tried and true steps that will help you design the most normalized and fastest performing database.
If you’ve ever designed a SQL database, then you know the importance of indexes. Indexes, when done correctly, can dramatically increase read times during queries. Tables with large amounts of data specially should always be indexed.
However, creating the correct index can be tricky. Here are a few tips:
- Create indexes on fields that are the most selective in the table in order.
- The primary key is usually the most selective index followed by any types of foreign key.
- Do not use indexes on a column where functions are used commonly in the where clause.
- Permitted values, dates, and booleans are columns you would not place indexes on.
- Any change in DB structure can cause problems with your indexes. Make sure to keep them updated at all times.
- Indexes can get fragmented over time – so make sure to have a process in place to either defragment or rebuild the indexes on a regular basis.
Querying data that joins multiple tables can sometimes retrieve redundant data. Keys are usually one of the most types of data that’s redundantly pulled in many queries. To avoid this, only pull the columns you need and avoid the asterisk (*) character as much as possible. When creating queries with possible redundant rows, left join multiple tables using DISTINCT or GROUP BY to get unique rows. Also, If you only want a certain amount of rows, use the LIMIT or SET ROWCOUNT functions. Avoid using a LIKE operator and the wildcard character (%) as they may slow down the query execution especially if there is an index on the field.
A correlated subquery is one that uses values from the outer query. This can be inefficient with the subquery evaluatd once for each row processed by the outer query. Correlated subqueries run all the rows from the subquery first, then the rows from the main. These queries use the EXISTS in the WHERE clause which can find results that are not necessary. Instead use INNER JOIN if possible to speed up the execution.
Temp tables can help speed up a query, but only in certain situations. Temp tables are sometimes necessary when querying large data that needs to be compressed into a smaller subset. This will only help when querying a very large table (1 million rows plus) and you need only 25% or less of that data. If your query doesn’t meet the above requirement, please don’t use a temp table.
The HAVING clause only returns rows where aggregate values meet the specified conditions. It is very similar to the WHERE clause which is why it can get used incorrectly. When HAVING is used when it should be a WHERE, it will slow down your query as it only gets executed after all the rows are returned. You should always use WHERE unless a GROUP BY field is filtered, then use HAVING.