Best Practices

Web Application Performance Tuning

September 12th, 2020 - Vincent Luciani

These best practices will help you tuning the web performance of your web application. Use the quick links below to jump directly to the type of information you are interested in.



General


Use of cachingUse application caching for your API ( specific to the programming language ).
In case of application caching, you need to implement cache flush and cache information
Cache duration: determine the maximum time you can wait for fresh information and setup the cache expiration accordingly
Different types of cachingIf two actions have a different time need before refresh, put them on two separate API calls and have different caching rules for both. Example: translation of labels versus product data
Hosting provider bandwithChecking the service provider's network is well connected to the outside world:
- check on the offer the guaranteed connection.
- test a sample with webpagetest.org
- in case of an application, have the server near your visitors ( specifically ask )
- in case visitors all around the world, use load balancing based on localization
Less information exchangedMinimize the amount of data sent at each request between the servers.
Server locationCheck you did not put a component of the application on the wrong location: for example database in Asia and API in US-east.
In case your company use VPCs, make sure all components are on the same VPC. If not possible, check there is not firewall between them and check the connection speed between the two VPCs ( is there a peering ?).
TestingTest access to the application from different part of the world ( using webpagetest.org )
Check latest hardware offer from cloud providerBe aware of new type of hardware offered by your cloud provider. For example, they can plan to offer a new generation of CPU that is either more stable, more efficient, faster...
If it is possible to get the information from the cloud provider:
Level of RAM, type of RAM ( SRAM - static RAM faster ), RAM speed (DDR is double data rate and is faster )
CPU ( speed, number of cores, L1 and L2 cache - L1 cache is stored on the chip, L2 is stored outside the chip - , architecture - 64 bit faster than 32 bit etc
Disk ( disk speed, type of disk - obviously SSD, but careful some provider may still offer HDD )
Application using more than one disk in parallelEspecially interesting for databases or processes heavily writing to the disk: have more than one disk and parallelize the writing
Use multicoreIf you have more than one core, make sure you actually use them. If server has only one thread, then run several servers in parallel.
Software as a service on the cloudEspecially interesting for databases, but can be used for API too. Look for "best databases as a service" on a search engine. Use software as a service instead of having the application server installed on your own server.
Load balancingBalance the load between several servers. Be careful if your application has a login. In that case, you need to synchronize the sessions between the two servers
Use the correct version of softwareIn case you use a serverless architecture or have your database on the cloud, this is not useful. But if you are running the application on your own, make sure you are up to date with the version of your servers ( web server, database server, runtime environment, programming language)
Check influence of antimalware or other servicesMake sure you choose an antimalware than does not kill your application when running on the same server.
Monitor how much CPU, RAM, disk your antimalware is using.
Check if other services running on the same server can be suppressed
Check mechanisms to connect to backend are in place and tunedSet up timeout to avoid blocking the server with transactions that take too long
Retry mechanism with a maximum number of retries to avoid blocking the server
Make sure timeout on the application is synchronized with the timeout on backend: timeout calling > timeout called
Fine tune max pool size when using connection pools: maximum number of connections that can be created to satisfy client requests
Use in-memory databaseUse in-memory database like reddis instead of keeping information in large files ( useful for batch processing, storing of large amount of settings - like translations, or caching at application level )
Use of zip when processing filesThis applies to batch processing: when keeping the information for further processing, zip the kept information when storing it and unzip when reading it

Website


Use of CDNCache all static file using a CDN
Minimize scriptsMinimize Javascript files, CSS files, HTML file
In case of CMS, make sure the web pages are pre-cachedThere must be a mechanism on the CMS to pre-cache the webpages after creating the corresponding content.
Use of http2Use a server that supports http2 protocol
Use of AMPUse AMP for content pages ( articles, presentations ).
AMP pages and normal page equivalent must be linked together using a metatag
AMP pages have no javascript
Answers from server are zippedResponse header should have: Content-Encoding: gzip
Request header have Accept-Encoding: gzip, deflate, br
This is set on the application server
Lazy loading of imagesPut in place a mechanism in javascript loading images only once the page is loaded ( onload )
Lazy loading of everything under the foldSeparate what is under the fold ( images, but also CSS ), meaning not seen by visitors. Load first what is above the fold, and then only what is under the fold
Reduce the size of imageI use for example tinyPNG.com
File expirationSetup file expiration so that they can stay on the visitor's browser for a file - then when using the website another time, it will load faster for them
In .htaccess for apache:
ExpiresActive On
# Set up 1 week caching on static files
<FilesMatch ""\.(xml|txt|html|js|css)$"">
ExpiresDefault A604800
Header append Cache-Control ""proxy-revalidate""
</FilesMatch>
Less requests sent to the serverBundle js files together
Can bundle images files together using sprites
Use svg instead of images in the case of small icons
Testinghttps://yellowlab.tools/
webpagetest.org

SQL


Indexincreases retrieving
slows down updating
less effect is great percentage of rows have the same value for this column
Tune queries included in functionsCheck queries used by function and tune them as well
Use indexes when joining two tablesIf you are joining two tables using a query. create an index on both columns used to join these two tables.
Composite index: do not omit the first columnIn case your index is using two columns:
- do not omit the first column in the order by clause
- list the columns in the right order in your order by clause
Clustered index for rangesCreate a clustered index on the most used column when selecting ranges using this column. This will physically order the data in the table using this column ( can see it when executing a select without order by).
You can only have one clustered index per table.
Clustered indexes do not allow duplicates
Use joins instead of correlated subqueriesThis is not the case for all DBMSes
Do not use non-deterministic functions on the left hand side of a comparisonQuery before tuning: select * from client_status where datediff(day,status_date,getdate())> 50
This query cannot be cached.
After tuning:
declare @date_threshold date
select @date_threshold=
dateadd(day,-50,getdate())
select * from client_status
where status_date > @date_threshold
Add a column containing reverse names when looking for a string at the end of a columnCreate a new column in the same table (let us call it column B)
use a trigger so that each time you insert or update a row, you put in this new column B the reverse value of the column you want to use in the Where clause (let us call it column A).
When looking for a string at the end of the column A, match the reverse of the string you are looking for to the new column B
Do not need to select all columnsIn your query, select only the columns that are necessary for further processing
Execute updates, inserts, deletes in batchesSeparate the full list of updates, inserts or deletes into chunks (of 5000 operations for example), and update chunk by chunk
Split voluminous tables into several tablesIf you have a table storing a huge amount of information, you can split this table’s data between several tables. Each resulting table will contain less rows than the corresponding original table
Use a temporary table when joining tablesWhen joining two voluminous tables, you can enhance the query using the following technique:
you query one of the two tables
you store the result of this query in a temporary table
you join this temporary table with the second table
To choose which one of the two tables to use in step one, do so that the information in the temporary table is as less voluminous as possible.
Delete records from underlying tables (possibly archive the deleted records)If you have voluminous tables, you can increase your SQL performance by moving data that is not necessary anymore for your application in separate tables
In order to apply this tip, you need to find out rules to determine which data is not useful anymore.
Rebuild Indexes and tablesUse a rebuild function to rebuild the index.
Another option is to drop the index and then creating it again:
Drop indexes and foreign keys before loading dataDrop indexes and foreign keys, execute the inserts, delete, updates, then recreate the indexes and foreign keys
Online versus batchEverytime the user requests an update, an insert or a delete, decide:
if it is critical to execute the update, insert or delete as soon as the user requests it.
if it is critical to make the user wait for the completion of the update, insert or delete results before the user can use the application further.
In case offline processing is possible:
communicate to the user that the request is going to be processed
execute the operations offline
once done with the offline operation, communicate to the user the action is completed

Put the most selective criteria first in the index and in the where clauseIf your index includes several columns (composite index), the index should start with the column containing the least duplicate values (most selective column).
This technique implies putting the columns in the right order when creating the composite index. You should also order these columns well in your where clause
Use less storage space on each columnChoose the minimum type necessary

  • If the value is just yes or no (0 or 1), choose a bit (the size of a bit is 1 bit).

  • In the case of a number: if you are sure it will be less than 250, will not be negative and will contain no decimal, use a tinyint (the size of a tinyint is 1 byte, which is 8 bits).

  • In the case of a number: if you are sure it will be less than 32767 and will have no decimal use a smallint (size: 2 bytes).

  • In the case of a number: if you are sure it will be less that 2,147,483,647 and will have no decimal use an integer (size: 4 bytes). If not, use a bigint (size: 8 bytes).

  • In the case of a number that has decimals, find out the total number of digits really needed as well as the number of digits needed after the decimal point. The total number of digits determines the storage size: 1-9 digits: 5 bytes. 10-19 digits: 9 bytes. 20-28 digits: 13 bytes. 29-38 digits: 17 bytes. Then, the more digits you use after the decimal point, the less digits you can use before the decimal point.

  • If your store dates without using hour minutes and seconds, use smalldatetime.

  • In the case of a text: when all values in the column have the same length, use char(x). It uses 2 bytes less space for the same data length than varchar(x). Example: airport code which length is always 3 letters. Fields in char(3) will always have a storage size of 3 bytes. Fields in varchar(3) will have a storage size equal to the length of the data in the field + 2 bytes. As the data length is always 3 in our example, then the storage size will always be 3 + 2=5 bytes.

  • In the case of letters: if the values in the fields have various lengths then it is best to use varchar over char. Example: a column contains strings which biggest reasonable length is 100 characters. Let us say that in average the string length is 25 in this column. Then the storage size in the case of varchar(100) is in average equal to 25 + 2=27 bytes. On the other side, the storage size for char(100) is always 100 bytes.


If you choose varchar, you could decide what is the biggest reasonable length of the strings contained in the column. But as the storage size in varchar(x) depends on the string size, why bother determining what to put instead of x? Example: you know that the strings stored will never go over 200 characters, but the storage size of each string will be the same whether you declare the column as varchar(200) or varchar(1000). Indeed, the storage size of each string is equal to the string actual size + 2.

You will find some experts arguing that it is irrelevant for performance whether you declare varchar(200) or varchar(1000). Some others say that it can impact performance. Their argument is that varchar(200) is an indication provided to the DBMS of the estimated storage size per field in this column. As every indication can influence how the DBMS is planning to execute the query, it can make sense to give an estimate as close as possible to reality to “help” the DBMS doing his job well.

Choose between select count greater than 0 and existsIn some cases a query containing ”where exists” performs differently than a query counting a number of rows and checking if it is greater than 0. Both ways of doing return the same result, but their speed may differ.
select *
from clients c
where
(select
count(*)
from salesperson s
where c.salesperson_id
=s.id) > 0
This website is non commercial and does not register any of your personal data (no cookie, no statistics). The site and its content are delivered on an "as-is" and "as-available basis".
image/svg+xml