Cache-Control
Last updated
Last updated
Cache-control is an HTTP header that dictates browser caching behavior. In a nutshell, when someone visits a website, their browser will save certain resources, such as images and website data, in a store called the cache. When that user revisits the same website, cache-control sets the rules which determine whether that user will have those resources loaded from their local cache, or whether the browser will have to send a request to the server for fresh resources. In order to understand cache-control in greater depth, a basic understanding of browser caching and HTTP headers is required.
As explained above, browser caching is when a web browser saves website resources so it doesn’t have to fetch them again from a server. For example, a background image on a website might be saved locally in cache so that when a user visits that page for the second time, the image will load from the user’s local files and the page will load much faster.
Browsers will only store these resources for a specified period of time, known as the time to live (TTL). If a user requests a cached resource after the TTL has expired, the browser will have to reach out to the server again and download a fresh copy of the resource. How do browsers and web servers know the TTL for each resource? This is where HTTP headers come into play.
Headers consist of key-value pairs which are separated by a colon. For cache-control, the ‘key’, or the part to the left of the colon, is always ‘cache-control’. The ‘value’ is what’s found on the right of the colon, and there can be one or several comma-separated values for cache control.
These values are called directives, and they dictate who can cache a resource as well as how long those resources can be cached before they must be updated. Below we go through some of the most common cache-control directives:
Private
Setting cache to private means the content will not be stored in any proxies and it will only be cached by the client.
Don't make the mistake of thinking that setting headers will make data more secure, you still need to use SSL.
Public
If set is public, in addition to being cached on the client side, it can also be cached by proxies, serving many other users.
no-store
no-store determines that content will not be cached
no-cache
no-cache indicates that the chance can be maintained but the cached content will be re-validated from the server.
max-age: seconds
max-age determines the number of seconds each content will be cached. For example, if cache-control looks like this:
It means the content is public and has a time limit of 60 minutes.
s-maxage: seconds
s-maxage with s- stands for shared. This command specifically targets public caches. Like max-age , it also retrieves the number of seconds cached. If so, it will override the max-age and expires headers for shared caching.
must-revalidate
must-revalidate sometimes happens when you have network problems and content cannot be retrieved from the server, the browser can serve old content without validation.
If this directive is present, it means that old content is not served in any case and the data must be re-validated from the server side before serving.
proxy-revalidate
proxy-revalidate is similar to must-revalidate but it specifies shared or proxy caches.
Mixing values
You can combine them in many different ways, but no-cache/no-store and public/private are exceptions. If you set both no-store and no-cache, no-store will take precedence over no-cache.
For private/public, for any unauthenticated request, the cache is considered public and for any authenticated cache, it is considered private.
So far I've only talked about how cached content is considered new, but haven't talked about how client validation from the server. Now I will talk about headers used for this purpose:
ETag
Etag was introduced in HTTP/1.1, it is just a unique identifier that the server attaches to some resource. The etag is then used by the client to make conditional HTTP requests.
You can understand it as: "Please give me this resource if the ETag is not the same as the ETag I have" and the content is downloaded only if the Etag is invalid.
The method by which the ETag is generated is not described in the HTTP docs and often some hash function is used to assign the etag to each version of the resource. There are 2 types of etag: strong and weak:
A strong ETag means that the two resources are exactly the same and there is no difference between them. While a weak ETag means that two resources, although not exactly the same, can be considered the same.
Last-Modified
The server may include a Last-Modified header indicating the date and time that certain content was last modified.
When the content is old, the client will make a conditional request including the last modified time it has in a header called If-Modified-Since to the server to update Last-Modified, if it matches the date the client has, Last-Modified for updated content to be considered new in n seconds. If the received Last-Modified does not match the client, the content will be reloaded from the server and replaced with the content the client has.
Browser caching is a great way to both preserve resources and improve user experience on the Internet, but without cache-control, it would be very brittle. Every resource on every site would be bound by the same caching rules, meaning that sensitive information would be cached the same way as public information, and frequently-updated resources would be cached for the same amount of time as ones that rarely change.
Cache-control adds the flexibility that makes browser caching truly useful, letting developers dictate how each resource will be cached. It also lets developers set special rules for intermediaries, which is a factor in why sites that use a CDN, like the Cloudflare CDN, tend to perform better than sites that don’t.