Token bucket algorithm
Last updated
Last updated
The token bucket algorithm is widely used for rate limiting. It is simple, well understood and commonly used by Internet companies. Both Amazon and Stripe use this algorithm to throttle their API requests.
The token bucket algorithm works as follows:
A token bucket is a container that has a pre-defined capacity. Tokens are put in the bucket at preset rates periodically. Once the bucket is full, no more tokens are added. As shown in the picture below, the token bucket capacity is 4. The refiller puts 2 tokens into the bucket every second. Once the bucket is full, extra tokens will overflow.
Each request consumes one token. When a request arrives, we check if there are enough tokens in the bucket. The picture below explains how it works.
If there are enough tokens, we take one token out for each request, and the request goes through.
If there are not enough tokens, the request is dropped.
The picture below illustrates how token consumption, refill, and rate-limiting logic work. In this example, the token bucket size is 4, and the refill rate is 4 per 1 minute.
The token bucket algorithm takes two parameters:
Bucket size: the maximum number of tokens allowed in the bucket
Refill rate: number of tokens put into the bucket every second
How many buckets do we need? This varies, and it depends on the rate-limiting rules. Here are a few examples.
It is usually necessary to have different buckets for different API endpoints. For instance, if a user is allowed to make 1 post per second, add 150 friends per day, and like 5 posts per second, 3 buckets are required for each user.
If we need to throttle requests based on IP addresses, each IP address requires a bucket.
If the system allows a maximum of 10,000 requests per second, it makes sense to have a global bucket shared by all requests.
Pros
The algorithm is easy to implement.
Memory efficient.
Token bucket allows a burst of traffic for short periods. A request can go through as long as there are tokens left.
Cons
Two parameters in the algorithm are bucket size and token refill rate. However, it mightbe challenging to tune them properly.