This was written via Markdown. Apologies if its unclear, or your glasses are missing 😅.

Given:

  • a bucket b of a specific size n.
  • a token i representing a unit of consumption e.g. each token may represent an object/element such as # of API requests, # of TCP packets, # of order basket item quantities, integer prices etc. Basically, what you want “consumed” from the recipient.
  • a token refill rate which determines how fast rr tokens are added to the bucket per t seconds (can be larger time periods e.g. minutes, hours, days).
  • a consumption rate r which determines how fast i tokens are consumed from the bucket. This can be an API request rate per second, purchase requests per second etc.., each having a given number of “tokens” they want to consume per given time period.

The formula is (as far as I understand) as follows:

  1. the bucket b is initialized with n number of tokens. If not initialized with initial tokens, all initial requests will fail until at least 1 token is available for consumption, given requirement #3 below.
  2. For every consumption request, a given number of tokens #i is consumed from the bucket.
    • if #i is equal to or more than n, the tokens are consumed, and the remainder calculated and saved back into bucket b as n for this request. The request passes.
    • if #i is less than n, the request is ideally denied, but other mechanisms can be applied e.g. waiting for i tokens to be available. So n remains untouched for this specific request. The request fails.
  3. Given the rate which determines how fast tokens are refilled i.e. if every rr/t period of time passes, a single token is added to the bucket, up to a maximum of size n:
    • if the rate of refill is 5 tokens every 10 seconds, divide 5 tokens / 10 seconds, and you get 1 token every 2 seconds on best effort (i.e. depends on availability of resources for every action/request). So if your consumption rate is 10 tokens every 10 seconds, that makes your consumption rate r to be 10 tokens/10 seconds, thus r == 1/s. Bucket can only refill at effectively 5 tokens/10 seconds, making the rr/t == 0.5/s i.e. 1 token every 2 seconds, thus half your requests will probably fail.
    • if the rate of refill is 20 tokens every 5 seconds, divide 20 tokens / 5 seconds, and you get 4 tokens every 1 second on best effort. Thus if your consumption rate is 10 tokens every 10 seconds, that makes your consumption rate r to be effectively 10 tokens/10 seconds, thus r == 1/s. Bucket can be refilled at 20 tokens every 5 seconds, that makes the effective minimum 4 requests/1 second i.e. rr/t == 4/s. All your requests should probably go through.

Disclaimer

  • This has NOT taken into account bursts! The above is an ideal situation which will happen as part of daily life, but definitely expect the bursts! I might add that section later.
  • Race conditions are also possible, so beware!

Why I like this algorithm? Don’t be greedy 😁. BUT! When you have an audience whose usage you can determine or predict beforehand, this algorithm makes great sense to me! Payments APIs, Rate limiting for uploads, static content which might be exposed, any requests for sensitive info etc..