Member-only story

System design paradigm: Caching

4 min readDec 23, 2020

The Problem

Caching is one of the two ways(the other is replication) to scale read heavy applications. Assume we have an application that handles 100K/s read requests. After load balancing the HTTP request from clients, those read requests will still hit the query backend(traditional DB or some in memory storage). If the backend can handle only 10K/s QPS, what should we do?

Pareto distribution of the query

The queries can be abstracted as key lookup(most of the time, that’s exactly what they are). Then there is a distribution of the request number among the keys. This distribution is almost always the Pareto distribution(power law).

That means a small percentage(e.x. 15%) of keys consists of a large percentage of the requests(e.x. 90%). We can put the 15% most frequently used data in a cache who can handle bigger QPS. The App servers will first query the cache and only go to DB when the key is not there. In our example, that will happen to only 10% of the requests. Then only 10% of the QPS will reach DB. This is the reasoning behind caching.

Distributed look-aside cache

Distributed sits in front of DB and is shared by all App servers. The cache itself is usually an in-memory DB that handles reading very fast. Most common caches are Redis and Memcached.

To query key K, App server use the following procedure:

Query cache for K, if cache hit, return the result from cache
Otherwise, query DB for K
Store the result to cache
Return the result to client

DB update will send a delete command to cache so that the cached value is invalidated. Delete is idempotent. Another way to keep cache consistent is to rely on expiration. Expiration is simpler design, but has higher inconsistency.

System design paradigm: Caching

The Problem

Pareto distribution of the query

Distributed look-aside cache

Written by Abracadabra

No responses yet