System design paradigm: Caching
The Problem
Caching is one of the two ways(the other is replication) to scale read heavy applications. Assume we have an application that handles 100K/s read requests. After load balancing the HTTP request from clients, those read requests will still hit the query backend(traditional DB or some in memory storage). If the backend can handle only 10K/s QPS, what should we do?
Pareto distribution of the query
The queries can be abstracted as key lookup(most of the time, that’s exactly what they are). Then there is a distribution of the request number among the keys. This distribution is almost always the Pareto distribution(power law).
That means a small percentage(e.x. 15%) of keys consists of a large percentage of the requests(e.x. 90%). We can put the 15% most frequently used data in a cache who can handle bigger QPS. The App servers will first query the cache and only go to DB when the key is not there. In our example, that will happen to only 10% of the requests. Then only 10% of the QPS will reach DB. This is the reasoning behind caching.
Distributed look-aside cache
Distributed sits in front of DB and is shared by all App servers. The cache itself is usually an in-memory DB that handles reading very fast. Most…