一起學系統架構 - Cache

tags: 系統架構

以下的原文取自 scalability for dummies 系列,並附上對照翻譯。

原文

After following Part 2 of this series,
接續著這個系列的 Part 2,

you now have a scalable database solution.
現在你已經有擴展資料庫的方案了。

You have no fear of storing terabytes anymore and the world is looking fine.
你已經不用再擔心要存一堆東西,這個世界依然美好。

But just for you. Your users still have to suffer slow page requests when a lot of data is fetched from the database.
但因為大量的資訊都是直接從資料庫讀取,你的用戶還是得慢慢等緩慢的頁面回應。

The solution is the implementation of a cache.
我們的解決方案就是要導入緩存。

With "cache" I always mean in-memory caches like Memcached or Redis.
這邊提到的 "緩存" 指的都是 記憶體緩存,像是 Memcached 或是 Redis。

Please never do file-based caching, it makes cloning and auto-scaling of your servers just a pain.
請勿使用檔案緩存,這會使得伺服器克隆跟自動擴展變得十分痛苦。

But back to in-memory caches.
話題回到 記憶體緩存。

A cache is a simple key-value store and it should reside as a buffering layer between your application and your data storage.
所謂的緩存其實就是鍵值存儲,而他應該負責作為應用程式到資料存儲中間的緩衝層。

Whenever your application has to read data it should at first try to retrieve the data from your cache.
每當你的應用程式需要讀取資料的時候,都應該試著先從緩存中獲取資料。

Only if it's not in the cache should it then try to get the data from the main data source.
只有在他沒辦法從緩存中拿到資料的時候,才需要去跟主資料存儲拿取資料。

Why should you do that? Because a cache is lightning-fast.
為什麼我們要這樣做?因為緩存超級快。

It holds every dataset in RAM and requests are handled as fast as technically possible.
他會在記憶體中保存所有的資料集,技術上來說這樣處理請求是最快的。

For example,
舉個例子,

Redis can do several hundreds of thousands of read operations per second when being hosted on a standard server.
在標準規格的伺服器上, Redis 一秒可以處理成千上百個讀取程序,

Also writes, especially increments, are very, very fast.
包含寫入,尤其是遞增,真的非常非常快。

Try that with a database!
試著跟資料庫一起使用吧!

There are 2 patterns of caching your data.
這邊有兩個常見模式用於資料緩存。

An old one and a new one:
一個老方法跟一個新方法:

#1 - Cached Database Queries 緩存資料庫 Queries

That's still the most commonly used caching pattern.
這依然是非常常見的緩存設計模式。

Whenever you do a query to your database, you store the result dataset in cache.
每當你查詢資料庫後,就把結果存到緩存裡。

A hashed version of your query is the cache key.
把你的 query 雜湊之後就可以作為緩存的鍵。

The next time you run the query,
下次你再次運行 query 時,

you check at first the cache if there is already a result.
你就可以先檢查緩存裡是否已經有結果。

This pattern has several issues.
但這個模式有一些問題。

The main issue is the expiration.
主要的問題是效期。

It is hard to delete a cached result when you cache a complex query (who has not?).
當你要緩存複雜的 query 時,這會讓清除緩存變得相當困難。

When one piece of data changes (for example a table cell)
可能一小部分的資料改變了 (例如,表單欄位),

you need to delete all cached queries who may include that table cell.
就必須把所有跟這張表有關係的緩存都移除。

You get the point?
有抓到這個點嗎?

#2 - Cached Object 緩存物件

That's my strong recommendation and I always prefer this pattern.
大推,我永遠傾向用這個模式。

In general, see your data as an object like you already do in your code (classes, instances, etc).
通常,我們會把資料視作一個物件,就像程式碼裡面寫的那樣 (類別,實例 ... 等)。

Let your class assemble a dataset from your database
將來自資料庫的資料透過類別組合成資料集,

and then store the complete instance of the class or the assembed dataset in the cache.
並將類別實例或是組合完的資料集存進緩存。

Sounds theoretical, I know, but just look how you normally code.
我知道聽起來很理論,但就跟你平常編程的方式一樣。

You have, for example, a class called "Product" which has a property called "data". It is an array containing prices, texts, pictures, and customer reviews of your product.
假設,你有個類別叫做 "Product", 它有個屬性 "data" 是個陣列,裡面包含價格,文字,圖片,客戶回顧。

The property "data" is filled by
這個 "data" 屬性是透過

several methods in the class doing several database requests which are hard to cache, since many things relate to each other.
好幾個類別方法做資料庫查詢後的結果,因為一堆東西都有關連性所以很難緩存。

Now, do the following:
現在,跟著以下的操作:

when your class has finished the "assembling" of the data array,
當你的類別在組合完資料陣列之後,

directly store the data array,
直接把整個陣列存起來,

or better yet the complete instance of the class, in the cache!
或是把整個物件實例存進去緩存更好!

This allows you to easily get rid of the object whenever something did change
這也讓資料變動之後,進行移除物件變的更加方便,

and makes the overall operation of your code faster and more logical.
使整個程式碼運算變得更加迅速且有邏輯。

And the best part: it makes asynchronous processing possible!
最好的地方是:這使我們有機會做非同步處理。

Just imagine an army of worker servers who assemble your objects for you!
想像一下,我們背後有一整支工作伺服器幫助我們組合這些物件。

The application just consumes the latest cached object and nearly never touches the databases anymore!
應用程式只會使用到最新的緩存物件,而且幾乎不需要碰資料庫。

Some ideas of objects to cache:
這邊提供些可以被緩存的物件範例:

  • user sessions (never use the database!) 用戶 sessions (絕對不用用資料庫!)

  • fully rendered blog articles 完成渲染後的部落格文章

  • activity streams 活動串流

  • user<->friend relationships 用戶的交友關係

As you maybe already realized, I am a huge fan of caching.
看到這邊你應該發現到,我是緩存的頭號愛好者。

It is easy to understand,
這非常容易理解,

very simple to implement and the result is always breathtaking.
因為實踐起來非常容易且結果總是出人意料。

In general, I am more a friend of Redis than Memcached,
通常來說,我比較喜歡用 Redis 比起 Memcached,

because I love the extra database-features of Redis like persistence and the built-in data structures like lists and sets.
因為我喜歡 Redis 那些額外的資料庫特性像是永續化,以及內建的資料結構像是列表和集合。

With Redis and a clever key'ing there may be a chance that you even can get completly rid of a database.
使用 Redis 並搭配精巧的鍵設計,你甚至有可能完全不需用到資料庫。

But if you just need to cache, take Memcached, because it scales like a charm.
但假如你只是需要緩存就用 Memcached 吧,因為擴展它就跟呼吸一樣容易。

Did you find this article valuable?

Support Hello Kayac by becoming a sponsor. Any amount is appreciated!