一起學系統架構 - Clones

tags: 系統架構

以下的原文取自 scalability for dummies 系列,並附上對照翻譯。

Scalability for Dummies - Part 1: Clones

Just recently I was asked what it would take to make a web service massively scalable.
最近我被問到需要做什麼才能使網路服務擁有巨大的延展性。

My answer was lengthy and maybe it is also for other people interesting.
這個回答很攏長,而且或許有很多人也同樣感興趣。

So I share it with you here in my blog and split it into parts to make it easier to read.
所以我把它放在我的部落格上,並把它分段讓它變得好閱讀一些。

New parts are released on a regular basis.
新的段落會定期的釋出。

Have fun and your comments are always welcomed!
祝你看得開心,這裡隨時歡迎你的留言!

Part 1 - Clones

Public servers of a scalable web service are hidden behind a load balancer.
一個具備延展性的網路服務,其公開服務器會被隱藏在負載平衡之後。

This load balancer evenly distributes load (requests from your users) on your group/cluster of application servers.
負載平衡會平均分配負載量到你的應用程式伺服器群組/叢集。

負載量就是來自用戶端的資源請求

That means that if, for example, user Steve interacts with your service, he may be served at his first request by server 2, then with his second request by server 9 and then maybe again by server 2 on his third request.
也就是说,如果用戶 Steve 想要使用你的服務, 他第一次的請求可能是由 server 2 負責處理, 而第二次的請求可能會換成 server 9, 再之後的第三次請求也許又會變成 server 2 來處理。

Steve should always get the same results of his request back, independent what server he "landed on".
無論 Steve 的請求跑到哪一台服務器,他應該都要拿到相同的結果。

That leads to the first golden rule for scalability:
這可以推導出關於延展性的第一條鐵則:

every server contains exactly the same codebase and does not store any user-related data, like sessions or profile pictures, on local disc or memory.
所有的伺服器應運行著相同的程式碼, 且不應該將用戶相關的資料儲存在本地硬碟或是記憶體, 像是 sessions 或是 簡介照片。

Sessions need to be stored in a centralized data store which is accessible to all your application servers. Sessions
需要被儲存在一個,能夠被所有應用程式伺服器訪問的中心化資料存儲。

It can be an external database or an external persistent cache, like Redis.
資料存處可以是外部資料庫,或是外部的永續型快取 (Redis 之類的)。

An external persistent cache will have better performance than an external database.
相較於外部資料庫,使用外部永續快取的效能會比較好。

By external I mean that the data store does not reside on the application servers. Instead, it is somewhere in or near the data center of your application servers.
這裡的外部,指的是資料不是直接記在應用程式伺服器, 而是跟應用程式同一處或是附近的資料中心。

But what about deployment?
但這樣要如何部署?

How can you make sure that a code change is sent to all your servers without one server still serving old code?
要怎麼確定程式碼異動都有同步到各個伺服器,而沒有任何一台服務器還在運行舊程式碼?

This tricky problem is fortunately already solved by the great tool Capistrano.
所幸,這個棘手的問題已經被 Capistrano 解決了。

It requires some learning, especially if you are not into Ruby on Rails,
它需要一些學習成本,尤其當你不是寫 Ruby on Rails,

but it is definitely both the effort.
但它絕對會帶來加倍的效率。

After "outsourcing" your sessions and serving the same codebase from all your servers,
在我們將 session 委外,並讓所有伺服器都運行相同程式碼之後。

you can now create an image file from one of these servers (AWS calls this AMI - Amazon Machine Image).
你就可以建立幫這些伺服器建立一份鏡像檔 (AWS 把它稱作 Amazon Machine Image)。

Use this AMI as a "super-clone" that all your new instances are based upon. Whenever you start a new instance/clone, just do an initial deployment of your latest code and you are ready!
之後無論要建立多少實體,建立時的實體都是基於這個鏡像檔來運行最新的程式碼,我們的工作就完成了。

Did you find this article valuable?

Support Hello Kayac by becoming a sponsor. Any amount is appreciated!