軟件架構的多核編程

2 月 17

Randy討論了多核軟件體系結構的問題以及如何通過多核編程解決此問題。

當今使用的幾乎所有主要軟件系統都是在多核計算機問世之前創建的。多處理器計算機已經存在了一段時間,但普通計算機卻沒有,並且很少有軟件系統可以充分利用它們。

 更多硬件與多核編程

Why can’t you throw more hardware at it and expect it to run faster? The issue is almost always characterized as “shared resources.” A shared resource is anything that one task, or thread must use without worrying about another task changing it while it is being used. Concepts of synchronization, locking, mutual exclusion or “critical sections” were developed to deal with the issue of safe sharing of common resources.

The traditional mechanisms for creating safe sharing among tasks are called semaphores or queues or events. Used correctly, the multicore programming primitives allow everything from the incrementing of a shared number to a complete database transaction to be done without the corruption that would occur if tasks were not properly sequenced, or “serialized.”

在單CPU時代,存在多處理或多線程的感覺,造成了同時執行多個任務的錯覺。當同步原語阻止一個任務進行時,這沒問題,因為單個CPU會切換到另一個未被阻塞的任務。如果總有CPU需要做的事情,則具有共享資源的軟件系統不會因同步而降級太多。

例子多核

Why can’t multiple cores just make this go 2 or 4 times faster? I’ll try to use an analogy to explain it. Imagine a person at a desk with an inbox, a computer, and an outbox. The job is to take the next item from the inbox and read it, enter information from the item into the computer, get a response from the computer, write it down on a form, prepare it for delivery and put it in the outbox. This cycle takes on the average 5 minutes per item, and the boss needs an output of 30 per hour, not the current 12. To solve this problem, the boss puts another person (core) at the same desk and allows the computer to swivel. Getting started, both people reach into the inbox, grab the same item and tear it in two. Starting over, they agree to notify the other before grabbing. Now, they each read their next items and are ready for data entry. Unfortunately, only one can do this at once, so the other waits, turns the monitor around and starts data entry. In the mean time, the first person prepared the response for the outbox, grabbed the next item, and needed to wait a short time for the computer monitor.

 問題

好消息是產量從每小時12個項目增加到18個項目。壞消息是老闆仍然需要30個項目。另一個人被添加到辦公桌上,產量從每小時18個項目增加到22個項目。下一個人將其移至24。另一個人將其移至22,因為該人在處理項目方面同樣熟練,實際上正在干擾已經進行的工作。

因此,採用具有更多內核的軟件系統確實可以實現這一目標。 最初的一些幫助很少,但是它們到了性能下降的地步。

解決方案

相反,如果為第二個人提供了另一個辦公桌,收件箱,計算機和發件箱,則輸出將增加近一倍。幾乎是因為需要額外的時間來填充兩個收件箱並清空兩個發件箱,而不僅僅是一個,但顯然是一個很好的折衷方案。該解決方案較為昂貴,但可以很快收回成本。

Changing a software system to split up the work is not so easy, especially software that is well established and mature. It’s not just a change in algorithms, it’s a change in architecture.

如果您在網絡上搜索多核優化,則會發現有關以下內容的文章: 共享二級緩存 or matrix processing. Many OS and chip vendors say that they have done the optimization for you, but they don’t say how. Some have proposed programming languages that have parallel processing constructs added to the language. All of these are fine and helpful, but don’t fix an architectural problem any more than it helps the people processing a single inbox to send them to a speed-reading class. Speed-reading is great, but in the absence of a new desk architecture, it has very limited benefit.

關於系統和多核編程體系結構的文章很少,其中討論了系統級性能的主要好處。事實證明,創建一個可擴展到多個內核的軟件系統只有一個主要的設計原則:

獲取任務所需的所有資源

如果沒有阻塞,則第一步最好。次優是使它非常精細,這意味著一個任務可以擁有一部分資源,而另一項則擁有不同的資源。

執行任務

Step two may be time consuming, but in a correct multicore programming architecture, it is not blocking other tasks. It can execute in parallel. An important principle of step two is that the sum of the time taken for all cores can be much more than the time it would take one CPU, but the wall-clock time is much less because they are done at the same time. For example, if step two takes a single-CPU system 100ms, then twelve tasks would take 1200ms. In a re-architected multi-core system with 4 cores requiring 180ms for step two, 4 parallel cores each performing the task 3 times (twelve tasks) would be done in 540ms. It hasn’t cut the time by 4, but the stage is set for true scaling when even more cores are added.

 快速合併或重新整合任務結果

第三步很難,尤其是對於 數據庫管理系統, where ACID principles must be followed. Saving transactions safely to disk in a consistent and durable manner requires synchronization around shared resources. The key here is to do as much work beforehand as possible, so that the re-integration is quick. This is probably why step two takes longer. Step two does more work (in parallel) so that the serialized portion is short. It’s a massively worthwhile trade-off.

現在的訣竅是 提取可用的軟件並重新進行架構,而不只是對其進行優化,以實現多核擴展。

Get notified about new RDM updates

Be the first to know about new Raima Database Manager updates when they go live, use cases, industry trends and more.