Parallel Programming 4
Performance Optimization : Locality, Communication, and Contention
Total communication time : overhead + occupancy + network delay
Pipelined communication์์ memory bandwidth-bound execution!
ย
Arithmetic intensity = amount of computation(e.g., instructions)/amount of communication(e.g., bytes)
-> ๋์์๋ก ์ข๋ค
Inherent communication
์ค๋ฅธ์ชฝ์ grid๊ฐ communication cost๊ฐ ๋ ํฌ๋ค
ย
- Inherent communication : information that fundamentally must be moved between processors to carry out the algorithm given the specified assignment
- Artifactual communication : all other communication
4๊ฐ์ elements๋ฅผ ๊ณ์ฐํ ๋๋ง๋ค loads three lines์ ํด์ผํ๋ค.
๋ค์๊ณผ ๊ฐ์ด grid๋ฅผ ๊ณ์ฐํ๋ค๋ฉด 6๊ฐ์ elements๋ฅผ ๊ณ์ฐํ ๋๋ง๋ค load two cache lines
์ฆ, spatial locality ๋ฅผ ์ ํ์ฉํ์!
ย
Contention
- Flat communication
- Tree structured communication
ย
Reducing communication costs
- Reduce overhead of communication to sender/receiver
- Reduce latency of communication
- Reduce contention
- Increase communication/computation overlap
- asynchronous communication, pipelining, multi-threading, pre-fetching, out-of-order execution