Parallel Programming 2
Parallel Programming Basics
Problem Decomposition -> Assignment -> Orchestration -> Mapping
Problem Decomposition
Break up problem into tasks that can be carried out in parallel
Create at least enough tasks to keep all execution units on a machine busy
Identifying Dependencies!!
Amdahlโs Law : dependencies limit maximum speedup due to parallelism
์์ฐจ์ ์ผ๋ก ์คํ๋์ด์ผ ํ ํ์๊ฐ ์๋ ๋ถ๋ถ์ด ์๋ค๋ฉด ๋ณ๋ ฌ์ ์ผ๋ก ์ฒ๋ฆฌ๊ฐ ๋ถ๊ฐ๋ฅํ๋ค.
๋ฐ๋ผ์ maximum speedup due to parallel execution <= 1/s
S = the fraction of total work that is inherently sequential
Sequentialํ ๋ถ๋ถ์ด ํด์๋ก Speedup์ด ์ ํ๋๋ค.
In most cases , the programmer is responsible for decomposing a program into independent tasks
Assignment
Assigning tasks to threads
Goals : achieve good workload balance, reduce communication costs
Can be performed statically (before application is run) or dynamically
Orchestration
- Structuring communication
- Scheduling tasks
- Organizing data structures in memory
Goals : reduce costs of communication/sync, preserve locality of data reference, reduce overhead, etc.
Mapping
Mapping โthreadsโ to hardware execution units
- Mapping by the operating system
- map a thread to HW execution context on a CPU code
- Mapping by the compiler
- map ISPC program instances to vector instruction lanes
- Mapping by the hardware
- map CUDA thread blocks to GPU cores