Parallel Programming 2

Parallel Programming Basics

Problem Decomposition -> Assignment -> Orchestration -> Mapping

Problem Decomposition

Break up problem into tasks that can be carried out in parallel

Create at least enough tasks to keep all execution units on a machine busy

Identifying Dependencies!!

Amdahlโ€™s Law : dependencies limit maximum speedup due to parallelism

์ˆœ์ฐจ์ ์œผ๋กœ ์‹คํ–‰๋˜์–ด์•ผ ํ•  ํ•„์š”๊ฐ€ ์žˆ๋Š” ๋ถ€๋ถ„์ด ์žˆ๋‹ค๋ฉด ๋ณ‘๋ ฌ์ ์œผ๋กœ ์ฒ˜๋ฆฌ๊ฐ€ ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค.

๋”ฐ๋ผ์„œ maximum speedup due to parallel execution <= 1/s

image

S = the fraction of total work that is inherently sequential

Sequentialํ•œ ๋ถ€๋ถ„์ด ํด์ˆ˜๋ก Speedup์ด ์ œํ•œ๋œ๋‹ค.

In most cases , the programmer is responsible for decomposing a program into independent tasks

Assignment

Assigning tasks to threads

Goals : achieve good workload balance, reduce communication costs

Can be performed statically (before application is run) or dynamically

Orchestration

  • Structuring communication
  • Scheduling tasks
  • Organizing data structures in memory

Goals : reduce costs of communication/sync, preserve locality of data reference, reduce overhead, etc.

Mapping

Mapping โ€œthreadsโ€ to hardware execution units

  • Mapping by the operating system
    • map a thread to HW execution context on a CPU code
  • Mapping by the compiler
    • map ISPC program instances to vector instruction lanes
  • Mapping by the hardware
    • map CUDA thread blocks to GPU cores