Parallel Programming 3
Performance Optimization : Work Distribution and Scheduling
Goals
- Balance workload onto available execution resources
- Reduce communication (to avoid stalls)
- Reduce extra work (overhead) performed to increase parallelism, manage assignment, reduce communication.
When is static assignment applicable?
- When the cost of work and the amount of work is predictable
- not all jobs have same cost, but same cost on average
Dynamic assignment
Shared work queue : a list of work to do
- Fine granularity partitioning
workload balance가 좋지만 synchronization cost가 크다. ( 작업 한번 할 때 마다 sync )
- Coarse granularity partitioning
Synchronization cost가 감소한다. ( 적절하게 task를 잘 나눠야 한다! )
Many tasks < - > Few tasks
Many tasks : good workload but high overhead
Few tasks : minimize overhead but not good workload
Single work queue 를 사용할 때의 synchronization problem을 해결하기 위해 distributed queue를 사용할 수도 있다.
When local work queue is empty, steal work from another work queue
Fork-join parallelism
Cilk Plus
cilk_spawn : pthread_create와 유사
cilk_sync : pthread_join과 유사
- Run continuation first ( “child stealing” )
- Caller thread spawns work for all iterations before executing any of it
- O(N) space for spawned work
- Run child first ( “continuation stealing” )
- Caller thread only creates one item to steal
- child stealing 방식에 비해 work queue가 작다
Cilk uses greedy join scheduling