Parallel Programming 3

November 20, 2021

Performance Optimization : Work Distribution and Scheduling

Goals

Balance workload onto available execution resources
Reduce communication (to avoid stalls)
Reduce extra work (overhead) performed to increase parallelism, manage assignment, reduce communication.

Shared work queue : a list of work to do

workload balance가 좋지만 synchronization cost가 크다. ( 작업 한번 할 때 마다 sync )

Synchronization cost가 감소한다. ( 적절하게 task를 잘 나눠야 한다! )

Many tasks < - > Few tasks

Many tasks : good workload but high overhead

Few tasks : minimize overhead but not good workload

Single work queue 를 사용할 때의 synchronization problem을 해결하기 위해 distributed queue를 사용할 수도 있다.

When local work queue is empty, steal work from another work queue

Cilk Plus

cilk_spawn : pthread_create와 유사

cilk_sync : pthread_join과 유사

Run continuation first ( “child stealing” )
- Caller thread spawns work for all iterations before executing any of it
- O(N) space for spawned work
Run child first ( “continuation stealing” )
- Caller thread only creates one item to steal
- child stealing 방식에 비해 work queue가 작다

Cilk uses greedy join scheduling