Parallel Programming 3

Performance Optimization : Work Distribution and Scheduling

Goals

  • Balance workload onto available execution resources
  • Reduce communication (to avoid stalls)
  • Reduce extra work (overhead) performed to increase parallelism, manage assignment, reduce communication.

 

When is static assignment applicable?

  • When the cost of work and the amount of work is predictable
  • not all jobs have same cost, but same cost on average

image

Dynamic assignment

Shared work queue : a list of work to do

image

  • Fine granularity partitioning

workload balance가 좋지만 synchronization cost가 크다. ( 작업 한번 할 때 마다 sync )

  • Coarse granularity partitioning

Synchronization cost가 감소한다. ( 적절하게 task를 잘 나눠야 한다! )

Many tasks < - > Few tasks

Many tasks : good workload but high overhead

Few tasks : minimize overhead but not good workload

 

Single work queue 를 사용할 때의 synchronization problem을 해결하기 위해 distributed queue를 사용할 수도 있다.

When local work queue is empty, steal work from another work queue

 

Fork-join parallelism

Cilk Plus

cilk_spawn : pthread_create와 유사

cilk_sync : pthread_join과 유사

image

  • Run continuation first ( “child stealing” )
    • Caller thread spawns work for all iterations before executing any of it
    • O(N) space for spawned work
  • Run child first ( “continuation stealing” )
    • Caller thread only creates one item to steal
    • child stealing 방식에 비해 work queue가 작다

Cilk uses greedy join scheduling