Parallel Programming 6

Cache Coherence

  • Write-through : CPU๊ฐ€ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์บ์‹œ์— ์ €์žฅ๋˜๊ฒŒ ๋˜๋Š”๋ฐ, ๋ฐ์ดํ„ฐ๊ฐ€ ์บ์‹œ ๋จ๊ณผ ๋™์‹œ์— memory์˜ ๋ฐ์ดํ„ฐ๋„ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๊ตฌ์กฐ
    • Data consistency ์œ ์ง€ํ•˜๊ธฐ ์‰ฌ์›€
    • ์„ฑ๋Šฅ์ด ๋–จ์–ด์ง
  • Write back : ์บ์‹œ ๋‚ด์— ์ผ์‹œ์ ์œผ๋กœ ์ €์žฅ๋œ ํ›„์— ๋ธ”๋ก ๋‹จ์œ„์˜ ์บ์‹œ๋กœ๋ถ€ํ„ฐ ํ•ด์ œ๋˜๋Š” ๋•Œ์—๋งŒ memory์— ๊ธฐ๋ก
    • write-through ๋ณด๋‹ค ์†๋„๊ฐ€ ๋น ๋ฆ„
    • cache์™€ memory ๊ฐ„์— ๋ฐ์ดํ„ฐ๊ฐ€ ๋‹ค๋ฅธ ๊ฒฝ์šฐ๊ฐ€ ๋ฐœ์ƒ

Cache Coherence ๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ์ด์œ  : ๊ฐ Processor๊ฐ€ ๊ฐ๊ฐ์˜ Cache๊ณต๊ฐ„์„ ๊ฐ€์ง€๊ณ  ์žˆ์„ ๋•Œ

Shared cache

  • One single cache shared by all processors

But shared cache becomes bottleneck

ย 

Private cache

Add per-core caches

  • Reduces latency
  • Increases throughput
  • Decrease energy

๋ฐœ์ƒ ๊ฐ€๋Šฅํ•œ ๋ฌธ์ œ์ 

image

A ๊ฐ’์„ Shared Cache์—์„œ ์ฐพ์œผ๋ฉด ์ œ๋Œ€๋กœ update๊ฐ€ ๋˜์–ด์žˆ์ง€ ์•Š๋‹ค

Hardware-based solutions

  • Directory-based coherence implementations
  • Snooping-based coherence implementations

ย 

MI(VI) coherence protocol

Simplest form is a two-state โ€œvalid/invalidโ€ protocol

If a core wants a copy, must find and โ€œinvalidateโ€ it

image

ํ˜„์žฌ ํ”„๋กœ์„ธ์„œ์—์„œ invalidํ•œ ๊ฐ’์„ ์ฝ๊ฑฐ๋‚˜ ์“ฐ๋ ค๊ณ  ํ•  ๋•Œ : valid๋กœ ๋ณ€๊ฒฝ๋จ

ํ˜„์žฌ ํ”„๋กœ์„ธ์„œ์—์„œ invalidํ•  ๋•Œ๋Š” ๋‹ค๋ฅธ ํ”„๋กœ์„ธ์„œ์—์„œ read/write miss๊ฐ€ ๋ฐœ์ƒํ•ด๋„ ํ•ด์ค„ ์ˆ˜ ์žˆ๋Š”๊ฒŒ ์—†์Œ

ํ˜„์žฌ ํ”„๋กœ์„ธ์„œ์—์„œ validํ• ๋•Œ ๋‹ค๋ฅธ ํ”„๋กœ์„ธ์„œ์—์„œ ๊ฐ’์„ read/writeํ•˜๋ฉด ํ˜„์žฌ ์ƒํƒœ๋ฅผ invalid๋กœ ๋ฐ”๊พธ๊ณ  send data

On a cache miss, how is the valid copy found?

  • Snooping : broadcast to all, whoever has it responds
  • Directory : track shares with separate structure

ย 

MSI coherence protocol

Modified, Shared, Invalid

image

ํ˜„์žฌ ํ”„๋กœ์„ธ์„œ๊ฐ€ invalidํ•œ ์ƒํƒœ์—์„œ ์ฝ์œผ๋ ค๊ณ  ํ•  ๋•Œ : ๋‹ค๋ฅธ ํ”„๋กœ์„ธ์„œ์—์„œ ๊ฐ’์„ ๊ฐ€์ ธ์™€์„œ ๊ณต์œ  ์ƒํƒœ๊ฐ€ ๋จ

ํ˜„์žฌ ํ”„๋กœ์„ธ์„œ๊ฐ€ invalidํ•œ ์ƒํƒœ์—์„œ ์“ฐ๋ ค๊ณ  ํ•  ๋•Œ : Modified ์ƒํƒœ๋กœ ๋ณ€๊ฒฝ

ย 

MESI coherence protocol

MSI protocol์—์„œ๋Š” load๋ฅผ ํ•  ๋•Œ load miss๊ฐ€ ๋ฐœ์ƒํ•˜๊ณ  ๋‹ค์‹œ ๋™์ผํ•œ ๊ฐ’์„ ๋ณ€๊ฒฝํ•˜๋ ค๊ณ  ํ•  ๋•Œ ๋ชจ๋“  ํ”„๋กœ์„ธ์„œ์—๊ฒŒ broadcast๊ฐ€ ๋˜๋Š”๋ฐ, ๋‹ค๋ฅธ ํ”„๋กœ์„ธ์„œ์—์„œ ๊ฐ€์ง€๊ณ  ์žˆ์ง€ ์•Š์•„๋„ ๋ฐœ์ƒํ•œ๋‹ค.

image

ํ˜„์žฌ ํ”„๋กœ์„ธ์„œ๊ฐ€ invalidํ•œ ์ƒํƒœ์—์„œ ์ฝ์œผ๋ ค๊ณ  ํ•  ๋•Œ : ๊ณต์œ ํ•˜๊ณ  ์žˆ๋‹ค๋ฉด Shared, ์•„๋‹ˆ๋ผ๋ฉด Exclusive

ํ˜„์žฌ ํ”„๋กœ์„ธ์„œ์—์„œ write miss, write hit๊ฐ€ ๋ฐœ์ƒํ•˜๋ฉด Modified์ƒํƒœ๋กœ ๋ฐ”๋€Œ๋Š” ์ด์œ  : ๋‚˜์ค‘์— writeback์„ ํ•ด์ฃผ๊ธฐ ์œ„ํ•ด

ย 

  • Snooping/broadcast-based cache coherence : ์ ์€ ์ˆ˜์˜ ํ”„๋กœ์„ธ์„œ์—์„œ ์‚ฌ์šฉํ•˜๊ธฐ์— ์ ํ•ฉํ•จ. ์ž‘์€ latency, ํ•˜์ง€๋งŒ directory ๋ฐฉ์‹์— ๋น„ํ•ด traffic์ด ํฌ๋‹ค
    • No explicit state
    • 2 hops (P0->memory->P0)
  • Directory-based cache coherence
    • Track sharers of blocks
    • 3 hops (P0->momory->P1->P0)