Dynamic Warp Subdivision for Integrated Branch and ... - Rutgers CS

Comment

Report 2 Downloads 54 Views

Dynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance Jiayuan Meng, David Tarjan, Kevin Skadron University of Virginia 37th Interna?onal Symposium on Computer Architecture (ISCA 2010) 1

SIMD Warp Execu?on •  SIMD execu?on unit – warp •  If warp stalled? – Warps interleaving •  If no warps are ready to execute? – Idle cycles; Throughput reduced •  Warps interleaving disadvantages –  Cache conten?on –  Increases cost of register ﬁle

•  Proposed solu?on – Dynamic warp subdivision 2

Why DWS? A (warp)

B (Stall) •  •  •  •  • 

Memory divergence

C (ready)

Intra-‐warp latency hiding approach “Warp-‐splits” – independent Scheduling en??es Exploited in two cases –  Branch divergence –  Memory latency divergence

No overhead on registers and cache Improved memory level parallelism and latency hiding

Memory divergence – Warp-‐splits

3

DWS Upon Branch divergence

ConvenDonal mechanism – Branch divergence and re-‐ convergence

4

DWS Upon Branch divergence Conven?onal -‐ Only one ac?ve branch path at a ?me

5

DWS Upon Branch divergence •  Delayed re-‐convergence – Advantages

–  Memory request issued earlier; Prefetching for others

6

Warp-‐split subdivision •  Aggressive subdivision – narrow warp-‐split •  Which branches allowed to subdivide? •  Heuris?c approach – subdivide upon branches whose post-‐dominator is followed by basic block of considerable length •  Advantages of Heuris?c approach –  Run-‐ahead threads not too far ahead –  Early memory request, prefetching with delayed re-‐convergence 7

Stack-‐based and PC-‐based Reconvergence

Results

8

DWS Upon Memory Divergence

Ini?a?ng misses earlier

Ini?a?ng misses earlier + Data prefetching

9

Preven?ng over-‐subdivision •  Aggressive split •  Lazy split •  Revive split

10

Re-‐converge or run-‐ahead? •  If not re-‐converged early – Same instruc?on sequence executed by warp-‐splits •  If re-‐converged early – Run-‐ahead warp-‐split stalls – Can’t issue outgoing memory request •  When to re-‐converge? – Need knowledge on future cache miss •  Results –  Only based on memory divergence -‐ poor performance –  Branch limited re-‐convergence – a ligle performance gain

Results

11

Implementa?on – DWS Upon Memory Divergence

12

Results

• Compared with adap?ve slip • Inﬂuencing Factors – Frequency of branch and memory divergence, length of memory latencies, ability of WPU to hide latency with exis?ng warps.

13

Conclusion •  Drawback: Doubles complexity and hardware cost in scheduling (WST) •  Future work: We can speculate cache miss frequency and miss latency to decide when to subdivide warp

14

Recommend Documents

green branch ridge subdivision

Subdivision Termination Criteria in Subdivision ... - CS Technion

014-01 - Rutgers CS