Onyx: A Prototype Phase-Change Memory Storage Array Ameen Akel* Adrian Caulfield, Todor Mollov, Rajesh Gupta, Steven Swanson
Non-Volatile Systems Laboratory, Department of Computer Science and Engineering University of California, San Diego *Now
at Micron Technology
1
4 KB Operation Request Latencies Disk
Flash
Current PCM
Projected PCM
Log Operation Request Latency (us)
10000 1000 100 10 1 0.1 0.01 Write
Read
2
Advantages of Studying PCM SSDs • Understand current PCM performance – With current storage infrastructure – Versus other NV tech: e.g. Flash SSDs
• PCM performance may differ from simulation – Variance in write latency due to data – Wear-out characteristics
• Use real applications to gauge performance • Understand how software should change for PCM • Prepare to integrate future-generation PCM 3
Overview • Motivation • PCM Devices – Technology Overview – Micron P8P Devices
• Onyx Architecture – Logical Architecture – PCM DIMMs – Physical Architecture
• Performance Analysis • Applications and Conclusions 4
PCM: The Device Level • PCM storage medium: Chalcogenide – Resistance depends on molecular phase
M. Breitwisch et al VLSI '07
• Writes – Heaters are attached to the chalcogenide – Current passed through heaters to change phase – Allows bit-alterable writes
• Reads – Measure resistance through chalcogenide area – Resistance sensed by ability to sink current 5
PCM: The Device Level • PCM storage medium: Chalcogenide – Resistance depends on molecular phase
• Writes – Heaters are attached to the chalcogenide – Current passed through heaters to change phase – Allows bit-alterable writes
XRD-measurements amorph
fcc
hexagonal
• Reads – Measure resistance through chalcogenide area – Resistance sensed by ability to sink current
M. Wuttig, et. al., FP6 Project CAMELS. 6
PCM Write Operations in Depth • Material heated to… – > 600∘C then cooled quickly Amorphous – ~ 350∘C then cooled slowly Crystalline
• Set and reset – Reset – 0 state – Set – 1 state 10 ns 50-150 ns
7
PCM Projections • Future PCM latency projections*: Operation
Latency
Read
48 ns
Set
150 ns
Reset
40 ns
• Process node progression: 90, 45, 32, 20, 9 nm *B. C. Lee, et. al. Architecting Phase Change Memory as a Scalable DRAM Alternative. ISCA 2009. 8
P8P PCM • • • • •
First-generation NOR-flash replacement Part: NP8P128A13B1760E (P8P) Process Node: 90 nm Capacity: 16 MB Per Device Bandwidth, Latency, Current – Write (64 bytes): 0.5 MB/s, 120 us, 35 mA – Read (16 bytes): 48.6 MB/s, 314 ns, 15 mA
• Lifetime: One million writes until first bit error 9
Overview • Motivation • PCM Devices – Technology Overview – Micron P8P Devices
• Onyx Architecture – Logical Architecture – PCM DIMMs – Physical Architecture
• Performance Analysis • Applications and Conclusions 10
Moneta: SSD for Emulated Fast NVMs Application
• DRAM-based NV-SSD emulator • Learn by building
File System OS IO Stack
CPU
DRAM
DRAM
DRAM
DRAM
DRAM
– Hardware – Controller & interconnect – Software – Driver, file system, apps
• Uses optimized software stack
PCIe
Moneta DRAM
DRAM
DRAM
Moneta Driver
– Decreases request latency – Improves request concurrency 11
Onyx: Phase-Change Memory SSD Application
• Based on Moneta*
File System OS IO Stack
– Shares hardware – Shares software stack
CPU
PCM
PCM
PCM
PCM
PCM
PCIe
Onyx PCM
DRAM
DRAM
Onyx Driver
• PCM replaces DRAM – Uses real PCM – Custom PCM controller *A. M. Caulfield, et. al. Moneta: A highperformance storage array architecture for next-generation, non-volatile memories. MICRO 2010
12
Moneta/Onyx Architecture Ring Control
Request Queue
Scoreboard DMA Control
Ring (4 GB/s)
Host via PIO
Transfer Buffers
2GB PCM 2GB PCM 2GB PCM 2GB PCM
Tag Status Registers Host via DMA
13
Onyx PCM Controller • Request Completion – Late Completion – On PCM write completion – Early Completion – On request reception
• Start-Gap Wear Leveling* – Low overhead wear leveling (two registers + logic) – Prevents hot spots from wearing out memory – Rotates line in memory every gap interval *M. K. Qureshi, et. al. Enhancing lifetime and security of PCMbased main memory with start-gap wear leveling. MICRO 42.
14
Closer Look at a PCM DIMM • 8 Ranks of 5 PCM devices – 64 data bits + 16 ECC bits – Effectively 16 ranks per memory interface
• Shared control and data lines • Capacity: 640 MB / DIMM Address[0:25]
Device 0
Device 1
Device 2
Data[0:15] Data[16:31] Data[32:47]
Device 3
Device 4
Data[48:63] Data[64:79]
15
Prototyping Advanced SSDs • Built on RAMP’s BEE3 board – Four FPGAs connected in a ring – Four DIMM slots per FPGA – PCIe 1.1 x8 host connection
• System capacity: 10 GB
16
Overview • Motivation • PCM Devices – Technology Overview – Micron P8P Devices
• Onyx Architecture – Logical Architecture – PCM DIMMs – Physical Architecture
• Performance Analysis • Applications and Conclusions 17
Read Performance Onyx
FusionIO
Moneta
2000 1800 1600 Bandwidth (MB/s)
1400 1200 1000 800 600 400 200 0 0.5
1
2
4
8
16
32
Request Size (KB)
64
128
256
512
1024
18
Write Performance Onyx-Late
Onyx-Early
FusionIO
Moneta
2000 1800 1600 Bandwidth (MB/s)
1400 1200 1000 800 600 400 200 0 0.5
1
2
4
8
16
32
Request Size (KB)
64
128
256
512
1024
19
BerkeleyDB Performance Onyx
FusionIO
Moneta
8000 7000
Transactions / Second
6000 5000 4000 3000 2000 1000 0 BTree
HashTable BDB Benchmark
20
Potential PCM Applications • As a read cache – First-gen PCM read speeds compete with flash – Next-gen PCM should improve read performance
• Replace DRAM in high-performance apps – PCM cost will likely drop below DRAM – Will scale aggressively past DRAM
• Outpace flash in high-performance SSDs – Reduces complexity of management – Provides higher-rated lifetime – Saves power, logic, and design time 21
Conclusions • Onyx designed to maximize PCM performance • More improvements possible as PCM scales – Onyx architecture will scale with PCM – Onyx will benefit from faster reads and writes
• PCM simplifies SSD management relative to flash and improves small access performance
22
Thank You! Questions?
23