PCM Devices - Flash Memory Summit

Report 8 Downloads 212 Views
Onyx: A Prototype Phase-Change Memory Storage Array Ameen Akel* Adrian Caulfield, Todor Mollov, Rajesh Gupta, Steven Swanson

Non-Volatile Systems Laboratory, Department of Computer Science and Engineering University of California, San Diego *Now

at Micron Technology

1

4 KB Operation Request Latencies Disk

Flash

Current PCM

Projected PCM

Log Operation Request Latency (us)

10000 1000 100 10 1 0.1 0.01 Write

Read

2

Advantages of Studying PCM SSDs • Understand current PCM performance – With current storage infrastructure – Versus other NV tech: e.g. Flash SSDs

• PCM performance may differ from simulation – Variance in write latency due to data – Wear-out characteristics

• Use real applications to gauge performance • Understand how software should change for PCM • Prepare to integrate future-generation PCM 3

Overview • Motivation • PCM Devices – Technology Overview – Micron P8P Devices

• Onyx Architecture – Logical Architecture – PCM DIMMs – Physical Architecture

• Performance Analysis • Applications and Conclusions 4

PCM: The Device Level • PCM storage medium: Chalcogenide – Resistance depends on molecular phase

M. Breitwisch et al VLSI '07

• Writes – Heaters are attached to the chalcogenide – Current passed through heaters to change phase – Allows bit-alterable writes

• Reads – Measure resistance through chalcogenide area – Resistance sensed by ability to sink current 5

PCM: The Device Level • PCM storage medium: Chalcogenide – Resistance depends on molecular phase

• Writes – Heaters are attached to the chalcogenide – Current passed through heaters to change phase – Allows bit-alterable writes

XRD-measurements amorph

fcc

hexagonal

• Reads – Measure resistance through chalcogenide area – Resistance sensed by ability to sink current

M. Wuttig, et. al., FP6 Project CAMELS. 6

PCM Write Operations in Depth • Material heated to… – > 600∘C then cooled quickly  Amorphous – ~ 350∘C then cooled slowly  Crystalline

• Set and reset – Reset – 0 state – Set – 1 state 10 ns 50-150 ns

7

PCM Projections • Future PCM latency projections*: Operation

Latency

Read

48 ns

Set

150 ns

Reset

40 ns

• Process node progression: 90, 45, 32, 20, 9 nm *B. C. Lee, et. al. Architecting Phase Change Memory as a Scalable DRAM Alternative. ISCA 2009. 8

P8P PCM • • • • •

First-generation NOR-flash replacement Part: NP8P128A13B1760E (P8P) Process Node: 90 nm Capacity: 16 MB Per Device Bandwidth, Latency, Current – Write (64 bytes): 0.5 MB/s, 120 us, 35 mA – Read (16 bytes): 48.6 MB/s, 314 ns, 15 mA

• Lifetime: One million writes until first bit error 9

Overview • Motivation • PCM Devices – Technology Overview – Micron P8P Devices

• Onyx Architecture – Logical Architecture – PCM DIMMs – Physical Architecture

• Performance Analysis • Applications and Conclusions 10

Moneta: SSD for Emulated Fast NVMs Application

• DRAM-based NV-SSD emulator • Learn by building

File System OS IO Stack

CPU

DRAM

DRAM

DRAM

DRAM

DRAM

– Hardware – Controller & interconnect – Software – Driver, file system, apps

• Uses optimized software stack

PCIe

Moneta DRAM

DRAM

DRAM

Moneta Driver

– Decreases request latency – Improves request concurrency 11

Onyx: Phase-Change Memory SSD Application

• Based on Moneta*

File System OS IO Stack

– Shares hardware – Shares software stack

CPU

PCM

PCM

PCM

PCM

PCM

PCIe

Onyx PCM

DRAM

DRAM

Onyx Driver

• PCM replaces DRAM – Uses real PCM – Custom PCM controller *A. M. Caulfield, et. al. Moneta: A highperformance storage array architecture for next-generation, non-volatile memories. MICRO 2010

12

Moneta/Onyx Architecture Ring Control

Request Queue

Scoreboard DMA Control

Ring (4 GB/s)

Host via PIO

Transfer Buffers

2GB PCM 2GB PCM 2GB PCM 2GB PCM

Tag Status Registers Host via DMA

13

Onyx PCM Controller • Request Completion – Late Completion – On PCM write completion – Early Completion – On request reception

• Start-Gap Wear Leveling* – Low overhead wear leveling (two registers + logic) – Prevents hot spots from wearing out memory – Rotates line in memory every gap interval *M. K. Qureshi, et. al. Enhancing lifetime and security of PCMbased main memory with start-gap wear leveling. MICRO 42.

14

Closer Look at a PCM DIMM • 8 Ranks of 5 PCM devices – 64 data bits + 16 ECC bits – Effectively 16 ranks per memory interface

• Shared control and data lines • Capacity: 640 MB / DIMM Address[0:25]

Device 0

Device 1

Device 2

Data[0:15] Data[16:31] Data[32:47]

Device 3

Device 4

Data[48:63] Data[64:79]

15

Prototyping Advanced SSDs • Built on RAMP’s BEE3 board – Four FPGAs connected in a ring – Four DIMM slots per FPGA – PCIe 1.1 x8 host connection

• System capacity: 10 GB

16

Overview • Motivation • PCM Devices – Technology Overview – Micron P8P Devices

• Onyx Architecture – Logical Architecture – PCM DIMMs – Physical Architecture

• Performance Analysis • Applications and Conclusions 17

Read Performance Onyx

FusionIO

Moneta

2000 1800 1600 Bandwidth (MB/s)

1400 1200 1000 800 600 400 200 0 0.5

1

2

4

8

16

32

Request Size (KB)

64

128

256

512

1024

18

Write Performance Onyx-Late

Onyx-Early

FusionIO

Moneta

2000 1800 1600 Bandwidth (MB/s)

1400 1200 1000 800 600 400 200 0 0.5

1

2

4

8

16

32

Request Size (KB)

64

128

256

512

1024

19

BerkeleyDB Performance Onyx

FusionIO

Moneta

8000 7000

Transactions / Second

6000 5000 4000 3000 2000 1000 0 BTree

HashTable BDB Benchmark

20

Potential PCM Applications • As a read cache – First-gen PCM read speeds compete with flash – Next-gen PCM should improve read performance

• Replace DRAM in high-performance apps – PCM cost will likely drop below DRAM – Will scale aggressively past DRAM

• Outpace flash in high-performance SSDs – Reduces complexity of management – Provides higher-rated lifetime – Saves power, logic, and design time 21

Conclusions • Onyx designed to maximize PCM performance • More improvements possible as PCM scales – Onyx architecture will scale with PCM – Onyx will benefit from faster reads and writes

• PCM simplifies SSD management relative to flash and improves small access performance

22

Thank You! Questions?

23