Programmable At-Speed Array and Functional BIST for ... - CiteSeerX

Report 2 Downloads 28 Views
Programmable At-Speed Array and Functional BIST for Embedded DRAM LSI Masaji Kume, Katsutoshi Uehara, Minoru Itakura, Hideo Sawamoto Enterprise Server Division, Hitachi Ltd. 1 Horiyamashita, Hadano-shi, Kanagawa-ken, 259-1392 Japan {masaji.kume, katsutoshi.uehara, minoru.itakura, hideo.sawamoto}@itg.hitachi.co.jp Toru Kobayashi, Masatoshi Hasegawa Micro Device Division, Hitachi Ltd. 2326 Imai, Oume-shi, Tokyo, 198-0023 Japan {t-kobaya, m-hase}@mdd.hitachi.co.jp Hideki Hayashi Hitachi ULSI Systems Co. Ltd. 2326 Imai, Oume-shi, Tokyo, 198-0023 Japan [email protected]

Abstract A new approach to DFT (Design For Test) for an Embedded DRAM LSI is proposed in this paper. One powerful BIST engine is implemented on the LSI, which executes not only the array BIST for the DRAM and SRAM macros, but also functional BIST for the whole chip. It was implemented in an Embedded DRAM cache LSI which is presented together with measured results.

2. Background 2.1 The Embedded DRAM Cache LSI The chip we designed is a cache chip for a highperformance server [4][5]. Figure 1 shows the relationship between the Embedded DRAM cache LSI and the other components in the system.

1. Introduction Embedded DRAM LSI has been a promising technology because of its high performance and low system cost and is often used in cache chips for high-performance servers or in router tables for network applications. High performance DRAM macro(s) with high bandwidth and high pin count is implemented on a chip, it can also have SRAM macros for data buffers to read from or write to the DRAM macros. The chip also has random logic to arbitrate commands coming from other chips and control the macros according to them. Recently, with the accelerating upwards trend in operating frequency, the internal clock frequency has reached the GHz order. When we think about testing such an Embedded DRAM LSI, we need to consider various contradicting items. The cost of testing needs to be lowered, but high testing coverage is required for reliable products. As the operating frequency increases and higher yields are expected, methods to sort and grade products become increasingly important and precise means to measure performance need to be pursued. Also a method to support DRAM fuse repair needs to be considered. Recently various at-speed BISTs have been proposed to cope with these issues [1][2][3]. However, these proposals mainly focus on the testing of each of the macros independently. In this paper we propose a methodology for testing an Embedded DRAM LSI as a whole. This paper first describes the key items that needed to be considered for the Embedded DRAM cache LSI that we were designing. Then the solution to the requirements is described. Next the evaluation results for the solution will be presented.

Paper 35.1 988

Processor

Embedded DRAM cache LSI

System Memory

Memory Controlle

I/O Sub System

Figure 1. System Block Diagram. The cache LSI is located between the processor and its memory controller and functions as an external data cache for the processor. The cache tag is implemented in the processor, not in the cache LSI. The processor sends cache access commands or memory access commands to the cache LSI which then accesses the cache data in its embedded DRAM macros. In the case of a read, the readout data is transferred to the processor or the memory controller. When the command is for a memory access, it is transfered through the cache LSI to the memory controller

ITC INTERNATIONAL TEST CONFERENCE 0-7803-8580-2/04 $20.00 Copyright 2004 IEEE

which accesses the main memory accordingly. In the case that the memory access is a read, the read data is returned to the cache LSI. The cache LSI transfers the data to the processor while it writes a copy of the data to its internal cache. The cache LSI processes multiple outstanding requests at a time. Figure 2 shows the die plot of the Embedded DRAM cache LSI. This LSI is fabricated using a Lg.=0.18um process and six layers of metal and implements eight DRAM macros, eight 4-port SRAM macros and two 2-port SRAM macros with capacities of 18 Mbit, 9 Kbit and 18 Kbit respectively.

4-port SRAM macros used as data buffers for DRAM read and write. The chip has a total of eight DRAM macros, eight 4-port SRAM macros. It also has one 2-port SRAM (physically implemented as two 18 Kbit 2-port SRAMs) used as a bypass path from main memory to the processor. The chip was designed to achieve the internal operating frequency of 700MHz in the worst case, and the data paths and control paths are staged with flip-flops to achieve it, although those flip-flops are ommitted in the Figure 3 for simplicity.

2.2 Design Points for Test Methodology for the Embedded DRAM cache LSI

Figure 2. Die Plot of the Embedded DRAM cache LSI

8B cache LSI

Processor Side

Our goals for testing this chip are: - to reduce test running cost - to enable precise performance measurement - to minimize silicon area impact - to enable flexible test pattern modification/addition

8B

I/F macro 5:1 MUX

Address Control macro

DRAM macro (18Mb) 3:1 MUX SRAM (9kb, 4ports)

SRAM (9kb, 4ports)

SRAM 18kb 2port

DRAM macro (18Mb) 4:1 MUX I/F macro 8B

Memory Side

As shown previously, this cache chip has several kinds of components to be tested, such as the DRAM macros, two kinds of SRAM macros and the random logic. It should be obvious that the test should cover not only the components by themselves but also the interconnections between them. We categorized the objects to be tested as follows: - DRAM internal - 4-port SRAM internal - 2-port SRAM internal - random logic - interconnection between DRAM and random logic - interconnection between SRAM and random logic

8B

Figure 3. Block diagram of the Embedded DRAM cache LSI Figure 3 shows the block diagram of the Embedded DRAM cache LSI. The LSI has four groups of data cluster, each cluster has two DRAM macros for cache data and two

There are various test methodologies proposed: BIST vs. direct access test, or scan-based test vs. functional test [1][6]. But our experience raises the following concerns: - When RAM is tested with array BIST, only the macro under test is operating and the other macros or random logic are not operating and possible noise effects from these other components can be overlooked. This may affect the performance measurement results. The same can be said of other combinations of DRAM, SRAM and random logic. In addition to the traditional Array BIST approach, testability for concurrent operations of multiple macros and random logic should be added. - Although scan based test has better test coverage than functional test, it also covers functionally untestable faults and it may lower the yield unnecessarily [7][8]. - It is difficult to correlate the performance result from the scan-based test with that measured in the system. In other words, our experience shows that the maximum frequency measured by the scan based test does not always correlate with that measured when the chip is

Paper 35.1 989

plugged in the system. One reason is that the switching rate of the logic in the scan-base logic is different from in the system. Another reason is due to chip design restrictions and the fact that the release/capture clock distribution configuration is not exactly the same as the system clock distribution configuration, neither is the capture clock timing including the rise/fall time the same as in the system. Functional test can make multiple macros operate concurrently just like when the chip is plugged in to the system. Also, functional test only tests functionally testable paths and it is free from functionally untestable faults and over-testing [8]. Functional testing has also been in use in many applications for long time, and functional test performance measurement results correlate reliably with the actual performance in the system. Also, in the case of the cache LSI, because the data path structure is relatively simple, the cost for testing (including test pattern design cost) can be limited and at the same time a satisfactory level of coverage can be achieved. So not only for the testing of the random logic and the interconnection between RAM macros and random logic but also for testing concurrent operations, we chose to use a programmable at-speed functional BIST. For RAM macros, because each type of macro has a large interface signal count, when we consider DRAM/SRAM testing, a direct access methodology would require expensive testers with many fast pins. In order to achieve the goals, a programmable at-speed array BIST was implemented in the LSI. These decisions lead to the choice of implementing both functional BIST and array BIST methods on the cache LSI as summarized in Table I. Table 1: Test Methodology for each test domain test domain test method DRAM macro programmable at-speed array (DRAM) BIST 4port SRAM macro programmable at-speed array (SRAM)BIST 2port SRAM macro programmable at-speed array (SRAM)BIST random logic programmable at-speed functional BIST interconnect between programmable at-speed DRAM and random logic functional BIST interconnect between programmable at-speed SRAM and random logic functional BIST Concurrent operations of programmable at-speed multiple macros and logic functional BIST

Paper 35.1 990

3. SNAF: At-speed Array and Functional BIST engine 3.1 Overview of the Self iNtegrated test for Array and Function (SNAF) There are two approaches to realize BIST which can test multiple macros: one is a shared and centralized BIST engine on a chip, and the other is a dedicated BIST engine for each macro. This is the trade off between silicon area versus savings in design, test development and test cost. The latter can save these because the design can be re-used in other applications [9]. Judging from the silicon area restrictions of the chip, we chose to select a shared BIST engine approach. Furthermore, in order to minimize the silicon area, we devised a BIST engine to control all of the DRAM BIST, SRAM BIST and functional BIST. This BIST engine is named SNAF, which stands for Self iNtegrated Test for Array and Function. SNAF has a programmable testability and is used for all of the tests on the RAM macros and the functional test of the LSI, and therefore it has only one set of the memory to store the programming code, which significantly reduces the silicon area.

Figure 4: SNAF Block Diagram Figure 4 shows the block diagram of the SNAF. SNAF has three test modes, namely SNA_DRAM mode, SNA_SRAM mode and SNF mode. The programming code which is entered through the JTAG port specifies the mode for the test. Pattern Generator block, Analyzer block and their interconnects are staged with flip-flops so that they operate at-speed, though they are omitted in the figure for simplicity. In the design phase, all the paths related to the SNAF testing were timed and tuned with more severe standard than ordinary paths to prevent that testing circuits fail before circuit under test fails.

SNA_DRAM is a test mode for DRAM Array-BIST, and DRAM macros are tested one at a time. It has functionality to specify the data bit pattern and also has functionality to generate various random patterns by specifying a seed. Test patterns generated in the Pattern Generator controlled by the programming code are imposed on the RAM macros, and the pattern read out from the RAM macros are compared with the expected pattern generated in the Pattern Generator and the test result (compare match or mismatch) is output through an LSI pin.

SNF programming code

Compile

behavior file for the cache LSI

behavior Simulation

SNA_SRAM is a test mode for SRAM Array-BIST and it has similar functionality to the SNA_DRAM except that its test target is not DRAM but SRAM macros. SNF is a test mode for functional BIST, which tests the functionality of the whole cache LSI. Pseudo processor commands to access the cache or access the main memory are generated. It can test concurrent operations of multiple DRAM macros, SRAM macros and random logic as shown in Figure 3. Pseudo commands from the processor and the memory controller are generated in the Pattern Generator by the programming code, and the commands are imposed through the chip interface macros to the internal macros. The LSI does functional operations according to these pseudo commands and returns response commands and response data to the processor or the memory controller. SNAF calculates a signature from the response commands and checks the ECC of the data. The SNAF outputs the signature and the ECC check result through the LSI output pins. When the test completes, a tester checks if the final signature is equal to the expected value which has been calculated using a DA system, and if the ECC check result is pass or fail. Figure 5 shows the sequence of steps of the SNF test vector generation. A test designer writes SNAF programming code, and it is simulated under the behavior level description of the cache LSI. There, every response to the processor is simulated and the signature generated from it is simulated, too, and it becomes the expected value of the signature. Next, SNAF programming code and the expected signature is compiled to SNAF test vectors: SNAF programming code is compiled to binary patterns to be stored in the cache LSI, and the expected signature is the vector which a tester compares with the actual signature from the cache LSI. In SNAF testing, the programming code is entered through the JTAG ports slowly, and the test result is signaled as DC signal such as pass/fail in SNA_DRAM and SNA_SRAM or ECC check result and the final signature in SNF. This means that expensive tester with many pins of high frequency is not required in SNAF, which makes the testing cost reasonably low.

Expected Signature

Compile

Tester

SNF test vector

Figure 5. Steps of SNF Test Pattern Generation

3.2. Two Level Code Structure One unique feature of the SNAF is its two level code structure: it has a higher level of code called Microcode, and a lower level of code called Nanocode. The basic idea of this two level structure is to allow sharing of the control logic for test pattern sequence generation among the DRAM, SRAM and functional BISTs. Generally speaking, programmable BIST controls the test pattern generation sequence by executing program words one by one. These word consists of multiple groups of bits, and each group controls one of the address generator, data generator, branch and branch conditions, read/write, refresh of DRAM and so on. [3] Controls for the address generator, data generator, branch and branch conditions are similar for DRAM BIST, SRAM BIST and even functional BIST and they can be unified into one, while the way to control read/write is specific to the type of BIST and should be specified in a different manner. For example, in the DRAM BIST, read/write is controlled by the sequence of RAS, CAS and Write Enable signals, and in the multi-port SRAM BIST it is controlled by the sequence of multiple sets of Port Enable and Write Enable signals, and in the functional BIST, read/write is controlled by commands from a processor, which for the cache LSI, consists of six bits and the read or write is encoded into combinations of these bits. Also note that it is often required to be able to alter the read/write controls in a flexible way during testing. For example, when we test DRAM, a test engineer may want to insert a cycle between assertion of RAS and assertion of CAS, whereas for functional test, a test

Paper 35.1 991

engineer may want to add a write cache command after a read cache command. With these observations in mind, we designed the two level structure of code in SNAF so that Microcode specifies what is common to all of the BIST modes, whereas Nanocode specifies read/write access at the bit pattern level cycle by cycle which is specific to each BIST mode. Furthermore, because Nanocode is useful for controlling test patterns cycle by cycle, we added functionality to alter the address and data patterns specified by Microcode. One example of this is that for SNA_SRAM the data pattern for one write port is altered so that it is the inverted data pattern of another write port for specified cycles. In the implementation of the cache LSI, there are 32 words of Microcode, each twelve bits long and they control the following: - control of the sequence of the test pattern generation. - control of the read/write address for RAM macro access, or the pseudo processor command. - control of the data pattern for RAM macro access and the expected read-out data pattern, or the data pattern of the pseudo data from the processor or memory controller. - control of the timing when the test pattern specified by Nanocode is applied to the RAM macros under test or to the pseudo input of the processor interface. - control of miscellaneous settings such as the DRAM refresh interval or the seed for random pattern generators.

Table II summarizes the SNAF Microcode format implemented in the cache LSI. Table II. Microcode format bit 0

1

2

3

4

5

Call Nanocode Addr.Gen.Cntrl. Branch

7

8

9

10

Nanocode Addr.Cntrl.

11 Cntrl

Addr.Gen.Cntrl. Branch Condition & Target Addr

Load Immediate ADD/SUB

6

Immediate Value Dest.Reg.

Source Reg.0

Cntrl Source Reg.1

Call Nanocode Nanocode Branch Condition and Addr.Gen.Cntrl. and Branch AddrCntrl Target Addr. Control

misc. Control

The Microcode has several kinds of instructions. They are Call_Nanocode, Branch, Load_Immediate, ADD, SUB, Call_Nanocode_and_Branch and Control as listed in Table II. Microcode is processed in an iterative way, such that for

Paper 35.1 992

each iteration the SNAF reads the next instruction of Microcode, executes it, and then increments the instruction address. The Branch instruction is used to change the next instruction address as required by the program flow. Inside the SNAF, there are various registers to control test pattern generation and to analyze the test result. ADD/SUB and Load_Immediate instructions are used to set these registers. A particular instruction of note is the Call_Nanocode instruction. It specifies the start address of the Nanocode and the number of Nanocode words to be executed. Each cycle, one word of Nanocode from the specified address is read and the read address then incremented. For each Nanocode word read, the specified test pattern is imposed on the target under test. After the specified number of words have been read, the Call instruction completes and the next Microcode instruction is executed. Call_Nanocode_and_Branch instruction is a mixture of a Call_Nanocode instruction and a Branch instruction with some limitation with respect to the start address, number of words of Nanocode, and the branch target Microcode address. Control instructions include various miscellaneous instructions. They are the END instruction for specifying the end of the program, BGN_RAND/STP_RAND instructions to begin/stop making random patterns and so on. In the implementation of the cache LSI, there are 16 words of Nanocode, each 14 bits long and they control the following: For SNA_DRAM, - control of RAS and CAS - control of the Write Enables - control of Read Data muxing - control of data pattern generation specific to DRAM macros. For SNA_SRAM, - control of the Write Enables - control of Addresses for concurrent accesses on multiple ports specific to SRAM macros - control of data pattern generation specific to SRAM macros. For SNF, - control of the pseudo processor command - control of addresses specific to functional test Figure 6 summarizes the SNAF Nanocode format implemented in the cache LSI.

SNA RAS DRAM Cntrl SNA SRAM SNF

CAS Cntrl

Write Enable

Port Enable Cntrl

Command

Read Cntrl

Addr. Cntrl.

Addr.Cntrl.

Write Data Cntrl Data Cntrl.

Tag. Cntrl.

Figure 6. Nanocode format

3.3. Test Sequence using the SNAF

1 0 0 0 0 0 0

0 1 0 0 0 0 0

5 .... Cntrl3

0 0 0 1 0 0 0

4 Cntrl2

Load Register Add (Address Modify.) Call 7, 11 Branch End

0 0 1 1 1 0 0

3 Cntrl1

27 28 29 30 31

0 1 2 3 4 5 6 7 8

2 Cntrl0

Load Register Load Register Add (Address Modify.) Call 0, 6 Branch

CAS

0 1 2 3 4

Address

1

0 0 1 0 0 0 0

0 0 0 1 0 0 0

....

Figure 7 shows an example of SNA_DRAM programming. In this example, it is assumed that the programming code is written through the JTAG ports such that the SNA_DRAM mode is selected and that the Microcode and the Nanocode specifies the sequence of instructions and bit patterns as shown in Figure 7. A Call instruction at Microcode address 3 calls Nanocode at address 0 through 6. In the SNA_DRAM mode, each bit in the Nanocode controls test patterns

Address Operation

....

3.3.1. Programming of SNA

NanoCode for SNA_DRAM bit 0

....

TRST in the JTAG port initializes the core of the SNAF. The SNAF programming code consists of three parts: Microcode, Nanocode and initial values for some of the internal registers in the SNAF. These registers include those for specifying the test mode of the SNAF (SNA_DRAM, SNA_SRAM or SNF) and whether the PLL is used or not. The SNAF programming code is input into the cache LSI through the JTAG port serially. The bit sequence includes a start bit at its head and a stop bit at its tail, and when the SNAF detects the start bit and the stop bit, it automatically starts the SNAF testing process. SNAF initializes the target under test and starts reading the first address of the Microcode and continues processing until it reaches an END instruction. The SNAF has an output pin to indicate whether testing is in process or not. It also has an output pin to show if the test passed or failed. As the SNAF supplies a deterministic test the tester can knows how many cycles it takes to complete the test. The tester should wait for a time long enough for the test to complete and then check the two pins to confirm that the test is complete and whether the test passes or fails.

MicroCode

RAS

The SNAF can be controlled through the JTAG ports and a clock input as shown in Figure 4. The testing sequence using SNAF: (1) initialize the core of the SNAF BIST engine. (2) store the SNAF programming code (3) the SNAF executes the stored programming code (4) a tester checks the result pins

imposed on the RAS, CAS and other control ports of the target DRAM macro under test. The SNAF Pattern Generator reads out the contents of the Nanocode address 0 through 6 cycle by cycle, and imposes the test pattern generated from this Nanocode to the target DRAM macro under test. In Figure 7, for example, the Nanocode specifies that the bit pattern for Control_1 port should be 0100000 and the time chart shows the sequence of the test pattern imposed on Control_1 port cycle by cycle, which is 0 for the first cycle and 1 for the second cycle and 0 for the third cycle and 0 for the fourth cycle and so on.

15

DRAM macro RAS CAS Cntrl0 Cntrl1 Cntrl2 Cntrl3 .....

Figure 7. Programming Example of SNA

3.3.2. Programming of SNF Figure 8 shows an example of SNF programming and an overview of its operation. The semantics of the Microcode is the same as that for SNA shown in Figure 7, but that of the Nanocode is different from the SNA mode. It specifies pseudo request commands from the processor, such as cache read command, cache write command. Because the pseudo commands are submitted according to the Microcode programming and the Nanocode programming, there is a concern of command overflow.

Paper 35.1 993

In the system, the processor submits commands and observes the responses coming back from the cache chip. By this, the processor knows how many commands are in progress (or outstanding) in the cache chip and can control the submission of new commands so that the number of outstanding commands in the cache chip does not over flow the command buffer in the cache chip. One solution to this would be that the SNF test pattern designer makes patterns so that a certain amount of interval is inserted between the pseudo command submissions and guarantees that the overflow would never occur, but practically this is a very inefficient task, and it takes long time to debug, and also makes patterns with many commands packed in a short duration (in other words, crowded command patterns) difficult. The solution in SNAF is to implement hardware to recognize the number of outstanding commands. When hardware recognizes that another submission of a command would cause an overflow it stops submission until another is allowed. NanoCode

MicroCode

for SNF bit 0

NOP RC (Read Cache) WC (Write Cache) NOP NOP RC (Read Cache) NOP

....

Load Register Add (Address Modify.) Call 7, 11 Branch End

3 .... CMD3

....

.... 27 28 29 30 31

2 CMD2

0 1 2 3 4 5 6 7 8

Load Register Load Register Add (Address Modify.) Call 0, 6 Branch

CMD1

0 1 2 3 4

Address

CMD0

Address Operation

1

3.4 Diagnosis Capability for SNAF A BIST engine must offer high testability and coverage, and also the capability for diagnosis. First, during the development phase when sample chips are being evaluated and verified, the actual critical paths need to be isolated and studied. Second, during the manufacturing phase, DRAM fuse repair need to be applied according to the DRAM fail bit maps collected. Also for SRAM, pulse width trimming needs to be enabled. To allow this, 32 output pins are allocated as diagnostic pins to permit monitoring of data. The sequence information of a test pattern can be collected through these diagnostic pins and we can isolate which portion of a test pattern detects the fails. During SNA, the fail bit map data are collected through the 32 diagnostic pins using a tester. One of the pins specifies when an error is detected. Other bits include address of the failed word and location of failed bits in the word. The tester collects these bits when the error detection signal is asserted. The fail bit map can then be constructed and a fuse repair pattern selected. During SNA_SRAM, the SNAF has a ability to test several pulse width settings of pre-charge duration or writeenable duration. By testing them with SNAF, an optimum setting can be identified for each chip. The chip is designed so that these settings can also be set by fuses, and after the SNAF test, the fuses are cut according to the test result. Figure 9 summarizes the test flow and fuse repair. SNAF in Wafer Test SNA_DRAM

collect FBM

SNA_SRAM

try various pulse width

15

Calculate Fuse Repaire Pattern

Processor Side I/F macro

Cut Fuses

MUX SNAF Addr. Cntrl

MUX Data Data Data Component Data Component Component Component

SNAF in PackageTest

I/F macro

SNA_DRAM

Memory Side

SNA_SRAM

Psuedo request commands generated by SNF NOP

RC

WC NOP NOP RC

SNF

NOP

Figure 8. Programming example of SNF

Paper 35.1 994

Packaging

Figure 9. Test Flow and Fuse Repair

In the SNF mode, internal operations can be observed through the diagnostic pins and using them we can isolate failing portions. A feature to vary the phase between the random logic and the RAM macros is implemented in the SNF which is useful to help determine whether a fail is related to the interconnection between the random logic and a RAM macro. Because SRAM and internal logic operates with the same frequency as the internal clock, diagnosis such as collecting FBM would require to use high speed testers. One way to relieve this is to implemente a serial-parallel converter in the monitoring pin, but although it halves the output pin’s signal frequency, it doubles the required pin counts for monitoring. In SNAF, a multiplexer is added as shown in Figure 10 and when we do diagnosis, we measure one test pattern twice, one to observe the internal signal 0 and the other to observe the internal signal 1 in Figure 10, for example.

interconnection or a DRAM macro. The SNA_SRAM result shows that the SRAM macros on this sample works fine. In order to localize the fault more precisely, usually more testing using SNAF’s other diagnostic functions such as a feature to vary the phase between the random logic and the RAM macros mentioned in 3.4 are necessary. Vdd(V) 2.10 2.00 1.90 1.80 1.70

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

P . . . .

P P . . .

P P . . .

P P P . .

P P P P .

P P P P .

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

SNA_SRAM

2.10 2.00 1.90 1.80 1.70

. . . . .

. . . . .

P . . . .

P . . . .

P P . . .

P P P . .

P P P . .

P P P . .

P P P P .

P P P P .

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

SNF

2.10 2.00 1.90 1.80 1.70

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

P . . . .

P P . . .

P P P . .

P P P . .

P P P P .

P P P P .

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

MUX

1773

1711

1649

1587

1525

1463

1401

1339

1277

1215

Cycle Time (ps)

Output pin signal 1

1153

internal signal 1

1091

Output pin signal 0

Serial to Parallel converter

1029

internal signal 0

shmoo result

SNAF mode SNA_DRAM

584

564

606

630

656

684

714

747

783

823

867

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

P . . P .

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

2.10 2.00 1.90 1.80 1.70

P . . . .

P P . . .

P P P . .

P P P P .

P P P P .

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

2.10 2.00 1.90 1.80 1.70

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

P P . . .

P P . . .

P P P . .

P P P P .

P P P P .

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

Vdd(V) 2.10 2.00 1.90 1.80 1.70

SNA_SRAM

SNF

H

Figure 10. Serial to Parallel Converter

Figure 11 shows the shmoo results of the SNAF test for one cache LSI sample. From the top, SNA_DRAM, SNA_SRAM and SNF shmoos are shown. For this sample, the performance measured by SNF is the lowest , and very similar performance is measured in SNA_DRAM. Such a result indicates that the performance limiter of this sample should most probably reside in the random logic,

1711

1773

584

630

1649

1587

656

606

1463

1525

684

Cycle Time (ps) 1401

Freq. (MHz) 564

Because each SNAF mode tests its targeting portions in the chip as summarized in the Table 1, when we measure a sample of the chip the shmoo results may vary depending on the mode. The real performance of the sample should be considered as the lowest one measured by SNA_DRAM, SNA_SRAM and SNF. Each sample may have different AC faults and testing with all test modes is necessary to do precise performance measurement.

714

4. Measurement Result

shmoo result

1339

F

. . . . .

SNAF mode SNA_DRAM

747

G

1277

D

. . . . .

Figure 11. Shmoo Results of the SNAF test

783

E

H

1215

B

G

1153

C

F

823

E

867

A

D

1091

Output pin signal 1

C

917

Output pin signal 0

B

1029

A

972

internal signal 0

917

972

Freq. (MHz)

Figure 12. Shmoo Results of the SNAF test (different sample) Figure 12 shows the shmoo results for another sample. For this particular sample, the DRAM macro is the performance limiter because the SNA_DRAM performance

Paper 35.1 995

measurement shows the far lower performance than others. SNA_SRAMand SNF performance measurements show higher frequency limit and this resul indicates that the AC fault which gates this sample’s performance resides in a DRAM macro.

Table III. Features of SNAF

target under DRAM macros, SRAM macros, test random logic and their interconnection test array BIST for RAM macros methodology functional BIST for logic and interconnection both BISTs are programmable test at speed control pins JTAG + 2pins silicon area 2% of the LSI, 50K Gates others Fail Bit Map collectable for RAM macros

References Figure 13: Correlation between SNAF test and System test We measured performance with SNAF. Then, with the same samples, we measured the performance in a system. The relation between these is shown in Figure 13. The lowest performance measured in the three SNAF modes is used as the performance in SNAF. This Figure shows good correlation between the SNAF performance measurement and the system performance measurement to within +/- 3% error, and indicates that the SNAF performance measurement is a useful measure for predicting the performance when it is plugged into a system.

5. Conclusion As a DFT scheme for Embedded DRAM cache LSI, a programmable at-Speed Array and Functional BIST engine, SNAF, is proposed. SNAF unifies DRAM BIST engines, SRAM BIST engines and Functional BIST engines into one powerful BIST engine and supplies test methodology to test not only each RAM macro independently, one at a time, but also to execute functional test which enables testing of interconnections between RAM macros and random logic. By combining scan-based test, which tests random logic, and SNAF, an Embedded DRAM cache LSI is tested fully at-speed. It was demonstrated that SNAF has a high test coverage and excellent performance measurement capability by showing that the performance measurement by SNAF correlates with the performance in a system within 3% error. The main features of the SNAF implemented in the cache LSI is summarized in Table III.

Paper 35.1 996

[1] R. Rajsuman, "Design and Test of Large Embedded Memories: An Overview", IEEE Design & Test of Computers Volume 18 Number 3, May 2001, pp. 16-27 [2] R. McConnell, R. Rajsuman, E. Nelson, J. Dreibelbis, "Test and Repair of Large Embedded DRAMs: Part 1", 2001 International Test Conference, Sept. 2001, pp. 163172 [3] J. Dreibelbis, J. Barth, H. Kalter, R. Kho, "ProcessorBased Built-In Self-Test for Embedded DRAM", IEEE Journal of Solid-State Circuits, Vol. 33, No. 11, Nov. 1998, pp. 1731-1740 [4] M. Nakayama, H. Sakakibara, M. Kusunoki, K. Kurita et al, "A 16MB cache DRAM LSI with Internal 35.8GB/s Memory Bandwidth for Simultaneous Read and Write Operation", 2000 IEEE International Solid-State Circuits Conference, 07803-5853-8/00 [5] H. Sakakibara, M. Nakayama, M. Kusunoki, K. Kurita et al, "A 750MHz 144Mb Cache DRAM LSI with SPeed Scalable Design and Programmable at-Speed FunctionArray BIST", 2003 IEEE International Solid-State Circuits Conference, 0-7803-7703-9/03 [6] K. Tumin, C. Vargas, R. Patterson, C. Nappi, "Scan vs. Functional Testing - A Comparative Effectiveness Study on Motorola's MMC2107 TM", 2000 International Test Conference, Oct. 2000, pp. 443-450 [7] J. Rearick, "Too Much Delay Fault Coverage Is a Bad Thing", 2001 International Test Conference, Sept. 2001, pp. 624-633 [8] A. Krstic, J. Liou, K. Cheng, L. Wang, "On Structural vs. Functional Testing for Delay Faults", Proceedings of IEEE International Symposium on Quality Electronic Design, March 2003 [9] E. Nelson, J. Dreibelbis, R. McConnell, "Test and Repair of Large Embedded DRAMs: part 2", 2001 International Test Conference, Sept. 2001, pp. 173-181