(12) United States Patent

Report 5 Downloads 212 Views
US007200703B2

(12) United States Patent

(10) Patent N0.: (45) Date of Patent:

Valmiki et a]. (54)

CONFIGURABLE COMPONENTS FOR

(56)

U.S. PATENT DOCUMENTS

Inventors: Ramanujan K. Valmiki, Villa #23-1,

5,799,203 A * 5,867,400 A * 6,438,737 B1 *

Adarsh Palm Meadows, Airport

Varthur-White?eld, Ramagundahalli, Halambi, 865, 20 C Main, 8 Block,

6,591,403 B1* 6,952,816 B2 * 7,020,764 B2 *

Koramangla, Bangalore (IN) 590095; Madhuri Mandava, D/O Late Dr. C.

Mastan Rao, G/A Lakhsmi Apts., Peda

7,024,654

Waltair, Vishakapatnam (IN) 530017;

7/2003 10/2005 3/2006

Bass et a1. ................... .. 716/5 Gupta et a1. ................ .. 716/18 Kubota et a1. .............. .. 712/37

4/2006

Bersch et a1.

.....

. . . .. 716/16

A1

9/2001

Ussery et a1.

.....

. . . . ..

2003/0120460 A1

6/2003

Aubury ....... ..

2003/0126563 A1

7/2003

Nakajima

..

716/1

702/182 .... .. 716/1

* cited by examiner

Dabral, C/o Dr. Mahavir Dabral, #601,

Kalyan Apartments, Sector-24, Indira Nagar, Lucknow (IN) 560034;

Primary ExamineriMark H. Rinehart Assistant ExamineriRaymond N Phan

Marimuthu Kumar, S/o C.K.

(74) Attorney, Agent, or FirmiAsh Tankha; Lipton, Weinberger & Husick

Murugan, 5/29, Kandhan Illam, Golden Nagar, Bharathiar University, Coimbatore (IN) 560034; Bill Safelski, 1081, Camino Ricardo, San Jose, CA (US) 30339

(57)

ABSTRACT

A system and method of designing an accelerator for a

processor-based system. The accelerator design problem is partitioned into a data communicate module design problem

Subject to any disclaimer, the term of this patent is extended or adjusted under 35

and a data compute core module design problem. The hardware design of the data communicate module is

U.S.C. 154(b) by 141 days.

achieved through a predetermined communication template which is customized for the particular application. The communication template has individual con?gurable com

(21) Appl. No.: 10/863,550 Filed:

Lee et a1. .................... .. 710/8 El-Ghoroury et a1. ....... .. 716/1 Morelli et a1. .............. .. 716/16

B2 *

2001/0025363

Seru Srinivas, #43, 4th Cross, Marenahalli, 2Dd Phase, JP Nagar, Bangalore (IN) 560078; Shashank

(22)

8/1998 2/1999 8/2002

6,459,644 B2 * 10/2002 MiZushima et a1. 365/230.01 6,477,691 B1 * 11/2002 Bergamashi/Rab et a1. 716/12

Bangalore (IN) 560066; Ashok

Notice:

Apr. 3, 2007

References Cited

EMBEDDED SYSTEM DESIGN

(76)

US 7,200,703 B2

munication components and a programmable control ?ow

Jun. 8, 2004

path. The components of the communicate template include

(65)

a host bus interface, a memory bus interface, a direct memory access, a local memory and a control module. The

Prior Publication Data

US 2005/0273542 A1

Dec. 8, 2005

combination of the communication components in a single

con?gurable communication template and their optimized

(51)

Int. Cl.

(52) (58)

US. Cl. ....................................... ..

G06F 13/00

interconnections increase the speed of data transfer and data control processes in the accelerator. The hardware design of

(2006.01)

the data compute core module can be achieved through

710/306; 712/29

Field of Classi?cation Search ...... ..

custom hardware design or by automatically generating hardware from software description.

710/305*317,

710/8419, 22431, 36438; 716/1,12,16418; 712/16*22, 28431 See application ?le for complete search history.

9 Claims, 21 Drawing Sheets MEMORY

500

5111

512

nos-r BUS

MEMORY nus

INTERFACE

um'r (HBIU)

um‘: (Mmu)

502/ w.

) -— COMMAND 504 STATUS

RECEPTOR 511mm;

DIRECT

MEMORY “831%”

U'NIT(CSL')

A Wm

41

I

am

RAM BACKPLANE sox-

$03!:

@

sag w

_,

COMPUTE

—->

com

*— s07

~ 510

U.S. Patent

Apr. 3, 2007

Sheet 1 0f 21

1012:

US 7,200,703 B2

10111

/

/

ACCELERATOR -1

ACCELERATOR -2

\

A

MEMORY SUB SYSTEM

l

/

am

102

/

HOST

'

"

,

PROCESSOR

A

104 4

~

/ COPROCESSOR

V

FIG. 1

7‘



PERIPHERALS

U.S. Patent

Apr. 3, 2007

Sheet 2 0f 21

US 7,200,703 B2

PARTITIONING

201

CONFIGURING COMMUNICATION TEMPLATE

202

CONFIGURING INTERFACE OF COIVIPUTE CORE

203

FIG. 2

U.S. Patent

Apr. 3, 2007

Sheet 3 0f 21

US 7,200,703 B2

COMPONENTS OF CUSTOMIZABLE OPTIONS OF EACH COMMUNICATION COMPONENT TEMPLATE LOCAL MEMORY TOTAL ADDRESS SPACE (ADDRESS MAP

UNIT

(RA\JI#1, SIZE)

RAM#2, RAM#3I..)

NUMBER OF RAM BANKS SIZE OF EACH RAM BANK LOCAL ADDRESS OFFSET FOR EACH RAM BANK NUMBER OF PORTS

TYPE OF EACH PORT (READ, WRITE, READ

WRITE) READ LATENCY FOR READ AND READ/ WRITE PORT WRITE LATENCY FOR WRITE AND READ/WRITE PORT

READ-WRITE DATA SPLIT (YES/NO) (TWO DIFFERENT BUSES FOR READ AND WRITE) DMA

NUMBER OF CHANNELS BURST SIZE LIMIT FOR EACH CHANNEL

(8,8, I6,ETC) ADDRESS LIsT HARDWARE (YES/NO) DMA CHANNEL MODES (STRIDE, OFFSET,

USER DEFINED) EXTERNAL INTERFACE

NUMBER OF INTERFACES RECEPTOR INTERFACE COUNT INITIATOR INTERFACE COUNT BUS INTERFACE PROTOCOL STANDARD

(AMBA, OPB, ETC.) ADDRESS BUS WIDTH DATA BUS WIDTH

CCU

SINGLE OR MULTIPLE CLOCK DOMAINS FREQUENCY OF OPERATION SINGLE STEP SUPPORT (YES/NO)

READ,wRITE,READ-wRITE CONFIGURATION FOR PORTS

READ-WRITE DATA SPLIT (YES/NO) (Two DIFFERENT BUSES FOR READ AND WRITE) READ/WRITE LATENCY

READ STROBE (YES/NO) NUMBER OF CONCURRENT OPERATIONS

(NUMBER OF THREADS)

FIG. 3

U.S. Patent

Apr. 3, 2007

Sheet 4 0f 21

US 7,200,703 B2

1/ Sample application snippet fl Two input data arrays are used to produce an output array

1/ The DSP equivalent expression is l/ for i=0 through to 87, z[i] = x[i]*c1 + y[i]*c2;

// Primary inputs int cl, c2; 1/ Received through receptor transfer int x[88], y[88]; // DMA input I’! Primary output int z[88]; ff DMA output If Local Loop Variables int i;

// Compute loop for (i=0; i