US007200703B2
(12) United States Patent
(10) Patent N0.: (45) Date of Patent:
Valmiki et a]. (54)
CONFIGURABLE COMPONENTS FOR
(56)
U.S. PATENT DOCUMENTS
Inventors: Ramanujan K. Valmiki, Villa #23-1,
5,799,203 A * 5,867,400 A * 6,438,737 B1 *
Adarsh Palm Meadows, Airport
Varthur-White?eld, Ramagundahalli, Halambi, 865, 20 C Main, 8 Block,
6,591,403 B1* 6,952,816 B2 * 7,020,764 B2 *
Koramangla, Bangalore (IN) 590095; Madhuri Mandava, D/O Late Dr. C.
Mastan Rao, G/A Lakhsmi Apts., Peda
7,024,654
Waltair, Vishakapatnam (IN) 530017;
7/2003 10/2005 3/2006
Bass et a1. ................... .. 716/5 Gupta et a1. ................ .. 716/18 Kubota et a1. .............. .. 712/37
4/2006
Bersch et a1.
.....
. . . .. 716/16
A1
9/2001
Ussery et a1.
.....
. . . . ..
2003/0120460 A1
6/2003
Aubury ....... ..
2003/0126563 A1
7/2003
Nakajima
..
716/1
702/182 .... .. 716/1
* cited by examiner
Dabral, C/o Dr. Mahavir Dabral, #601,
Kalyan Apartments, Sector-24, Indira Nagar, Lucknow (IN) 560034;
Primary ExamineriMark H. Rinehart Assistant ExamineriRaymond N Phan
Marimuthu Kumar, S/o C.K.
(74) Attorney, Agent, or FirmiAsh Tankha; Lipton, Weinberger & Husick
Murugan, 5/29, Kandhan Illam, Golden Nagar, Bharathiar University, Coimbatore (IN) 560034; Bill Safelski, 1081, Camino Ricardo, San Jose, CA (US) 30339
(57)
ABSTRACT
A system and method of designing an accelerator for a
processor-based system. The accelerator design problem is partitioned into a data communicate module design problem
Subject to any disclaimer, the term of this patent is extended or adjusted under 35
and a data compute core module design problem. The hardware design of the data communicate module is
U.S.C. 154(b) by 141 days.
achieved through a predetermined communication template which is customized for the particular application. The communication template has individual con?gurable com
(21) Appl. No.: 10/863,550 Filed:
Lee et a1. .................... .. 710/8 El-Ghoroury et a1. ....... .. 716/1 Morelli et a1. .............. .. 716/16
B2 *
2001/0025363
Seru Srinivas, #43, 4th Cross, Marenahalli, 2Dd Phase, JP Nagar, Bangalore (IN) 560078; Shashank
(22)
8/1998 2/1999 8/2002
6,459,644 B2 * 10/2002 MiZushima et a1. 365/230.01 6,477,691 B1 * 11/2002 Bergamashi/Rab et a1. 716/12
Bangalore (IN) 560066; Ashok
Notice:
Apr. 3, 2007
References Cited
EMBEDDED SYSTEM DESIGN
(76)
US 7,200,703 B2
munication components and a programmable control ?ow
Jun. 8, 2004
path. The components of the communicate template include
(65)
a host bus interface, a memory bus interface, a direct memory access, a local memory and a control module. The
Prior Publication Data
US 2005/0273542 A1
Dec. 8, 2005
combination of the communication components in a single
con?gurable communication template and their optimized
(51)
Int. Cl.
(52) (58)
US. Cl. ....................................... ..
G06F 13/00
interconnections increase the speed of data transfer and data control processes in the accelerator. The hardware design of
(2006.01)
the data compute core module can be achieved through
710/306; 712/29
Field of Classi?cation Search ...... ..
custom hardware design or by automatically generating hardware from software description.
710/305*317,
710/8419, 22431, 36438; 716/1,12,16418; 712/16*22, 28431 See application ?le for complete search history.
9 Claims, 21 Drawing Sheets MEMORY
500
5111
512
nos-r BUS
MEMORY nus
INTERFACE
um'r (HBIU)
um‘: (Mmu)
502/ w.
) -— COMMAND 504 STATUS
RECEPTOR 511mm;
DIRECT
MEMORY “831%”
U'NIT(CSL')
A Wm
41
I
am
RAM BACKPLANE sox-
$03!:
@
sag w
_,
COMPUTE
—->
com
*— s07
~ 510
U.S. Patent
Apr. 3, 2007
Sheet 1 0f 21
1012:
US 7,200,703 B2
10111
/
/
ACCELERATOR -1
ACCELERATOR -2
\
A
MEMORY SUB SYSTEM
l
/
am
102
/
HOST
'
"
,
PROCESSOR
A
104 4
~
/ COPROCESSOR
V
FIG. 1
7‘
‘
PERIPHERALS
U.S. Patent
Apr. 3, 2007
Sheet 2 0f 21
US 7,200,703 B2
PARTITIONING
201
CONFIGURING COMMUNICATION TEMPLATE
202
CONFIGURING INTERFACE OF COIVIPUTE CORE
203
FIG. 2
U.S. Patent
Apr. 3, 2007
Sheet 3 0f 21
US 7,200,703 B2
COMPONENTS OF CUSTOMIZABLE OPTIONS OF EACH COMMUNICATION COMPONENT TEMPLATE LOCAL MEMORY TOTAL ADDRESS SPACE (ADDRESS MAP
UNIT
(RA\JI#1, SIZE)
RAM#2, RAM#3I..)
NUMBER OF RAM BANKS SIZE OF EACH RAM BANK LOCAL ADDRESS OFFSET FOR EACH RAM BANK NUMBER OF PORTS
TYPE OF EACH PORT (READ, WRITE, READ
WRITE) READ LATENCY FOR READ AND READ/ WRITE PORT WRITE LATENCY FOR WRITE AND READ/WRITE PORT
READ-WRITE DATA SPLIT (YES/NO) (TWO DIFFERENT BUSES FOR READ AND WRITE) DMA
NUMBER OF CHANNELS BURST SIZE LIMIT FOR EACH CHANNEL
(8,8, I6,ETC) ADDRESS LIsT HARDWARE (YES/NO) DMA CHANNEL MODES (STRIDE, OFFSET,
USER DEFINED) EXTERNAL INTERFACE
NUMBER OF INTERFACES RECEPTOR INTERFACE COUNT INITIATOR INTERFACE COUNT BUS INTERFACE PROTOCOL STANDARD
(AMBA, OPB, ETC.) ADDRESS BUS WIDTH DATA BUS WIDTH
CCU
SINGLE OR MULTIPLE CLOCK DOMAINS FREQUENCY OF OPERATION SINGLE STEP SUPPORT (YES/NO)
READ,wRITE,READ-wRITE CONFIGURATION FOR PORTS
READ-WRITE DATA SPLIT (YES/NO) (Two DIFFERENT BUSES FOR READ AND WRITE) READ/WRITE LATENCY
READ STROBE (YES/NO) NUMBER OF CONCURRENT OPERATIONS
(NUMBER OF THREADS)
FIG. 3
U.S. Patent
Apr. 3, 2007
Sheet 4 0f 21
US 7,200,703 B2
1/ Sample application snippet fl Two input data arrays are used to produce an output array
1/ The DSP equivalent expression is l/ for i=0 through to 87, z[i] = x[i]*c1 + y[i]*c2;
// Primary inputs int cl, c2; 1/ Received through receptor transfer int x[88], y[88]; // DMA input I’! Primary output int z[88]; ff DMA output If Local Loop Variables int i;
// Compute loop for (i=0; i