IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 4, NO. 4, NOVEMBER 1996
403
Computer-Aided Design of Fuzzy Systems Based on Generic VHDL Specifications Thomas Hollstein, Saman K. Halgamuge, Member, IEEE, and Manfred Glesner, Member, ZEEE
Abstruct-Fuzzy systems implemented in hardware can operate with much higher performance than software implementations on standard microcontrollers. In this paper, three types of fuzzy systems and related hardware architectures are discussed: standard fuzzy controllers,FuNe Z fuzzy syqtems, and fuzzy classifiers based on a neural network structure. Two computer-aided design (CAD) packages for automatic hardware synthesis of standard fuzzy controllers are presented: a hard-wired implementation of a complete fuzzy system on a single or multiple field programmable gate arrays (FPGA) and a modular toolbox calledfuuyCAD for synthesis of reprogrammable fuzzy controllers with architectures due to specified designer constraints. In the fuzzyCAD system, an efficient design methodology has been implemented which covers a large design space in terms of signal representations and component architectures as well as system architectures. Very highspeed integrated-circuits hardware-description language (VHDL) descriptions and usage of powerful syethesis tools allow different technologies to be targeted easily and efficiently. In the last part of this paper, properties and hardware realizations of fuzzy classifiers based on a neural network are introduced. Finally, future perspectives and possible enhancements of the existing toolkits are outlined.
' I T f
>(n
t-
Fuzzy System
z1
- z n
Fig. 1. General MIMO fuzzy system.
I
I
I
I
Fig. 2. Standard fuzzy controller (Mamdani).
1. INTRODUCTION W.
T
HE functionality of a fuzzy system can be acquired from either expert knowledge or from training data. This source of knowledge has an basic impact on the fuzzy system structure to be applied. Generally a multiple-input multipleoutput (MIMO) fuzzy system can be abstracted as a system with n inputs { X I ,. . . , X n } and m outputs { 2 1 ,. . . , Z,} (Fig. 1). Assume that the functionality of this system is described by the functional relation
I
-*-- - - - - - - I
F:X+Z
(1)
with X = { X l , . . - , X n } and 2 = { Z 1 , . . . , Z m }. If F is acquired from expert knowledge, the fuzzy system can be realized by a classical standard fuzzy controller implementation, which has been introduced by Mamdani [13]. Fig. 2 shows an example for the operation of an standard fuzzy controller. Without a priori knowledge of F , the number and form of the rules, membership functions, and the defuzzification parameters have to be generated by neurofuzzy software based on training data { X , Z } . Examples of two such models known in the neurofuzzy field are shown in Figs. 3 and 4. Comparing the standard fuzzy controller operation with these alternative fuzzy systems, it is obvious that the major Manuscript received April 10, 1996. The authors are with the Institute of Microelectronic Systems, Darmstadt University of Technology, Darmstadt, Germany. Publisher Item Identifier S 1063-6706(96)06587-3.
I
-f-
I
I
I
---3 I
1
Fig. 3. FuNe I fuzzy system.
difference is in the conclusioddefuzzification parts. The FuNe Z fuzzy system performs the defuzzification by weighted
addition of singletons, passing the result through a sigmoid function. In the classifier fuzzy system which is also a neural network that can be interpreted as a fuzzy system 181, 151, the defuzzification is not required since a decision about the membership to an output class [2] is sufficient. The three introduced fuzzy models are considered by the authors for designing fuzzy hardware. Every type of fuzzy system can be implemented on standard microcontrollers. As soon as dedicated fuzzy hardware is taken into consideration, restrictions can apply due to specialized hardware modules. Assume that a software programmable fuzzy hardware covers
1063-6706/96$05.00 0 1996 IEEE
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 4, NO. 4, NOVEMBER 1996
404
.
Designer. Systolic Array Structure and Size
Expert Knowledge
for Neural Network Structure Initialisation
I
I
I
I
Training Database
Synthesis (SYnoPsYs) I I
I
I I
I
--->
_* --- ------I
-3-1
1
,
t
Fig. 4. Classifier fuzzy system.
Generic VHDL Component Library
r
tl t
Classifier
Designer: HardwareArchitecture and Constraint8
Rules and Membershipfunctions
FUZZY
e
System Operation
Pre-Configurationby Software Hardware Synthesis On-Chip Training
f=?
Fig 6. Design and configuration flow. Neurofuzzy classifier.
I I
X
Btldi
b t 4 bM
~*
System
I
FPY
I
lnVar1 crisp
C Z 3
F U U ~System Operation
Fuznfication 8
ConflguratiordProgrammlngby Software
Inference
Hardware Synthesis
9
2 3
Fig. 5. Design and configuration flow: standard fuzzy controllers (Mamdani) and FuNe Iconfigurable systems.
Defuzzification OutputVariable 1
-
clk start ss stop
OutputVariable n
a functionality space S = {SI, . . . , S k } , where all Sa (i E {I . . . k } ) are possible fuzzy systems which can be realized. Ainy projected fuzzy system functionality F can be mapped on this hardware if
3s; E S
with
Sa= F,
z t {l,...,k}.
(2) Fig. 7. Generated fuzzy hardware.
If IS/= 1, the hardware is a fully hard-wired implementation with a fixed rule base. This special case is especially interestiag for rapid prototyping on RAM-based field programmable gate anrays (FPGA’s) where a software programmable rule base is not required, since the whole system can be resynthesized if rule modifications are required. In the following sections, hardware implementations and synthesis toolkits for the previously introduced fuzzy system types will be described. The general structure for standard fu.czy controllers and a similar hardware architecture, which cain be configured by the software neurofuzzy system FuNe I [41 are shown in Fig. 5. Based on a generic net-list module library, hard-wired implementations of standard fuzzy controllers for rapid prototyping purposes can be generated for an FPGA technology (Xilinx). By use of the advanced systemfuzzyCAD based on geineric very high-speed integrated-circuits hardware descrip-
tion language (VHDL) descriptions, reprogrammable fuzzy controllers can be synthesized for different application-specific integrated circuit (ASIC) target technologies. The system architecture and module selection is influenced by interaction of the designer to meet the required timing and area constraints. By selecting a dedicated defuzzification module, a special architecture can be synthesized which can be configured by FuNe I (off-line software training). The hardware requirements for the third system (neurofuzzy classifier) are totally different since the structure of this system is based on a three-layer neural network. This fuzzyinterpretable neural network can be trained on the chip. The amount of neurons in the hidden layer is not fixed and can be varied during the training process. Therefore, a special generic systolic array architecture is required for
HOLLSTEIN et al.: COMPUTER-AIDED DESIGN BASED ON GENERIC VHDL SPECIFICATIONS
405
FUZZlFlCATlON RAM
I I
Variable 3 Variable 2
RAM
-
Nr. MSF
Variable 1 “0’
--
RAM to
COMP
Inference block
address generation FPGA Fig. 8. Fuzzification block.
INFERENCE
. -
OutVar 1
rzi -.
1
i[
RAM
ruleevaluator
previous value
-
:
RAM
I
-
E
-
c
RAM
c
e
c
-
ruleevaluator
ruleevaluator
i i
-MAX-
-
premise
c
-
-
-U RAM
I
c
OutVar 2 consequence
i
i
Fig. 9. Rule inference.
the hardware implementation. An initial structure (number of hidden neurons, initial edge weights) can be programmed based on optionally available expect knowledge. The structure of this system is shown in Fig. 6. This special fuzzy system structure and related hardware is described in the last part of this contribution.
11. RAPID PROTOTYPING OF FUZZY SYSTEMS ON FPGA TARGET ARCI~~~ECTURES
A toolkit FUZ2LCA for automatic generation of application specific fuzzy controllers using a high-level fuzzy language input has been successfully implemented and tested [6]. A similar approach is described recently in [ 111. Advantages of direct hard-wired implementation is the minimum hardware overhead in both the data path and the controller of the design which leads to a minimum number of logic cells required on the target devices. Static random access memory ( S U M ) based FPGA’s are well suited for prototyping purposes due to their reprogrammability. An additional advantage is that the
configurable logic cells may also be used efficiently for an on-chip realization of small SRAM memory blocks. The compiler FUZ2LCA automatically creates a design of a complete standard fuzzy controller, based on a net-list module library with generic parameters. A large design space in terms of timing and area can be covered since the designer can the number of computation units to work parallel. Fuzzy systems written either in the C programming language or a type of fuzzy programming language (e.g., Togai’s FPL language) can be synthesized and converted to Xilinx net-list format (XNF). This enables the user to define the fuzzy system in a problem specific manner. Problems arising in mapping specificationsof large fuzzy systems can be solved by effectively partitioning the design into several PGA’s. Each fuzzy system design consists of three modules or functional units: fuzzification, rule inference, and compositioddefuzzification. All modules have their own local controllers allowing them to operate independently. The user can set parameters depending on the availability of hardware resources and the required speed so that a highly parallel
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 4, NO 4, NOVEMBER 1996
406
Defuuification
I
1
I
0
n
U
z
Fig 11. Efficient implementation of MOA defuzzification
m,
L
steering in
FPGA I
I
I
steering out
4 Bit
11 rules
trailer angle
design, a completely sequential design, or a compromise can be is created. Due to the high-time consumption of many commonly used methods, the defuzzification unit should normally operate in parallel to the fuzzification and inference units. The system controller supports both sequential and pipe-line modes, depending on user selectable parameters. Due to the modularity FUZ2LCA can be easily extended by adding alternative modules. In addition to the FPGA’s, external memories are needed for storing antecedent and consequent membership functions (MSF in Fig. 7).
4 Bit bus MAX-MIN inference MOA- Defuzzification
4 Bit
I
Bit I
I
Fuzzy Controller
I
1
Fig. 12. Fuzzy truck control
A. Fuzzijication Unit
Membership functions p x k i Zcan be easily stored as lookup tables using two different external RAM blocks [14]. All odd numbered membership functions ( i E {l,3 , 5 , . . .})
are stored in an “odd”-RAM block, while the even numbered membership functions (2
E {0,2,4,.’
.I)
C. Defuzzification
are stored in an “even”-RAM block (Fig. 8). The restriction in this method is that at maximum, only two membership functions can overlap, but the RAM blocks can be accessed efficiently in parallel. B. Rule Inference
The premises are evaluated by using a single-rule evaluator or several in parallel depending on the rule-base complexity and timing constraints. The initially implemented but easily extendable inferencekomposition method is MidMax. The outcome of the composition is directly piped into the parallelrunning defuzzification (no additional intermediate memory required).
~
Inference is the process where the evaluation of the premise and the consequent membership function of a single rule is performed (Fig. 9 and left-most part of Fig. lo), whereas in the composition, the inference results of many rules are combined (center part of Fig. 10, here-”ma” operation). Three different types of rule evaluators can be generated: simple evaluators that can either read or negate the membership value of an input; MidMax rule evaluators for rules with less complexity; complex rule evaluators with maximum of 16 Min/Max operations and parenthesis hierarchies.
The defuzzification unit normally is the most timeconsuming module, especially if a very resource-consuming method such as center of gravity COG is implemented. Two steps are taken to overcome this problem: the defuzzification module is always generated as a parallely running module; less time consuming methods are generated with efficient hardware structures. Midpoint of area (MOA), also known as center of area (COA), mean of maxima (MOM), are the standard defuzzification methods ([31) which can be implemented efficiently in hardware. Considering the composition output curve as w (normalized to the maximum value unity: 0 5 w 5 1) and denoting Zval as the finite set of possible normalized output values of a fuzzy controller with Zval = {zo, . . . , z,, . . . , z n } , the different defuzzification methods can be formalized as follows:
HOLLSTEIN et al.: COMPUTER-AIDED DESIGN BASED ON GENERIC VHDL SPECIFICATIONS
407
Designer Interaction Library-based lnstatiation and Composition of VHDL Description
1
Hypertext-based Design Manager VHDL Module Library
Instantiationof selected structures
I
VHDL Description of Fuzzy Controller
I
Fuzzy Controller Frame
I
Library
High-Level and Logic Synthesis
I
I
I
Fig. 13. Structure: processor instantiating
Target Technology Netlist
I
Fuzzy System Knowledge Database
I Standard Cell / FPGA Design Software
!
6 Fig. 14. External rule configuration memory.
I
Physical Layout Description File (ASIC) or Device Configuration Bitstream (FPGA)
Fig. 15. Design Flow: generic fuzzy processor.
COG-the
center of gravity of the area
where
n
si =
MOA-the
~22".
middle of the area
%,",PA.
where %h = MOM-the center of gravity of the area under the maxima of fuzzy output.
M = { i ( w ; = maz(w1, *
*
,Un)}
The defuzzified crisp output is zh = The standard MOA method and two variations of it can be generated as defuzzification units [6]. The hardware implementation of the MOA method is depicted in Fig. 10. The pointers z, upper and zl lower cover the output range starting from the lower and the upper limits, respectively, and moving stepwise toward their meeting point. The area underneath the fuzzy output shape is added to a register as the pointer on the left moves, and it is subtracted from this register as the pointer on the right moves. The meeting point z , equals the crisp output since the integration is performed such that the condition I
w1,
e
w, are the curve segments which originate from
the corresponding consequent membership function segments , u z , .~. ., , p ~ (after , ~ composition). COM-the middle of the area under the maxima of fuzzy output introduced in [6] h
n
sa = i=l
si i=h
lwi wi = m u x ( w k ) 0 w; < mu.(wiE).
(6)
1
(7)
is optimized in every step (see Fig. 11: Al = A2). For application support of the controller chip, a bit stream generator program for binary rule coding will be provided. The bit stream can be stored in a electrically erasable read-only memory (EEPROM) which is located on the board adjacent to the fuzzy controller and is read once after power-up (Fig. 14).
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 4, NO. 4, NOVEJMBER 1996
408
merip degree of
degree of membership
RAM 0
degree of membership
RAM 1
w-u \k MFO
M 1
F 2 M 3
M 4
T
cnsp input
cnsp input
degree o f membership
crisp input
Fig. 16. Overlap-free membership function storage. degree of membership
x7
crisp
input
Fig. 17. Membership function approximation.
B. The Design Flow
Since complex operations such as multiplications or divisions are not involved in the MOA strategies, these methods are much faster than COG. D. Application Example
After several tests, the compiler has been successfully applied for generating a fuzzy controller for the fuzzy truck with trailer, described in [IO]. This fuzzy controller consists of 11 fuzzy rules, two inputs, and one output (each with five membership functions), employs MAX/MIN inferencekomposition and MOA defuzzification (Fig. 12). A four-bit version of the
generated fuzzy controller (the accuracy is sufficient for this application) ““d be in a XC4006-FPGA and Only 42p’sare needed for cdculaing a new Output. This result can be compared with standard solutions such as DSPTMS320 (150 pS) and special fuzzy solutions of Togai ASIC FCllO (32 pS). 111. FuzzYCAD: NEW MODULE-ORIENTED VHDL-BASEDDESIGNAPPROACH
Based on the experiences with the previously described automated fuzzy controller implementations on FPGA’s, a completely modular fuzzy controller design toolkit is devel-
HOLLSTEIN et al.: COMPUTER-AIDED DESIGN BASED ON GENERIC VHDL SPECIFICATIONS
Membership Function Memory
.. .. e
409
m
.
and Fuuification
* V
Input Registers
1:~ I
I
Output
Evaluation to Rule Unit
Fig. 18. Structure of fuzzification unit
oped. Compared to the FPGA solution which was a pure rapid prototyping approach, the VHDL-based approach provides more flexibility and is intended to become a CAD system for flexible customer specific solutions. The decision for VHDL as description language has been made since a lot of design experience with VHDL specifications and the SYNOPSYS simulation and synthesis toolkits were already available. The new system is not restricted to one target technology and the realized controller is fully reprogrammable by software. The user is able to design a MIMO standard fuzzy controller according to requirements of one or more application domains. Advantage of the library-oriented VHDL concept is the flexibility concerning integration of new modules and the possibility of making a rough estimate of resulting timing and area costs. Another benefit of this concept is the reduced simulation effort since the modules have already been tested many times (reuse of design components). Therefore, the main simulation effort is given by the validation whether or not the selected bit widths are sufficient for the required computation accuracy. The fuzzy controller parameters which are determined and fixed by the design process are: number ninp of input and nOutpof output signals; number n ~ of membership ~ i ~ functions (MF) for input signals; number nMFout of MF’s for output signals; the maximum overlap ow, of MF’s (the maximum number of MF’s which can produce a nonzero value for one crisp input value); MF storage technique; external and internal bit widths; number and capabilities of parallel-running rule evaluation modules; defuzzification method. In combination with previously mentioned parameters, the user has an influence on timing and area of the resulting implementation.
A. Overview: General Structure The toolkit consists of a modular generic VHDL description library and a configuration software tool. Selection of VHDL
modules and setting of generic instantiating parameters of
VHDL modules will be automaticallyhnteractively performed by this CAD program due to specific user requirements. The structure for instantiation can be seen from Fig. 13. The complete design flow can be seen from Fig. 15. Using the fuzzy controller design software fuzzyCAD, a VHDL description of the complete controller is generated. This VHDL source code can be mapped on a standard cell or FPGA target library using a high-level design tool (SYNOPSYS, in our case). With vendor-specific design software, this net list can be compiled to a physical implementation (layout). Simulation can be performed on every level of abstraction to validate the processor functionality. C. Internal Membership-Function Representation Overlap-Free MF Storage: For efficient defuzzification an overlap-free membership storage is very useful. The maximum overlap owmax determines the number of RAM blocks, required for storing the MF’s Rol . , Rovmax-l.Generally, a membership function MF,, i E (0,. . . ~ M -F l} will be Fig. 16 stored in the RAM module R, where x = imod ow,. shows an illustration for owmax = 3. Overlap-free MF storage is very effective for defuzzification, since the owmax RAM benches storing the MF functions of an output variable can be processed in parallel for inference computation. This is important, because the defuzzification operation is the critical bottleneck concerning performance. many MF Shape: Piece- Wise Linear Representation: In neurofuzzy approaches, where fuzzy systems are automatically generated ([4], [ 5 ] , [8]), the resulting membership functions can be bell shaped. The typical approach utilizing lookup tables for fuzzification is inefficient in many cases because of the requirement of huge fuzzification memory. One solution to this problem is the approximation of membership functions using straight lines, as shown in Fig. 17. In this example, each membership function form is represented with maximum of eight straight lines reducing the memory capacity. In case of implementing a sigmoidal membership function, 256 bytes are needed for a simple lookup table compared to 24 bytes which are needed for the approach with membership function approximation. Each line Y =
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 4, NO. 4, NOVEMBER 1996
410
entity fuzzification is generic ( var-width : integer :=l; nwn-width : integer :=3; x-width : integer :=4; y-width : integer :=4; m-width : integer :=4; adr-width-rm-mf : integer :=5; adr-width-ram-result : integer :=3; overlap : integer : = 1
-- bitwidth of MF number -- bitwidth subfunction of MF -- bitwidth MF def. coordinates -- RAM depth MF storage -- RAM depth result storage -- degree of MF overlap
1; port ( clk reset w-request fuzzifikation-start dat a-in data-valid x-in
: in std-logic; : in std-logic;
-- system clock -- global reset
in std-logic; -- MF write request in std-logic; -- start fuzzification : in std-logic-vector (var-width + nm-width + x-width + y-width + m-width - 1 downto 0); : in std logic; -- MF input data valid : i n ~ s t d ~ l o g i c ~ v e c t o r ( ( 2 * * v a r ~ w i d t h ) * x ~ w -1 i d tdownto h 0); :
:
w-reque st-akn : out std-logic; -- acknowledge MF write request : out std-logic; -- acknowledge writing MF data w rite-a kn : out std-logic; -- flag: fuzzification done fuzzifikation-end -DMA Read Access on Results DMA-ram-result-activate : in std-logic; DMA-rm-result-en : in std-logic; : in std-logic; DMA-rm-result-re : in T-FUZZIFICATION-DMAADR-OUT; DMA-rm-result-adr DMA-fuzzifikation-result : out T-FUZZIFICATION-DATA-OUT );
end fuzzification;
+
a, . (X - x,) y, is characterized by the three parameters: the tangential coefficient m,, the coordinates of the left-most position of the line x,, and y,. Three memory words distributed in three short memories or concatenated to 1 bit string may contain those parameters. MF Shape-Lookup Table Representation: For high-speed MF access, the lookup table representation is well suited: For every crisp input value, the MF-related fuzzified values can directly be read out of the memory without additional computation effort. This method is sometimes also more efficient for MF's with bent shapes which would require a lot of base points for piecewise linear representation.
D. Fuzzi$cation Unit Since the defuzzification is the most time-consuming unit, fuzzification can be performed serially without influence on the global timing behavior of the circuit. Fig. 18 shows the data path structure of this unit. Crisp inputs are captured in input registers and applied to the fuzzifier serially. The fuzzifier can be realized as a lookup table. Since fuzzification is not too time critical, in the presented approach, the membership functions are stored piecewise linear and the fuzzified values are computed by interpolation. This also implies one search through the MF memory per input variable. Generic VHDL fuzzification unit entity is shown at the top of the next page. E. Inference and Defuzzification Modules The composition and defuzzification unit works similar to the defuzzification unit in the previously described FPGA pro-
totyping system, using the midpoint-of-area method (MOA). Compared to the FUZ2LCA solution, the MF overlap can be any value o'umaXnow. The operational flow is as follows. The integration begins at the zero point of the output variable's value range. This value is applied to the MF memory and a certain number (< vumax) of nonzero MF values, and the corresponding MF identifier are read out. This can be done fully parallel since the MF's are stored overlap free. Then a minimum operation is performed on these MF values and the corresponding rule weights. The results are fed into a maxtree and the final value is used for the MOA defuzzification (integration), e.g., it is added to the accumulator or subtracted (depending on the actual integration direction). Depending on the sign of the integration result (stored in the accumulator register) the integration direction for the next step is determined. Fig. 19 shows the operational unit for defuzzification. F. Programmable Rule-Evaluation Kernel The rule-evaluation kernel is programmable by software. During the design phase, the number of parallel-running rule evaluators, their type, and rule-memory size is fixed. Three classes of rules can be processed. 1) Trivial Rules: if a is high then ~ 7 :is medium. 2) Normal Rules: conditions chained in a sequence with AND, OR, and NOT operators. 3 ) Hierarchical Rules: multilevel nesting with parenthesis-operators: AND, OR, NOT. For every class of rules, instances of rule evaluators can be created. For normal applications, the classes one and two are of primary interest. Rule evaluators of a class n may also
HOLLSTEIN et al.: COMPUTER-AIDED DESIGN BASED ON GENERIC VHDL SPECIFICATIONS
411
Fig. 19. MOA defuzzification: inference and integration unit.
evaluate rules of class m with m < n. The user has direct influence on the created number of rule evaluators of each class. For low-performance applications it may be sufficient to create only one rule evaluator of the most complex class to be processed later on. All rules will be processed sequentially, and the chip area (costs) is minimal. For high-performance applications multiple rule evaluators of one or different classes may operate in parallel. Each evaluator can process multiple rules sequentially.
IV. IMPLEMENTATION OF FuNe I Fuzzy SYSTEMS Two possibilities can be considered for the real-time implementation of a fuzzy systems automatically generated by neurofuzzy systems. The first and simple method is the direct implementation of all neurons and interconnections. The second method is to implement it as a fuzzy system.
In case of a FuNe I multilayer perceptron-based system, the computation time and the complexity of hardware needed for the first method is much higher than that for the second method. But for fuzzy interpretable neural networks based on nearest prototype classification, the first method is more appropriate, as described in Section V. Although the FuNe I-type fuzzy systems designed with off-line software training can be implemented in commercially available fuzzy processors, an application specific design would increase the speed. The design must be easily configurable for different generated fuzzy systems. The first hardware implementation has been a simple FPGA design. The FuNe I fuzzy system with four inputs and three outputs extracted from the popular Iris data set [l], is implemented in a single Xilinx FPGA 4005 chip. This design is used in a prototype board that can be connected to a personal
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 4, NO. 4, NOVEMBER 1996
412
Features InDuts output Mem. func. per input No. of rules Type of FPGA No. of sigmoids (EPROMs) No. of memory units for storing mem. func. Speed in million rules per second
X
I
Design 1 4 3 4
Design 2 128
4
16
128 256
XC4005 3 (256 bytes) 1 (256 bytes per mem. func.)
XC4006 4 (256 bytes) 3 (8 bytes per mem. func.)
1.25
1.25
x2
Data set
Iris Solder Dioit Maximum distance in all dimensions
x
City block distance
Number of false Classifications
Number of false Classifications
Pcity-block-dist
Pmaxdist
6 3 0
6
5 26
x1
Fig. 20. Points of equal “distances” for different distance measures
The distance measure used in competitive learning can be more generally defined by the Minkowski metric [12] n
computer via an ISA bus for the visualization of classification results. The typical approach utilizing lookup tables for fuzzification can be inefficient in cases with high fan-in because of the requirement of huge fuzzification memory. One solution to this problem is the approximation of membership functions using straight lines, as indicated in a previous section (see also Fig. 17). A comparison of performance of the two efforts described is summarized in Table I V. REAL-TIMEFUZZY INTERPRETABLE CLASSIFIERS Although FuNe I fuzzy systems can be efficiently configured in FPGA-based prototype boards, as discussed above, the off-line neural network training for designing is hardly implementable in FPGA’s due to area limitations. However, several new methods for the generation of fuzzy classification systems were presented that can be implemented in FPGA’s: * dynamic vector quantization (DVQ) variations (DVQ2 and DVQ3) as improved versions of learning vector quantization networks [5]; cubic basis function networks (CBFN) (deduced from famous radial basis function networks) with modified restricted coulomb energy (MRCE) learning presented in 0
PI. Since those methods can be considered as nearest prototype neural networks, the distance between an input vector and all the reference vectors are calculated to decide upon the class membership of an input vector. The prerequisite is the selection of a distance measure with less computational intensity.
p(i,J$) =
(E IId
-w
(8)
d l y .
d=l
The most commonly used measure, the Euclidean distance (A = a), the city block distance (A = l), and the maximum (A 4CO), can be derived from this general form. A. Computationally Feasible Distance Measures
The Euclidean distance, though reported good simulation results, contains a multiplication operation per dimension which is an disadvantage in hardware implementation. Therefore, the city block distance and the maximum measure are compared n Pcity-block-dist(I:
3)=
I I-~wdl
(9)
5 n.
(10)
d=l
pmax-dlst(cw) = MaxlId
-
wd( :
15 d
Both measures describe the points with equal “distances” + as squares for a two-dimensional input space (inputs I = {XI, XZ}), as shown in Fig. 20. The iris data set Iris [ 11 is a real-world data set, that can also be considered as a benchmark, having four inputs and three classes. The Solder data set is from a real-world application [9] consisting of 23 inputs, classifying solder joints into two classes, “good” or “bad.” In this paper, authors use another real-world data set Digit from the optical digit recognition containing 36 preprocessed inputs and ten outputs [4]. The complexity of the data sets Iris, Solder, and Digit increases in terms of the number of inputs. The comparison in Table 11 indicates that the city block distance is better, especially for complicated large data sets such as Solder and Digit.
413
HOLLSTEIN et al.: COMPUTER-AIDED DESIGN BASED ON GENERIC VHDL SPECIFICATIONS
50
45
40
"iris-data" -8"solder-data" -+"tyre-data" - 0 -
35
30
25
20
15
10
5
0
14
16
12
8
10
6
2
0
Accuracy [Bit]
Fig. 21. Classification error for different fixed point formats.
tHtk0of$H&w2I
wl2
w23
0
0
0
0
0
0 0
0 0
0 0
0
Fig. 23. Systolic array (one-dimensional) for nearest neighbor classification
the number of reference vectors generated per each class to one. The simulation results clearly indicate that the number of bits needed is less than six even for complex classification examples (Fig. 21). The number of neurons per class remain constant throughout the simulation since the dynamically adding neurons also compensate the computational accuracy.
& + 4 ~ o o o ] ? q ~ ~ ~
C. FPGA Architecture for Parallel Processing
W,1
wmz
wm
Min Flg. 22. Systollc array (two-dimensional) for nearest neighbor classification.
B. Fixed-point Calculation A reduced fixed-point format should be found without significant degrading in performance. The error versus the number of bits are analyzed for different data sets limiting
One can consider most of the tasks solved by neural networks as operations with arrays (e.g., input vectors and weights). These tasks are often solved with highly parallel architectures. Very often, systolic arrays (as best suited structures to this 'lass Of problems) are used. For performing nearest neighbor decisions with m- and n-dimensional reference vectors, a two-dimensional systolic array with m rows and n 1 columns can be used (Fig. 5). Reference vectors are stored in processing elements (PE's)-one in each row of the systolic array. The right-most additional column is used to find the smallest distance.
+
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 4, NO 4, NOVEMBER 1996
414
I
ADDRESS
I
INPUT
VECTOR
U Fig. 24. SIMD array.
In the case of CBFN, additional elements ?. = (r.1, rj2, . . .} which determine the extensions of the hyperbox from the center I? =. {W,1, WJz . . .} to the dimensions d = { 1,2, . . .} when the input vector I = {XI, Xz, . . .} is presented are stored into each PE. Whatever signal a PE receives from the left neighbor in the row, it has to pass the original input element Xd to the next PE in the column. If a PE receives “1” from its neighbor: if I & - WJdl 5 rJd, it passes the “1” to the neighbor in the row; otherwise it passes “0” to the neighbor in the row. The output of the PE’s in the row will be thereafter “0” since the input element is not in the attraction region of this hyperbox. One-dimensional systolic arrays can also be considered for implementations (Fig. 5); then no parameter data is stored in PE’s and the parameters of all hyperboxes are applied to the array one after another. If the number of reference vectors (hyper boxes) is m and number of dimension is n, m(n 2) n cycles are needed to make a nearest neighbor decision. The last PE in the row stores the actual smallest distance and compares this with its input. In this way, the smallest stored distance is updated. Considering hardware restrictions of FPGA’ s, systolic arrays with many elements could be hardly implemented. It is also possible to make an array with a feedback loop. If single instruction multiple data (SIMD) arrays are used for nearest neighbor classification, reference vectors are stored in local memories, either connected (external) or integrated into each PE. It takes n cycles for a PE to compute the distance between a n-dimensional reference vector and an input vector. 0
+ +
If there are k PE’s, then k distances are computed in parallel. It takes n + k cycles to get the result if outputs of all PE’s are sent to a common data bus sequentially. To classify m input vectors it takes X ( n k ) cycles. A SIMD array solution which is more appropriate to DVQ variants is illustrated in Section V-D Comparison of DifSerent Architectures: Considering the large application Digit there should be up to ten output classes. The feature space should have up to 36 dimensions and the number of dynamically generated neurons can be limited to 150. For every dimension of the feature space 6 to 8 bits can be used. The first architecture (Fig. 5 ) is hardly implementable on four FPGA’s because of the high number of PE’s. Although these PE’s are simple, not more than ten of them can be implemented on one chip (with external memory). But it’s also important to consider that there exist many connections between elements which, in turn, take a lot of routing resources of the FPGA. In case of assigning a generated neuron to a PE, the number of PE’s needed for this solution is very large: 150 * 36 -t 150 = 7550. The second and the third solution seem to be more suitable for FPGA’s, however, the first of these two architectures is still difficult to implement (Fig. 5). If there are 36 dimensions, 37 PE’s are needed. This means that three of the FPGA chips should contain nine PE’s, and one chip, ten PE’s. If three parameters are inputs to each PE simultaneously (&,WJd,r3d), every PE needs 3 * 8 = 24 VO pins, and overall, 10 * 24 = 240 U 0 pins are needed but a XC4013 FPGA has only 192 VO blocks. If data is presented to PE’s sequentially with 4-bit size, the number of available 110 blocks is sufficient, but it is, of course, twice as time consuming, and every PE also needs three 4-bit registers for input data. The third solution (one-dimensional systolic
+
415
HOLLSTEIN et al.: COMPUTER-AIDED DESIGN BASED ON GENERIC VHDL SPECIFICATIONS
ac - calculated distance db - data bus buffer ir - instruction register enb - enable buffer cnt - RAM addressingregister w - R A M for weight vectors 1
I
Fig. 25. Registers of a processing unit.
array with a feedback loop) is somewhat slower than the second one but seems to be more easily implementable-the number of PE’s is less and the number of registers which are needed for input data. Since the neurons representing reference vectors are dynamically created, finding the optimal number of PE’s is complicated, and the highest classification speed is seldom achieved. Concerning the solutions which are presented above, none of these three architectures allows training. In the case of cubic-basis function network (CBFN) with modified restricted Coulomb energy (MRCE) learning, the dimension which causes the misclassification must be known. Here, the systolic array solution shows only that the classification result is correct or not. If SIMD array is used for implementing a nearest neighbor classifier and CBFN, every PE needs at least four to five registers and a separate control unit that can be implemented in DSP. If an application works in real time and calculation speed of one single input vector has to be as high as possible, a SIMD array is the best solution. D. SIMD Array with FPGA’s
Each dynamically generated neuron is assigned to a processing unit in a FPGA, which calculates the distance between an input vector and the neuron. Since the distance to all the reference vectors (processing elements) should be calculated for each input vector, a SIMD array can be used for this purpose. Since no multiplication is used in training as well as in recall operation, the algorithms proposed seem to be very effective. Although it is sufficient to have a real-time implementation with a DSP-board for classification benchmarks such as Iris data, it is absolute necessary to implement parallel parts of algorithms with the SIMD array for complicated applications (e.g., Digit data set, where number of reference vectors generated is at least 46 where the number of generated reference vectors exceeds 40 (in the case of DVQ3). Even though the clock rate for FPGA’s are several times less than that for a powerful DSP solution, the overall speed gain is much higher for such applications due to the exploitation of massive parallel implementation of processing units in FPGA’s. Depending on the application, either CBFWMRCE, DVQ2, or DVQ3 can be selected. In the case, of highly representative training data sets CBFNMRCE is appropriate. For other applications, either DVQ2 or DVQ3 more suitable since the CBFNMRCE method does not have the ability to generalize properly. For highly overlapping data sets, DVQ3 gives the best results due to its excellent generalization capability. Distance Calculation: The registers implemented in a processing unit are shown in Fig. 23. The 64 word RAM w with six bit-word length stores
Fig. 26. Structure of a PE.
1
MSB
n I I 2
3
4
I:lp1 to11
1161
110%
LSB
Fig. 27. Calculation of minimum distance.
the weight vectors for 64 dimensions. The accumulator a c stores the calculated “distance” and c n t is used for indirect addressing of RAM. Table I11 shows the number of configurable logic blocks (CLB’s) occupied by each register of a PE. The following register transfers are implemented for calculation of the city block distance: adac calculates the city block distance according to (9): ac +ac 1 w(cnt)-db I cnt t cnt 1; l d c n t load the index register c n t : cnt + db; r d a c reads the accumulator a c of a PE: dbus t ac; l d a c initialize the accumulator ac:
+ +
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 4, NO. 4, NOVEMBER 1996
416
TABLE I11 NLIMBEROF CLB’S NEEDEDFOR REGISTERS
Number of CLBs 12 6 1.5 3 3 0.5
cnt enb
ac 0
c-
db;
l d w c initialize the weight vectors of a neuron:
c db w(cnt) t cnt -t- 1. cnt The processing elements described in VHDL are synthesized with a commercial tool (Synopsis) to get the FPGA net list (see, also, Fig. 24). Nearest Neighbor Calculation: After the calculation of city block distances, the minimum distance has to be calculated. The method presented in Fig. 25 uses a wired-OR bus for this purpose [7].
Invert all the bits. starting from the most signijicant bit (MSB), for all the bits: for all PE’s activated by the controller write the distances as binary numbers to the Wired-OR bus. If the resulting binary number on the bus is “1”: deactivate all the PE’s, that have written a “0” to the bus move to the next bit. If the resulting bit in the bus is “0”: move to the next bit. The final result is to inverted to get the minimum distance. In the example shown in Fig. 25, the minimum distance is “0010.” VI. CONCLUSIONS ~ L U DFUTUKE WOKK The presettied compiler FUZ2LCA for automatic generation of fuLzy controller implementations on FPG.4’s i s tested with s e \ w a l a p p l i c a t i o n e x n m p i c s . Sincc rules arc hard wirctl, this
are integrated into the complete controller structure. The fuzzyCAD design manager is currently implemented with an hypertext-based user interface. Furthermore, implementation cost models implemented as estimators for timing and area will be developed. Additionally, a defuzzification module for an FuNe I configurable fuzzy system is currently specified in VHDL. The neurofuzzy approaches can either deliver fuzzy modules that can be implemented by the generic fuzzy processor or they are hardware friendly and fuzzy interpretable neural structures that are directly considered as fuzzy hardware solutions. REFERENCES [l] E. Anderson, “The irises of the Gaspe Peninsula,” Bull. Amer. Iris Soc., vol. 59, pp. 2-5, 1935. [2] J. C Bezdek, “A review of probabilistic, fuzzy, and neural models for pattem recognition,” J. Intell. Fuzzy Syst., vol. 1, pp. 1-35, 1993. [3] D. Driankov, H. Hellendoom, and M. Reinfrank, An Introduction to Fuzzy Control. New York Springer-Verlag, 1993. [4] S. K. Halgamuge and M. Glesner, “Neural networks in designing fuzzy systems for real world applications,” Int. J. Fuuy Sets Syst., vol. 65, no. 1, pp. 1-12, 1994 (North Holland). [5j __, “Fuzzy neural networks: Between functional equivalence and applicability,” IEE Int. J. Neural Syst., vol. 6, no. 2, pp. 185-196, 1995 (World Scientific Publ.). [6] S. K. Halgamuge, T. Hollstein, A. Kirschbaum, and M. Glesner, “Automatic generation of application specific fuzzy controllers for rapid prototyping,” presented at IEEE Int. Con$ Fuzzy Syst., Orlando, FL, June 1994. [7] S. K. Halgamuge, W. Pochmuller, C. Grimm, and M. Glesner, “Fuzzy interpretable dynamically developing neural networks with FPGA-based implementation,” presented at 4th Int. Con$ Microelectron. Neural Networks Fuzzy Syst., Torino, Italy, Sept. 1994. [8] S. K. Halgamuge, W. Pochmuller, and M. Glesner, “An Altemative approach for generation of membership functions and fuzzy rules based on radial and cubic basis function networks,” Int. J. Approximate Reasoning, vol. 12, no. 314, pp. 279-298, Apr./May 1995. [9] -, “A rule-based prototype system for automatic classification in industrial quality control,” in IEEE Int. Con$ Neural Networks, San Francisco, CA, Mar. 1993, pp. 238-243. [lo] S. K. Halgamuge, T. A. Runkler, and M. Glesner, “A hierarchical hybrid fuzzy controller for realtime reverse driving support of vehicles with long trailers,” presented at IEEE lnt. Gun$ Fuzzy Syst., Orlando, FL, June 1994. [ l 11 D. L. Hung, “Dedicated digital fuzzy hardware,” IEEE Micro, Chips, Syst., Software, Applicat., vol. 15, no. 4, pp. 31-39, Aug. 1995. [I21 T. Kohonen, Self-Organization and Associative Memory. New York Springer-Verlag, 1989. [13] E. H. Mamdani and S. Assilian, “An experiment in linguistic synthesis with a fuzzy logic controller,” Int. J. Man-Machine Studies, vol. 7 , no. 1, pp. 1-13, 1975. [14] H. Surmanu and A. P. Ungering, “Fuzzy rule-based systems on generalpurpose processors,” IEEE Micro, Chips, Syst., Software, Applicat., vol. 15, no. 4, pp. 4 0 4 8 , Aug. 1995.
concept had to be improved for the automated design of fuzzy
A S K S whcre reprogrammability of the rules is required. Since
high-level design entry makes possible the mapping on different targct technologies (standard cell libraries, FPGA libraries, etc.), a VI-lD1.-based approach is well suiied for a fuzzyCAD toolkit. The additional advantage is that the system can already be simulated on behavioral level to validate if the selected bii widths for internal and extcrnal signals arc sufticicnt for achie\.ing a required coniputation precision. Sincc the whole system is instantiated from a generic VHDL library, basic rough design faults can bc excluded. The basic moddes of the VFIDL library arc already available. Currently the modules
Thomas Hollstein received the Dip1.-Ing. degree in electricial engineering from Damistadt University of Technology, Germany, in 1991. He is currently working toward the Ph.D. degree at D m s t a d t University of Technology. Since 1991, he has been a Research Assistant at the Institute of Microelectronic Systems, Darmstadt University of Technology, Germany. As a member of the CAD research group, his research interests are CAD for integrated circuits, hardwadsoftware co-design, system-level specifications, and hardware implementations of fuzzy systems.
HOLLSTEIN et al.: COMPUTER-AIDED DESIGN BASED ON GENERIC VHDL SPECIFICATIONS
Saman K. Halgamuge (M’85-M’96) received the B.Sc. degree in electronic and telecommunication engineering from the University of Moratuwa, Sri Lanka, in 1985, and the Dip1.-Ing. and Dr.-Ing. degrees in computer engineering from Darmstadt University of Technology, Germany, in 1990 and 1995, respectively. In 1985, he worked as an Electronic Engineer at the Ceylon Electricity Board, Colombo, Sri Lanka. From 1990 to 1995 he worked as a Research Associate at Darmstadt University of Technology, Germany, where he taught graduatehndergraduate courses. Currently, he is a Lecturer in Computer Systems Engineering at the University of South Australia at Adelaide and is associated with the following research groups, all at this university: Telecommunications Systems Engineering Centre of the Institute for Telecommunications Research, and Knowledge-Based Engineering Systems Centre. He is also associated with the Cooperative Research Centre for Sensor Signal and Information Processing, in Adelaide. He has published more than 40 conference/journal papers and contributed to five books in the areas of data analysis, mechatronics, neural networks, genetic algorithms and fuzzy systems. His research interests also include automatic target tracking, data fusion, and manufacturing systems.
417
Manfred Glesner (M’93) received the Dip1.Phys. degree from the Saarland University, in Saarbruecken, Germany, in applied physics and electricial engineering, in 1969, and the Ph.D. degree from the same university, in 1975, with research on the application of nonlinear optimization techniques in computer-aided design of electronic circuits. From 1975 to 1981, he was a Lecturer at the Saarland University, Saarbruecken, Germany, in the areas of electronics CAD and control. In 1981, he was appointed as an Associate Professor for electrical engineenng at Darmstadt University of Technology, Germany In 1989, Darmstadt University conferred him as a Full Professor and the new Chair of Microelectronics System Design. His current research is in advanced design tools for microelectronic circuits, VLSI digital signal processing, and to innovatwe system applicahions of microelectronics. Dr. Glesner is a member of several technical societies and is active in organizing international conferences.