Neural network based modeling of HfO2 thin film ... - Semantic Scholar

Report 2 Downloads 54 Views
Expert Systems with Applications Expert Systems with Applications 32 (2007) 358–363 www.elsevier.com/locate/eswa

Neural network based modeling of HfO2 thin film characteristics using Latin Hypercube Sampling Kyoung Eun Kweon a, Jung Hwan Lee a, Young-Don Ko a, Min-Chang Jeong b, Jae-Min Myoung b, Ilgu Yun a,* a

b

Semiconductor Engineering Laboratory, Department of Electrical and Electronics Engineering, Yonsei University, 134 Shinchon-Dong, Seodaemun-Gu, Seoul 120-749, Republic of Korea Information and Electronic Materials Research Laboratory, Department of Materials Science and Engineering, Yonsei University, 134 Shinchon-Dong, Seodaemun-Gu, Seoul 120-749, Republic of Korea

Abstract In this paper, the neural network based modeling for electrical characteristics of the HfO2 thin films grown by metal organic molecular beam epitaxy was investigated. The accumulation capacitance and the hysteresis index are extracted to be the main responses to examine the characteristics of the HfO2 dielectric films. The input process parameters were extracted by analyzing the process conditions and the characterization of the films. X-ray diffraction was used to analyze the characteristic variation for the different process conditions. In order to build the process model, the neural network model using the error back-propagation algorithm was carried out and those initial weights and biases are selected by Latin Hypercube Sampling method. This modeling methodology can allow us to optimize the process recipes and improve the manufacturability. Ó 2005 Elsevier Ltd. All rights reserved. Keywords: HfO2; Process modeling; Neural networks; Latin Hypercube Sampling

1. Introduction The industrial demands for highly integrated and multifunctional circuits lead to increase circuit density and scaling down the size of semiconductor devices. According to the technology roadmap of the semiconductor industry association (SIA) (Semiconductor Industry Association, 2000), a gate oxide thickness is reduced less than 1 nm for the application of the 0.05-lm metal-oxide-semiconductor field-effect-transistors (MOSFETs) in the near future. In this scale, MOSFETs cannot work properly because of the physical limits such as the excessive gate tunneling leakage and the gate oxide reliability (Wilk, Wallace, & Anthony, 2001). Therefore, the high-k dielectric materials, such as Al2O3, ZrO2, and HfO2, have a great attention as

*

Corresponding author. Tel.: +82 2 2123 4619; fax: +82 2 313 2879. E-mail address: [email protected] (I. Yun).

0957-4174/$ - see front matter Ó 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2005.11.032

candidates to replace the current gate oxides such as SiO2 (Cho et al., 2003; Cho, Wang, Sha, & Chang, 2002; Gusev et al., 2000; Lee, Kang, Nieh, Qi, & Lee, 2000; Lee, Kang, et al., 2000; Qi et al., 2000; Zhu, Li, & Liu, 2004). Among these candidates, HfO2 has risen as the one of the promising dielectric materials due to the large band-gap energy, the high dielectric constant and the high breakdown field. The application of the neural networks in the semiconductor manufacturing has been researched and successfully implemented in the area of the process modeling such as the molecular beam epitaxy and plasma-enhanced chemical vapor deposition processes (Han, Ceiler, Bidstrup, Kohl, & May, 1994; Lee, Ko et al., 2000, 2000). In this paper, the electrical properties of HfO2 thin film characteristics, such as the accumulation capacitance (Cacc) and the hysteresis index, were investigated via the neural network model using the error back-propagation algorithm. The accumulation capacitance (Cacc) is defined as the capacitance at the strong accumulation region and

K.E. Kweon et al. / Expert Systems with Applications 32 (2007) 358–363

the hysteresis index is defined as the width of the hysteresis loop generated by the bi-directional voltage sweep. The Latin Hypercube Sampling (LHS) was used to generate the weights and the biases of the neural networks and the modeling results were verified using statistical analysis.

359

Table 1 Summary of process conditions

2. Experiments HfO2 thin film was grown on a p-type Si (1 0 0) substrate, of which the native oxide was chemically eliminated by (50:1) H2O:Hf solution prior to the growth by MOMBE. Hafnium-tetra-butoxide [Hf (O  t-C4H9)4] was chosen as the MO precursor because it has an appropriate vapor pressure and relatively low decomposition temperature. High-purity (99.999%) oxygen gas was used as the oxidant. Hf-t-butoxide was introduced into the main chamber using Ar as a carrier gas through a bubbling cylinder. The bubbler was maintained at a constant temperature to supply the constant vapor pressure of Hf-source. The apparatus of the system is schematically shown in Fig. 1. High-purity Ar carrier gas passed through the bubbler containing the Hf-source. The gas line from the bubbler to the nozzle was heated to the same temperature. The mixture of Ar and metal-organic gases heated at the tip of the nozzle flows into the main chamber. The introduced Hf-source decomposed into Hf and ligand parts when it reached a substrate maintained at high temperature and Hf ion was combined with O2 gas supplied from another nozzle. The base pressure and working pressure were 109 and 107 Torr, respectively. The HfO2 films grown by MOMBE were annealed at 700 °C for 2 min in N2 ambient. The process conditions are summarized in Table 1. Au dots were deposited to evaluate the electrical properties of grown HfO2 sample. The stainless shadow mask was used to make regular Au dots and the hole diameter in the mask was 0.2 mm. The determination of the electrode

Process variables

Range

Substrate temperature Bubbler temperature Nozzle temperature Base pressure Working pressure Gas flow (Ar) Gas flow (O2) Growth time

450–550 °C 130 °C (Fixed) 270 °C (Fixed) 109 Torr 107 Torr 3–5 sccm 3–5 sccm 30 min

metal and accurate definition of electrode area has influence on the analysis of the electrical properties of HfO2. 3. Modeling scheme 3.1. Design of experiments In order to characterize the high-k dielectric properties, the input factors are extracted with respect to the controllable process variables of MOMBE equipment. Those factors are the substrate temperature (Tsub), Ar gas flow (Ar) and O2 gas flow (O2). Generally, the factorial design creates two levels of each factor, which are called ‘high’ and ‘low’, respectively. The full factorial design specifies factorial design with all possible high (+)/low () combination of all the input factors. Considering the curvature effect, the design of two-level factors with center points is carried out (Montgomery, Keats, Perry, Thompson, & Messina, 2000). The full factorial design matrix with one center point is summarized in Table 2. 3.2. Latin Hypercube Sampling The Latin Hypercube Sampling (LHS) is used in this study to select randomized values for the weights and the

Substrate heater

Turbo Molecular pump

Substrate holder

Main chamber

Shutter

Loadlock chamber

Ar Nozzle View port

O2 Mass flow controller

Leak valve

Bubbler valves

Pressure gauge Outlet

Fig. 1. The schematic of MOMBE systems.

Mass flow controller

Inlet Bubbler Heater

360

K.E. Kweon et al. / Expert Systems with Applications 32 (2007) 358–363

The neural networks in this work carried out with the error BP algorithm. The error BP neural networks consist of several layers of neurons which receive, process and transmit critical information regarding the relationships between the input parameters and corresponding responses. Generally, the weight mechanism of the BP algorithm is defined by the following (Chen, 1996):

Table 2 Factorial design matrix Run

Tsub [°C]

Ar [sccm]

O2 [sccm]

Remark

1 2 3 4 5 6 7 8

450 450 450 450 550 550 550 550

3 3 5 5 3 3 5 5

3 5 3 5 3 5 3 5

Full factorial design

9

500

4

4

Center point

wijk ðn þ 1Þ ¼ wijk ðnÞ þ gDwijk ðnÞ

biases, which are parameters of neural networks. The LHS method is a stratified sampling technique where the random variable distributions are divided into equal probability intervals. The LHS method generates a sample size N from the n variables. A 1/N probability is randomly selected from within each interval that is partitioned into N nonoverlapping ranges for each basic event (Swidzinski & Chang, 2000). Unlike the simple random sampling, the LHS method can describe a full coverage of the sampling range by maximally satisfying each marginal distribution. The distributions of sampling with respect to the selecting method are illustrated in Fig. 2. The 100 samples were generated in the range of (0.5, 0.5). It is presented that the sampling values of the LHS method are uniformly distributed comparing to that of the random sampling. Therefore, the unbiased random values of the weights and biases for the neural networks were selected via the LHS method.

where wijk is the connection strength between the jth neuron in the layer (k  1) and the ith neuron in layer k, Dwijk is the calculated change in that weight which reduces the error function of the networks, and g is the learning rate. This algorithm has been shown to be very effective in learning arbitrary nonlinear mappings between noisy sets of input and output factors. The schematic of general feed-forward neural networks are shown in Fig. 3. The neural networks parameters used in this study are summarized in Table 3. These networks were trained on nine experimental runs. The two trials were used for testing data in order to verify the fitness of the NNet outputs for the

Responses

y1

h1

10

Frequency

Frequency

10

2

2

Value

0.2

0.3

0.4

....

....

hk

Hidden Layer(s)

....

....

xi

xm

Input Layer

6 4

0.1

ho

W oj

8

4

0.0

Output Layer

Fig. 3. Typical feed-forward neural networks.

12

-0.5 -0.4 -0.3 -0.2 -0.1

yn

Inputs

12

0

(a)

....

W mo

x1

14

6

yj

W 11

14

8

....

W 1j

3.3. Neural networks Neural networks are utilized to model the nonlinear relationship between inputs and outputs in semiconductor process modeling. The networks consist of the three layers that are the input layer, the hidden layer and the output layer. That is comprised of simple processing units called neurons, interconnection, and weights that are assigned to the interconnection between neurons (May, 1994). Each neuron contains the weighted sum of its inputs filtered by a nonlinear sigmoid transfer function.

ð1Þ

0

0.5

(b)

-0.5 -0.4 -0.3 -0.2 -0.1

0.0

0.1

0.2

0.3

0.4

Value

Fig. 2. Two difference distributions of the sampling values: (a) the simple random sampling and (b) LHS.

0.5

K.E. Kweon et al. / Expert Systems with Applications 32 (2007) 358–363

and features indicating that the results are satisfied with the statistical assumption for the residuals (Mayers & Montgomery, 1995). The statistical significances of three input factors are listed in Table 4 under the significance level (a = 0.05). For the accumulation capacitance (Cacc), Tsub and Ar are significance factors and Ar and O2 are considered as significance factors for the hysteresis index. The response surface plots of the accumulation capacitance are shown in Fig. 6 when O2 is fixed at the 4 sccm and Tsub is fixed at 500 °C, respectively. The accumulation capacitance is proportional to the dielectric constant and inversely proportional to the equivalent oxide thickness

Table 3 Summary of the neural network parameters NNets parameters NNet structure NNet learning rate NNet momentum

3-4-3-2 0.0003 0.04

results of the training data. The root mean square errors (RMSEs) of the training for Cacc and the hysteresis are 0.76 and 0.03, respectively. The RMSEs for the testing are 0.76 and 0.03, respectively. 4. Results and discussion The neural network model results and the residual plots for Cacc and the hysteresis are illustrated in Figs. 4 and 5, where the squares represent the training data and the triangles represent the testing data for prediction. The modeling results exhibit a good agreement with the values between the predicted and the measured responses, respectively. It is observed that the residual plots for all responses are randomly distributed and there are no special patterns

Table 4 Statistical significance level Factor

Significance level

Tsub Ar O2

Cacc

Hysteresis

0.002 0.007 0.088

0.416 0.011 0.039

1.5

22

1.0

20

0.5

18

Residuals

Cacc(Network Outputs) [pF]

361

16

0.0 -0.5

14

: training data : testing data

12

-1.0 -1.5

10 10

12

14

16

18

20

0

22

Cacc (Experimental Data) [pF]

(a)

2

4

6

8

10

Run Order

(b)

Fig. 4. The neural network modeling results for Cacc: (a) the measured vs. the predicted values and (b) the residual plot.

0.10

1.6

0.05

Residuals

Hysteresis (Network Outputs) [V]

2.0

1.2

0.8

0.4

-0.05

: training data : testing data

0.0 0.0

(a)

0.00

0.4

0.8

1.2

1.6

-0.10

2.0

Hysteresis (Experimental Data) [V]

0

(b)

2

4

6

8

10

Run Order

Fig. 5. The neural network modeling results for the hysteresis: (a) the measured vs. the predicted values and (b) the residual plot.

362

K.E. Kweon et al. / Expert Systems with Applications 32 (2007) 358–363

Fig. 6. The response surface plots for Cacc: (a) O2 = 4 sccm and (b) Tsub = 500 °C.

(EOT). With increasing Tsub, fully decomposed Hf source [Hf (O Æ t-C4H9)4] makes the hydrocarbon-rich circumstances. The incorporation of them limits the crystallite size and causes the dominant tetragonal phase in the film. It was found that small O2/Ar ratio causes the hydrocarbon-rich plasma and limits crystal size. As small O2/Ar ratio causes the hydrocarbon-rich plasma and limits crystal size, the accumulation capacitance (Cacc) is increased (Kim et al., 2004). As the substrate temperature (Tsub) is increased, the oxide thickness is decreased and Cacc is increased. Based on the results for 2h XRD scan shown in Fig. 7, the tetragonal phase is observed at 30.3°. The tetragonal phase means that crystallite size is limited and small because the tetragonal phase can be stabilized in very small crystallites (Garvie, 1978; Garvie & Gross, 1985). As shown in Fig. 7, the intensity of the tetragonal phase is increased as the substrate temperature is increased from 450 °C to 550 °C. It can be interpreted that the tetragonal phase affects the reduction of the oxide thickness (Kim et al., 2004). The response surface plots of the hysteresis are shown in Fig. 8 when Tsub is fixed at 500 °C and O2 is fixed at 4 sccm,

respectively. As shown in Fig. 8 (a), the hysteresis that is proportional to the interfacial trap density (Dit) increases with decreasing O2/Ar ratio because the formation of a superior interface of the oxide layers decreases Dit with increasing O2/Ar ratio (Wilk et al., 2001). During the growth of HfO2 on a Si substrate, Hf is deposited and reacts with the oxygen. þ Hf þ O2 ! HfObulk HfObulk 2 2

However, the oxygen is not enough to react with Hf, the oxygen vacancy (VO) is created. þ Hf ! HfObulk þ 2V O HfObulk 2 2

Au 10 1000

Intensity (a.u.)

Intensity (a.u.)

1000

t(1 1 1) m(-1 1 1) m(1 1 1)

100

10 20

ð2Þ

As shown in Fig. 8 (b), sufficient oxygen vacancy mobility decomposes the interfacial layer of SiO2 and creates the interfacial silicate. It was found that these decomposition reactions take place actively when Tsub is lower than 500 °C (Copel & Reuter, 2003). In addition, the charges are trapped by the oxygen vacancies as voltage sweep bidirectionally. The interfacial trap density and the hysteresis index are increased due to the trapped charges. As Tsub increases from 450 °C to 550 °C, the decomposition does

Au

(a)

ð1Þ

30

m(2 0 0)

40



50

(b)

m(-1 1 1) t(1 1 1)

m(2 0 0)

10 100

10 20

30

40

50



Fig. 7. The 2h XRD scan: (a) Tsub = 450 °C, Ar = 5 sccm, and O2 = 5 sccm and (b) Tsub = 550 °C, Ar = 5 sccm, and O2 = 5 sccm.

K.E. Kweon et al. / Expert Systems with Applications 32 (2007) 358–363

363

Fig. 8. The response surface plots for the hysteresis: (a) Tsub = 500 °C and (b) O2 = 4 sccm.

not happen actively and the hysteresis index is decreased. Based on this analysis, the modeling results reveal a good agreement with the physical mechanism. 5. Conclusion The electrical characteristics of HfO2 thin films were investigated via the error BP neural network model using The Latin Hypercube Sampling and the neural network models to correlate between the process conditions and the electrical characteristics were developed. The Latin Hypercube Sampling method used to generate the weights and the biases with equal probability distribution within a specific interval statistically randomly. From these results, the neural network modeling can explain the comprehensive effects of the response on the varying process conditions in accordance with the physical mechanisms. The methodology can allow us to predict electrical properties with respect to process conditions as well as it can improve the manufacturability. Acknowledgement This work was supported by the Brain Korea 21 Project in 2005. References Chen, C. H. (1996). Fuzzy logic and neural network handbook. McGrawHill. Cho, B.-O., Chang, J. P., Min, J.-H., Moon, S. H., Kim, Y. W., & Levin, I. (2003). Material characteristics of electrically tunable zirconium oxide thin films. Journal of Applied Physics, 93(1), 745–749. Cho, B.-O., Wang, J., Sha, L., & Chang, J. P. (2002). Tuning the electrical properties of zirconium oxide thin films. Applied Physics Letters, 80(6), 1052–1054. Copel, M., & Reuter, M. C. (2003). Decomposition of interfacial SiO2 during HfO2 deposition. Applied Physics Letters, 83(16), 3398–3400. Garvie, R. C. (1978). Stabilization of the tetragonal structure in zirconia microcrystals. Journal of Physical Chemistry, 82(2), 218–224.

Garvie, R. C., & Gross, M. F. (1985). Intrinsic size dependence of the phase transformation temperature in zirconia microcrystals. Journal of Material Science, 21(4), 1253–1257. Gusev, E. P., Copel, M., Cartier, E., Baumvol, I. J. R., Krug, C., & Gribelyuk, M. A. (2000). High-resolution depth profiling in ultrathin Al2O3 films on Si. Applied Physics Letters, 76(2), 176–178. Han, S., Ceiler, M., Bidstrup, S., Kohl, P., & May, G. (1994). Modeling the properties of PECVD silicon dioxide films using optimized back propagation neural networks. IEEE Transactions on Components, Packaging, and Manufacturing Technology—Part A, 17(2), 174–182. Kim, M.-S., Ko, Y.-D., Hong, J.-H., Jeong, M.-C., Myoung, J.-M., & Yun, I. (2004). Characteristics and processing effects of ZrO2 thin films grown by metal-organic molecular beam epitaxy. Applied Surface Science, 227, 387–398. Lee, B. H., Kang, L., Nieh, R., Qi, W.-J., & Lee, J. C. (2000). Thermal stability and electrical characteristics of ultrathin hafnium oxide gate dielectric reoxidized with rapid thermal annealing. Applied Physics Letters, 76(14), 1926–1928. Lee, K. K., Brown, T., Dagnall, G., Bicknell-Tassius, R., Brown, A., & May, G. (2000). Using neural networks to construct models of the molecular beam epitaxy process. IEEE Transactions on Semiconductor Manufacturing, 13(1), 34–45. May, G. (1994). Manufacturing ICs the neural way. IEEE Spectrum, 47–51. Mayers, R. H., & Montgomery, D. C. (1995). Response surface methodology. New York: Wiley. Montgomery, D. C., Keats, J. B., Perry, L. A., Thompson, J. R., & Messina, W. S. (2000). Using statistically designed experiments for process development and improvement: an application in electronics manufacturing. Robotics and Computer Integrated Manufacturing, 16, 55–63. Qi, W.-J., Nieh, R., Lee, B. H., Kang, L., Jeon, Y., & Lee, J. C. (2000). Electrical and reliability characteristics of ZrO2 deposited directly on Si for gate dielectric application. Applied Physics Letters, 77(20), 3269–3271. Semiconductor Industry Association (SIA), 2000. The national technology roadmap for semiconductors. Swidzinski, J. F., & Chang, K. (2000). Nonlinear statistical modeling and yield estimation technique for use in Monte Carlo simulations. IEEE Transactions on Microwave Theory and Techniques, 48(12), 2316–2324. Wilk, G. D., Wallace, R. M., & Anthony, J. M. (2001). High-k gate dielectrics: current status and materials properties considerations. Journal of Applied Physics, 89(10), 5243–5275. Zhu, J., Li, Y. R., & Liu, Z. G. (2004). Fabrication and characterization of pulsed laser deposited HfO2 films for high-k gate dielectric applications. Journal of Physics D: Applied Physics, 37, 2896–2900.