Spoken Inquiry Discrimination Using Bag-of-Words ... - Semantic Scholar

Report 11 Downloads 65 Views
Spoken Inquiry Discrimination Using Bag-of-Words for Speech-Oriented Guidance System Haruka Majima', Rafael Torres', Yoko Fujita', HirOlηichi Kawanami', Tomoko Matsui2, Hiroshi Saruwαtari', Kiyohiro Shikano'

'

Graduate School ofInformation Science, NaraInstitute of Science and Technology, Japan 2 Department of Statistical Modeling, The Institute of Statistical Mathematics, Japan {ha工uka-rn,

工afael-t,

kawanarni,

sawata工i,

shikano}@is.naist.jp,

[email protected]

Moreover, we employ it with the likelih∞d values of G恥仏1s,

Abstract

and duration and SNR information of input utterances to

We investigate a discrimination meth吋伽invalid and valid

spoken inquiries, received by a speech-oriented guidance system

operating in a real environment. Invalid spoken inquiries include background voices, which are not directly uttered to the system,

and nonsense utterances. Such spoken inquiries should be

rejected beforehand. By now, we have reported a method using the likelih∞d values of Gaussian mixture models (Gl\仏1s) to

discriminate invalid spoken inquiries台。m valid ones. In this

pap町, we improve the performance by utilizing not 0凶y the

likelih∞d values but also other information in spoken inquiries

improve the total performance of discrimination between invalid and valid inputs

In this paper, we describe effective feature combinations for

invalid input discrimination, using the BOW from the 1 0・best

ASR results to train mode1s with either SVM or ME. The out1ine of the paper is as folJows. First, we describe th巴 overview of Takemaru-kun system. Then, we describe our proposed method

with several e仔ective features using SVM or ME, and show the experimenta1 results using real users' utterances. Finally, we

concl ude our proposal.

such as bag-oιwords (BOW), utterance duration, and signaトto­ noise ratio (SNR). To deal with these multiple information, we

use support vector machine (SVM) with radial basis functior】

2.

Speech-oriented guidance system

(阻F) kemel and maximum entropy (ME) method and ∞mpare

Takemaru・kun

measure for SVM and 84.2% for ME, while F・measure for

The Takemaru-kun system川(F明re 1 ) is a real-environment

Index Terms: speech-oriented guidance system, spoken inquiry

of the lkoma City Nor/h白mmunity Cen/er 10ωted in 血e

discrimination, support vector machine, maximum entropy, bag­ of-words

since Nov. 2∞2, providing guidance to visitors regarding the

the performance.In the experiments, we achieve 86.6% of F・

GMM-based method is 81.7%.

1.

speech-oriented guidance system, pla∞d inside the entrance hall

Prefecture of Nara, Japan. The system has been operated dai1y

center facilities,

foreωst,

Introduction

Automatic sp田ch re∞gnition (ASR) has been widely applied to

news,

services,

and

about

neighboring the

agent

sightseeing, itse1f,

weather

among

other

mゐrmation. Users can a1so activate a Web s回rch feature that

allows searching for Web pages over the Internet containing出e

dictation, Voice Search, and ωr navigation, to name a few. 1n

uttered keywords 羽山system is also aimed at serving as field

this paper, we describe the speech圃oriented guidan∞ system

test of a speech interface, and to collect actual utterance data

Takemaru-kun [1], which aims to realize a natural speech

interface using ASR. Takemaru-kun

is

a

real-environment

speech-oriented

guidance system adopting an example-based question answering

(Q八) that is flexible to respond to u田r's questions on demand. 八n answer 10 a user's question is selected by referring to the

The system displays an animated agent at the front monitor, which is the mascot character ofIkoma city, Takemaru-kun.百le interaction with the system follows a one-question-1・ω

Só óO;.a

おゐ2・0

と同・" S

お2明。

群2 �Q.

穆.. 一 一

2民2・"

世・・一一一一一 ­ 一一 -

E

:"'s伊"

-

7S・.

-

圃圃・

760。

s\ �I ・Mf

・・・

(1正jMM

伯尚M.\l. 白01'、durallon

自0\\

(6K.iMM. 且01\. S'1R

t7K.i�1�1. 自O込,

durat1on. S!\ R

ωmhinalion "f featu町S

Fi思Ire 5: F-measures of SVM and ME for combinalion patlerns (1) and (5) to (7)

5.

Conclusions

We investigated discrimination between inva1id and valid spoken inquiries using multiple features as the likelihood values of GMMs, BOWs, utterance durations and SNRs. SVM and ME methods were compared in performance with Ihe F-measure and both methods outp町お口n吋the convenlional melhod, which uses Ihe likelih∞d values of GMMs. Especially SVM p町formed the besl and achieved 86.6% ofF-measure. Our future work includes further investigationゐr effective combination methods of differenl kinds of features.

6. Acknowledgements This work was partially supporled by CREST (Core Research for Evolutional Science and Technology), Japan Science and Technology Agency (JST).

7. [1]

M 臥U 船 田園田・ 一 め 同 t 加 一 に 十 ,M M 一川固 川 山 剛 . .. m 曲 出 脳 一 v 削 ・・・・・・・・・・・・・ 目叫 | : 。 船 川 | 榊u | 寸 aコ U6 0 AUい 岡 町叫 同 M M N

Pa仕ern (1) (2) (3) ( 4) (5) (6) (7)

- - . - - . -且一白.司 -一 一. -・. -

Table 4: F mb Combination of features GMM likelihood, BOW GMM Iikelihood, duration GMM Iikelihood, SNR GMM Iikeliho吋, duration, SNR GMM Iikelihood, BOW, duration GMM likelihood, BOW, SNR G恥仏1likelihood, BOW, duration, SNR

[2]

自H・。

島4.�.o

お45・@

R, l・。

[3]

M

:

[4]

[5]

Figure 4目F-measures of SVM and ME for combination patt町ns ( l ) 10 (4)

R. Nisimurn,

A.

Speech-Oriented

References Saruwotori

Lee, H.

Guidance

System

ond

K.

with

Shikono, “Public

Adult

and

Child

Oiscrimination Capability", In Proc. ICASSP2004, vol. 1, pp.433436,2004 H. Saka� T. Cincarek,H. Kawanami, H.Saruwatari, K. Shikano, A Lee,

“Voice

Activity

Oetection

applied to

hands-free

spoken

dialogue robot based on decωing using acoustic and language model", Proceedings of the 1 st international conference on Robot commu町田tion and coordination (ROBOCOMM2007),Article No 16,8 page邑2007 A.

Lee,

Shikano

K. Nakamura, R. Nishimura, H. Saruwatari, and

K.

‘'Noise R obust R田1 World Spoken Oialogue System

using GMM Based R句ection of Unintended Inputs", In Proc. Int町田tional Conference on Spoken Language Processing, pp. 847-850,2004 C. Chang and C. Lin,“LIBSVM: a Library for Support Vector Machines",

2001.

Software

available

at

http://www.csie.n刷edu.tw,ム-cj linIlibsvm C. Manning and O. Klein,“Optimization, Max朗t Models, and Conditio回1 Estimation without Magic", Tutorial at HLT-NAACL 2003

and

ACL

2003.

Software

available

at

http://nJp.stanford.edu/softwar巴/classifi町shtml [6]

A.

Lee, T. Kawahara

Realtime

Large

and

K.

Shikano,

Vocabulary

,‘JuIius

Recognition

Eurospeech2001,pp. 1691-1694,2001

- an Open Source Engine",

Proc