X logFX(m,k)

Report 0 Downloads 86 Views
US0077200 12B1

(12) United States Patent

(10) Patent No.:

BOrah et al. (54) SPEAKER IDENTIFICATION IN THE

May 18, 2010

7,472,063 B2 * 12/2008 Nefian et al. ............. TO4/256.1

PRESENCE OF PACKET LOSSES

7,617,101 B2 * 1 1/2009 Chang et al. ................ TO4,232 2002/0103639 A1 8/2002 Chang et al.

(75) Inventors: Deva K. Borah, Las Cruces, NM (US); Philip De Leon, Las Cruces, NM (US)

2002/0164070 A1* 11/2002 Kuhner et al. ............... 382,159 2003/0036905 A1 2/2003 Toguri et al. 2003/0088414 A1*

(73) Assignee: Arrowhead Center, Inc., Las Cruces,

5/2003 Huang et al. ................ 704/246

2003/O120489 A1

NM (US) Subject tO any disclaimer, the term of this

( c ) Notice:

US 7,720,012 B1

(45) Date of Patent:

6/2003 Krasnansky et al.

2003/0.198.195 A1 * 10, 2003 Li .............................. 370,260 2005/0276235 A1 12/2005 Lee et al. .................... 370,270

patent is extended or adjusted under 35 U.S.C. 154(b) by 1344 days. OTHER PUBLICATIONS

(21) Appl. No.: 11/178,959 (22) Filed:

Mayorga, P. Besacier, L., Lamy, R., Serignat, J.-F. Audio packet loss over IP and speech recognition, Nov.30, 2003, IEEE, 607-612.* Besacier, L., et al., “GSMSpeech Coding and Speaker Recognition'.

Jul. 11, 2005 Related U.S. Application Data

Proc. IEEE ICASSP'00, (Jun. 2000), 1-4.

Campbell, Joseph P. et al., “Testing with the YOHO CD-Rom Voice

(60) Provisional application No. 60/586.889, filed on Jul.9,

Verification Corpus'. Proc. Int. Conf Acoustics, Speech, and Signal

2004.

Processing (ICASSP), (1995),341-344. Davis, Steven B., et al., “Comparison of Parametric Representations

(51) Int. Cl. H04L 12/16 (2006.01) (52) U.S. Cl. ....................... 370/260; 370/230; 382/159; 704/232; 704/246; 704/256.1: 709/204

tences', IEEE Transactions on Acoustics, Speech, and Signal Pro

for Monosyllabic Word Recognition in Continuously Spoken Sen

(58) Field of Classification Search ................. 370/352, 370/260,270, 230; 379/100.06; 455/556.2:

(Continued) Primary Examiner Gerald Gauthier

cessing, vol. ASSP-28, No. 4, (Aug. 1980),357-366.

704/246. 232,256.1: 709/204; 382/159

(74) Attorney, Agent, or Firm Justin R. Jackson; Philip D.

See application file for complete search history. (56)

Askenazy: Peacock Myers, P.C.

References Cited

(57)

ABSTRACT

U.S. PATENT DOCUMENTS

4,363,102 A

A system, method, and apparatus for identifying a speaker of an utterance, particularly when the utterance has portions of it missing due to packet losses. Different packet loss models are applied to each speaker's training data in order to improve accuracy, especially for Small packet sizes.

12/1982 Holmgren et al.

6,041,299 A *

3/2000 Schuster et al. ............. TO4,232

6, 195,639 B1 6,389,392 B1

2/2001 FeltStrom et al. 5, 2002 Pawlewski et al.

6,584,494 B1* 6,751,590 B1* 6,772,119 B2 *

6/2003 Manabe et al. .............. TO9.204 6/2004 Chaudhari et al. .......... 704/246 8/2004 Chaudhari et al. .......... 704/246

20 Claims, 6 Drawing Sheets

7,457.242 B2 * 1 1/2008 Beightol et al. ............. 370,230 O

u(n)

14

12

X(m,k) Magnitude- X(m,k)

Silence

Squared

Removal this

is

sail in

Pre-emphasis

Mel-scale Filterbank

Log-energy

Xk logFX(m,k)) 20

Y(m) 22

2

US 7,720,012 B1 Page 2 OTHER PUBLICATIONS

Goldsmith, J., et al., "Capacity, Mutual Information, and Coding for Finite-State Markov Channels', IEEE Trans. Signal Processing, vol. 42, (May 1996),868-886. Quatieri, Thomas F. “Discrete-Time Speech Signal Processing Prin ciples and Practice'. Prentice-Hall, Inc., New Jersey, 2002, textbook

discription on 3 pages provided,(2002).

Reynolds, Douglas A. et al., “Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models', IEEE Transactions on Speech and Audio Processing, vol. 3, No. 1. (Jan. 1995).72-83.

* cited by examiner

U.S. Patent

May 18, 2010

Sheet 1 of 6

O

US 7,720,012 B1

12

14

Magnitude- X(m,k) Squared missssss

Pre-emphasis

F Mel-scale 'l

X(m k)

Filterbank

Log-energy

X log(f, |X(m,k))

> Y (m) 22

20

FIG. 1

2

U.S. Patent

May 18, 2010

Sheet 2 of 6

US 7,720,012 B1

O

O.

O

500

1000

1500

2000

f(Hz)

FG. 2

2500

3000

3500

4000

U.S. Patent

May 18, 2010

Sheet 3 of 6

US 7,720,012 B1

1OO

90

80

70 c)

O

() d

so 50

40

O 20% Packet Loss X 40% packet LOSS 30

-

O

50

100

150

Packet Size (samples/packet)

FIG. 3

200

- -

250

U.S. Patent

May 18, 2010

Sheet 5 of 6

US 7,720,012 B1

95

90

s

s

8 S

85

8O

-e- 8 samples/packet -- 64 samples/packet 75 2

4.

6

8

10

ldentification data length (sec)

FIG.S

2

14

16

U.S. Patent

100

May 18, 2010

T-

Sheet 6 of 6

US 7,720,012 B1

-

98 96 94 92 90

-e- 50% packet loss

-%-20% packet loss

82-

2

-v- No loss

4

6

8

O

identification data length (sec)

F.G. 6

2

4.

16

US 7,720,012 B1 9 10 known speakers, the training data comprising packet 17. The computer readable material of claim 16 wherein losses, and the test data comprising utterances from an the test data comprises packet losses. unknown speaker, cause the computer to extract one or 18. The computer readable material of claim 17 wherein more features from training data comprising utterances the computer estimates a packet loss rate of the test data and from known speakers, the training data comprising 5 applies the test data packet loss rate to the training data. packet losses; 19. The computer readable material of claim 16 wherein obtain at least one parameter set corresponding to the fea- the computer generates a plurality of parameter sets corre tures of the training data of each known speaker, extract sponding to each known speaker, each Such parameter set one or more features from test data comprising utter- comprising a different packet loss rate. ances from an unknown speaker; determine a probabil- 10 20. The computer readable material of claim 16 wherein ity for each parameter set that the features from the test the parameter sets are stored on a computer-readable storage data arise from that parameter set; and identify the medium. unknown speaker by determining which known speak er's parameter set maximizes the probability.

k

.

.

.

.