US0077200 12B1
(12) United States Patent
(10) Patent No.:
BOrah et al. (54) SPEAKER IDENTIFICATION IN THE
May 18, 2010
7,472,063 B2 * 12/2008 Nefian et al. ............. TO4/256.1
PRESENCE OF PACKET LOSSES
7,617,101 B2 * 1 1/2009 Chang et al. ................ TO4,232 2002/0103639 A1 8/2002 Chang et al.
(75) Inventors: Deva K. Borah, Las Cruces, NM (US); Philip De Leon, Las Cruces, NM (US)
2002/0164070 A1* 11/2002 Kuhner et al. ............... 382,159 2003/0036905 A1 2/2003 Toguri et al. 2003/0088414 A1*
(73) Assignee: Arrowhead Center, Inc., Las Cruces,
5/2003 Huang et al. ................ 704/246
2003/O120489 A1
NM (US) Subject tO any disclaimer, the term of this
( c ) Notice:
US 7,720,012 B1
(45) Date of Patent:
6/2003 Krasnansky et al.
2003/0.198.195 A1 * 10, 2003 Li .............................. 370,260 2005/0276235 A1 12/2005 Lee et al. .................... 370,270
patent is extended or adjusted under 35 U.S.C. 154(b) by 1344 days. OTHER PUBLICATIONS
(21) Appl. No.: 11/178,959 (22) Filed:
Mayorga, P. Besacier, L., Lamy, R., Serignat, J.-F. Audio packet loss over IP and speech recognition, Nov.30, 2003, IEEE, 607-612.* Besacier, L., et al., “GSMSpeech Coding and Speaker Recognition'.
Jul. 11, 2005 Related U.S. Application Data
Proc. IEEE ICASSP'00, (Jun. 2000), 1-4.
Campbell, Joseph P. et al., “Testing with the YOHO CD-Rom Voice
(60) Provisional application No. 60/586.889, filed on Jul.9,
Verification Corpus'. Proc. Int. Conf Acoustics, Speech, and Signal
2004.
Processing (ICASSP), (1995),341-344. Davis, Steven B., et al., “Comparison of Parametric Representations
(51) Int. Cl. H04L 12/16 (2006.01) (52) U.S. Cl. ....................... 370/260; 370/230; 382/159; 704/232; 704/246; 704/256.1: 709/204
tences', IEEE Transactions on Acoustics, Speech, and Signal Pro
for Monosyllabic Word Recognition in Continuously Spoken Sen
(58) Field of Classification Search ................. 370/352, 370/260,270, 230; 379/100.06; 455/556.2:
(Continued) Primary Examiner Gerald Gauthier
cessing, vol. ASSP-28, No. 4, (Aug. 1980),357-366.
704/246. 232,256.1: 709/204; 382/159
(74) Attorney, Agent, or Firm Justin R. Jackson; Philip D.
See application file for complete search history. (56)
Askenazy: Peacock Myers, P.C.
References Cited
(57)
ABSTRACT
U.S. PATENT DOCUMENTS
4,363,102 A
A system, method, and apparatus for identifying a speaker of an utterance, particularly when the utterance has portions of it missing due to packet losses. Different packet loss models are applied to each speaker's training data in order to improve accuracy, especially for Small packet sizes.
12/1982 Holmgren et al.
6,041,299 A *
3/2000 Schuster et al. ............. TO4,232
6, 195,639 B1 6,389,392 B1
2/2001 FeltStrom et al. 5, 2002 Pawlewski et al.
6,584,494 B1* 6,751,590 B1* 6,772,119 B2 *
6/2003 Manabe et al. .............. TO9.204 6/2004 Chaudhari et al. .......... 704/246 8/2004 Chaudhari et al. .......... 704/246
20 Claims, 6 Drawing Sheets
7,457.242 B2 * 1 1/2008 Beightol et al. ............. 370,230 O
u(n)
14
12
X(m,k) Magnitude- X(m,k)
Silence
Squared
Removal this
is
sail in
Pre-emphasis
Mel-scale Filterbank
Log-energy
Xk logFX(m,k)) 20
Y(m) 22
2
US 7,720,012 B1 Page 2 OTHER PUBLICATIONS
Goldsmith, J., et al., "Capacity, Mutual Information, and Coding for Finite-State Markov Channels', IEEE Trans. Signal Processing, vol. 42, (May 1996),868-886. Quatieri, Thomas F. “Discrete-Time Speech Signal Processing Prin ciples and Practice'. Prentice-Hall, Inc., New Jersey, 2002, textbook
discription on 3 pages provided,(2002).
Reynolds, Douglas A. et al., “Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models', IEEE Transactions on Speech and Audio Processing, vol. 3, No. 1. (Jan. 1995).72-83.
* cited by examiner
U.S. Patent
May 18, 2010
Sheet 1 of 6
O
US 7,720,012 B1
12
14
Magnitude- X(m,k) Squared missssss
Pre-emphasis
F Mel-scale 'l
X(m k)
Filterbank
Log-energy
X log(f, |X(m,k))
> Y (m) 22
20
FIG. 1
2
U.S. Patent
May 18, 2010
Sheet 2 of 6
US 7,720,012 B1
O
O.
O
500
1000
1500
2000
f(Hz)
FG. 2
2500
3000
3500
4000
U.S. Patent
May 18, 2010
Sheet 3 of 6
US 7,720,012 B1
1OO
90
80
70 c)
O
() d
so 50
40
O 20% Packet Loss X 40% packet LOSS 30
-
O
50
100
150
Packet Size (samples/packet)
FIG. 3
200
- -
250
U.S. Patent
May 18, 2010
Sheet 5 of 6
US 7,720,012 B1
95
90
s
s
8 S
85
8O
-e- 8 samples/packet -- 64 samples/packet 75 2
4.
6
8
10
ldentification data length (sec)
FIG.S
2
14
16
U.S. Patent
100
May 18, 2010
T-
Sheet 6 of 6
US 7,720,012 B1
-
98 96 94 92 90
-e- 50% packet loss
-%-20% packet loss
82-
2
-v- No loss
4
6
8
O
identification data length (sec)
F.G. 6
2
4.
16
US 7,720,012 B1 9 10 known speakers, the training data comprising packet 17. The computer readable material of claim 16 wherein losses, and the test data comprising utterances from an the test data comprises packet losses. unknown speaker, cause the computer to extract one or 18. The computer readable material of claim 17 wherein more features from training data comprising utterances the computer estimates a packet loss rate of the test data and from known speakers, the training data comprising 5 applies the test data packet loss rate to the training data. packet losses; 19. The computer readable material of claim 16 wherein obtain at least one parameter set corresponding to the fea- the computer generates a plurality of parameter sets corre tures of the training data of each known speaker, extract sponding to each known speaker, each Such parameter set one or more features from test data comprising utter- comprising a different packet loss rate. ances from an unknown speaker; determine a probabil- 10 20. The computer readable material of claim 16 wherein ity for each parameter set that the features from the test the parameter sets are stored on a computer-readable storage data arise from that parameter set; and identify the medium. unknown speaker by determining which known speak er's parameter set maximizes the probability.
k
.
.
.
.