A text-independent speaker verification system based upon classiﬁcation of Mel-Frequency Cepstral Coefficients (MFCC) using a minimum-distance classifier and a Gaussian Mixture Model (GMM) Log-Likelihood Ratio (LLR) classifier. The speaker recognition system was implemented in MATLAB using training data and test data stored in WAV files. I developed custom matching and testing routines based upon minimum distance classification, extracted feature vectors using the melcepst function from Voicebox toolkit, and used an open source GMM library. For testing, I used 8 speakers (4 male, 4 female) from the popular TIMIT speaker database each saying two phonetically-diverse sentences. One sentence was used for training and the other for testing. Manually training the threshold for the minimum-distance classifier resulted in a 91% classification accuracy. Cody developed this project independently as part of a Digital Signal Processing course. A presentation and report follow. The code is available on GitHub.
The presentation provides an overview of the theory behind MFCCs, minimum-distance classification, and least-likelihood ratio classification using Gaussian Mixture Models, as well as discussing the experimental results of my implementation.
The report elaborates upon that shown in the presentation above. In particular, the report describes both the theory and practical details for a reference implementation of a text-independent speaker verification system. Furthermore, it discusses an advanced technique for classification based upon Gaussian Mixture Models (GMM). Finally, it discusses the results of a set of experiments performed using my reference implementation.