Skip to content


Text-Independent Speaker Verification

A text-independent speaker verification system based upon classification of Mel-Frequency Cepstral Coefficients (MFCC) using a minimum-distance classifier and a Gaussian Mixture Model (GMM) Log-Likelihood Ratio (LLR) classifier.  The speaker recognition system was implemented in MATLAB using training data and test data stored in WAV files. I developed custom matching and testing routines based upon minimum distance classification, extracted feature vectors using the melcepst function from Voicebox toolkit, and used an open source GMM library. For testing, I used 8 speakers (4 male, 4 female) from the popular TIMIT speaker database each saying two phonetically-diverse sentences. One sentence was used for training and the other for testing. Manually training the threshold for the minimum-distance classifier resulted in a 91% classification accuracy. Cody developed this project independently as part of a Digital Signal Processing course. A presentation and report follow. The code is available on GitHub.

The presentation provides an overview of the theory behind MFCCs, minimum-distance classification, and least-likelihood ratio classification using Gaussian Mixture Models, as well as discussing the experimental results of my implementation.


The report elaborates upon that shown in the presentation above. In particular, the report describes both the theory and practical details for a reference implementation of a text-independent speaker verification system.  Furthermore, it discusses an advanced technique for classification based upon Gaussian Mixture Models (GMM). Finally, it discusses the results of a set of experiments performed using my reference implementation.


Posted in .


21 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

  1. Saad says

    Hi i really liked your code for speaker recognition technique, can you plz tell me how many test sound you’ve used,

    Thanks

  2. Saad says

    Ok you’ve used 8 speaker , can you please send me the test voices at my email

    Thanks in advance

    • codyaray says

      Thanks Saad. Unfortunately, I completed this project a while ago and don’t have the data files anymore. If you have access to the TIMIT data set, though, you can find an ample supply of speaker recordings. Otherwise, its easy enough to hook a mic to your computer (if its not built-in, such as a mac) and record your own data set. Just enlist the help of a few friends or co-workers, etc.

      PS – If you’re a student, feel free to use this code for reference, but I’d encourage you to develop your own solution still. :)

  3. Saad says

    Thanks codyaray thank you for your kind advice im working on it, but first i want to understand different codes available.
    Your code is giving me the following error
    Attempted to access PC(2); index out of bounds because numel(PC)=1.

    Error in Probability_of_Cluster_given_X (line 2)
    PY = PC(Label_of_Cluster)*Mixing_Coefficient(X,Means(:,Label_of_Cluster),
    Variances(:,Label_of_Cluster));

    Error in GMM (line 41)
    Probability_of_Cluster_given_Point(i,j) =
    Probability_of_Cluster_given_X(Input(:,j),Mu,Variances,PC,i);

    Error in main (line 63)
    [training_idx1, training_mu1, training_sigma1] = GMM(training_features1′, No_of_Clusters,
    No_of_Iterations);

    any help plz

    • codyaray says

      Try the timit.m file. It was the one I actually used. However, you have to have all the files present for it to work properly. If you only have a subset, comment out some of the code for interacting with the other files.

  4. codyaray says

    My guess is that IndexOutOfBounds error is caused because No_of_Clusters or No_of_Iterations is higher than need be. But I haven’t used this code in a couple years, so I don’t really recall the details off the top of my head.

    • Saad says

      I have reduced No_of_Clusters and No_of_Iterations, but now it is giving me different error:

      Undefined function or variable ‘background_mu’.

      Error in main (line 91)
      log(Cluster_Probability(testing_features1′, training_mu1)) – log(Cluster_Probability(testing_features1′, background_mu))

      • codyaray says

        As stated above, use timit.m instead of main.m.

        • Saad says

          no both timit.m and main.m are giving me same errors, plus can you explain to me how can i make it real time.

          Thanks

          • codyaray says

            Sorry, I can’t help you any further. Now I recall (and the presentation shows) that the GMM implementation wasn’t quite complete. I don’t have access to MATLAB any longer to help complete it. This code should give you a guideline though, not solve your homework problems for you. You need to have an understanding of whats going on to really make use of it.

  5. Saad says

    can you please tell me one more thing what is ‘background_mu’ have you assigned some variable to it or is it just a function.

  6. ritika says

    @saad cud u finish the thing?

  7. ajay mishra says

    Hi I really want your code for speaker recognition technique, can you plz give it to me as soon as possible. and can you tell me how many test sound you’ve used.

    regards
    ajay mishra

    • codyaray says

      Both a link to the code and the number of test sounds are included in the article. Please read. Also note that the GMM implementation was incomplete, if that’s what you’re seeking. Good luck.

  8. rania says

    hi
    please send your code about this

  9. tony says

    hi
    can we use svm for speaker recoginition.

  10. Priya says

    hiii…

    Now a days phone log likelihood ratios (PLLR) are used for speaker recognition. so plz can you give some idea about how to implement it on MATLAB.

    • codyaray says

      Sorry – I haven’t touched this in a long time. If you figure it out, feel free to drop a note here with resources for others.

  11. Joseph says

    hello codyaray

    is this a speaker independent or dependent?
    because I’m searching for independent

    thanks a lot

  12. codyaray says

    Joseph – As this is speaker *verification*, it only makes sense in a speaker dependent context. That is, the objective is to verify whether a test sample corresponds to a known speaker.

    -Cody



Some HTML is OK

or, reply to this post via trackback.

 



Log in here!