Features

Copyright 2014-2015 Anthony Larcher and Sylvain Meignier

frontend provides methods to process an audio signal in order to extract useful parameters for speaker verification.

frontend.features.compute_delta(features, win=3, method='filter', filt=array([ 0.25, 0.5, 0.25, 0., -0.25, -0.5, -0.25]))[source]

features is a 2D-ndarray each row of features is a a frame

Parameters:
  • features – the feature frames to compute the delta coefficients
  • win – parameter that set the length of the computation window. The eize of the window is (win x 2) + 1
  • methods – method used to compute the delta coefficients can be diff or filter
  • filt – definition of the filter to use in “filter” mode, default one is similar to SPRO4: filt=np.array([.2, .1, 0, -.1, -.2])
Returns:

the delta coefficients computed on the original features.

frontend.features.hz2mel(f)[source]

Convert an array of frequency in Hz into mel.

Parameters:f – frequency to convert
Returns:the equivalene on the mel scale.
frontend.features.mel2hz(m)[source]

Convert an array of mel values in Hz.

Parameters:m – ndarray of frequencies to convert in Hz.
Returns:the equivalent values in Hertz.
frontend.features.mfcc(input, lowfreq=100, maxfreq=8000, nlinfilt=0, nlogfilt=24, nwin=256, nfft=512, fs=16000, nceps=13, midfreq=1000, shift=0.01, get_spec=False, get_mspec=False)[source]

Compute Mel Frequency Cepstral Coefficients.

Parameters:
  • input – input signal from which the coefficients are computed. Input audio is supposed to be RAW PCM 16bits
  • lowfreq – lower limit of the frequency band filtered. Default is 100Hz.
  • maxfreq – higher limit of the frequency band filtered. Default is 8000Hz.
  • nlinfilt – number of linear filters to use in low frequencies. Default is 0.
  • nlogfilt – number of log-linear filters to use in high frequencies. Default is 24.
  • nwin – length of the sliding window. Default is 256.
  • nfft – number of points for the Fourier Transform. Default is 512.
  • fs – sampling frequency of the original signal. Default is 16000Hz.
  • nceps – number of cepstral coefficients to extract. Default is 13.
  • midfreq – frequency boundary between linear and log-linear filters. Default is 1000Hz.
  • shift – shift between two analyses. Default is 0.01 (10ms).
Returns:

the cepstral coefficients in a ndaray as well as the Log-spectrum in the mel-domain in a ndarray.

Note

MFCC are computed as follows:

  • Pre-processing in time-domain (pre-emphasizing)
  • Compute the spectrum amplitude by windowing with a Hamming window
  • Filter the signal in the spectral domain with a triangular filter-bank, whose filters are approximatively linearly spaced on the mel scale, and have equal bandwith in the mel scale
  • Compute the DCT of the log-spectrom
  • Log-energy is returned as first coefficient of the feature vector.

For more details, refer to [Davis80].

frontend.features.trfbank(fs, nfft, lowfreq, maxfreq, nlinfilt, nlogfilt, midfreq=1000)[source]

Compute triangular filterbank for cepstral coefficient computation.

Parameters:
  • fs – sampling frequency of the original signal.
  • nfft – number of points for the Fourier Transform
  • lowfreq – lower limit of the frequency band filtered
  • maxfreq – higher limit of the frequency band filtered
  • nlinfilt – number of linear filters to use in low frequencies
  • nlogfilt – number of log-linear filters to use in high frequencies
  • midfreq – frequency boundary between linear and log-linear filters
Returns:

the filter bank and the central frequencies of each filter

Previous topic

The frontend package

Next topic

io

This Page