Features
Copyright 2014-2015 Anthony Larcher and Sylvain Meignier
frontend provides methods to process an audio signal in order to extract
useful parameters for speaker verification.
-
frontend.features.compute_delta(features, win=3, method='filter', filt=array([ 0.25, 0.5, 0.25, 0., -0.25, -0.5, -0.25]))[source]
features is a 2D-ndarray each row of features is a a frame
Parameters: |
- features – the feature frames to compute the delta coefficients
- win – parameter that set the length of the computation window.
The eize of the window is (win x 2) + 1
- methods – method used to compute the delta coefficients
can be diff or filter
- filt – definition of the filter to use in “filter” mode, default one
is similar to SPRO4: filt=np.array([.2, .1, 0, -.1, -.2])
|
Returns: | the delta coefficients computed on the original features.
|
-
frontend.features.hz2mel(f)[source]
Convert an array of frequency in Hz into mel.
Parameters: | f – frequency to convert |
Returns: | the equivalene on the mel scale. |
-
frontend.features.mel2hz(m)[source]
Convert an array of mel values in Hz.
Parameters: | m – ndarray of frequencies to convert in Hz. |
Returns: | the equivalent values in Hertz. |
-
frontend.features.mfcc(input, lowfreq=100, maxfreq=8000, nlinfilt=0, nlogfilt=24, nwin=256, nfft=512, fs=16000, nceps=13, midfreq=1000, shift=0.01, get_spec=False, get_mspec=False)[source]
Compute Mel Frequency Cepstral Coefficients.
Parameters: |
- input – input signal from which the coefficients are computed.
Input audio is supposed to be RAW PCM 16bits
- lowfreq – lower limit of the frequency band filtered.
Default is 100Hz.
- maxfreq – higher limit of the frequency band filtered.
Default is 8000Hz.
- nlinfilt – number of linear filters to use in low frequencies.
Default is 0.
- nlogfilt – number of log-linear filters to use in high frequencies.
Default is 24.
- nwin – length of the sliding window.
Default is 256.
- nfft – number of points for the Fourier Transform. Default is 512.
- fs – sampling frequency of the original signal. Default is 16000Hz.
- nceps – number of cepstral coefficients to extract.
Default is 13.
- midfreq – frequency boundary between linear and log-linear filters.
Default is 1000Hz.
- shift – shift between two analyses. Default is 0.01 (10ms).
|
Returns: | the cepstral coefficients in a ndaray as well as
the Log-spectrum in the mel-domain in a ndarray.
|
Note
MFCC are computed as follows:
- Pre-processing in time-domain (pre-emphasizing)
- Compute the spectrum amplitude by windowing with a Hamming window
- Filter the signal in the spectral domain with a triangular filter-bank, whose filters are approximatively linearly spaced on the mel scale, and have equal bandwith in the mel scale
- Compute the DCT of the log-spectrom
- Log-energy is returned as first coefficient of the feature vector.
For more details, refer to [Davis80].
-
frontend.features.trfbank(fs, nfft, lowfreq, maxfreq, nlinfilt, nlogfilt, midfreq=1000)[source]
Compute triangular filterbank for cepstral coefficient computation.
Parameters: |
- fs – sampling frequency of the original signal.
- nfft – number of points for the Fourier Transform
- lowfreq – lower limit of the frequency band filtered
- maxfreq – higher limit of the frequency band filtered
- nlinfilt – number of linear filters to use in low frequencies
- nlogfilt – number of log-linear filters to use in high frequencies
- midfreq – frequency boundary between linear and log-linear filters
|
Returns: | the filter bank and the central frequencies of each filter
|