Copyright 2014-2015 Anthony Larcher
frontend provides methods to process an audio signal in order to extract useful parameters for speaker verification.
Convert Mu-law PCM to linear X=(P,S) lin = pcmu2lin(pcmu) where pcmu contains a vector of mu-law values in the range 0 to 255. No checking is performed to see that numbers are in this range.
Output values are divided by the scale factor s:
s Output Range 1 +-8031 (integer values) 4004.2 +-2.005649 (default) 8031 +-1 8159 +-0.9843118 (+-1 nominal full scale)
The default scaling factor 4004.189931 is equal to sqrt((2207^2 + 5215^2)/2) this follows ITU standard G.711. The sine wave with PCM-Mu values [158 139 139 158 30 11 11 30] has a mean square value of unity corresponding to 0 dBm0.
Read a 1 or 2-channel audio file in SPHERE, WAVE or RAW PCM format. The format is determined from the file extension.
Parameters: | inputFileName – name of the file to read from |
---|---|
Returns: | the signal as a numpy array and the sampling frequency |
Read a sequence of features in HTK format
Parameters: | inputFileName – name of the file to read from |
---|---|
Returns: | a tupple (d, fp, dt, tc, t) described below |
Note
d = data: column vector for waveforms, 1 row per frame for other types
fp = frame period in seconds
dt = data type (also includes Voicebox code for generating data)
- WAVEFORM Acoustic waveform
- LPC Linear prediction coefficients
- LPREFC LPC Reflection coefficients: -lpcar2rf([1 LPC]);LPREFC(1)=[];
- LPCEPSTRA LPC Cepstral coefficients
- LPDELCEP LPC cepstral+delta coefficients (obsolete)
- IREFC LPC Reflection coefficients (16 bit fixed point)
- MFCC Mel frequency cepstral coefficients
- FBANK Log Fliter bank energies
- MELSPEC linear Mel-scaled spectrum
- USER User defined features
- DISCRETE Vector quantised codebook
- PLP Perceptual Linear prediction
- ANON
one or more of the following modifiers
t = text version of type code e.g. LPC_C_K
This function is a translation of the Matlab code from VOICEBOX is a MATLAB toolbox for speech processing. by Mike Brookes Home page: VOICEBOX <http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html>
Read label file in ALIZE format
Parameters: |
|
---|---|
Returns: | a logical array |
Read signal from single channel PCM 16 bits
Parameters: | inputFileName – name of the PCM file to read. |
---|---|
Returns: | the audio signal read from the file in a ndarray. |
Read a SPHERE audio file
Parameters: |
|
---|
Note
Scaling:
- ‘s’ Auto scale to make data peak = +-1 (use with caution if reading in chunks)
- ‘r’ Raw unscaled data (integer values)
- ‘p’ Scaled to make +-1 equal full scale
- ‘o’ Scale to bin centre rather than bin edge (e.g. 127 rather than 127.5 for 8 bit values, can be combined with n+p,r,s modes)
- ‘n’ Scale to negative peak rather than positive peak (e.g. 128.5 rather than 127.5 for 8 bit values, can be combined with o+p,r,s modes)
Format
- ‘l’ Little endian data (Intel,DEC) (overrides indication in file)
- ‘b’ Big endian data (non Intel/DEC) (overrides indication in file)
File I/O
- ‘f’ Do not close file on exit
- ‘d’ Look in data directory: voicebox(‘dir_data’)
- ‘w’ Also read the annotation file *.wrd if present (as in TIMIT)
- ‘t’ Also read the phonetic transcription file *.phn if present (as in TIMIT)
- NMAX maximum number of samples to read (or -1 for unlimited [default])
- NSKIP number of samples to skip from start of file (or -1 to continue from previous read when FFX is given instead of FILENAME [default])
Returns: | a tupple such that (Y, FS) |
---|
Note
Y data matrix of dimension (samples,channels)
FS sample frequency in Hz
WRD{*,2} cell array with word annotations: WRD{*,:)={[t_start t_end],’text’} where times are in seconds only present if ‘w’ option is given
PHN{*,2} cell array with phoneme annotations: PHN{*,:)={[t_start t_end],’phoneme’} where times are in seconds only present if ‘t’ option is present
FFX Cell array containing
- filename
- header information
- first header field name
- first header field value
- format string (e.g. NIST_1A)
- file id
- current position in file
- dataoff byte offset in file to start of data
- order byte order (l or b)
- nsamp number of samples
- number of channels
- nbytes bytes per data value
- bits number of bits of precision
- fs sample frequency
- min value
- max value
- coding 0=PCM,1=uLAW + 0=no compression, 0=shorten,20=wavpack,30=shortpack
- file not yet decompressed
- temporary filename
If no output parameters are specified, header information will be printed. The code to decode shorten-encoded files, is not yet released with this toolkit.
Read a feature stream in SPRO4 format
Parameters: |
|
---|---|
Returns: | a sequence of features in a ndarray |
Read signal from a wave file
Parameters: | inputFileName – name of the PCM file to read. |
---|---|
Returns: | the audio signal read from the file in a ndarray. |
Save labels in ALIZE format
Parameters: |
|
---|
Write feature file in HTK format
Parameters: |
|
---|