A class for acoustic feature management. FeaturesServer should be used to extract acoustic features (MFCC or LFCC) from audio files in SPHERE, WAV or RAW PCM format. It can also be used to read and write acoustic features from and to disk in SPRO4 or HTK format.
Attr input_dir: | directory where to load audio or feature files |
---|---|
Attr input_file_extension: | |
extension of the incoming files | |
Attrlabel_dir: | directory where to read and write label files |
Attr label_files_extension: | |
extension of label files to read and write | |
Attr from_file: | format of the input files to read, can be audio, spro4 or htk, for audio files, format is given by the extension |
Attr config: | pre-defined configuration for speaker diarization or recognition in 8 or 16kHz. Default is speaker recognition 8kHz |
Attr single_channel_extension: | |
list with a single extension to add to the audio filename when processing a single channel file. Default is empty, means the feature file has the same name as the audio file | |
Attr double_channel_extension: | |
list of two channel extension to add to the audio filename when processing two channel files. Default is [‘_a’, ‘_b’] | |
Attr sampling_frequency: | |
sample frequency in Hz, default is None, determine when reading the audio file | |
Attr lower_frequency: | |
lower frequency limit of the filter bank | |
Attr higher_frequency: | |
higher frequency limit of the filter bank | |
Attr linear_filters: | |
number of linear filters to use for LFCC extraction | |
Attr log_filters: | |
number of linear filters to use for MFCC extraction | |
Attr window_size: | |
size of the sliding window in seconds | |
Attr shift: | time shift between two feature vectors |
Attr ceps_number: | |
number of cepstral coefficients to extract | |
Attr snr: | snr level to consider for SNR-based voice activity detection |
Attr vad: | type of voice activity detection to use, can be ‘snr’, ‘energy’ (using a three Gaussian detector) or ‘label’ when reading the info from pre-computed label files |
Attr feat_norm: | normalization of the acoustic features, can be ‘cms’ for cepstral mean subtraction, ‘mvn’ for mean variance normalization or ‘stg’ for short term Gaussianization |
Attr log_e: | boolean, keep log energy |
Attr delta: | boolean, add the first derivative of the cepstral coefficients |
Attr double_delta: | |
boolean, add the second derivative of the cepstral coefficients | |
Attr rasta: | boolean, perform RASTA filtering |
Attr keep_all_features: | |
boolean, if False, only features labeled as “speech” by the vad are saved if True, all features are saved and a label file is produced |
Load a cep from audio or mfcc file. This method loads all channels available in the file.
Parameters: | show – the name of the show to load |
---|---|
Returns: | the cep array and the label array |
Load a list of feature files and stack them in a unique ndarray. The list of files to load is splited in sublists processed in parallel
Parameters: |
|
---|
Save the cep array in file
Parameters: |
|
---|---|
Raise: | Exception if feature format is unknown |
Function that takes a list of audio files and extract features
Parameters: | audio_file_list – an array of string containing the name of the feature files to load |
---|
Extract features from audio file using parallel computation
Parameters: |
|
---|