io

Copyright 2014-2015 Anthony Larcher

frontend provides methods to process an audio signal in order to extract useful parameters for speaker verification.

frontend.io.pcmu2lin(p, s=4004.189931)[source]

Convert Mu-law PCM to linear X=(P,S) lin = pcmu2lin(pcmu) where pcmu contains a vector of mu-law values in the range 0 to 255. No checking is performed to see that numbers are in this range.

Output values are divided by the scale factor s:

s Output Range 1 +-8031 (integer values) 4004.2 +-2.005649 (default) 8031 +-1 8159 +-0.9843118 (+-1 nominal full scale)

The default scaling factor 4004.189931 is equal to sqrt((2207^2 + 5215^2)/2) this follows ITU standard G.711. The sine wave with PCM-Mu values [158 139 139 158 30 11 11 30] has a mean square value of unity corresponding to 0 dBm0.

frontend.io.read_audio(inputFileName)[source]

Read a 1 or 2-channel audio file in SPHERE, WAVE or RAW PCM format. The format is determined from the file extension.

Parameters:inputFileName – name of the file to read from
Returns:the signal as a numpy array and the sampling frequency
frontend.io.read_htk(inputFileName, labelFileName='', selectedLabel='', framePerSecond=100)[source]

Read a sequence of features in HTK format

Parameters:inputFileName – name of the file to read from
Returns:a tupple (d, fp, dt, tc, t) described below

Note

  • d = data: column vector for waveforms, 1 row per frame for other types

  • fp = frame period in seconds

  • dt = data type (also includes Voicebox code for generating data)

    1. WAVEFORM Acoustic waveform
    2. LPC Linear prediction coefficients
    3. LPREFC LPC Reflection coefficients: -lpcar2rf([1 LPC]);LPREFC(1)=[];
    4. LPCEPSTRA LPC Cepstral coefficients
    5. LPDELCEP LPC cepstral+delta coefficients (obsolete)
    6. IREFC LPC Reflection coefficients (16 bit fixed point)
    7. MFCC Mel frequency cepstral coefficients
    8. FBANK Log Fliter bank energies
    9. MELSPEC linear Mel-scaled spectrum
    10. USER User defined features
    11. DISCRETE Vector quantised codebook
    12. PLP Perceptual Linear prediction
    13. ANON
  • tc = full type code = dt plus (optionally)

    one or more of the following modifiers

    • 64 _E Includes energy terms
    • 128 _N Suppress absolute energy
    • 256 _D Include delta coefs
    • 512 _A Include acceleration coefs
    • 1024 _C Compressed
    • 2048 _Z Zero mean static coefs
    • 4096 _K CRC checksum (not implemented yet)
    • 8192 _0 Include 0’th cepstral coef
    • 16384 _V Attach VQ index
    • 32768 _T Attach delta-delta-delta index
  • t = text version of type code e.g. LPC_C_K

This function is a translation of the Matlab code from VOICEBOX is a MATLAB toolbox for speech processing. by Mike Brookes Home page: VOICEBOX <http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html>

frontend.io.read_label(inputFileName, selectedLabel='speech', framePerSecond=100)[source]

Read label file in ALIZE format

Parameters:
  • inputFieName – the label file name
  • selectedLabel – the label to return. Default is ‘speech’.
  • framePerSecond – number of frame per seconds. Used to convert the frame number into time. Default is 100.
Returns:

a logical array

frontend.io.read_pcm(inputFileName)[source]

Read signal from single channel PCM 16 bits

Parameters:inputFileName – name of the PCM file to read.
Returns:the audio signal read from the file in a ndarray.
frontend.io.read_sph(inputFileName, mode='p')[source]

Read a SPHERE audio file

Parameters:
  • inputFileName – name of the file to read
  • mode – specifies the following (* =default)

Note

  • Scaling:

    • ‘s’ Auto scale to make data peak = +-1 (use with caution if reading in chunks)
    • ‘r’ Raw unscaled data (integer values)
    • ‘p’ Scaled to make +-1 equal full scale
    • ‘o’ Scale to bin centre rather than bin edge (e.g. 127 rather than 127.5 for 8 bit values, can be combined with n+p,r,s modes)
    • ‘n’ Scale to negative peak rather than positive peak (e.g. 128.5 rather than 127.5 for 8 bit values, can be combined with o+p,r,s modes)
  • Format

    • ‘l’ Little endian data (Intel,DEC) (overrides indication in file)
    • ‘b’ Big endian data (non Intel/DEC) (overrides indication in file)
  • File I/O

    • ‘f’ Do not close file on exit
    • ‘d’ Look in data directory: voicebox(‘dir_data’)
    • ‘w’ Also read the annotation file *.wrd if present (as in TIMIT)
    • ‘t’ Also read the phonetic transcription file *.phn if present (as in TIMIT)
  • NMAX maximum number of samples to read (or -1 for unlimited [default])
  • NSKIP number of samples to skip from start of file (or -1 to continue from previous read when FFX is given instead of FILENAME [default])
Returns:a tupple such that (Y, FS)

Note

  • Y data matrix of dimension (samples,channels)

  • FS sample frequency in Hz

  • WRD{*,2} cell array with word annotations: WRD{*,:)={[t_start t_end],’text’} where times are in seconds only present if ‘w’ option is given

  • PHN{*,2} cell array with phoneme annotations: PHN{*,:)={[t_start t_end],’phoneme’} where times are in seconds only present if ‘t’ option is present

  • FFX Cell array containing

    1. filename
    2. header information
    1. first header field name
    2. first header field value
    3. format string (e.g. NIST_1A)
      1. file id
      2. current position in file
      3. dataoff byte offset in file to start of data
      4. order byte order (l or b)
      5. nsamp number of samples
      6. number of channels
      7. nbytes bytes per data value
      8. bits number of bits of precision
      9. fs sample frequency
      10. min value
      11. max value
      12. coding 0=PCM,1=uLAW + 0=no compression, 0=shorten,20=wavpack,30=shortpack
      13. file not yet decompressed
    4. temporary filename

If no output parameters are specified, header information will be printed. The code to decode shorten-encoded files, is not yet released with this toolkit.

frontend.io.read_spro4(inputFileName, labelFileName='', selectedLabel='', framePerSecond=100)[source]

Read a feature stream in SPRO4 format

Parameters:
  • inputFileName – name of the feature file to read from
  • labelFileName – name of the label file to read if required. By Default, the method assumes no label to read from.
  • selectedLabel – label to select in the label file. Default is none.
  • framePerSecond – number of frame per seconds. Used to convert the frame number into time. Default is 0.
Returns:

a sequence of features in a ndarray

frontend.io.read_wav(inputFileName)[source]

Read signal from a wave file

Parameters:inputFileName – name of the PCM file to read.
Returns:the audio signal read from the file in a ndarray.
frontend.io.save_label(outputFileName, label, selectedLabel='speech', framePerSecond=100)[source]

Save labels in ALIZE format

Parameters:
  • outputFileName – name of the file to write to
  • lael – label to write in the file given as a ndarray of boolean
  • selectedLabel – label to write to the file. Default is ‘speech’.
  • framePerSecond – number of frame per seconds. Used to convert the frame number into time. Default is 100.
frontend.io.write_htk(features, outputFileName, fp, tc)[source]

Write feature file in HTK format

Parameters:
  • features – sequence of features to write
  • outputFileName – name of the file to write to
  • fp – frame period in seconds
  • tc

    type code = the sum of a data type and (optionally) one or more of the listed modifiers

    • 0 WAVEFORM Acoustic waveform
    • 1 LPC Linear prediction coefficients
    • 2 LPREFC LPC Reflection coefficients: -lpcar2rf([1 LPC]);LPREFC(1)=[];
    • 3 LPCEPSTRA LPC Cepstral coefficients
    • 4 LPDELCEP LPC cepstral+delta coefficients (obsolete)
    • 5 IREFC LPC Reflection coefficients (16 bit fixed point)
    • 6 MFCC Mel frequency cepstral coefficients
    • 7 FBANK Log Fliter bank energies
    • 8 MELSPEC linear Mel-scaled spectrum
    • 9 USER User defined features
    • 10 DISCRETE Vector quantised codebook
    • 11 PLP Perceptual Linear prediction
    • 12 ANON
    • 64 _E Includes energy terms hd(1)
    • 128 _N Suppress absolute energy hd(2)
    • 256 _D Include delta coefs hd(3)
    • 512 _A Include acceleration coefs hd(4)
    • 1024 _C Compressed hd(5)
    • 2048 _Z Zero mean static coefs hd(6)
    • 4096 _K CRC checksum (not implemented yet) hd(7) (ignored)
    • 8192 _0 Include 0’th cepstral coef hd(8)
    • 16384 _V Attach VQ index hd(9)
    • 32768 _T Attach delta-delta-delta index hd(10)
frontend.io.write_pcm(data, outputFileName)[source]

Write signal to single channel PCM 16 bits

Parameters:
  • data – audio signal to write in a RAW PCM file.
  • outputFileName – name of the file to write
frontend.io.write_spro4(features, outputFileName)[source]

Write a feature stream in SPRO4 format.

Parameters:
  • features – sequence of features to write
  • outputFileName – name of the file to write to

Previous topic

Features

Next topic

Normfeat

This Page