Transformation functions¶
- phoneshift.transform(wav:ndarray, fs:float, pbf:float=1.0, pbfs:ndarray, esf:float=1.0, esp:boolean=True, psf:float=1.0, psf_max:float=2.0, clipper_knee:float=None, winlen_inner:float=0.020*fs, timestep:float=0.005*fs, f0_min:float=27.5, f0_max:float=3520, info:boolean=False) ndarray[float32] ¶
This is the generic function to transform a voice signal while applying multiple audio effects. See also below for more functions dedicated to specific tasks.
Note
It assumes the signal is monophonic, like a voice, a flute, a violin, saxophone, etc.
It is not recommended to use it on polyphonic signals like a piano, a guitar, a drum set, etc.
- Parameters:
wav – Input signal. Currently, spacialisation in a multichannel signal is not preserved. Multichannel signals are averaged through the channel dimension, processed and then duplicated to the same number of channels.
fs – Sampling rate [Hz].
pbf –
Playback factor to do time scaling [coefficient, def. 1.0].
Note
The method is designed so that there is no global time drift possible. However, because internal frames need to be processed for ensuring signal continuity, audio events might be slightly shifted locally.
For example, assuming a speed up of 2, an audio event at 60s, might end up at 30.005s, instead of 30s. Nevertheless, there is no time drift. So an audio event at 120s might end up at 60.005s, not 60.010s.
pbfs – Time varying playback factor [2D ndarray, def. None]. A 2D numpy array of shape (N, 2) where N is the number of given pairs
[time, pbf]
. The first column is the time in seconds, relative to the original signal (not the transformed one). The second column is thepbf
playback factor (as above).esf – Envelope scaling factor [coefficient, def. 1.0].
esp – Preserve spectral envelope [boolean, def. True]. Also known as “formants preservation”.
psf – Pitch scaling factor [coefficient, def. 1.0].
psf_max – Maximum value for pitch scaling factor [coefficient, def. 2.0].
clipper_knee – Clipper knee amplitude [linear amplitude, def. None, common 0.66, source]. This is to prevent the signal to clip at 1.0 when saving it in a file and create audio glitches. The knee amplitude is the point where the clipper starts to act. This will prevent the signal to go above ±1.0 in amplitude. The lower the value, the less glitches but the more the signal will be distorted. Set it to
None
to disable it.
Note
The following arguments are used to optimize the processing’s audio quality and speed. They are not recommended to be changed unless you know what you are doing.
You can use
transform_timescaling
andtransform_pitchscaling
that will automatically do it for for you depending on the task.- Parameters:
winlen_inner – Inner window length [#samples, def. 0.020*fs]. This is the window length used for the inner processing. The bigger the value, the more stable the sound but the processing will be slower.
timestep – Inner window length [#samples, def. 0.005*fs]. This is the time step from one frame to the next. The smaller the value, the more stable the sound but the processing will be slower.
f0_min – Minimum value for the fundamental frequency [Hz, def. 440/16=27.5]. This is to prevent the pitch to go too low and create audio glitches.
f0_max – Maximum value for the fundamental frequency [Hz, def. 440*8=3520]. This is to prevent the pitch to go too high and create audio glitches.
- Info:
Returns an extra dict with various information related to how the processing went [def. False]
- Returns:
- ndarray[float32] - The modified signal.
Shape will be the same as the input signal. The type will always be float32 since the whole processing runs on float32 precision.
info[dict] - Processing information [optional: only if argument
info=True
]
- Examples:
import phoneshift import soundfile wav, fs = soundfile.read('path/to/audio.wav') syn = phoneshift.transform(wav, fs, psf=2.0) soundfile.write('syn.wav', syn, fs)
- Processing flow:
The function
transform
is based on an Overlap-Add process whose base implementation is freely available here.The different processing operations are done in the following order:
- phoneshift.transform_timescaling(wav: ndarray, fs: float, **kwargs)¶
Same arguments and return values as
transform()
.This function alter a few technical arguments of
transform()
in order to optimize speed for time scaling, without compromising audio quality.
- phoneshift.transform_pitchscaling(wav: ndarray, fs: float, **kwargs)¶
Same arguments and return values as
transform()
.This function alter a few technical arguments of
transform()
in order to optimize speed for pitch scaling only, without compromising audio quality.