--- title: `simple_interpolation` keywords: fastai sidebar: home_sidebar summary: "A Pandas implentation of the Brownian Bridge interpolation algorithm. Wiener processes are assumed to build `std()`." description: "A Pandas implentation of the Brownian Bridge interpolation algorithm. Wiener processes are assumed to build `std()`." nb_path: "index.ipynb" ---
{% raw %}
{% endraw %}

Interpolation rocks, but doing it poorly can alter the original features of your data. Brownian bridge preserves the volatibility of the original data, if done well. Mixing that with a bit theory on the stock market (Wiener processes), we built a simple interpolation library.

Read about the algorithm in the "Brownian bridge algo" section below.

Install

pip install simple_interpolation

How to use

{% raw %}
#  (i. e. X column, values 3-5)
df
X Y
0 0 8.089846
1 1 11.793489
2 2 9.026726
3 6 8.996177
4 7 11.221730
5 8 8.398122
6 9 8.845667
7 10 11.454700
8 11 11.431745
9 12 7.050733
10 13 10.009420
11 14 6.964674
12 15 9.541557
13 16 11.656722
14 19 11.062303
15 20 11.302763
16 21 13.042057
17 22 7.405670
18 23 8.986057
19 24 7.554964
20 25 10.467688
21 26 9.416683
22 27 10.038665
23 28 5.519665
24 45 10.184922
25 46 11.661662
26 47 9.748401
27 48 11.023116
28 49 9.298167
{% endraw %} {% raw %}
patched_df = interpolate_gaps( df ) #, plot = True )
patched_df
No datetime column: assuming first column 'X' as X-axis
std() built with Wiener method
Will interpolate if X-column interval is more than 1.7675
Processed 0.00% of gaps
Ended succesfully
Ended execution
X Y interpolated
0 0.0000 8.089846 0
1 1.0000 11.793489 0
2 2.0000 9.026726 0
3 3.0000 8.907296 1
4 4.0000 8.786089 1
5 5.0000 9.240995 1
6 6.0000 8.996177 0
7 7.0000 11.221730 0
8 8.0000 8.398122 0
9 9.0000 8.845667 0
10 10.0000 11.454700 0
11 11.0000 11.431745 0
12 12.0000 7.050733 0
13 13.0000 10.009420 0
14 14.0000 6.964674 0
15 15.0000 9.541557 0
16 16.0000 11.656722 0
17 17.5000 11.359512 1
18 19.0000 11.062303 0
19 20.0000 11.302763 0
20 21.0000 13.042057 0
21 22.0000 7.405670 0
22 23.0000 8.986057 0
23 24.0000 7.554964 0
24 25.0000 10.467688 0
25 26.0000 9.416683 0
26 27.0000 10.038665 0
27 28.0000 5.519665 0
28 29.0625 6.991474 1
29 30.1250 8.235653 1
30 31.1875 8.458524 1
31 32.2500 8.610869 1
32 33.3125 9.604926 1
33 34.3750 10.543888 1
34 35.4375 11.191181 1
35 36.5000 10.989237 1
36 37.5625 11.268674 1
37 38.6250 11.928758 1
38 39.6875 11.400602 1
39 40.7500 11.028734 1
40 41.8125 10.771130 1
41 42.8750 10.656800 1
42 43.9375 10.545092 1
43 45.0000 10.184922 0
44 46.0000 11.661662 0
45 47.0000 9.748401 0
46 48.0000 11.023116 0
47 49.0000 9.298167 0
{% endraw %}

Brownian bridge algo: the theory

Allows to interpolate large gaps preserving volatility of the series (as an input!). Read about it here "Brownian bridge".

Weiner method to obtain the relevant std()

In a Wiener process volatility (variance) is $$var = \Delta_t$$ so $$std = \sqrt{var} = \sqrt{\Delta_t}$$This sets how the local volatility should be analyzed.

So, if we have $std_{year}$ (or $std_{whole series}$), we can get the daily by: $$std_{year} = std_{day} \cdot \sqrt{365} \Rightarrow std_{day} = \frac{std_{year}}{\sqrt{365}}$$

So we can get the "basic building block" of the volatility by getting $std_{minute}$ in our case.

Having $std_{minute}$, we then do a "bottom-up" process building the gap:

{% raw %} $$ std_{gap} = std_{minute} \cdot \sqrt{number\_of\_mins\_in\_gap}$$ {% endraw %}

(Advice from Miguel, my colleague at ING)

Fixed timesteps

> You can use fixed_freq argument to make the interpolated X points rounded to a certain timestep. 'fixed_freq' timesteps defaults to 'min'. Valid options from Pandas, see link:https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases

Implementation of the rounding (you probably don't need to read this)

This constraint takes us out of the brownian bridge, because for it we only interpolate the midpoints through:

\begin{cases} x_m = \frac{x_0 + x_1}{2} \\ y_m = \frac{y_0 + y_1}{2} + std \end{cases}

But, if we round up to mins, this midpoint $x_m$ could be different than a minute-exact timestamp (imagine the first interpolated point on a gap of 3m: it would be 1.5m). So we round $x_m$, and search for its associated Y displacement $\Delta y$:

\begin{cases} x'_m = x_m + \Delta x_{toroundtomin} \\ y'_m = y_m + \Delta y \end{cases}

To get the associated $\Delta y$ we must use the slope (derivative) at that straight line between points $(x_0, y_0), (x_1, y_1)$.

So:

1- Round up $x_m$ to the nearest minute (lowest, floor()-like), so we obtain: $x'_m$, $\Delta x_{toroundtomin}$

2- The deltas on X and Y are related by the derivative, which we are implicitly assuming linear on the brownian bridge, so it's quite straightforward to calculate $\Delta y$:

{% raw %} $$ \Delta y := \frac{dy}{dx} \Delta x \Rightarrow \Delta y \approx \frac{y_1 - y_0}{x_1 - x_0} \Delta x_{toroundtomin} $$ {% endraw %}

So we would have everything for the Y correction.