--- title: simple_interpolation keywords: fastai sidebar: home_sidebar summary: "Interpolation rocks, but it can screw the original features of your data series if done poorly. One algorithm that at least preserves the original volatility is **Brownian Bridge**, which uses it as an input. This is an implementation of the Brownian Bridge to be used with **Pandas**." description: "Interpolation rocks, but it can screw the original features of your data series if done poorly. One algorithm that at least preserves the original volatility is **Brownian Bridge**, which uses it as an input. This is an implementation of the Brownian Bridge to be used with **Pandas**." nb_path: "00_core.ipynb" ---
{% raw %}
/home/navarro/.local/lib/python3.8/site-packages/pandas/compat/__init__.py:120: UserWarning: Could not import the lzma module. Your installed Python is incomplete. Attempting to use lzma compression will result in a RuntimeError.
  warnings.warn(msg)
{% endraw %} {% raw %}
{% endraw %} {% raw %}

interpolate_gaps[source]

interpolate_gaps(df_to_patch, method='wiener', fixed_freq='min', plot=False)

Searches for gaps and interpolates with a Brownian Bridge, which preserves volatility of the whole trend.

::returns:: patched, interpolated DataFrame with 3 columns: "X_original_name", "Y_original_name", "interpolated" (flag to mark interpolated rows)

{% endraw %} {% raw %}
{% endraw %} {% raw %}
df
X Y
0 0 11.844657
1 1 9.800014
2 2 7.690769
3 6 9.601032
4 7 9.723581
5 8 11.044739
6 9 10.988900
7 10 9.685440
8 11 9.287421
9 12 10.877917
10 13 10.431381
11 14 11.607592
12 15 11.600781
13 16 11.329190
14 19 8.718075
15 20 9.301708
16 21 9.355985
17 22 9.580390
18 23 9.732817
19 24 7.719776
20 25 11.907776
21 26 10.568831
22 27 8.139414
23 28 10.942830
24 45 9.851524
25 46 12.324436
26 47 11.604481
27 48 10.379039
28 49 10.806593
{% endraw %} {% raw %}
patched_df = interpolate_gaps( df ) #, plot = True )
patched_df
No datetime column: assuming first column 'X' as X-axis
std() built with Wiener method
Will interpolate if X-column interval is more than 1.7675
Processed 0.00% of gaps
Ended succesfully
Ended execution
X Y interpolated
0 0.0000 11.844657 0
1 1.0000 9.800014 0
2 2.0000 7.690769 0
3 3.0000 8.514332 1
4 4.0000 9.240396 1
5 5.0000 9.612237 1
6 6.0000 9.601032 0
7 7.0000 9.723581 0
8 8.0000 11.044739 0
9 9.0000 10.988900 0
10 10.0000 9.685440 0
11 11.0000 9.287421 0
12 12.0000 10.877917 0
13 13.0000 10.431381 0
14 14.0000 11.607592 0
15 15.0000 11.600781 0
16 16.0000 11.329190 0
17 17.5000 10.023632 1
18 19.0000 8.718075 0
19 20.0000 9.301708 0
20 21.0000 9.355985 0
21 22.0000 9.580390 0
22 23.0000 9.732817 0
23 24.0000 7.719776 0
24 25.0000 11.907776 0
25 26.0000 10.568831 0
26 27.0000 8.139414 0
27 28.0000 10.942830 0
28 29.0625 10.612403 1
29 30.1250 10.259143 1
30 31.1875 9.728705 1
31 32.2500 9.665752 1
32 33.3125 9.766997 1
33 34.3750 9.694638 1
34 35.4375 9.658282 1
35 36.5000 9.638956 1
36 37.5625 9.611720 1
37 38.6250 9.259513 1
38 39.6875 9.235943 1
39 40.7500 9.435961 1
40 41.8125 9.248638 1
41 42.8750 9.225836 1
42 43.9375 9.502167 1
43 45.0000 9.851524 0
44 46.0000 12.324436 0
45 47.0000 11.604481 0
46 48.0000 10.379039 0
47 49.0000 10.806593 0
{% endraw %} {% raw %}

check_dataframe[source]

check_dataframe(df)

Collection of tests done first on the dataframe to ensure it's properly formatted for interpolation, and identification of x, y columns to be used.

Checks:

  • 2 columns only (or 1 + datetime index)
  • looks for datetime presence (affects the next steps)

:returns: params_deduced = ( dataframe_valid :: boolean, params_deduced :: dict, stores flags and X,Y colums names )

{% endraw %} {% raw %}
{% endraw %} {% raw %}

plot_results[source]

plot_results(patched)

Plot the original series and the interpolated data.

{% endraw %} {% raw %}
{% endraw %} {% raw %}

bb[source]

bb(x0, y0, x1, y1, params_deduced, variance, interpolated, fixed_freq)

Brownian Bridge algo implementation.

::returns:: nothing, but adds (xm, ym) tuples to an interpolated list.

{% endraw %} {% raw %}
{% endraw %} {% raw %}

get_local_std[source]

get_local_std(delta, df_to_patch, params_deduced, gap_index, method='wiener')

Calculates variance to be ussed in the interpolation. Must be different depending on the gap length.

Two methods: local_scan and weiner. 'local_scan' is a naive approach mostly kept for learning purposes; weiner is default and is always recommended.

Assuming a Wiener process (like in stock market theory) then std ~ sqrt(delta_t), and in the gap case delta_t := delta_t_gap, so:

std_gap = std_whole / sqrt( delta_t_gap )

::returns:: std, float to be used in the brownian bridge

{% endraw %} {% raw %}
{% endraw %}