# Computing a Rolling Median¶

## Rolling Median in C++¶

A powerful use case for a skip list is in the computation of a rolling fraction, for example a rolling median.

Here is a reasonable C++ attempt at doing that with the arguments:

`data`

- A vector of data of type`T`

of length`L`

.`win_length`

- a ‘window’ size. The median is computed over this number of values.`result`

- a destination vector for the result. This will either end up with`L - win_length`

values, alternatively is will be`L`

long and start with`win_length`

`NaN`

s.

Rolling median code using a skip list might look like this, error checking is omitted:

```
#include "SkipList.h"
template <typename T>
void rolling_median(const std::vector<T> data,
size_t win_length,
std::vector<T> &result) {
OrderedStructs::SkipList::HeadNode<T> sl;
result.clear();
for (size_t i = 0; i < data.size(); ++i) {
sl.insert(data[i]);
if (i >= win_length) {
result.push_back(sl.at(win_length / 2));
sl.remove(data[i - win_length]);
}
}
}
```

If you are working with C arrays (such as Numpy arrays) then this C’ish approach might be better, again error checking omitted:

```
#include "SkipList.h"
template <typename T>
void rolling_median(const T *src, size_t count, size_t win_length, T *dest) {
OrderedStructs::SkipList::HeadNode<T> sl;
const T *tail = src;
for (size_t i = 0; i < count; ++i) {
sl.insert(*src);
if (i + 1 >= win_length) {
*dest = sl.at(win_length / 2);
++dest;
sl.remove(*tail);
++tail;
}
++src;
}
}
```

Multidimensional Numpy arrays have a stride value which is omitted in the above code but is simple to add. See *RollingMedian.h* and *test/test_rolling_median.cpp* for further examples.

Rolling percentiles require a argument that says what fraction of the window the required value lies. Again, this is easy to add.

### Even Window Length¶

The above code assumes that if the window length is even that the median is at `(window length - 1) / 2`

. A more plausible median for even sized window lengths is the mean of `(window length - 1) / 2`

and `window length / 2`

. This requires that the mean of two types is meaningful which it will not be for strings.

## Rolling Median in Python¶

Here is an example of computing a rolling median of a `numpy`

1D array.
This creates an array with the same length as the input starting with `window_length`

`NaN`

s:

```
import numpy as np
import orderedstructs
def simple_python_rolling_median(vector: np.ndarray,
window_length: int) -> np.ndarray:
"""Computes a rolling median of a numpy vector returning a new numpy
vector of the same length.
NaNs in the input are not handled but a ValueError will be raised."""
if vector.ndim != 1:
raise ValueError(
f'vector must be one dimensional not shape {vector.shape}'
)
skip_list = orderedstructs.SkipList(float)
ret = np.empty_like(vector)
for i in range(len(vector)):
value = vector[i]
skip_list.insert(value)
if i >= window_length - 1:
# // 4 for lower quartile
# * 3 // 4 for upper quartile etc.
median = skip_list.at(window_length // 2)
skip_list.remove(vector[i - window_length + 1])
else:
median = np.nan
ret[i] = median
return ret
```

This can be called thus:

```
np_array = np.arange(10.0)
print('Original:', np_array)
result = simple_python_rolling_median(np_array, 3)
print(' Result:', result)
```

And the result will be:

```
Original: [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
Result: [nan nan 1. 2. 3. 4. 5. 6. 7. 8.]
```

## Handling NaNs¶

Not-a-number (NaN) values can not be inserted into a Skip List as they are not comparable to anything (including themselves).
An attempt to call `insert()`

, `index()`

, `has()`

, `remove()`

with a NaN will raise an error.
In C++ this will throw a `OrderedStructs::SkipList::FailedComparison`

.
In Python it will raise a `ValueError`

.
This section looks at handling NaNs in Python.

Here are several ways of handling NaNs:

- Propogate the Exception.
- Make the Median NaN.
- Forward Filling.

### Propogate the Exception¶

Here is a rolling median that will raise a `ValueError`

if there is a NaN in the input.

```
def rolling_median_no_nan(vector: typing.List[float],
window_length: int) -> typing.List[float]:
"""Computes a rolling median of a vector of floats and returns the results.
NaNs will throw an exception."""
skip_list = orderedstructs.SkipList(float)
ret: typing.List[float] = []
for i in range(len(vector)):
value = vector[i]
skip_list.insert(float(value))
if i >= window_length:
median = skip_list.at(window_length // 2)
skip_list.remove(vector[i - window_length])
else:
median = math.nan
ret.append(median)
return ret
```

### Make the Median NaN¶

Here is a rolling median that will make the median NaN if there is a NaN in the input. Incidentally this is the approach that numpy takes.

```
def rolling_median_with_nan(vector: typing.List[float],
window_length: int) -> typing.List[float]:
"""Computes a rolling median of a vector of floats and returns the results.
NaNs will be consumed."""
skip_list = orderedstructs.SkipList(float)
ret: typing.List[float] = []
for i in range(len(vector)):
value = vector[i]
if math.isnan(value):
median = math.nan
else:
skip_list.insert(float(value))
if i >= window_length:
median = skip_list.at(window_length // 2)
remove_value = vector[i - window_length]
if not math.isnan(remove_value):
skip_list.remove(remove_value)
else:
median = math.nan
ret.append(median)
return ret
```

The first row is the input, the second the output. Window length is 5:

```
[0.0, math.nan, 2.0, 3.0, 4.0, 5.0, 6.0, math.nan, 8.0, 9.0],
[math.nan, math.nan, math.nan, math.nan, math.nan, 3.0, 4.0, math.nan, 4.0, 5.0],
```

### Forward Filling¶

Another approach is to replace the NaN with the previous value.
This is *very popular* in FinTech and is commonly know as *Forward Filling*.
Here is an implementation:

```
def forward_fill(vector: typing.List[float]) -> None:
"""Forward fills NaNs in-place."""
previous_value = math.nan
for i in range(len(vector)):
value = vector[i]
if math.isnan(value):
vector[i] = previous_value
if not math.isnan(value):
previous_value = value
def rolling_median_with_nan_forward_fill(vector: typing.List[float],
window_length: int) -> typing.List[float]:
"""Computes a rolling median of a vector of floats and returns the results.
NaNs will be forward filled."""
forward_fill(vector)
return rolling_median_no_nan(vector, window_length)
```

The first row is the input, the second the output. Window length is 5:

```
[0.0, math.nan, 2.0, 3.0, 4.0, 5.0, 6.0, math.nan, 8.0, 9.0],
[math.nan, math.nan, math.nan, math.nan, math.nan, 2.0, 3.0, 4.0, 5.0, 6.0],
```

Another example, the first row is the input, the second the output. Window length is 5:

```
[0.0, math.nan, 2.0, math.nan, 4.0, 5.0, 6.0, math.nan, 8.0, 9.0],
[math.nan, math.nan, math.nan, math.nan, math.nan, 2.0, 2.0, 4.0, 5.0, 6.0],
```

There is no ‘right way’ to handle NaNs. They are always problematic. For example what is the ‘right way’ to sort a sequence of values that may include NaNs?