# Quantile Function of Mixture Distributions in Python

Finite mixtures are just convex combinations of two or more component distributions. Given weights $w_1, \dots, w_n$ and component distributions $P_i$ or densities $p_i$, the distribution and density functions are easy to compute, $$ F(x) = \sum_{i=1}^n w_i P_i(x) $$ $$ f(x) = \sum_{i=1}^n w_i p_i(x) $$ but the inverse distribution $F^{-1}(p)$ or quantile function is not so straightforward. Often, there is no closed-form solution. But fortunately this can be calculated easily by just solving $F(x) = p$ numerically for $x$ (for such families such as Gaussian mixtures, there are better ways to do this).

The following Python code does this and provides an interface similar to other the `scipy.stats`

probability distributions.

```
import functools
import numpy as np
from scipy.optimize import root_scalar
def _vectorize_float(f):
vectorized = np.vectorize(f, otypes=[float], signature="(),()->()")
@functools.wraps(f)
def wrapper(*args):
return vectorized(*args)
return wrapper
class MixtureDistribution:
def __init__(self, distributions, weights):
self._distributions = list(distributions)
self._weights = list(weights)
if not (all(w >= 0 for w in self._weights) and sum(self._weights) == 1):
raise ValueError("Invalid weight vector.")
if len(self._distributions) != len(self._weights):
raise ValueError("Mixtures and weights must have the same length.")
if len(self._distributions) < 2:
raise ValueError("Must have at least two component distributions.")
@_vectorize_float
def pdf(self, x):
return sum(w * d.pdf(x) for w, d in zip(self._weights, self._distributions))
@_vectorize_float
def cdf(self, x):
return sum(w * d.cdf(x) for w, d in zip(self._weights, self._distributions))
@_vectorize_float
def ppf(self, p):
bracket = [min(dist.ppf(p) for dist in self._distributions),
max(dist.ppf(p) for dist in self._distributions)]
r = root_scalar(
f=lambda x: self.cdf(x) - p,
fprime=self.pdf,
bracket=bracket,
x0=0
)
assert r.converged
return r.root
```

The percentage-point function inverts the cdf using the standard `root_scalar`

function. As a cdf is monotonic, the bracketing interval must be bounded by the mininum and maximum of the quantitle function across all the component distributions. (The `_vectorize_float`

bit isn't conceptually important: this just allows my functions to be called with numpy arrays; for some reason, numpy's `vectorize`

function cannot be used as a decorator.)

Then we can create and call the mixture distribution as follows,

```
from scipy.stats import norm
m = MixtureDistribution([norm(-2), norm(-2)], [1/2, 1/2])
m.cdf(0) == 0.5
>> True
```

which makes sense.