Given a stream of pairs (id, weight), write a function that selects an id at random, with probability proportional to its weight (compared to the total weight sum when the stream has ended). The stream size is unknown in advance and you're allowed O(1) memory.
Example: if the stream was {(1,2),(2,2),(3,4),(4,10)}, then id 2 would be returned with probability 1/9.
Let $w_i$ be the weight of the ith pair and $s_i = \sum\limits_{j=1}^i w_j$ (sum of all weights seen so far).
After processing all n stream pairs, the ith pair should be selected with probability $\frac{w_i}{s_n}$.
We select the first pair of the stream. Then, for each new pair i, we select it with probability $\frac{w_i}{s_i}$.
Proof
Let p(i) be the probability of selecting pair i after processing the whole stream. p(i) is the probability of selecting pair i and not selecting any of the subsequent pairs. So $p(i) = \frac{w_i}{s_i} * (1 - \frac{w_{i+1}}{s_{i+1}}) * ... * (1 - \frac{w_n}{s_n})$
$s_{i+1} = s_i + w_{i+1}$
$1 - \frac{w_{i+1}}{s_{i+1}} = 1 - \frac{w_{i+1}}{s_i + w_{i+1}} = \frac{s_i}{s_{i+1}}$
Solution
Since we are only allowed to use constant space, we need to process the stream values online.
Proof
Let p(i) be the probability of selecting pair i after processing the whole stream. p(i) is the probability of selecting pair i and not selecting any of the subsequent pairs. So $p(i) = \frac{w_i}{s_i} * (1 - \frac{w_{i+1}}{s_{i+1}}) * ... * (1 - \frac{w_n}{s_n})$
$s_{i+1} = s_i + w_{i+1}$
$1 - \frac{w_{i+1}}{s_{i+1}} = 1 - \frac{w_{i+1}}{s_i + w_{i+1}} = \frac{s_i}{s_{i+1}}$
Plugging into the above equation
$p(i) = \frac{w_i}{s_i} * \frac{s_i}{s_{i+1}} * \frac{s_{i+1}}{s_{i+2}} * \frac{s_{i+2}}{s_{i+3}} * ... * \frac{s_{n-2}}{s_{n-1}} * \frac{s_{n-1}}{s_n} = \frac{w_i}{s_n}$
$p(i) = \frac{w_i}{s_i} * \frac{s_i}{s_{i+1}} * \frac{s_{i+1}}{s_{i+2}} * \frac{s_{i+2}}{s_{i+3}} * ... * \frac{s_{n-2}}{s_{n-1}} * \frac{s_{n-1}}{s_n} = \frac{w_i}{s_n}$
Q.E.D.
You can find a sample implementation on github.
Sem comentários:
Enviar um comentário