quarta-feira, 28 de janeiro de 2015

Selecting values from a stream at random

Saw an interesting question today at CareerCup:
Given a stream of pairs (id, weight), write a function that selects an id at random, with probability proportional to its weight (compared to the total weight sum when the stream has ended). The stream size is unknown in advance and you're allowed O(1) memory. 
Example: if the stream was {(1,2),(2,2),(3,4),(4,10)}, then id 2 would be returned with probability 1/9.

Let $w_i$ be the weight of the ith pair and $s_i = \sum\limits_{j=1}^i w_j$ (sum of all weights seen so far).
After processing all n stream pairs, the ith pair should be selected with probability $\frac{w_i}{s_n}$.

Solution

Since we are only allowed to use constant space, we need to process the stream values online.

We select the first pair of the stream. Then, for each new pair i, we select it with probability $\frac{w_i}{s_i}$.

Proof

Let p(i) be the probability of selecting pair i after processing the whole stream. p(i) is the probability of selecting pair i and not selecting any of the subsequent pairs. So $p(i) = \frac{w_i}{s_i} * (1 - \frac{w_{i+1}}{s_{i+1}}) * ... * (1 - \frac{w_n}{s_n})$

$s_{i+1} = s_i + w_{i+1}$

$1 - \frac{w_{i+1}}{s_{i+1}} = 1 - \frac{w_{i+1}}{s_i + w_{i+1}} = \frac{s_i}{s_{i+1}}$

Plugging into the above equation

$p(i) = \frac{w_i}{s_i} * \frac{s_i}{s_{i+1}} * \frac{s_{i+1}}{s_{i+2}} * \frac{s_{i+2}}{s_{i+3}} * ... * \frac{s_{n-2}}{s_{n-1}} * \frac{s_{n-1}}{s_n} = \frac{w_i}{s_n}$

Q.E.D.

You can find a sample implementation on github.



Sem comentários:

Enviar um comentário