Probabilistic Retrieval Models
For further discussion we'll make two important assumptions:
ranking the relevant documents depends on the number of documents the user has already seen: the more documents we see - the less useful they are.
relevance of \(D_i\) to \(Q\) is independent of other documents \(D_j\) from the collection. Therefore we can apply it to each document separately.
Notation
Assume \( R=\{r, \neg r\} \) a binary random variable that indicates relevance
let \(r\) represent the event that document \(D\) is relevant
\(\neg r\) represent the event that \(D\) is not relevant
We need to estimate the probability of relevance of a document \(D\) w.r.t. query \(Q\). In other words, we need to find:
\(P(R=r|D, Q) \) - the probability that \(D\) is relevant to \(Q\)
\(P(R=\neg r| D, Q)\) - the probability that \(D\) is not relevant to \(Q\)
Applying Bayes Theorem to infer the probabilities:
\(P(R=r|D,Q)=\frac{P(D, Q|R=r)P(R=r)}{P(D,Q)}\)
\(P(R=\neg r| D,Q)=\frac{P(D,Q|R=\neg r)P(R=\neg r)}{P(D,Q)}\)
Subscribe to my newsletter
Read articles from Oleg Kleiman directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by