Processing math: 27%

Formulas for Bayesian A/B Testing

By Evan Miller

October 1, 2015 (Changes)

This page collects a few formulas I’ve derived for evaluating A/B tests in a Bayesian context. The formulas on this page are closed-form, so you don’t need to do complicated integral evaluations; they can be computed with simple loops and a decent math library. The advantage of Bayesian formulas over the traditional frequentist formulas is that you don’t have to collect a pre-ordained sample size in order to get a valid result. (See How Not To Run An A/B Test for more context on the “peeking” problem, and Simple Sequential A/B Testing for a frequentist solution to the problem.)

Table of Contents

  1. A/B testing: binary outcomes
    1. Derivation
    2. Equivalent Formulas
    3. Implementation
  2. A/B/C testing: binary outcomes
    1. Derivation
    2. Implementation
  3. A/B/C/D testing: binary outcomes
    1. Derivation
    2. Implementation
  4. A/B testing: count data
    1. Derivation
    2. Implementation
  5. A/B/C testing: count data
    1. Derivation
    2. Implementation

A/B testing: binary outcomes

For a binary-outcome test (e.g. a test of conversion rates), the probability that B will beat A in the long run is given by:

Pr(pB>pA)=αB1i=0B(αA+i,βB+βA)(βB+i)B(1+i,βB)B(αA,βA)

Where:

Derivation

Not for the timid! If calculus isn’t your cup of chamomile, skip right to the implementation section.

The beta distribution is a convenient prior distribution for modeling a binomial parameter p. Starting from a non-informative prior,1 the distribution of p after S successes and F failures is given by Beta(S+1,F+1). For the remainder of the discussion let α=S+1 and β=F+1, which are just the two parameters of the beta distribution for the belief.

Now suppose we have two experimental branches (A and B) and have a Bayesian belief for each one:

pABeta(αA,βA)pBBeta(αB,βB)

Using the pdf of the beta distribution, we can get the total probability that pB is greater than pA by integrating the joint distribution over all values for which pB>pA:

Pr(pB>pA)=101pApαA1A(1pA)βA1B(αA,βA)pBαB1(1pB)βB1B(αB,βB)dpBdpA

Evaluating the inner integral, the equation becomes:

Pr(pB>pA)=110pαA1A(1pA)βA1B(αA,βA)IpA(αB,βB)dpA

Where Ix is the regularized incomplete beta function. Using the identity Ix(1,b)=1(1x)b, the recursive relationship

Ix(a,b)=Ix(a1,b)xa1(1x)b(a1)B(a1,b)

and the fact that α and β are integers, we can express Ix as:

Ix(a,b)=1(1x)ba1j=1xaj(1x)b(aj)B(aj,b)

Or equivalently:

Ix(a,b)=1a1i=0xi(1x)b(b+i)B(1+i,b)

The probability integral (2) can therefore be written:

Pr(pB>pA)=110pαA1A(1pA)βA1B(αA,βA)(1αB1i=0piA(1pA)βB(βB+i)B(1+i,βB))dpA=11+10pαA1A(1pA)βA1B(αA,βA)αB1i=0piA(1pA)βB(βB+i)B(1+i,βB)dpA=10αB1i=0pαA1+iA(1pA)βA+βB1(βB+i)B(αA,βA)B(1+i,βB)dpA=αB1i=010pαA1+iA(1pA)βA+βB1(βB+i)B(αA,βA)B(1+i,βB)dpA=αB1i=0B(αA+i,βA+βB)(βB+i)B(αA,βA)B(1+i,βB)10pαA1+iA(1pA)βA+βB1B(αA+i,βA+βB)dpA

Finally:

Pr(pB>pA)=αB1i=0B(αA+i,βA+βB)(βB+i)B(1+i,βB)B(αA,βA)

Update, 6/8/2014: Chris Stucchio has derived a decision rule and asymptotic formula using the above formula. Check them out!

Equivalent Formulas

It’s possible to derive similar formulas that sum over the other three parameters:

Pr(pB>pA)=1αA1i=0B(αB+i,βB+βA)(βA+i)B(1+i,βA)B(αB,βB)Pr(pB>pA)=βA1i=0B(βB+i,αA+αB)(αA+i)B(1+i,αA)B(αB,βB)Pr(pB>pA)=1βB1i=0B(βA+i,αA+αB)(αB+i)B(1+i,αB)B(αA,βA)

The above formulas can be found with symmetry arguments.

Implementation

Here’s a quick implementation of (6) in Julia:

using SpecialFunctions # for logbeta

function probability_B_beats_A(α_A, β_A, α_B, β_B)
    total = 0.0
    for i = 0:(α_B-1)
        total += exp(logbeta(α_A+i, β_B+β_A)
            - log(β_B+i) - logbeta(1+i, β_B) - logbeta(α_A, β_A))
    end
    return total
end

The beta function produces very large numbers, so if you’re getting infinite values in your program, be sure to work with logarithms, as in the code above. Your standard library’s log-beta function will come in handy here.

If you don’t have log-beta available, it’s easy enough to define one with the log-gamma function and the identity:

log(B(a,b))=log(Γ(a))+log(Γ(b))log(Γ(a+b))

If you have neither log-beta nor log-gamma available, first rewrite the equation in terms of gamma function:

Pr(pB>pA)=αB1i=0Γ(αA+i)Γ(βA+βB)Γ(1+i+βB)Γ(αA+βA)(βB+i)Γ(αA+i+βA+βB)Γ(1+i)Γ(βB)Γ(αA)Γ(βA)

Using the property that Γ(z)=(z1)!, notice that there are an equal number of multiplicative terms in the numerator and denominator. If you alternately multiply and divide one term at a time, you should be able to arrive at an answer without encountering numerical overflow.

As there are four equivalent formulas available, implementers concerned with computational efficiency may want to choose the formula that requires the smallest number of iterations for a particular set of α and β values.

A final word of caution to implementers: when calling these functions, don’t forget to add 1 to the success and failure counts! Otherwise your results will be slightly off.


A/B/C testing: binary outcomes

It is possible to extend the binary-outcome formula to three test groups, call them A, B, and C. The probability that C will beat both A and B in the long run is:

Pr(pC>max

Where:

Note that this formula can be computed in O(\alpha_A \alpha_B) time (see the implementation section below).

Derivation

Start with a Bayesian belief for each of three experimental branches (A, B, and C):

\begin{array}{lr} p_A \sim {\rm Beta}(\alpha_A, \beta_A) \\ p_B \sim {\rm Beta}(\alpha_B, \beta_B) \\ p_C \sim {\rm Beta}(\alpha_C, \beta_C) \\ \end{array}

Calling the pdf of the beta distribution f(p|\alpha, \beta) = f(p), we can get the total probability that p_C is greater than both p_A and p_B by integrating the joint distribution over all values for which p_C > p_A and p_C > p_B:

{\rm Pr}(p_C > \max{\{p_A, p_B\}}) = \int_0^1 \int_0^{p_C} \int_0^{p_C} f(p_A) f(p_B) f(p_C) dp_A dp_B dp_C

Evaluating the inner two integrals, the equation becomes:

\begin{equation} \label{binary_abc_pr_eval_inner} {\rm Pr}(p_C > \max{\{p_A, p_B\}}) = \int_0^1 I_{p_C}(\alpha_A, \beta_A) I_{p_C}(\alpha_B, \beta_B) f(p_C) dp_C \end{equation}

Using the identity for I_X \eqref{ibeta_sum}, we have:

{\rm Pr} = \int_0^1 \left(1-\sum_{i=0}^{\alpha_A-1}\frac{p_C^i (1-p_C)^{\beta_A}}{(\beta_A+i)B(1+i,\beta_A)}\right) \left(1-\sum_{i=0}^{\alpha_B-1}\frac{p_C^i (1-p_C)^{\beta_B}}{(\beta_B+i)B(1+i,\beta_B)}\right) f(p_C) dp_C \\

Multiplying out the parenthetical terms and integrating them separately:

{\rm Pr} = 1 - \int_0^1 \sum_{i=0}^{\alpha_A-1}\frac{p_C^i (1-p_C)^{\beta_A}}{(\beta_A+i)B(1+i,\beta_A)} f(p_C) dp_C - \int_0^1 \sum_{i=0}^{\alpha_B-1}\frac{p_C^i (1-p_C)^{\beta_B}}{(\beta_B+i)B(1+i,\beta_B)} f(p_C) dp_C + \int_0^1 \sum_{i=0}^{\alpha_A-1}\frac{p_C^i (1-p_C)^{\beta_A}}{(\beta_A+i)B(1+i,\beta_A)} \sum_{i=0}^{\alpha_B-1}\frac{p_C^i (1-p_C)^{\beta_B}}{(\beta_B+i)B(1+i,\beta_B)} f(p_C) dp_C

From the previous derivation, we can rewrite the first two integrals as {\rm Pr}(p_A > p_C) and {\rm Pr}(p_B > p_C), and consolidate the terms inside the third integral:

\begin{array}{ll} {\rm Pr} &=& 1 - {\rm Pr}(p_A > p_C) - {\rm Pr}(p_B > p_C) \\ && + \int_0^1 \sum_{i=0}^{\alpha_A-1} \sum_{j=0}^{\alpha_B-1} \frac{p_C^{i+j}(1-p_C)^{\beta_A+\beta_B}}{(\beta_A+i)(\beta_B+j)B(1+i,\beta_A)B(1+j,\beta_B)} \frac{p_C^{\alpha_C-1}(1-p_C)^{\beta_C-1}}{B(\alpha_C, \beta_C)} dp_C \\ \\ &=& 1 - {\rm Pr}(p_A > p_C) - {\rm Pr}(p_B > p_C) \\ && + \int_0^1 \sum_{i=0}^{\alpha_A-1} \sum_{j=0}^{\alpha_B-1} \frac{p_C^{i+j+\alpha_C-1}(1-p_C)^{\beta_A+\beta_B+\beta_C-1}}{(\beta_A+i)(\beta_B+j)B(1+i,\beta_A)B(1+j,\beta_B)B(\alpha_C, \beta_C)} dp_C \end{array}

Finally, evaluating the integral we have:

\begin{array}{ll} {\rm Pr}(p_C > \max\{p_A, p_B\}) &=& 1 - {\rm Pr}(p_A > p_C) - {\rm Pr}(p_B > p_C) \\ && + \sum_{i=0}^{\alpha_A-1} \sum_{j=0}^{\alpha_B-1} \frac{B(i+j+\alpha_C, \beta_A+\beta_B+\beta_C)}{(\beta_A+i)(\beta_B+j)B(1+i,\beta_A)B(1+j,\beta_B)B(\alpha_C, \beta_C)} \end{array}

Implementation

In Julia (taking advantage of probability_B_beats_A defined above):

using SpecialFunctions # for logbeta

function probability_C_beats_A_and_B(α_A, β_A, α_B, β_B, α_C, β_C)
    total = 0.0
    for i = 0:(α_A-1)
        for j = 0:(α_B-1)
          total += exp(logbeta(α_C+i+j, β_A+β_B+β_C) - log(β_A+i) - log(β_B+j)
              - logbeta(1+i, β_A) - logbeta(1+j, β_B) - logbeta(α_C, β_C))
        end
    end
    return (1 - probability_B_beats_A(α_C, β_C, α_A, β_A)
              - probability_B_beats_A(α_C, β_C, α_B, β_B) + total)
end

See the hints above if you don’t have a log-beta function in your language.


A/B/C/D testing: binary outcomes

The framework may be extended indefinitely to more variants, at increasing computational expense (and algebraic complexity). A derivation of a four-variant case (computable in O(\alpha_A \alpha_B \alpha_C) time) follows.

Derivation

Beginning with

{\rm Pr}(p_D > \max\{p_A, p_B, p_C\}) = \int_0^1 \int_0^{p_D} \int_0^{p_D} \int_0^{p_D} f(p_A) f(p_B) f(p_C) f(p_D) dp_A dp_B dp_C dp_D

The three inner integrals evaluate to

{\rm Pr}(p_D > \max\{p_A, p_B, p_C\}) = \int_0^1 I_{p_D}(\alpha_A, \beta_A) I_{p_D}(\alpha_B, \beta_B) I_{p_D}(\alpha_C, \beta_C) f(p_D) dp_D

Using the identity

\begin{equation} \label{beta_inverse_identity} I_x(a, b) = 1 - I_{1-x}(b, a) \end{equation}

We can write for the moment

\begin{array}{ll} {\rm Pr}(p_D > \max\{p_A, p_B, p_C\}) &=& \int_0^1 I_{p_D}(\alpha_A, \beta_A) I_{p_D}(\alpha_B, \beta_B) (1-I_{1-p_D}(\beta_C, \alpha_C)) f(p_D) dp_D \\ &=& \int_0^1 I_{p_D}(\alpha_A, \beta_A) I_{p_D}(\alpha_B, \beta_B) f(p_D) dp_D \\ && -\int_0^1 I_{p_D}(\alpha_A, \beta_A) I_{p_D}(\alpha_B, \beta_B) I_{1-p_D}(\beta_C, \alpha_C) f(p_D) dp_D \\ \end{array}

Rewriting the first term

{\rm Pr}(p_D > \max\{p_A, p_B, p_C\}) = {\rm Pr}(p_D > \max\{p_A, p_B\}) -\int_0^1 I_{p_D}(\alpha_A, \beta_A) I_{p_D}(\alpha_B, \beta_B) I_{1-p_D}(\beta_C, \alpha_C) f(p_D) dp_D \\

And then rewriting the second term

{\rm Pr} = {\rm Pr}(p_D > \max\{p_A, p_B\}) -\int_0^1 (1-I_{1-p_D}(\beta_A, \alpha_A)) (1-I_{1-p_D}(\beta_B, \alpha_B)) I_{1-p_D}(\beta_C, \alpha_C) f(p_D) dp_D \\

Multiplying terms and separating the integrals

\begin{array}{ll} {\rm Pr} &=& {\rm Pr}(p_D > \max\{p_A, p_B\}) -\int_0^1 I_{1-p_D}(\beta_C, \alpha_C) f(p_D) dp_D \\ && +\int_0^1 I_{1-p_D}(\beta_A, \alpha_A) I_{1-p_D}(\beta_C, \alpha_C) f(p_D) dp_D \\ && +\int_0^1 I_{1-p_D}(\beta_B, \alpha_B) I_{1-p_D}(\beta_C, \alpha_C) f(p_D) dp_D \\ && -\int_0^1 I_{1-p_D}(\beta_A, \alpha_A) I_{1-p_D}(\beta_B, \alpha_B) I_{1-p_D}(\beta_C, \alpha_C) f(p_D) dp_D \\ \end{array}

Equations \eqref{ibeta_sum} and \eqref{beta_inverse_identity} imply

\begin{equation} \label{beta_inverse_sum} I_{1-p_D}(\beta, \alpha) = \sum_{i=0}^{\alpha-1} \frac{p_D^{i}(1 - p_D)^\beta}{(\beta+i)B(1+i,\beta)} \end{equation}

Rewriting the first integral using the two-variant formula, and the next two integrals using the three-variant formula,

\begin{array}{ll} {\rm Pr} &=& {\rm Pr}(p_D > \max\{p_A, p_B\}) -{\rm Pr}(p_C > p_D) \\ && -\left(1-{\rm Pr}(p_A > p_D) - {\rm Pr}(p_C > p_D) - {\rm Pr}(p_D > \max\{p_A, p_C\})\right) \\ && -\left(1-{\rm Pr}(p_B > p_D) - {\rm Pr}(p_C > p_D) - {\rm Pr}(p_D > \max\{p_B, p_C\})\right) \\ && -\int_0^1 I_{1-p_D}(\beta_A, \alpha_A) I_{1-p_D}(\beta_B, \alpha_B) I_{1-p_D}(\beta_C, \alpha_C) f(p_D) dp_D \\ \end{array}

Combining

\begin{array}{ll} {\rm Pr} &=& -2 + {\rm Pr}(p_A > p_D) + {\rm Pr}(p_B > p_D) + {\rm Pr}(p_C > p_D) \\ && + {\rm Pr}(p_D > \max\{p_A, p_B\}) \\ && + {\rm Pr}(p_D > \max\{p_A, p_C\}) \\ && + {\rm Pr}(p_D > \max\{p_B, p_C\}) \\ && -\int_0^1 I_{1-p_D}(\beta_A, \alpha_A) I_{1-p_D}(\beta_B, \alpha_B) I_{1-p_D}(\beta_C, \alpha_C) f(p_D) dp_D \\ \end{array}

Or

\begin{array}{ll} {\rm Pr} &=& 1 - {\rm Pr}(p_D > p_A) - {\rm Pr}(p_D > p_B) - {\rm Pr}(p_D > p_C) \\ && + {\rm Pr}(p_D > \max\{p_A, p_B\}) \\ && + {\rm Pr}(p_D > \max\{p_A, p_C\}) \\ && + {\rm Pr}(p_D > \max\{p_B, p_C\}) \\ && -\int_0^1 I_{1-p_D}(\beta_A, \alpha_A) I_{1-p_D}(\beta_B, \alpha_B) I_{1-p_D}(\beta_C, \alpha_C) f(p_D) dp_D \\ \end{array}

Using \eqref{beta_inverse_sum} the final integral evaluates to

\sum_{i=0}^{\alpha_A-1}\sum_{j=0}^{\alpha_B-1}\sum_{k=0}^{\alpha_C-1} \frac{B(i+j+k+\alpha_D, \beta_A + \beta_B + \beta_C + \beta_D)}{(\beta_A+1)(\beta_B+1)(\beta_C+1)B(1+i,\beta_A)B(1+j,\beta_B)B(1+k,\beta_C)B(\alpha_D,\beta_D)}

Yielding at last

\begin{array}{ll} {\rm Pr}(p_D > \max\{p_A, p_B, p_C\}) &=& 1 - {\rm Pr}(p_D > p_A) - {\rm Pr}(p_D > p_B) - {\rm Pr}(p_D > p_C) \\ && + {\rm Pr}(p_D > \max\{p_A, p_B\}) \\ && + {\rm Pr}(p_D > \max\{p_A, p_C\}) \\ && + {\rm Pr}(p_D > \max\{p_B, p_C\}) \\ && - \sum_{i=0}^{\alpha_A-1}\sum_{j=0}^{\alpha_B-1}\sum_{k=0}^{\alpha_C-1} \frac{B(i+j+k+\alpha_D, \beta_A + \beta_B + \beta_C + \beta_D)}{(\beta_A+i)(\beta_B+j)(\beta_C+k)B(1+i,\beta_A)B(1+j,\beta_B)B(1+k,\beta_C)B(\alpha_D,\beta_D)} \end{array}

Implementation

In Julia (taking advantage of probability_C_beats_A_and_B and probability_B_beats_A):

using SpecialFunctions # for logbeta

function probability_D_beats_ABC(α_A, β_A, α_B, β_B, α_C, β_C, α_D, β_D)
    total = 0.0
    for i = 0:(α_A-1)
        for j = 0:(α_B-1)
            for k = 0:(α_C-1)
              total += exp(logbeta(α_D+i+j+k, β_A+β_B+β_C+β_D) 
                           - log(β_A+i) - log(β_B+j) - log(β_C+k)
                           - logbeta(1+i, β_A) - logbeta(1+j, β_B) - logbeta(1+k, β_C) 
                           - logbeta(α_D, β_D))
            end
        end
    end
    return (1 - probability_B_beats_A(α_A, β_A, α_D, β_D)
              - probability_B_beats_A(α_B, β_B, α_D, β_D)
              - probability_B_beats_A(α_C, β_C, α_D, β_D)
              + probability_C_beats_A_and_B(α_A, β_A, α_B, β_B, α_D, β_D)
              + probability_C_beats_A_and_B(α_A, β_A, α_C, β_C, α_D, β_D)
              + probability_C_beats_A_and_B(α_B, β_B, α_C, β_C, α_D, β_D)
              - total)
end

If the above code is too slow for you, try building lookup tables to cache the results of log and logbeta.


A/B testing: count data

Analyzing count data (for example, if you’re comparing the number of sales per salesman, or number of sales week over week) requires a different formula. The probability that group 1 has a higher arrival rate than group 2 is given by:

{\rm Pr}(\lambda_1 > \lambda_2) = \sum_{k=0}^{\alpha_1-1} \frac{ (\beta_1 + \beta_2)^{-(k+\alpha_2)} \beta_1^k \beta_2^{\alpha_2} }{(k+\alpha_2)B(k+1,\alpha_2)}

Here the \alpha values represent the total event counts in each group, and the \beta values represent the exposure, that is, the relative opportunity for events to occur. For example, if the first group was “exposed” to events for twice as long as the second group, you would set \beta_1 = 2 and \beta_2 = 1. (Alternatively, you could set \beta_1 = 20 and \beta_2 = 10; the math works out the same.) In the above equation, B is the beta function.

A derivation and implementation follow; they both closely mirror the binary-outcome case.

Derivation

Not for the timid! You may want to skip right to the implementation section.

Let \lambda_1 and \lambda_2 be the Poisson parameter for each group. With a gamma-distributed prior belief, the posterior beliefs are given by:

\begin{array}{lr} \lambda_1 \sim {\rm Gamma}(\alpha_1, \beta_1) \\ \lambda_2 \sim {\rm Gamma}(\alpha_2, \beta_2) \end{array}

Using the pdf of the gamma distribution, we can get the total probability that \lambda_1 is greater than \lambda_2 by integrating the joint distribution over all values for which \lambda_1 > \lambda_2:

\begin{equation} {\rm Pr}(\lambda_1 > \lambda_2) = \int_0^\infty \int_{\lambda_2}^\infty \frac{\beta_1^{\alpha_1} \lambda_1^{\alpha_1 - 1} e^{-\beta_1 \lambda_1}}{\Gamma(\alpha_1)} \frac{\beta_2^{\alpha_2} \lambda_2^{\alpha_2 - 1} e^{-\beta_2 \lambda_2}}{\Gamma(\alpha_2)} d\lambda_1 d\lambda_2 \end{equation}

Evaluating the inner integral, the equation becomes:

\begin{equation} \label{count_ab_pr_eval_inner} \int_0^\infty Q(\alpha_1, \beta_1 \lambda_2) \frac{\beta_2^{\alpha_2} \lambda_2^{\alpha_2-1} e^{-\beta_2 \lambda_2}}{\Gamma(\alpha_2)} d\lambda_2 \end{equation}

Where Q is the upper incomplete regularized gamma function. Using the identity Q(1,z) = e^{-z} and the recursive relationship

\begin{equation} Q(a + n, z) = Q(a, z) + z^a e^{-z} \sum_{k=0}^{n-1} \frac{z^k}{\Gamma(a+k+1)} \end{equation}

we can express Q as:

\begin{equation} \label{igamma_sum} Q(n, z) = e^{-z}\sum_{k=0}^{n-1}\frac{z^k}{\Gamma(k+1)} \end{equation}

The probability integral \eqref{count_ab_pr_eval_inner} can therefore be written:

\begin{array}{ll} {\rm Pr}(\lambda_1 > \lambda_2) &=& \int_0^\infty e^{-\beta_1 \lambda_2} \left( \sum_{k=0}^{\alpha_1-1} \frac{(\beta_1 \lambda_2)^k}{\Gamma(k+1)}\right) \frac{\beta_2^{\alpha_2} \lambda_2^{\alpha_2-1} e^{-\beta_2 \lambda_2}}{\Gamma(\alpha_2)} d\lambda_2 \\ &=& \sum_{k=0}^{\alpha_1-1} \int_0^\infty \frac{e^{-\beta_1 \lambda_2} (\beta_1 \lambda_2)^k}{\Gamma(k+1)} \frac{\beta_2^{\alpha_2} \lambda_2^{\alpha_2-1} e^{-\beta_2 \lambda_2}}{\Gamma(\alpha_2)} d\lambda_2 \\ &=& \sum_{k=0}^{\alpha_1-1} \frac{\beta_1^k \beta_2^{\alpha_2}}{\Gamma(k+1) \Gamma(\alpha_2)} \int_0^\infty e^{-(\beta_1+\beta_2) \lambda_2} \lambda_2^{k+\alpha_2-1} d\lambda_2 \\ &=& \sum_{k=0}^{\alpha_1-1} \frac{\beta_1^k \beta_2^{\alpha_2}}{\Gamma(k+1) \Gamma(\alpha_2)} (\beta_1 + \beta_2)^{-(k+\alpha_2)} \Gamma(k+\alpha_2) \\ &=& \sum_{k=0}^{\alpha_1-1} \beta_1^k \beta_2^{\alpha_2} (\beta_1 + \beta_2)^{-(k+\alpha_2)} \frac{\Gamma(k+\alpha_2)}{\Gamma(k+1) \Gamma(\alpha_2)} \frac{k+\alpha_2}{k+\alpha_2} \end{array}

Using \Gamma(z+1) = z\Gamma(z), we have:

\begin{equation} \label{gamma_version} {\rm Pr}(\lambda_1 > \lambda_2) = \sum_{k=0}^{\alpha_1-1} \beta_1^k \beta_2^{\alpha_2} (\beta_1 + \beta_2)^{-(k+\alpha_2)} \frac{\Gamma(k+\alpha_2+1)}{\Gamma(k+1) \Gamma(\alpha_2)} \frac{1}{k+\alpha_2} \end{equation}

Replacing the gamma functions with a beta function, we have:

\begin{equation} \label{count_ab_pr_alpha_b} {\rm Pr}(\lambda_1 > \lambda_2) = \sum_{k=0}^{\alpha_1-1} \frac{ (\beta_1 + \beta_2)^{-(k+\alpha_2)} \beta_1^k \beta_2^{\alpha_2} }{(k+\alpha_2)B(k+1,\alpha_2)} \end{equation}

Implementation

Here’s a quick implementation of \eqref{count_ab_pr_alpha_b} in Julia:

using SpecialFunctions # for logbeta

function probability_1_beats_2(α_1, β_1, α_2, β_2)
    total = 0.0
    for k = 0:(α_1-1)
        total += exp(k * log(β_1) + α_2 * log(β_2) - (k+α_2) * log(β_1 + β_2)
            - log(k + α_2) - logbeta(k + 1, α_2))
    end
    return total
end

The beta function produces very large numbers, so if you’re getting infinite values in your program, be sure to work with logarithms, as in the code above. Your standard library’s log-beta function will come in handy here.

If you don’t have log-beta available, it’s easy enough to define one with the log-gamma function and the identity:

\log(B(a, b)) = \log(\Gamma(a)) + \log(\Gamma(b)) - \log(\Gamma(a+b))

For more life-changing tips see the implementation section for the binary-outcome case.


A/B/C testing: count data

The count data formula above can be extended to three groups:

{\rm Pr}(\lambda_1 > \max\{\lambda_2, \lambda_3\}) = 1 - {\rm Pr}(\lambda_2 > \lambda_1) - {\rm Pr}(\lambda_3 > \lambda_1) \\ + \sum_{k=0}^{\alpha_2-1} \sum_{l=0}^{\alpha_3-1} \frac{\beta_1^{\alpha_1} \beta_2^k \beta_3^l}{(\beta_1 + \beta_2 + \beta_3)^{(k+l+\alpha_1)}} \frac{\Gamma(k+l+\alpha_1)}{\Gamma(k+1) \Gamma(l+1) \Gamma(\alpha_1)}

Where:

Derivation

Start with a Bayesian belief for each of three experimental branches (A, B, and C):

\begin{array}{lr} \lambda_1 \sim {\rm Gamma}(\alpha_1, \beta_1) \\ \lambda_2 \sim {\rm Gamma}(\alpha_2, \beta_2) \\ \lambda_2 \sim {\rm Gamma}(\alpha_3, \beta_3) \end{array}

Calling the pdf of the gamma distribution g(\lambda|\alpha, \beta) = g(\lambda), we can construct the total probability that \lambda_1 is greater than \lambda_2 and \lambda_3 with a triple integral:

{\rm Pr}(\lambda_1 > \max\{\lambda_2, \lambda_3\}) = \int_0^\infty \int_0^{\lambda_1} \int_0^{\lambda_1} g(\lambda_3) g(\lambda_2) g(\lambda_1) d\lambda_3 d\lambda_2 d\lambda_1

Evaluating the two inner integrals (using the upper incomplete regularized gamma function as before), we can write:

{\rm Pr}(\lambda_1 > \max\{\lambda_2, \lambda_3\}) = \int_0^\infty (1-Q(\alpha_2, \beta_2 \lambda_1))(1-Q(\alpha_3, \beta_3 \lambda_1)) g(\lambda_1) d\lambda_1

Multiplying out the parenthetical terms and distributing the integral across the four resulting terms:

{\rm Pr} = 1 - \int_0^\infty Q(\alpha_2, \beta_2 \lambda_1) g(\lambda_1) d\lambda_1 - \int_0^\infty Q(\alpha_3, \beta_3 \lambda_1) g(\lambda_1) d\lambda_1 \\ + \int_0^\infty Q(\alpha_2, \beta_2 \lambda_1) Q(\alpha_3, \beta_3 \lambda_1) g(\lambda_1) d\lambda_1

Following the previous derivation, we can rewrite the first two integrals as {\rm Pr}(\lambda_2 > \lambda_1) and {\rm Pr}(\lambda_3 > \lambda_1), and then write the remaining Q terms as summations:

{\rm Pr} = 1 - {\rm Pr}(\lambda_2 > \lambda_1) - {\rm Pr}(\lambda_3 > \lambda_1) \\ + \int_0^\infty e^{-\beta_2\lambda_1} \left(\sum_{k=0}^{\alpha_2-1} \frac{(\beta_2 \lambda_1)^k}{\Gamma(k+1)} \right) e^{-\beta_3\lambda_1} \left(\sum_{l=0}^{\alpha_3-1} \frac{(\beta_3 \lambda_1)^l}{\Gamma(l+1)} \right) g(\lambda_1) d\lambda_1

Writing out g explicitly and consolidating terms:

{\rm Pr} = 1 - {\rm Pr}(\lambda_2 > \lambda_1) - {\rm Pr}(\lambda_3 > \lambda_1) \\ + \int_0^\infty e^{-\lambda_1(\beta_1+\beta_2+\beta_3)} \sum_{k=0}^{\alpha_2-1} \sum_{l=0}^{\alpha_3-1} \frac{\beta_1^{\alpha_1} \beta_2^k \beta_3^l \lambda_1^{(k+l+\alpha_1-1)}}{\Gamma(k+1)\Gamma(l+1)\Gamma(\alpha_1)} d\lambda_1

Bringing the integral inside the summations:

{\rm Pr} = 1 - {\rm Pr}(\lambda_2 > \lambda_1) - {\rm Pr}(\lambda_3 > \lambda_1) \\ + \sum_{k=0}^{\alpha_2-1} \sum_{l=0}^{\alpha_3-1} \frac{\beta_1^{\alpha_1} \beta_2^k \beta_3^l}{\Gamma(k+1)\Gamma(l+1)\Gamma(\alpha_1)} \int_0^\infty e^{-\lambda_1(\beta_1+\beta_2+\beta_3)} \lambda_1^{(k+l+\alpha_1-1)} d\lambda_1

And evaluating it:

{\rm Pr} = 1 - {\rm Pr}(\lambda_2 > \lambda_1) - {\rm Pr}(\lambda_3 > \lambda_1) \\ + \sum_{k=0}^{\alpha_2-1} \sum_{l=0}^{\alpha_3-1} \frac{\beta_1^{\alpha_1} \beta_2^k \beta_3^l}{\Gamma(k+1)\Gamma(l+1)\Gamma(\alpha_1)} \frac{\Gamma(k+l+\alpha_1)}{(\beta_1+\beta_2+\beta_3)^{(k+l+\alpha_1)}}

Rearranging, we finally have:

\begin{equation} \label{count_abc_pr_alpha_b} {\rm Pr} = 1 - {\rm Pr}(\lambda_2 > \lambda_1) - {\rm Pr}(\lambda_3 > \lambda_1) \\ + \sum_{k=0}^{\alpha_2-1} \sum_{l=0}^{\alpha_3-1} \frac{\beta_1^{\alpha_1} \beta_2^k \beta_3^l}{(\beta_1+\beta_2+\beta_3)^{(k+l+\alpha_1)}} \frac{\Gamma(k+l+\alpha_1)}{\Gamma(k+1)\Gamma(l+1)\Gamma(\alpha_1)} \end{equation}

It also is possible to rewrite the gamma functions in terms of a single, multivariate beta function, but that doesn’t necessarily increase the equation’s clarity.

Implementation

In Julia, leveraging the probability_1_beats_2 function above, we can compute \eqref{count_abc_pr_alpha_b} with:

using SpecialFunctions # for loggamma

function probability_1_beats_2_and_3(α_1, β_1, α_2, β_2, α_3, β_3)
    total = 0.0
    for k = 0:(α_2-1)
        for l = 0:(α_3-1)
            total += exp(α_1 * log(β_1) + k * log(β_2) + l * log(β_3) 
                - (k+l+α_1) * log(β_1 + β_2 + β_3)
                + loggamma(k+l+α_1) - loggamma(k+1) - loggamma(l+1) - loggamma(α_1))
        end
    end
    return (1 - probability_1_beats_2(α_2, β_2, α_1, β_1)
              - probability_1_beats_2(α_3, β_3, α_1, β_1) + total)
end

And that’s a wrap!


Notes

  1. An informative prior can also be used here, with the restriction that \alpha and \beta remain integers.


Changes


You’re reading evanmiller.org, a random collection of math, tech, and musings. If you liked this you might also enjoy:


Get new articles as they’re published, via LinkedIn, Twitter, or RSS.


Want to look for statistical patterns in your MySQL, PostgreSQL, or SQLite database? My desktop statistics software Wizard can help you analyze more data in less time and communicate discoveries visually without spending days struggling with pointless command syntax. Check it out!


Wizard
Statistics the Mac way

Back to Evan Miller’s home pageSubscribe to RSSLinkedInTwitter