Probabilistic programming from scratch

Update 2018-09-09: If you're interested in the topic of this post, you might enjoy the talk I gave at QCon NY in June 2018. That talk combines the pedagogical code in the O'Reilly Orioles discussed below, with aspects of the more production/enterprise-oriented work on probabilistic programming we did at Fast Forward Labs.

O’Reilly just published a series of Orioles (interactive video tutorials) by me about probabilistic programming in Python. You can read the highlights for free. A Safari login is required for the Orioles themselves.

The approach: from scratch, by simulation

My goal in the series is that you understand every line of code. I attempt to ensure this by implementing things from scratch, and choosing a Bayesian inference algorithm that is particularly transparent and non-mathematical.

Posterior comparison

The “from scratch” approach means I don’t import any libraries. This was inspired by Joel Grus’s great book Data Science from Scratch, which uses core data structures (lists, dictionaries, etc.) to implement fairly sophisticated algorithms such as Naive Bayes and feed-forward neural networks.

The series also draws from Jake VanderPlas’s PyCon talk Statistics for Hackers. In that talk, Jake shows that, by taking advantage of the fact that computers can repeat things very quickly, it’s often simplest to simulate your way to the answer to a statistical question. With this approach, you avoid the need to memorize a zoo of frequentist statistical tests of dubious applicability. Peter Norvig took a similar approach in his beautiful notebooks on Probability (A Concrete Introduction and Probability, Paradox and the Reasonable Person Principle).

If you have a non-mathematical programming background, or your calculus is a little rusty, or you’re the kind of person who likes to compose together simple well-understood tools to answer complex questions, you might like this approach.

Approximate Bayesian Computation

In real world problems, with lots of data, and lots of unknown parameters, and hierarchical structure, inference problems are often solved with industrial strength probabilistic programming systems such as PyMC3 or Stan. These systems embed well-tested and extremely fast implementations of algorithms such as Hamiltonian Monte Carlo with the No U-Turn Sampler, or Automatic Differentiation Variational Inference. These algorithms are great, but they require a deep mathematical background to truly understand.

Instead I use Approximate Bayesian Computation. ABC may note be the fastest, but it’s not wrong, and if you can wait long enough, it always works. And crucially, given my goal that you understand every line of code, it’s a great example of the “simulate your way to the answer” approach that relies on the computer’s ability to repeat calculations.

Here’s the entire Approximate Bayesian Computation algorithm:

  1. Draw a sample from the prior
  2. Simulate the process you observed, assuming the sample from the prior is the correct value
  3. If the outcome of simulation matches the data you observed, the sample from prior is now a sample from posterior
  4. Repeat 1-3 until you have enough samples from the posterior to answer a concrete statistical question.

In Python, that looks like this:

def posterior_sampler(data, prior_sampler, simulate):
    for p in prior_sampler:
        if simulate(p) == data:
            yield p

It’s up to the user of this function to write prior_sampler (a generator that yields samples from the prior) and simulate (a function that simulates the process you observed).

But otherwise, that’s it. Short and sweet!

If you don’t know what the prior or posterior are, read the highlights article on the O’Reilly website.

And if you have a Safari login and want to see this algorithm used to solve real problems (including online learning), the connection with Bayes’s Theorem made more explicit, and comparison to PyMC3, check out the Orioles.