How to do inverse transformation sampling in scipy and numpy

Let’s say you have some data which follows a certain probability distribution. You can create a histogram and visualize the probability distribution, but now you want to sample from it. How do you go about doing this with python?

gaussian mixture
gaussian mixture

The short answer:

The long answer:

You do inverse transform sampling, which is just a method to rescale a uniform random variable to have the probability distribution we want. The idea is that the cumulative distribution function for the histogram you have maps the random variable’s space of possible values to the region [0,1]. If you invert it, you can sample uniform random numbers and transform them to your target distribution!

How the inverse CDF looks for the the above gaussian mixture
How the inverse CDF looks for the the above gaussian mixture

To implement this, we calculate the CDF for each bin in the histogram (red points above) and interpolate it using scipy’s interpolate functions. Then we just need to sample uniform random points and pass them through the inverse CDF! Here is how it looks:

New samples in red, original in blue
New samples in red, original in blue

 

A lovely new minesweeper on android I made

Today I am finally releasing my little minesweeper for android! I’ve been working on this as a hobby for the past few weekends, and now it is finally smooth enough to let other people see it! The problem with most minesweeper applications in the market is that they are either really ugly or haven’t really figured out how to adapt the original mouse controls to a touchscreen. I set out to solve these two problems so I can play some mines on my phone!

To solve the ugliness problem, I drew some tiles in photoshop in a very minimal style, to disturb the eyes as little as possible and let you focus on the game. Here is how it turned out.

mines on the nexus 7
mines on the nexus 7 (click to open full res; kind of depressing how a screenshot from a tiny tablet doesn’t fit my 15′ macbook pro’s screen at 100% resolution)

To navigate the board, you can use the normal multitouch gestures like pan and pinch to zoom. To place a flag, you can long press a tile or you can double tap an open tile and drag to a closed tile (these gestures won’t let you win speed competitions, but they’re pretty good if you’re lazily solving the board)

drag from an open tile...
drag from an open tile…
and put a flag
and put a flag

You also get some pretty sweet statistics when you win or lose!

boom!
boom!

 

 

Download it here!

Gamma distribution approximation to the negative binomial distribution

In a recent data analysis project I was fitting a negative binomial distribution to some data when I realized that the gamma distribution was an equally good fit. And with equally good I mean the MLE fits were numerically indistinguishable. This intrigued me. In the internet I could find only a cryptic sentence on wikipedia saying the negative binomial is a discrete analog to the gamma and a paper talking about bounds on how closely the negative binomial approximates the gamma, but nobody really explains why this is the case. So here is a quick physicist’s derivation of the limit for large k.

Continue reading “Gamma distribution approximation to the negative binomial distribution”