Dimensionality reduction 101: linear algebra, hidden variables and generative models

Suppose you are faced with a high dimensional dataset and want to find some structure in the data: often there are only a few causes, but lots of different data points are generated due to noise corruption. How can we infer these causes? Here I’m going to cover the simplest method to do this inference: we will assume the data is generated by a linear transformation of the hidden causes. In this case, it is quite simple to recover the parameters of this transformation and therefore determine the hidden (or latent) variables which represent their cause.

Continue reading “Dimensionality reduction 101: linear algebra, hidden variables and generative models”

Quick introduction to gaussian mixture models with python

Usually we like to model probability distributions with gaussian distributions. Not only are they the maximum entropy distributions if we only know the mean and variance of a dataset, the central limit theorem guarantees that random variables which are the result of summing many different random variables will be gaussian distributed too. But what to do when we have multimodal distributions like this one?

A gaussian mixture with 2 dimensional features
A gaussian mixture with 2 dimensional features

A gaussian distribution would not represent this very well. So what’s the next best thing? Add another gaussian! A gaussian mixture model is defined by a sum of gaussians $$P(x)=\sum_i w_i \, \mathcal{G}(\mu_i, \Sigma_i)$$ with means $\mu$ and covariance matrices $\Sigma$. Continue reading “Quick introduction to gaussian mixture models with python”

Automatic segmentation of microscopy images

A few months back I was posed the problem of automatically segmenting brightfield images of bacteria such as this:

Brightfield image of a bacterial colony
Brightfield image of a bacterial colony

I thought this was a really simple problem so I started applying some filters to the image and playing with morphology operations. You can isolate dark spots in the image by applying a threshold to each pixel. The resulting binary image can be modified by using the different morphological operators, and hopefully identifying each individual cell. Turns out there is a reason people stopped using these methods in the 90s and the reason is they don’t really work. If the cells are close enough, there won’t be a great enough difference in brightness to separate the two particles and they will remain stuck.

Continue reading “Automatic segmentation of microscopy images”

Implementing a recurrent neural network in python

In one of my recent projects I was interested in learning a regression for a quite complicated data set (I will detail the model in a later post, for now suffice to say it is a high dimensional time series). The goal is to have a model which given an input time series and an initial condition is able to predict the output at subsequent times. One good tool to tackle this problem is the recurrent neural network. Let’s look at how it works and how to implement it easily in python using the excellent theano library.

A simple feed forward neural network consists of several layers of neurons: units which sum up the input from the previous layer and a constant bias and pass it through a nonlinear function (usually a sigmoid). Neural networks of this kind are known to be universal function approximators (i.e. for an arbitrary number of layers and/or neurons you can approximate any function sufficiently well). This means that when you don’t have an explicit probabilistic model for your data but just want to find a nonparametric model for the input output relation a neural network is (in theory, not necessarily in practice) a great choice. Continue reading “Implementing a recurrent neural network in python”

Kernel density estimation

Sometimes you need to estimate a probability distribution from a set of discrete points. You could build a histogram of the measurements, but that provides little information about the regions in phase space with no measurements (it is very likely you won’t have enough points to span the whole phase space). So the data set must be smoothed, as we did with time series. As in that case, we can describe the smoothing by a convolution with a kernel. In this case, the formula is very simple $$f(x)=\frac{1}{N}\sum_i^N K(x-x_i)$$

The choice of K is an art, but the standard choice is the gaussian kernel as we’ve seen before. Let’s try this out on some simulated data

 

Gaussian mixture we will sample from
Gaussian mixture we will sample from

And now let’s apply KDE to some sampled data.

 

100 samples, sigma 1
100 samples, sigma 1
1000 samples, sigma 1
1000 samples, sigma 1

The choice of standard deviation makes a big difference in the final result. For low amounts of data we want a reasonably high sigma, to smoothen out the large variations in the data. But if we have a lot of points, a lower sigma will more faithfully represent the original distribution:

1000 samples, sigma 0.1
1000 samples, sigma 0.1

 

 

Information processing systems post-mortem

Slides from the talk

Yesterday I gave an informal talk about information processing systems and lessons learned from the fields of AI and biology. This was a mix of introductory information theory and some philosophical ramblings.

While creating this talk I took the time to review several concepts from machine learning and AI. In Jaynes’ book about probability theory, bayesian inference is presented as a completely general system for logic under uncertainty. The gist of the argument is that an inference system which obeys certain internal consistency requirements must use probability theory as a formal framework. A hypothetical information processing system should obey such consistency requirements when assigning levels of plausibility to all pieces of information, which means its workings should be built upon probability theory. As a bonus, all the theory is developed, so we need only apply it!

To implement such a system we make a connection with biology. I started by arguing that an organism which wants to maximise its long term population growth must be efficient at decoding environmental inputs and responding to them. Thus if we define long term viability of an organism implementing a given information processing system as a finess function, we can obtain good implementations of our system by maximising such a function.

Continue reading “Information processing systems post-mortem”