## Parallel programming with opencl and python

In the next few posts I’ll cover my experiences with learning how to program efficient parallel programs on gpus using opencl. Because the machine I got was a mac pro with the top of the line gpus (7 teraflops) I needed to use opencl, which is a bit complex and confusing at first glance. It also requires a lot of boilerplate code which makes it really hard to just jump in and start experimenting. I ultimately decided to use pyopencl, which allows us to do the boring boilerplate stuff in just a few lines of python and focus on the actual parallel programs (the kernels).

First, a few pointers on what I read. A great introduction to the abstract concepts of parallel programming is the udacity course Introduction to parallel programming. They use C and CUDA to illustrate the concepts, which means you can’t directly apply what you see there on a computer with a non nvidia gpu. To learn the opencl api itself, I used the book OpenCL in Action: How to Accelerate Graphics and Computation. As for pyopencl, the documentation is a great place to start. You can also find all the python code I used in github. Continue reading “Parallel programming with opencl and python”

## Automatic segmentation of microscopy images

A few months back I was posed the problem of automatically segmenting brightfield images of bacteria such as this:

I thought this was a really simple problem so I started applying some filters to the image and playing with morphology operations. You can isolate dark spots in the image by applying a threshold to each pixel. The resulting binary image can be modified by using the different morphological operators, and hopefully identifying each individual cell. Turns out there is a reason people stopped using these methods in the 90s and the reason is they don’t really work. If the cells are close enough, there won’t be a great enough difference in brightness to separate the two particles and they will remain stuck.

## An introduction to smoothing time series in python. Part III: Kalman Filter

If we have a mathematical model for the system in study, we can use that information to dramatically improve the quality of our prediction. Like in the previous filtering methods, we are taking advantage of the fact that to estimate the system state at some time $t$ we can use not only the information available at that time but also the information from the past (and the future if smoothing). But model free methods tend to want to reduce large deviations from one point in time to the next, while we may actually expect that at some specific time point the system does jump drastically in value. Adding in the model allows us to take that into account.

Let’s start by looking at the Kalman Filter, which is the optimal estimator for linear and gaussian systems. Let us define such a system first in the discrete case:

$$x_{n+1} = Ax_{n}+\xi \\ y_{n+1} = Bx_{n+1}+\zeta$$

The stochastic process in $x$ is the underlying process we want to follow. Not only is the process in $x$ a brownian process (additive white noise denoted by $\xi$), we are unable to observe it directly. The observation is denoted by $y$ and is a function of $x$ corrupted by (again) additive white noise $\zeta$. The gaussian assumption is often a reasonable approximation to the problem’s noise statistics because the timescale of whichever microscopic process produces randomness is usually much smaller than the one of the actual dynamics, allowing the central limit theorem to kick in.

## Using Beautiful Soup to convert Springpad notes to Evernote

A few months back I decided to migrate all my work notes from springpad to evernote because I found evernote more robust and simpler. I still keep my recipe collection on springpad though! Looks yummy.

Anyway, surprisingly there were some scripts going from evernote to springpad but not the other way around, which is a bit suprising because both services use a sort of HTML notation to export their notes so it’s pretty simple to convert notes from one format to another. So I used the nice python library Beautiful Soup to parse the HTML and convert it to the other format.

With evernote it’s a bit tricky to get it to accept everything because any noncompliant HTML entity throws off the whole process, but after some trial and error I managed to fix it. As an aside, if you want a bit more power when editing your evernote notes, this service lets you directly edit the HTML of any note. It comes in pretty handy when you clip something from the web and it comes nested in some crazy divs. I hope evernote’s HTML format will not change too much in the near future and this script will stay helpful. In any case the header contains the evernote client version at the time, so I hope even if it changes they will recognize/honor the old version. Find the code after the jump.

## Neuromorphic computing

Lately I’ve been giving some thought on quantitative measures for the brain’s processing power. Let us take the number of brain neurons as $10^{11}$ and synapses as $10^{14}$, according to wikipedia. I am going to assume a very simplified computational model for a neuron:

$$y_i = \phi\left( \sum_j w_{ij} x_j \right)$$

With $\phi$ an activation function such as a step function or a sigmoidal function. So every time a neuron spikes, a computation is being performed. From our previous numbers, the sum in the $j$ has around $10^3$ components on average, which means there are one thousand additions and one thousand multiplications performed per spike! Let’s consider each of these a floating point operation on a computer. Then we have 2000 flops (I will call flops the plural of a flop and flop/s flops per second, because the terminology is really confusing) per spike. Let us assume an action potential (a spike) for a neuron lasting 1ms as an upper bound. Then a neuron can spike up to 1000 times per second. Thus we have $1000*10^{11}*2000=2*10^{17}$ flop/s, or 200 petaflop/s.

That is only an order of magnitude higher than the fastest supercomputers at the time of writing. It seems at least a first approximation to the brain would be achievable in realtime in the next few years correct? Not with the current architectures. An obvious oversight such a simplistic calculation makes is the concurrency of the calculation. The 2000 calculations for a given spike are performed simultaneously in a neuron, whereas a traditional von neumann architecture would need to do these calculations in steps: first perform all the multiplications, then sum all the results to a single value. Even a massively parallel architecture would need one clock cycle for the multiplications, and 10 clock cycles to integrate all values (by summing them in pairs you need $log_2 (1000)$ clock cycles). The number of flop/s is the same, but you need to run your machine 11 times faster, which destroys power efficiency (you don’t want your brain consuming megawatts of power).

An even greater problem is the memory bandwidth. At each clock cycle, you need to move $10^{11}$ numbers to your computational cores, compute their new values and move them back. If each is a double we are in the order of 800GB/s each way just for the main operation (i.e. not counting any temporary variables for the sum reduction above discussed), which does seem out of reach of current supercomputers, and also not very power efficient.

The brain is not affected by these problems since the memory is an integral part of the computational infrastructure. In fact synaptic weights and connections are both the software and the memory of the brain’s architecture. Of course, the way information is encoded is not very well understood, and there are likely many mechanisms to do so. The wikipedia page on neural coding is quite interesting. In any case it is clear that synaptic weights are not enough to fully describe the brain’s architecture, as glial cells might also play a role in memory formation and neuronal connectivity. However, spike timing dependent plasticity (STDP, associated with Hebb’s rule) seems to be an adequate coarse grained description of how synapse weights are determined (at the time of writing).

With this in mind, memristors seem to be an appropriate functional equivalent to our coarse grained description of the brain. In a memristor, resistance is proportional to the intensity of the current that flows through it. Thus you can engineer a system where connections which are often used are reinforced. By combining memristive units with transistors, it is in principle possible to create an integrate and fire unit similar to a neuron. A device to emulate STDP could also be implemented. The biggest hurdle seems to be connection density. In a planar implementation, only 4 nearest neighbor connections can be implemented straightforwardly. To reach an order of 1000 connections (not necessarily with nearest neighbors) per unit, 3D structures will need to be used. At the current time however there are no promising techniques to enable the reliable construction of such structures. I foresee that self assembly will play a large role in this field, again taking heavy inspiration from the way nature does things.

In spite of these hurdles, I am excited. With progress in science getting harder each year, the only way to continue to discover nature’s secrets will be to enhance our cognitive capabilities be it through biological or electrical engineering.