## Simple pattern formation with cellular automata

A cellular automaton is a dynamical system where space, time and dynamic variable are all discrete. The system is thus composed of a lattice of cells (discrete space), each described by a state (discrete dynamic variable) which evolve into the next time step (discrete time) according to a dynamic rule.

x_i^{t+1} = f(x_i^t, \Omega_i^t, \xi)

This rule generally depends on the state of the target cell $x_i^t$, the state of its neighbors $\Omega_i^t$, and a number of auxiliary external variables $\xi$. Since all these inputs are discrete, we can enumerate them and then define the dynamic rule by a transition table. The transition table maps each possible input to the next state for the cell. As an example consider the elementary 1D cellular automaton. In this case the neighborhood consists of only the 2 nearest neighbors $\Omega_i^t = \{x_{i-1}^t, x_{i+1}^t\}$ and no external variables.

In general, there are two types of neighborhoods, commonly classified as Moore or Von Neumann. A Moore neighborhood of radius $r$ corresponds to all cells within a hypercube of size $r$ centered at the current cell. In 2D we can write it as $\Omega_{ij}^t = \{x^t_{kl}:|i-k|\leq r \wedge |j-l|\leq r\}\setminus x^t_{ij}$. The Von Neumann neighborhood is more restrictive: only cells within a manhattan distance of $r$ belong to the neighborhood. In 2D we write $\Omega_{ij}^t = \{x^t_{kl}:|i-l|+|j-k| \leq r\}\setminus x^t_{ij}$.

Finally it is worth elucidating the concept of totalistic automata. In high dimensional spaces, the number of possible configurations of the neighborhood $\Omega$ can be quite large. As a simplification, we may consider instead as an input to the transition table the sum of all neighbors in a specific state $N_k = \sum_{x \in \Omega}\delta(x = k)$. If there are only 2 states, we need only consider $N_1$, since $N_0 = r – N_1$. For an arbitrary number $m$ of states, we will obviously need to consider $m-1$ such inputs to fully characterize the neighborhood. Even then, each input $N_k$ can take $r+1$ different values, which might be too much. In such cases we may consider only the case when $N_k$ is above some threshold. Then we can define as an input the boolean variable

P_{k,T}=\begin{cases}
1& \text{if $N_k \geq T$},\\
0& \text{if $N_k < T$}.
\end{cases}

In the simulation you can find here, I considered a cellular automaton with the following properties: number of states $m=2$; moore neighborhood with radius $r=1$; lattice size $L_x \times L_y$; and 3 inputs for the transition table:

• Current state $x_{ij}^t$
• Neighborhood state $P_{1,T}$ with $T$ unspecified
• One external input $\xi$
\xi_{ij}=\begin{cases}
1& \text{if $i \geq L_x/2$},\\
0& \text{if $i < L_x/2$}.
\end{cases}
• Initial condition $x_{ij} = 0 \; \forall_{ij}$

For these conditions a deterministic simulation of these conditions yields only a few steady states: homogeneous 1 or 0, half the lattice 1 and the other 0, and oscillation between a combination of the previous.

One possibility would be to add noise to the cellular automaton in order to provide more interesting dynamics. There are two ways to add noise to a cellular automaton:

The most straightforward way is to perform the following procedure at each time step:

• Apply the deterministic dynamics to the whole lattice
• For each lattice site $ij$, invert the state $x_{ij}$ with probability $p$

This procedure only works of course for $m=2$. In the case of more states there is no obvious way to generalize the procedure and we need to use a proper monte carlo method to get the dynamics.

A second way is to implement a probabilistic cellular automaton. In this case the transition table is generalized to a markov matrix: each input is now mapped not to a specific state but rather to a set of probabilities for a transition to each state ($m$ probabilities). Naturally for each input these sum to one. In this case we have $m$ times more parameters than before.

## Solving stochastic differential equations with theano

In systems biology we often need to solve diffusion equations of the type $$df = f(x,t) dt + g(x,t)dW$$ where W is a white noise process; they’re the most common example of a stochastic differential equation (SDE). There are only very few cases for which we can analytically solve this equation, such as when either f or g are constant or just depend linearly on x. Of course most interesting cases involve complicated f and g functions, so we need to solve them numerically.

One way to solve this is to use a straightforward variant of the Euler method. You just need to do the usual time discretization $\delta t = T/N$ (with T the total time and N the number of steps) and then update your current value with the deterministic contribution $\delta t \times f$ and the stochastic contribution $\sqrt{\delta t} \times g$. If you are not familiar with stochastic calculus you may be wondering why is the time step multiplier $\sqrt{\delta t}$ for the stochastic part.

## Simulating tissues with pressure

One small project I did was to code up a simulation of a growing tissue which feels pressure and where each cell has a dynamic state which depends on its neighbors and the pressure it feels. The idea is to reproduce some essential properties of morphogenesis. You can look at the code here. I am going to talk about the most interesting parts of the code.

I initialize the cells in an ordered lattice, with random perturbations in their positions, except those which are in the borders (bottom, left, right). Those are static and do not evolve in the simulation like the others. They are there just to represent the pressure from the rest of the body (huge) on the simulated tissue (tiny). This is not a very realistic assumption because many developmental systems have a size of the order of the body size, but we have to start somewhere!

All cells are connected with springs, which simulate adhesive and pressure forces in the tissue. If left alone, the system relaxes into a hexagonal configuration, since this minimises the spring potential energy. I integrate the harmonic oscillator equations using a fourth order Adams Moulton algorithm.

Now, it is important to realize that there are two time scales in the system: the pressure equilibration and cell lifetimes. We can assume the mechanical pressure equilibrates very fast, while cell divisions take their time. So what we do is run the oscillator system until equlibrium for each time step of the cellular state evolution, which we will talk about later.

## Modeling excitable media with cellular automata

While researching for my seminar I came across a class of cellular automata which models spiral waves in excitable media. Because these models are so simple I had some fun implementing them in processing. Processing is great because you can use the javascript version to embed the visualization in a webpage directly and everyone can play with it. Here are some of the models I played around with:

• Spiral, a model developed by Gerhard and Schuster to simulate a chemical reaction.
• CCA, a simple cyclic cellular automaton.
• StochCCA, where I added stochasticity to the previous model. This is useful to assign more weight to the 4 neighbors of the Von Neumann neighborhood than to the remaining 4 which complete the Moore neighborhood. This appears to make space “more isotropic” and makes the waves actually circular.

## Neuromorphic computing

Lately I’ve been giving some thought on quantitative measures for the brain’s processing power. Let us take the number of brain neurons as $10^{11}$ and synapses as $10^{14}$, according to wikipedia. I am going to assume a very simplified computational model for a neuron:

$$y_i = \phi\left( \sum_j w_{ij} x_j \right)$$

With $\phi$ an activation function such as a step function or a sigmoidal function. So every time a neuron spikes, a computation is being performed. From our previous numbers, the sum in the $j$ has around $10^3$ components on average, which means there are one thousand additions and one thousand multiplications performed per spike! Let’s consider each of these a floating point operation on a computer. Then we have 2000 flops (I will call flops the plural of a flop and flop/s flops per second, because the terminology is really confusing) per spike. Let us assume an action potential (a spike) for a neuron lasting 1ms as an upper bound. Then a neuron can spike up to 1000 times per second. Thus we have $1000*10^{11}*2000=2*10^{17}$ flop/s, or 200 petaflop/s.

That is only an order of magnitude higher than the fastest supercomputers at the time of writing. It seems at least a first approximation to the brain would be achievable in realtime in the next few years correct? Not with the current architectures. An obvious oversight such a simplistic calculation makes is the concurrency of the calculation. The 2000 calculations for a given spike are performed simultaneously in a neuron, whereas a traditional von neumann architecture would need to do these calculations in steps: first perform all the multiplications, then sum all the results to a single value. Even a massively parallel architecture would need one clock cycle for the multiplications, and 10 clock cycles to integrate all values (by summing them in pairs you need $log_2 (1000)$ clock cycles). The number of flop/s is the same, but you need to run your machine 11 times faster, which destroys power efficiency (you don’t want your brain consuming megawatts of power).

An even greater problem is the memory bandwidth. At each clock cycle, you need to move $10^{11}$ numbers to your computational cores, compute their new values and move them back. If each is a double we are in the order of 800GB/s each way just for the main operation (i.e. not counting any temporary variables for the sum reduction above discussed), which does seem out of reach of current supercomputers, and also not very power efficient.

The brain is not affected by these problems since the memory is an integral part of the computational infrastructure. In fact synaptic weights and connections are both the software and the memory of the brain’s architecture. Of course, the way information is encoded is not very well understood, and there are likely many mechanisms to do so. The wikipedia page on neural coding is quite interesting. In any case it is clear that synaptic weights are not enough to fully describe the brain’s architecture, as glial cells might also play a role in memory formation and neuronal connectivity. However, spike timing dependent plasticity (STDP, associated with Hebb’s rule) seems to be an adequate coarse grained description of how synapse weights are determined (at the time of writing).

With this in mind, memristors seem to be an appropriate functional equivalent to our coarse grained description of the brain. In a memristor, resistance is proportional to the intensity of the current that flows through it. Thus you can engineer a system where connections which are often used are reinforced. By combining memristive units with transistors, it is in principle possible to create an integrate and fire unit similar to a neuron. A device to emulate STDP could also be implemented. The biggest hurdle seems to be connection density. In a planar implementation, only 4 nearest neighbor connections can be implemented straightforwardly. To reach an order of 1000 connections (not necessarily with nearest neighbors) per unit, 3D structures will need to be used. At the current time however there are no promising techniques to enable the reliable construction of such structures. I foresee that self assembly will play a large role in this field, again taking heavy inspiration from the way nature does things.

In spite of these hurdles, I am excited. With progress in science getting harder each year, the only way to continue to discover nature’s secrets will be to enhance our cognitive capabilities be it through biological or electrical engineering.

## Simulating networks of nonlinear stochastic systems

arXiv:1209.3700

In this paper we attempt to find a computationally efficient way to numerically simulate networks with nonlinear stochastic dynamics. With this I mean a continuous dynamical model where the differential equation for each variable depends nonlinearly on some or all variables of the system and has additive noise. If $x$ is a vector with all variables and $\eta$ is a random vector of the same size as $x$ with some unspecified distribution, the dynamics can be compactly described as $$\frac{d x}{dt}=f(x,t)+\eta$$

The challenge lies in the nonlinearity combined with stochasticity. Were only one of them to be present, the problem would be simple. A deterministic nonlinear problem can be straightforwardly be integrated with an ODE package, while a linear stochastic system can be reduced to a system of ODEs for the moments of the probability distribution function (PDF). A full solution would require a Monte Carlo algorithm to simulate a sufficient number of paths to allow us to estimate the PDF of $x$ at each time point. For networks with many nodes we are haunted by the curse of dimensionality, as the volume needed to be sampled increases exponentially and so do the number of simulated paths required to get a good approximation of the distribution at later time points. In systems where there is a well defined mode around which most of the probability mass is concentrated we should be able to derive an analytic approximation which is more tractable. This is exactly what we try to do in the paper. Continue reading “Simulating networks of nonlinear stochastic systems”

## Information processing systems post-mortem

Slides from the talk

Yesterday I gave an informal talk about information processing systems and lessons learned from the fields of AI and biology. This was a mix of introductory information theory and some philosophical ramblings.

While creating this talk I took the time to review several concepts from machine learning and AI. In Jaynes’ book about probability theory, bayesian inference is presented as a completely general system for logic under uncertainty. The gist of the argument is that an inference system which obeys certain internal consistency requirements must use probability theory as a formal framework. A hypothetical information processing system should obey such consistency requirements when assigning levels of plausibility to all pieces of information, which means its workings should be built upon probability theory. As a bonus, all the theory is developed, so we need only apply it!

To implement such a system we make a connection with biology. I started by arguing that an organism which wants to maximise its long term population growth must be efficient at decoding environmental inputs and responding to them. Thus if we define long term viability of an organism implementing a given information processing system as a finess function, we can obtain good implementations of our system by maximising such a function.