Is the Central Limit Theorem an engine for biological stability?

Biological information systems, like any others, struggle constantly with randomness. Our bodies are precision instruments to measure very many things at the same time – light, vibrations, gas pressures, concentrations of salts and hormones, to mention a few. Any of these measurements can be thought of as a sample. Now, randomness can cause the sample to lie quite far off from the actual measure. A possible solution is to resample the sample! This is not intuitive, and I will explain it below. Perhaps this is the reason why many signalling pathways in biology have so many links in the chain from receiver to effector!

The Central Limit Theorem states that if you draw a sample from a population and calculate the mean of the sample, and then repeat it several times, the means will form a normal distribution around the true mean of the original population. This means that even if the original population has a wild distribution, repeated samples of the population come closer and closer to the true mean.

Take a look at this example to see what it means:

Image from Wikipedia

Here, the original distribution is on the top left – highly irregular. But if we take samples of two numbers at a time from this distribution and plot their means, we end up with the distribution on the top right – already a great step towards normality! With three and four in the sample, we get the bottom left and bottom right, respectively.

Nearly all cell surface receptors signal through a pathway of messenger molecules. Not just one, but a whole cascade. The traditional explanation for this phenomenon is that the signal can be easily amplified in this way. But perhaps the real driver is the stability of the readout that can be gained. There are similar organisational features in other places too, for example in the transmission of visual information from the retina. The signals pass through a few serially arranged neurons on their way to the visual cortex. Perhaps this is what prevents our field of view from flickering? (The rods are exquititely sensitive and can detect a single photon.)

Perhaps I should write this up and submit it to the journal of Medical Hypotheses? (This is one of the few scientific journals that require no proof whatsoever, and as a result the journal contains everything from well-supported testable hypotheses to completely far-out ideas, such as the benefits of masturbation against nasal congestion.)

This entry was posted on Wednesday, March 25th, 2009 at 1:26 pm and is filed under Information theory, Life, Science. You can follow any responses to this entry through the RSS 2.0 feed.
You can leave a response, or trackback from your own site.

4 Responses to Is the Central Limit Theorem an engine for biological stability?

This is a very cool idea, but note that the central limit theorem only applies when underlying distributions have finite moments (and a couple of other regularity conditions). This practically means that it doesn’t apply to samples from a Cauchy or fat-tailed distribution which lack a variance, skewness, and kurtosis—and that bad things happen if you force a finite-moment assumption on such samples (you get wild errors sooner or later).

It is essential to keep track of such regularity conditions when moving from abstract statistics of textbooks to real systems. What reasons can we produce to show that these conditions are met in the biological systems you mention?

Interesting! Thanks for the well-informed comment!

I must admit that the mathematics of finite moments is something I don’t fully understand.

However, to a humble medical scientist like myself, it seems that the biological systems in question are in every step based on counting discrete numbers of molecules, so to speak, and therefore have to yield distributions with a defined mean, variance, and so on.

It perplexes me that I can’t find anything on PubMed or Google Scholar on this topic. Perhaps I haven’t found the right search terms. Will keep trying, and the results (if any) will appear on the blog!

Oops, I asked to be emailed about follow-ups but never got one about your comment (maybe spam filter), sorry for the delay.

It’s a strong possibility that the biological systems you’re interested in do follow the law of large numbers and the central limit theorem. I raise the issue only because in my narrow experience with biological systems, rich randomness can arise in surprising places, e.g., multi-fractal heartbeat, which might not submit itself to central limit theorem.

I look forward to learning more about the possibilities for biological systems implementing resampling techniques in wetware.

just wanted to add that one should distinguish between the probability distribution p_i of every single realization (sample=measurement) and the probability distribution p_sum of the sum of the samples. it is the latter one, which converges to the Gauss distribution.

so what biological system need to do in order to make use of the central limit theorem, i.e., estimate mean input value (measurement like temperature, pressure, toxic stimuli…) and possibly variance for given p_i, is take into account the number of measurements. this might be realized through the structure of the system itself. as far as i’m aware of, robustness against noise, fluctuations, wrong measurements… in any biological information system is ensured by multiple feedback mechanisms on different timescales.

This is a very cool idea, but note that the central limit theorem only applies when underlying distributions have finite moments (and a couple of other regularity conditions). This practically means that it doesn’t apply to samples from a Cauchy or fat-tailed distribution which lack a variance, skewness, and kurtosis—and that bad things happen if you force a finite-moment assumption on such samples (you get wild errors sooner or later).

It is essential to keep track of such regularity conditions when moving from abstract statistics of textbooks to real systems. What reasons can we produce to show that these conditions are met in the biological systems you mention?

Interesting! Thanks for the well-informed comment!

I must admit that the mathematics of finite moments is something I don’t fully understand.

However, to a humble medical scientist like myself, it seems that the biological systems in question are in every step based on counting discrete numbers of molecules, so to speak, and therefore have to yield distributions with a defined mean, variance, and so on.

It perplexes me that I can’t find anything on PubMed or Google Scholar on this topic. Perhaps I haven’t found the right search terms. Will keep trying, and the results (if any) will appear on the blog!

Oops, I asked to be emailed about follow-ups but never got one about your comment (maybe spam filter), sorry for the delay.

It’s a strong possibility that the biological systems you’re interested in do follow the law of large numbers and the central limit theorem. I raise the issue only because in my narrow experience with biological systems, rich randomness can arise in surprising places, e.g., multi-fractal heartbeat, which might not submit itself to central limit theorem.

I look forward to learning more about the possibilities for biological systems implementing resampling techniques in wetware.

just wanted to add that one should distinguish between the probability distribution p_i of every single realization (sample=measurement) and the probability distribution p_sum of the sum of the samples. it is the latter one, which converges to the Gauss distribution.

so what biological system need to do in order to make use of the central limit theorem, i.e., estimate mean input value (measurement like temperature, pressure, toxic stimuli…) and possibly variance for given p_i, is take into account the number of measurements. this might be realized through the structure of the system itself. as far as i’m aware of, robustness against noise, fluctuations, wrong measurements… in any biological information system is ensured by multiple feedback mechanisms on different timescales.