
Photo by Brandon Lopez on Unsplash
Today I’m looking at the free energy principle (FEP) by the British neuroscientist Karl Friston. The FEP basically states that to resist the natural tendency to disorder, adaptive agents must minimize surprise. This has implications for the gemba, as you’ll see.
ADVERTISEMENT |
A good example to explain this is to say, “Successful fish typically find themselves surrounded by water, and very atypically find themselves out of water, since being out of water for an extended time will lead to a breakdown of homoeostatic (autopoietic) relations.”1
Here, the free energy refers to an information-theoretic construct:
“Because the distribution of ‘surprising’ events is in general unknown and unknowable, organisms must instead minimize a tractable proxy, which according to the FEP turns out to be ‘free energy.’ Free energy in this context is an information-theoretic construct that (i) provides an upper bound on the extent to which sensory data is atypical (‘surprising’); and (ii) can be evaluated by an organism, because it depends eventually only on sensory input and an internal model of the environmental causes of sensory input.”1
In FEP, our brains are viewed as predictive engines, or Bayesian inference engines. This idea is built on predictive coding or processing that goes back to the German physician and physicist Hermann von Helmholtz from the 1800s. The main idea is that we have a hierarchical structure in our brain that tries to predict what’s going to happen based on the previous sensory data received.
As philosopher Andy Clarke explains, our brain isn’t a cognitive couch potato waiting for sensory input to make sense of what’s going on. It’s actively predicting what’s going to happen next. This is why minimizing the surprise is important.
For example, when lifting a closed container, we predict that it’s going to have a certain weight based on our previous experiences and the container’s visual signal. We’re surprised if the container is light in weight and can be lifted easily. We have similar experiences when we miss a step on the staircase. From a mathematical standpoint, we can say that when our internal model matches the sensory input, we’re not surprised.
This refers to the KL divergence in information theory. The lower the divergence, the better the fit between the model and the sensory input, and the lower the surprise. The hierarchical model is top down. The prediction flows top down, while the sensory data flow bottom up. If the model matches the sensory data, then nothing goes up the chain. However, when there’s a significant difference between the top-down prediction and the bottom-up, incoming sensory data, the difference is raised up the chain.
One of my favorite examples to explain this further is to imagine that you’re in the shower with your radio playing. You can faintly hear the radio in the shower. When your favorite song plays on the radio, you feel like you can hear it better than when an unfamiliar song is played. This is because your brain is able to better predict what is going to happen, and the prediction helps smooth out the incoming auditory signals.
British neuroscientist Anil Seth has a great quote regarding the predictive processing idea, “perception is controlled hallucination.”
Clarke explains this further:
“Perception itself is a kind of controlled hallucination.... [T]he sensory information here acts as feedback on your expectations. It allows you to often correct them and to refine them.
“... [T]o perceive the world is to successfully predict our own sensory states. The brain uses stored knowledge about the structure of the world and the probabilities of one state or event following another to generate a prediction of what the current state is likely to be, given the previous one and this body of knowledge. Mismatches between the prediction and the received signal generate error signals that nuance the prediction or (in more extreme cases) drive learning and plasticity.
“Predictive coding models suggest that what emerges first is the general gist (including the general affective feel) of the scene, with the details becoming progressively filled in as the brain uses that larger context—time and task allowing—to generate finer and finer predictions of detail. There is a very real sense in which we properly perceive the forest before the trees.
“What we perceive (or think we perceive) is heavily determined by what we know, and what we know (or think we know) is constantly conditioned on what we perceive (or think we perceive).
“... [T]he task of the perceiving brain is to account for (to accommodate or ‘explain away’) the incoming or ‘driving’ sensory signal by means of a matching top-down prediction. The better the match, the less prediction error then propagates up the hierarchy. The higher-level guesses are thus acting as priors for the lower-level processing, in the fashion (as remarked earlier) of so-called ‘empirical Bayes.’”
The question of what happens when the prediction doesn’t match is best explained by Friston:
“The free energy considered here represents a bound on the surprise inherent in any exchange with the environment, under expectations encoded by its state or configuration. A system can minimize free energy by changing its configuration to change the way it samples the environment, or to change its expectations. These changes correspond to action and perception, respectively, and lead to an adaptive exchange with the environment that is characteristic of biological systems. This treatment implies that the system’s state and structure encode an implicit and probabilistic model of the environment.”
Our brains are continuously sampling the data coming in and making predictions. When there’s a mismatch between the prediction and the data, we have three options:
1. Update our model to match the incoming data.
2. Attempt to change the environment so that the model matches the environment. (Try resampling the data coming in.)
3. Ignore and do nothing.
Option 3 won't always yield positive results. Option 1 is a learning process in which we’re updating our internal models based on the new evidence. Option 2 shows strong confidence in our internal model, and that we’re able to change the environment. Or perhaps there’s something wrong with the incoming data, and we must get more data to proceed.
The ideas from FEP can also further our understanding of our ability to balance between maintaining status quo (exploit) and going outside our comfort zones (explore). To paraphrase the English polymath Spencer Brown, “the first act of cognition is to differentiate (act of distinction).” We start with differentiating: me/everything else. We experience and “bring forth” the world around us by constructing it inside our mind. This construction has to be a simpler version due to the very high complexity of the world around us. We only care about correlations that matter to us in our local environment. This matters the most for our survival and sustenance.
This leads to a tension. We want to look for things that confirm our hypotheses and maintain status quo. This is a short-term vision. However, this doesn’t help in the long run with our sustenance; we must also explore, to look for things that we don’t know about. This is the long-term vision. This helps us prepare to adapt with the ever-changing environment. There’s a balance between the two.
The idea of FEP can go from “I model the world” to “We model the world” to “We model ourselves modeling the world.” As part of a larger human system, we can co-create a shared model of our environment and collaborate to minimize the free energy, leading to our sustenance as a society.
Final words
FEP is a fascinating field, and I welcome readers to check out the works of Karl Friston, Andy Clarke, and others. I’ll finish with this insight from Friston—that the idea of minimizing free energy is also a way to recognize one’s existence:
“Avoiding surprises means that one has to model and anticipate a changing and itinerant world. This implies that the models used to quantify surprise must themselves embody itinerant wandering through sensory states (because they have been selected by exposure to an inconstant world): Under the free-energy principle, the agent will become an optimal (if approximate) model of its environment. This is because, mathematically, surprise is also the negative log-evidence for the model entailed by the agent. This means minimizing surprise maximizes the evidence for the agent (model).
“Put simply, the agent becomes a model of the environment in which it is immersed. This is exactly consistent with the Good Regulator theorem of Conant and Ashby (1970). This theorem, which is central to cybernetics, states that “every Good Regulator of a system must be a model of that system.”... Like adaptive fitness, the free-energy formulation is not a mechanism or magic recipe for life; it is just a characterization of biological systems that exist. In fact, adaptive fitness and (negative) free energy are considered by some to be the same thing.”
Always keep on learning...
1 Buckley, Christopher L.; Sub Kim, Chang; McGregor, Simon; and Seth, Anil K. “The free energy principle for action and perception: A mathematical review.” Journal of Mathematical Psychology, vol. 81. 2017.
Published Jan. 19, 2025, in Harish Jose’s blog.
Add new comment