Book Review - After Phrenology

Categories: Neuroscience Book Review

Written on March 23, 2021

After Phrenology by Michael L. Anderson makes three main arguments (I have ordered by compellingness):

Brain regions are functionally differentiated but not functionally specialized. Each brain region is differentiated, just as a hammer is different to a knife; however, each region still contributes to many different functions just as a knife can be used to not only cut vegetables but also spread butter and tighten screws. Anderson reviews research showing that any specific brain region is involved in a large number of different cognitive operations. This is counter to popular focus on modular localized cognition that fMRI emphasizes and has continued since the practice of phrenology.
All cognition and our study of it should be action oriented and “embodied”. Cognition interfaces with our environment and ensures our survival through the production of action. There is no point in having senses if no action can be taken to respond to them! Anderson uses this interface between brain and environment to argue against popular symbol based theories of cognition.
The brain’s computational abilities are made possible by its dynamic functional connections. These connections are between both other brain regions and the external environment. Our cognition is occurring at four different levels each with unique properties: genetic, synaptic, chemical, and environmental. The chemical level refers to extrasynaptic chemical gradients for signalling via volume transmission that I was unaware of (and I suspect may also be true for other readers).

Take-Aways

I believe After Phrenology effectively hammered home a deeper appreciation for just how complex the task of unravelling the secrets of cognition will be. A favourite phrase from a friend “evolution did not optimize for interpretability” comes to mind as evolution goes about its greedy optimization, creating a mish-mash of interdependencies between and across the brain components it has at hand. Keeping more in mind just how interconnected the brain is both within itself and our environment will be useful for any future research on cognition. More broadly, it is useful to have more awareness of our bias to assign single modular functions to the components of any system.

After Phrenology has made me more skeptical of fMRI than I already was. It has also made me more interested in volume transmission and the dynamic routing between brain modules. This dynamic routing reminds me of a a recent deep learning paper that successfully leveraged dynamic switching between a number of neuronal submodules.

Being previously unaware of dynamicism, I found the critiques of symbol based processing refreshing. However, I feel like there are fundamental definitional problems with Anderson’s discussion of symbols that are emblematic of the book as a whole often being too high level and avoiding specific details about mechanism or implementation. In trying to resolve these definitional issues, I am ironically more bullish about Vector Symbolic Architectures (VSAs) than I was before reading this critique. It seems to me that VSAs can address the symbol based issues outlined, while also being compatible with a biologically plausible implementation of Turing complete symbolic systems as noted in this paper and shown to work powerfully in this work. Update: I have since read How to Build a Brain which further supports my belief in the compatibility of VSAs.

The focus on action oriented learning and embodied cognition was appreciated and fit well with my general beliefs in evolutionary biology, the Free Energy Principle and Active Inference. Action orientation has interesting implications for deep learning that are supported by a recent DeepMind paper that emphasizes how all symbols must have utility for an end task and both be created by and have subjective value to the agent itself, not to us humans. The paper also acknowledges a great way to learn symbols might be through social interaction, which fits with After Phrenology’s argument that all language is for the purpose of socialization.

I believe it is also noteworthy that DeepMind’s MuZero does not decode any of its symbolic reconstructions back into pixels. This is in stark contrast to many model based deep reinforcement learning approaches like this and this where the symbolic representation the agent learns is forced to reconstruct its input¹.

Beyond not forcing the learners to have symbolic representations that are interpretable to us as humans, the action oriented perspective also rejects modelling parts of the world that are not relevant to action and, by proxy, survival. This paper cleverly frames this problem as forcing the agent to only learn about things that it can directly control or be affected by. For example, a land based creature does not often need to use its limited computation to model a cloud being blown across the sky. One natural and successful way of creating this action oriented approach is by simply trying to sequentially predict the next thing, which has led to a number of very impressive successes, this and this paper come most readily to mind.

Anderson’s Support of Dynamicism

Anderson is a strong advocate for dynamicism and in opposition to symbolic processing in the brain. Anderson’s argument rests on symbols being unnecessary to construct for four different reasons that I sketch and then explain in more detail:

Dynamical equations can be used instead of symbols – think of heuristics, closed feedback loops, or model free reinforcement learning. For example, we catch a ball using a heuristic, not solving a complex physics equation.
Our environments make learning and representation problems easier – our surroundings and physical body constrain the space of decisions/actions both by the laws of physics and social norms. This makes deciding what to represent and what actions to take unneeding of symbols.
The environment itself can be used to store and represent symbols – rather than remembering and reasoning through everything in my head, I can write things down, use an abacus, etc.
We often forget that the fundamental purpose of the brain is to increase the odds we stay alive (and, by proxy, reproduce). It is incorrect to think of the brain as an independent representation machine that takes in sensory input and performs computations on it (like our computers) without performing any physical actions in the environment.

Going into more detail on each of these points:

1. Dynamical equations can be used instead of symbols:

The “optical acceleration cancellation” heuristic used to catch something works by moving to cancel out the perceived vertical acceleration of the object. We perform this operation in a closed loop between our perception of the ball and physical coordinates in space, without needing to rely upon any symbols.

As another example, the centrifugal governor is a clever feedback device to make the speed of an engine constant. The device operates in a dynamic, closed loop that can be described by a differential equation. Instead of using a differential equation, we could discretize each component of the device, represent each with a symbol, and compute how they all relate. However, Anderson and other dynamicists argue that this symbolic representation is not only unnecessary in being overly complex but also hides the real nature of embodied cognition, constantly interacting with its environment. I believe this discrepancy between the symbol-free and symbol based is analogous to the difference between model free and model based reinforcement learning, respectively. To date, the model free programs used by many RL agents have been surprisingly successful in a way that model based has often failed.

2. Our environments make learning and representation problems easier:

A source of many symbolic beliefs about the brain come from the perceived difficulty of inferring sensory stimuli: converting 2D, noisy images of our surroundings into a vibrant 3D reconstruction of the world is so difficult that we must use inference to reconstruct an internal representation. However, Anderson argues that the combination of (i) receiving highly redundant external stimuli and (ii) the sophistication of our sensory organs makes the representation problem easier than it is often assumed. The redundant stimuli refers to temporal and contextual redundancy. For example, a single image of a mug on my retina may be difficult to infer, however, many images of this mug from different angles and in the context of my office makes inference much easier.

Anderson goes even further to argue that the holy-grail of symbolic processing – language – is also very much a symbol free dynamical system that is oriented towards social, conversational settings. Anderson argues that language may in fact be easier to learn than we assume because only language that is easy to learn and use will propagate. Moreover, in social settings there are many conversational and behavioural norms that are followed to enable successful communication (eg. tone and posture mirroring²) that constrain the degrees of freedom for what to say. My own poor understanding of English grammar yet ability to communicate and still write (with questionable success for certain!) comes to mind as another way in which I have learnt the conventions of English without actually having a robust symbolic and self contained logical understanding of how language functions.

In addition and as further evidence, while many languages can theoretically perform infinite recursion, in practice we struggle to use more than three nested statements, something that is quite tractable for statistical language approaches to model.

3. The environment itself can be used to store and represent symbols:

Anderson reviews evidence about performing mathematics where much of our symbolic manipulation uses visual and tactile heuristics, a far cry from siloed internal representations. For example, students who know that multiplication occurs before addition in an equation will see the multiplication symbol as being physically closer to the numbers being multiplied. Another experiment was motivated by the observation that when we perform calculations, we work from the left to the right compute the answer. By putting a moving background behind math problems and altering the speed and direction of the background, researchers were able to change problem solving speed. Interestingly, they also found this effect was stronger in those who were more mathematically capable.

This wonderful quote also comes to mind:

When historian Charles Weiner looked over a pile of Richard Feynman’s notebooks, he called them a wonderful ‘record of his day-to-day work’.

“No, no!”, Feynman objected strongly.

“They aren’t a record of my thinking process. They are my thinking process. I actually did the work on the paper.”

“Well,” Weiner said, “The work was done in your head, but the record of it is still here.”

“No, it’s not a record, not really. It’s working. You have to work on paper and this is the paper. Okay?”, Feynman explained.

Source: Clive Thompson (2014). Smarter Than You Think. p. 7

4. We often forget that the fundamental purpose of the brain is to increase the odds we stay alive:

As noted earlier, without the ability to take action there is no purpose in having sensory organs, let alone process symbols of any kind. Our sensory stimuli are inexorably intertwined with our physical location and environment. There is a continual and mutually dependent feedback loop between our observations and our actions. We cannot think of the brain as being an isolated computational machine.

Anderson also acknowledges that thinking of the brain as a computer brings with it wrong assumptions about it using discrete symbols that are applied sequentially in discretized time segments. Everything about cognition is continuous in time, parallelized, and performed with a high level of background activity. Aside from background noise the brain must be robust to, this background activity can be highly consequential, for example, this wild paper showed that Alzheimers in mice was treated by pulsing gamma wave frequency 40Hz light waves into their eyes!

Dynamicism is compatible with Vector Symbolic Architectures

While Anderson’s arguments are interesting and it makes sense that we often form heuristics rather than disentangling the inner workings of the world, I do not believe Anderson’s critiques conflict with all forms of symbolic processing. We may not have unique, isolated symbols for every thought and calculation performed under one logical framework. However, even in the simple differential equation used to catch a falling object there are variables corresponding to the object and our body’s position in space that will be represented by some neural firing pattern and can be considered symbols. In fact, Anderson directly acknowledges this by talking about the variables in these equations being represented by vectors made up of neural population codes. Moreover, he cites work by Smolensky, one of the founders of Vector Symbolic Architectures (VSAs) for how these vectors can be combined into superpositions to represent probabilistic and multimodal outputs³. Are these variables that can be manipulated by mathematical operations not symbols? In addition, while the environment may be used as an external symbol system, Anderson says nothing about memories and how they are stored and recalled, while they clearly come from inside our heads and not the environment.

Beyond Anderson’s critiques being compatible with VSAs, I remain skeptical of the dynamic equation/model-free argument for all learning and responses. Dynamical systems are limited⁴ in their computational capacities in ways that Turing machines are not. Moreover, Turing machines are easy to create (they include cellular automata such as Conways’s game of life and Wolfram’s Rule 110) and could hypothetically be implemented by only four simple enzymes performing combinatory logic on strands of RNA. As a result, if there were compelling reasons for evolution to create a Turing machine (that leverages symbols rather than differential equations), it would be very tractable for it to do so. And compelling reasons with regards to the computational capacity and flexibility of Turing machines arguably exist.

Other interesting facts

On the dynamic utilization of neural resources: It was known that the visual cortex of the brain is utilized in blind braille readers and that this utilization is functional as targeted TMS was able to interfere with performance. An interesting experiment was done where sighted people were kept in total darkness for five days straight. During this time they were taught braille and within the first few days of having no visual stimulation, their visual cortex also appeared to be used for braille and functionally important to its comprehension. By the 5th day they made fewer errors than those that were sighted. When they returned to light after the five days, the visual cortex was no longer used for braille processing and their performance declined somewhat, being less statistically different to those never blindfolded. This is strong evidence for rapid plasticity between brain regions and weak evidence for more neurons increasing performance.
We know that there are connections between tactile and visual regions of the brain so the observed connections between these sensory modalities in sighted participants was not de novo but still much stronger than usual. This dynamic utilization of extra cognitive resources is impressive in the speed at which it occurred. It also begs the question of what the functional benefits from being able to use more neural resources during this period were? Could participants learn braille faster than they otherwise would have? Or to retain it better? These findings deserve further investigation (and may already have been answered! I have not looked beyond these observations at this point).
One way in which this dynamic computation occurs that is put forward by Anderson is through extrasynaptic signalling via volume transmission. One interesting study found that a reduction in hippocampal anisotropy (the variance of diffusion that occurs in different directions) was related to a reduction in learning ability for rats tested on the Morris water maze. The degree of anisotropy has also been shown to be related to aging and neurological disorders. I need to read these papers and learn more about volume transmission but my initial assumption is that by having more bias in the diffusion directions (greater anisotropy), you are able to provide more information content to specific brain regions (eg. the hippocampus) to enable learning.
Pyramidal neurons have synapses with only 1% of neurons in the range of their dendrites.
Only 5-20% of input in the primary visual cortex comes from sensory input, the remaining input is recurrent connections from elsewhere.
I am more skeptical of whole brain emulation than I used to be. Even if you could record every single synapse in the human brain, you would also need to account for extrasynaptic signalling. You would also need to model neuron-glia interactions. And to the extent that embodied cognition and symbols are stored in the environment, this would also need to be taken into account.
Narwhal tusks, which I was under the impression had unknown utility, are in fact a sense organ tuned to salinity differentials that specify the freezing of the water’s surface overhead.

Thanks to Joe Choo-Choy for reading a draft of this piece and providing useful edits and discussion! All errors and confusions that remain are mine and mine alone.

Footnotes

Interestingly, this is also a problem with the Free Energy Principle using Variational Inference in the way it currently does. While this does enable a probabilistic model of observations to be built, it seems like this representation should not be learnt at the pixel level and should instead learn a probabilistic model in an autoregressive fashion using the chain rule of probability. ↩
The coolest example of social mirroring I am aware of is highlighted in The Secret of Our Success by Joseph Henrich about this paper. During show host interviews the researchers could predict how smoothly the conversation would go based upon if one person mimicked the behaviour of the other. This occurred when one person acknowledged they had lower social status and deferred to mirroring the other. Except taken from Secrets of Our Success pg. 125: “For example, when two people are having a positive conversational experience, getting to know one another, they will be unconsciously mimicking each other, in their body positions, vocal frequencies, movements, and facial expressions—a patterning known as the Chameleon effect. Interestingly, however, since prestige subordinates are keener on understanding what their higher ups are thinking, wanting, and believing, they engage in relatively more mimicry—that is, subordinates unconsciously mimic prestigious individuals more than vice versa. One study of vocal mimicry involved CNN’s longtime talk-show host, Larry King. Researchers analyzed the low-frequency vocal patterns used by King and his guests to see whether King altered his vocal patterns to match the guest, or vice versa. Prior research had established that one of the ways that conversationalists mimic each other is by syncing up their low-frequency vocal patterns. But who accommodates to whom? Twenty-five guests were analyzed, ranging from Bill Clinton to Dan Quayle (U.S. vice-president, 1989–93). As expected, when Larry was interviewing someone perceived to be highly prestigious, Larry shifted his vocal frequencies to match his guest’s patterns. However, when he was interviewing those perceived to be of lower status than Larry himself, it was the guests who automatically and unconsciously shifted to match Larry’s frequency. Larry most strongly accommodated to George Bush, a sitting American president, as well as to Liz Taylor, Ross Perot, Mike Wallace, and a presidential candidate, Bill Clinton. Meanwhile, Dan Quayle, Robert Strauss, and Spike Lee accommodated to Larry. Sometimes neither person shifted to match the other, such as when Larry interviewed a young Al Gore. These conversations were perceived as difficult, perhaps because both individuals saw themselves as being of higher status than their partner, so neither would defer.” ↩
For a very interesting introduction to VSAs and their symbolic potential that I am very bullish on see this excellent paper. ↩
See this for a Socratic dialogue of the same arguments. Also see slides of the paper I made for a journal club and you can look at but only if you lack the time to read the actual paper – I really recommend reading the paper instead as it is well written and far more detailed. ↩