Steven Pinker, author of The Blank Slate, released a new book titled When Everyone Knows That Everyone Knows. It’s about “common knowledge”, meaning instances when everyone knows something, and they know everyone else knows everyone knows that thing. This is linked to game theory, and Pinker covers several “games” throughout the book. Pinker promises that studying common knowledge can reveal things about the “Mysteries of Money, Power, and Everyday Life”. He promises a lot:
In this book I’ll expand on that theory and show how common knowledge also explains fundamental features of societal organization, such as political power and nancial markets; some of the design specs of human nature, such as laughter and tears; and countless curiosities of private and public life, such as bubbles and crashes, road rage, anonymous donations, long goodbyes, revolutions that come out of nowhere, social media shaming mobs, and academic cancel culture.
He massively overpromised. The book begins as a dreadful slog.
All of this should say:
[Voss’s Rewrite] Common knowledge fullfills three criteria:
Everyone knows it
(1) is known to each person
(2) is known to each person
I believe coordination powers are enhanced when these criteria are satisfied. Human intelligence likely evolved to support common knowledge. Much of human instinctive behavior in politics, personal and large-scale, is probably evolved for spreading common knowledge.
I get it, the book is written for an IQ range that can’t handle reasonable informatic density. But there’s not even a smart-people article or chapter hidden somewhere. Pinker fails to make his overly broad case; what he does show is merely common sense.
It’s trivial and obvious that if knowledge affects human behavior, it will affect it more the more widespread it is. That’s enough reason for blank-slatists to oppose the spread of HBD knowledge. You don’t need this overly discrete, artificial concept of “common knowledge” to explain this. Apparently Pinker secretly agrees, because his chapter 8 on cancel culture is just a long whinge session. He doesn’t tie cancel culture in to his “theory” at all.
I don’t see why I, as a quantitative sociobiologist who studies memetics, should assign special significance to “common knowledge”, instead of just using a continuous measure of how widespread an idea is in a population. His book fails to make the case for this. Are his readers really so low-information, that they never thought maybe widespread ideas are different from niche ideas?
And game theory still isn’t relevant to real human science. Pinker doesn’t make a case otherwise. Dare I say that he just uses it because he is or was (he’s 71 years old) personal friends with puzzleheads that worked on it. The reason why game theory is irrelevant is that it doesn’t fit real life. The model assumptions are unrealistic & false, just like neoclassical microeconomics. Real life has massive populations, not a handful of players. There’s never communication restrictions on average. And people aren’t working for a well-defined reward, they’re obeying an ensemble d’instincts1 without always getting clear feedback. Statistical models can fit this and yield measurable parameters related to information spread and behavioral changes; game theory models never yield measurables, which means they are unscientific.
I was already familiar with, and critical of, game theory, although I suppose this book could introduce it to an intelligent 15 year old. The only thing that was new to me in the text was the Aumann Agreement Theorem, a relatively trivial result from 1976 from Pinker’s friend. Pinker’s other friend Scott Aaronson has a good blog post on it which Pinker cites.
I think a critical reading of this post is a pretty good way to see the issues with the theorem from a sociobiological perspective.
I’ll start with the “Muddy Children Puzzle,” which is one of the greatest logic puzzles ever invented. How many of you have seen this one?
OK, so the way it goes is, there are a hundred children playing in the mud. Naturally, they all have muddy foreheads. At some point their teacher comes along and says to them, as they all sit around in a circle: “stand up if you know your forehead is muddy.” No one stands up. For how could they know? Each kid can see all the other 99 kids’ foreheads, so knows that they’re muddy, but can’t see his or her own forehead. (We’ll assume that there are no mirrors or camera phones nearby, and also that this is mud that you don’t feel when it’s on your forehead.)
The problem is even without mirrors, all of the kids know that they are a kid, they were playing in the mud, and 99 out of 99 of the other kids who were playing with them have muddy foreheads. Therefore, they infer they they themselves have a muddy forehead. Consequently, in real life, they will stand up. Modeling the logic of them behaving in a way that humans don’t behave is not going to be very useful to sociobiology. We’re already going down a path pointed in the wrong direction by motivating the theorem.
But seriously, let me give you an example I stole from Steven Pinker, from his wonderful book The Stuff of Thought. Two people of indeterminate gender—let’s not make any assumptions here—go on a date. Afterward, one of them says to the other: “Would you like to come up to my apartment to see my etchings?” The other says, “Sure, I’d love to see them.”
This is such a cliché that we might not even notice the deep paradox here. It’s like with life itself: people knew for thousands of years that every bird has the right kind of beak for its environment, but not until Darwin and Wallace could anyone articulate why (and only a few people before them even recognized there was a question there that called for a non-circular answer).
In our case, the puzzle is this: both people on the date know perfectly well that the reason they’re going up to the apartment has nothing to do with etchings. They probably even both know the other knows that. But if that’s the case, then why don’t they just blurt it out: “would you like to come up for some intercourse?” (Or “fluid transfer,” as the John Nash character put it in the Beautiful Mind movie?)
So here’s Pinker’s answer. Yes, both people know why they’re going to the apartment, but they also want to avoid their knowledge becoming common knowledge. They want plausible deniability. There are several possible reasons: to preserve the romantic fantasy of being “swept off one’s feet.” To provide a face-saving way to back out later, should one of them change their mind: since nothing was ever openly said, there’s no agreement to abrogate. In fact, even if only one of the people (say A) might care about such things, if the other person (say B) thinks there’s any chance A cares, B will also have an interest in avoiding common knowledge, for A’s sake.
Married couples don’t do this when alone. Unfamiliar couples may do this when alone because premarital sex is risky and the man is trying to tempt the woman into it very slowly. She’ll get scared and say no if she is asked to full-on consent to sex at the elevator door. Couples who are overheard are not trying to evade common knowledge; in a marriage, it is common knowledge that you have sex with your spouse. You do not announce to others specifically when you are planning on having sex because that is pornographic in nature; it may inspire feelings of arousal or envy in others and lead to conflict.
For a bunch of self-styled logicians, the logic here is extremely facile. I guess logic isn’t a substitute for deep subject-matter expertise. This is what happens when a group of puzzle-brained pure math people with no real training in human behavioral biology, genetics, and evolution try to tackle these issues.
OK, now for a darker example of common knowledge in action. If you read accounts of Nazi Germany, or the USSR, or North Korea or other despotic regimes today, you can easily be overwhelmed by this sense of, “so why didn’t all the sane people just rise up and overthrow the totalitarian monsters? Surely there were more sane people than crazy, evil ones. And probably the sane people even knew, from experience, that many of their neighbors were sane—so why this cowardice?” Once again, it could be argued that common knowledge is the key. Even if everyone knows the emperor is naked; indeed, even if everyone knows everyone knows he’s naked, still, if it’s not common knowledge, then anyone who says the emperor’s naked is knowingly assuming a massive personal risk. That’s why, in the story, it took a child to shift the equilibrium. Likewise, even if you know that 90% of the populace will join your democratic revolt provided they themselves know 90% will join it, if you can’t make your revolt’s popularity common knowledge, everyone will be stuck second-guessing each other, worried that if they revolt they’ll be an easily-crushed minority. And because of that very worry, they’ll be correct!
These regimes were/are popular. Thinking otherwise is getting one-shotted by enemy propaganda. None of these regimes abused the majority, only minorities, thus evading the undermining of popularity.
(4) When I first learned about this stuff 12 years ago, it seemed obvious to me that a lot of it could be dismissed as irrelevant to the real world for reasons of complexity. I.e., sure, it might apply to ideal reasoners with unlimited time and computational power, but as soon as you impose realistic constraints, this whole Aumannian house of cards should collapse. As an example, if Alice and Bob have common priors, then sure they’ll agree about everything if they effectively share all their information with each other! But in practice, we don’t have time to “mind-meld,” swapping our entire life experiences with anyone we meet. So one could conjecture that agreement, in general, requires a lot of communication. So then I sat down and tried to prove that as a theorem. And you know what I found? That my intuition here wasn’t even close to correct!
In more detail, I proved the following theorem. Suppose Alice and Bob are Bayesians with shared priors, and suppose they’re arguing about (say) the probability of some future event—or more generally, about any random variable X bounded in [0,1]. So, they have a conversation where Alice first announces her expectation of X, then Bob announces his new expectation, and so on. The theorem says that Alice’s and Bob’s estimates of X will necessarily agree to within ±ε, with probability at least 1-δ over their shared prior, after they’ve exchanged only O(1/(δε2)) messages. Note that this bound is completely independent of how much knowledge they have; it depends only on the accuracy with which they want to agree! Furthermore, the same bound holds even if Alice and Bob only send a few discrete bits about their real-valued expectations with each message, rather than the expectations themselves.
Yeah, so let’s try to apply this to mutational load theory. It should apply cleanly. I started out with priors that were pretty much normal for this sphere. After reading about it extensively my priors moved from something like N(0,.01) for the mutational pressure on leftism to N(0.07, 0.02). I’d like to induce agreement to within epsilon = 0.01. I am not really sure from his explanation as to what delta should be. The probability it converges to agreement within epsilon? I’d like that to be 99% I guess. Or maybe 90%, which could be interpreted as I have a protocal that takes a certain amount of time I’m solving for here, and it works 90% of the time, which means if I use it on 10 people from the broader HBD sphere, and they' start with the same priors I started with (fair assumption), they’ll agree 9/10 will converge to where I want. So delta can be 0.10.
So that comes out to exchange a whopping 100,000 messages. If I change delta to 0.01, that would be 1,000,000 messages. How long do I think it takes to exchange a message? What even is a message in this situation?
There’s no way to tell. All I can think is that if they’re exactly like me, and they have the exact same priors I had, then they’ll agree with me after reading everything I read. I’d estimate this would take them between 100 and 500 hours of reading, potentially long enough to learn an entire foreign language. Because you’ve got to read and understand all of the studies on the topic, all of the books, and this requires a certain level of mathematical and statistical knowledge which is beyond an entire undergraduate degree in statistics, and then you need a ton of sociobiological knowledge on top of that. So it really does take a long time, just like learning a new language.
In this case, Aaronson’s theorem has become pointless because it’s impossible to map it on to reality. His original intuition is not so debunked.
So I think in same cases disagreement is driven by the fact that only experts can know the truth with certainty, and it’s too costly for most people to become an expert. One could apply this to global warming, vaccines, mutational load, and even other HBD topics. One question though is how honest is non-expert dissent? How do we treat internal credibility intervals in people without expertise? Is it ever “honest” to oppose experts? Is there some expectation of a Bayesian agent knowing they should have less certainty than an expert agent? Or do perfect Bayesians struggle with certainty bounds? They haven’t addressed that either, but it’s very important in real life. Instead they have attempted to deny that expertise is difficult and it’s not reasonable to assume that expertise can be common knowledge. They have done the opposite of dealing with this issue in a realistic, scientific way.
The other major issue is the common priors assumption. Now, it’s unclear how to treat credbility, but I will say that any prior should get drowned out by sufficient data. In discourses where data is thin, priors will predominate. But to non-experts, data is always thin, since they don’t extensively interact with what is there. So in thin discourses, or among people who are non-experts who otherwise have to act, you should expect disagreement to be driven by different priors.
What do they say about this?
(2) Or—and this is an obvious one—you could reject the assumption of common priors. After all, isn’t a major selling point of Bayesianism supposed to be its subjective aspect, the fact that you pick “whichever prior feels right for you,” and are constrained only in how to update that prior? If Alice’s and Bob’s priors can be different, then all the reasoning I went through earlier collapses. So rejecting common priors might seem appealing. But there’s a paper by Tyler Cowen and Robin Hanson called “Are Disagreements Honest?”—one of the most worldview-destabilizing papers I’ve ever read—that calls that strategy into question. What it says, basically, is this: if you’re really a thoroughgoing Bayesian rationalist, then your prior ought to allow for the possibility that you are the other person. Or to put it another way: “you being born as you,” rather than as someone else, should be treated as just one more contingent fact that you observe and then conditionalize on! And likewise, the other person should condition on the observation that they’re them and not you. In this way, absolutely everything that makes you different from someone else can be understood as “differing information,” so we’re right back to the situation covered by Aumann’s Theorem. Imagine, if you like, that we all started out behind some Rawlsian veil of ignorance, as pure reasoning minds that had yet to be assigned specific bodies. In that original state, there was nothing to differentiate any of us from any other—anything that did would just be information to condition on—so we all should’ve had the same prior. That might sound fanciful, but in some sense all it’s saying is: what licenses you to privilege an observation just because it’s your eyes that made it, or a thought just because it happened to occur in your head? Like, if you’re objectively smarter or more observant than everyone else around you, fine, but to whatever extent you agree that you aren’t, your opinion gets no special epistemic protection just because it’s yours.
Shouldn’t you converge on the average belief unless you have special knowledge? In thin discourses, I think people do this, weighted on their priors, and then diverge by special life experiences. But others won’t listen to them, because their experience doesn’t change the average belief of the population, meaning another Bayesian, when considering the views of others, will not change their opinion of the population mean because one person in 7 billion has an outlier experience.
I do agree that lying doesn’t work well over the long term, so it’s probably not very relevant when considering the population equilibrium of beliefs, and that in discourses with a lot of information, priors won’t be very important. However, keep in mind belief isn’t the same thing as behavior — behavior will vary a lot given uniform belief.
I do find that populations tend to believe largely the same things, expertise aside. I find my divergent beliefs are mainly due to niche expertise, which takes significant time and effort to transmit. My divergent behaviors, including, eg, my moral foundations phenotype, mostly do not hinge on belief at all; those are genetic.
Tying this back to common knowledge, I don’t see any reason to think the sterile, discrete definition of “common knowledge” used by Pinker’s friends is useful. I would say, however, that everything useful will become widespread knowledge, time and population intelligence permitting. This follows from my memetics model. That won’t induce behavioral uniformity though, since people don’t act merely on information. They genuinely have divergent, instinctive interests.
Pinker’s book certainly gives something to think about, although it’s not very convincing in any regard, especially to any degree it tries to reduce human behavior evolution and variability to merely a function of widespread ideas.
Pinker frequently uses French loan phrases.