Yesterday was Martin Luther King Jr. Day. This year, the US celebrated by inaugurating Trump to the presidency a second time. Those that watched the ceremony saw a speech by a political power broker.
To some people, his gesture was a clear indication of the power broker’s political ideology.
Many other people tried to explain away the ideological interpretation. Some argued that he was making a “Roman Salute”,1 or that he was pantomiming “my heart goes out to you”2 (if only he knew of some other way to gesticulate that3). Others explained that it was just an awkward gesture.
awkward
Different priors, different explanations
Let’s build a model to explore how prior beliefs can contribute to people’s differing explanations of the same data.
We start by building a generative model that simulates what the observer thinks the broker would do, given different combinations of latent causal factors.
Unconditioned model
In this generative model, the broker can choose to make the salute gesture or some other gesture4. He might be ideologically pro-democracy or pro-fascism, and his demeanor might be suave or awkward. The causal structure is that I and D are causal antecedents of G:
Figure 1: { P(I, D, G) }
We can specify these causal relationships as a memo model that enumerates over all realizations of the variables, { \{ (i, d, g) : i \in I, d \in D, g \in G \} }, to infer the joint probability { P(I, D, G) }.
import jaximport jax.numpy as jnpfrom memo import memofrom enum import IntEnumclass Gesture(IntEnum): SOMETHINGELSE =0 SALUTE =1class Ideology(IntEnum): DEMOCRACY =0 FASCISM =1class Demeanor(IntEnum): SUAVE =0 AWKWARD =1@jax.jitdef gesture_pmf(gesture, ideology, demeanor):### P(Gesture=SALUTE | Ideology=DEMOCRACY, Demeanor=SUAVE) p_salute__dem_suave =0.001### P(Gesture=SALUTE | Ideology=DEMOCRACY, Demeanor=AWKWARD) p_salute__dem_awk =0.2### P(Gesture=SALUTE | Ideology=FASCISM, Demeanor=SUAVE) p_salute__fasc_suave =0.9### P(Gesture=SALUTE | Ideology=FASCISM, Demeanor=AWKWARD) p_salute__fasc_awk =0.92### P(Gesture=SALUTE | Ideology, Demeanor) p_salute = jnp.array([ [p_salute__dem_suave, p_salute__fasc_suave], [p_salute__dem_awk, p_salute__fasc_awk], ])[demeanor, ideology]### P(Gesture | Ideology, Demeanor)return jnp.array([1- p_salute, p_salute])[gesture]@memodef speech_simulation[ _g: Gesture, _i: Ideology, _d: Demeanor](prior_fasc=0.5): observer: knows(_d, _i, _g) observer: thinks[ broker: given(d in Demeanor, wpp=1), broker: chooses(i in Ideology, wpp=( prior_fasc if i == {Ideology.FASCISM} else1- prior_fasc )), broker: chooses(g in Gesture, wpp=gesture_pmf(g, i, d)), ]return observer[ Pr[ broker.g == _g, broker.i == _i, broker.d == _d, ] ]### observer 1print("Observer 1, with a uniform prior belief about ""whether the broker is pro-fascism or pro-democracy")res1joint = speech_simulation( prior_fasc=0.5, print_table=True, return_aux=True, return_xarray=True)print("\n\n")### observer 2print("Observer 2, who thinks that the broker ""being pro-fascism is unlikely a priori")res2joint = speech_simulation( prior_fasc=0.01, print_table=True, return_aux=True, return_xarray=True)
We see that observer 1 thinks that the broker is much more likely to make the salute gesture if he harbors a pro-fascist ideology. Thus, observer 1 will interpret the salute as highly indicative of the broker’s ideology.
Observer 2 thinks that the broker is unlikely to produce the salute at all, but in the unlikely event he does, it would be because he’s socially awkward.
Parsing the model
Recall from Bayes rule that we infer the posterior from the product of the prior and likelihood: { P(\mathcal{H} \mid \mathcal{D}) \propto P(\mathcal{H}) \, P(\mathcal{D} \mid \mathcal{H}) }, and that the prior and likelihood are a factorization of the joint probability { P(\mathcal{H}, \mathcal{D}) = P(\mathcal{H}) \, P(\mathcal{D} \mid \mathcal{H})}.
The model above expresses the joint distribution { P(I, D, Q) } by specifying a likelihood model: { P(G \mid I, D) }, and a prior: { P(I, D) }. In this case, I and D are independent a priori, so the joint prior can be factorized as { P(I, D) = P(I) \, P(D) }.
To infer the probability of ideology and demeanor given a gesture, we condition the joint model on G. In other words, we infer the posterior, { P(I, D \mid G) }, by appling Bayes’ rule:
\begin{align*}
P(I, D \mid G) = \frac{P(I, D, G)}{P(G)}
& = \frac{P(I) \, P(D) \, P(G \mid I, D)}{P(G)} \\ \\
& = \frac{P(I) \, P(D) \, P(G \mid I, D)}{\sum\limits_{i \in I, \, d \in D} \,P(I{=}i, D{=}d, G)}
\end{align*}
Conditioned model
Equipped with a generative model, we can infer what latent causes were likely to have generated the observation, according to each observer.
This is accomplished by conditioning the model of { P(I, D, G) } on the data (e.g., { G{=}\text{salute} }). In memo, we can do this with
agent: observes [frame.representation] is value.
In our case, the agent is observer, the agent is resolving uncertainty about the state of the representation g in the frame of the broker, and the query variable is _g. Thus:
observer: observes [broker.g] is _g
Note that the query value (_g in this case) is in the frame of the observer, not the frame of the broker. This is demarcated by having _g outside of the brackets (remember, brackets denote entering a frame). We will later see how this syntax is permits false beliefs.
Observing _g means conditioning the model on that specific realization of G:
Figure 2: { P(I, D \mid G) }
We are now inferring the conditional distribution { P(I, D \mid G)}.
@memodef speech_observation[ _g: Gesture, _i: Ideology, _d: Demeanor](prior_fasc=0.5): observer: knows(_d, _i, _g) observer: thinks[ broker : given(d in Demeanor, wpp=1), broker : chooses(i in Ideology, wpp=( prior_fasc if i == {Ideology.FASCISM} else1- prior_fasc )), broker : chooses(g in Gesture, wpp=gesture_pmf(g, i, d)), ]### observe gesture ### observer: observes[broker.g] is _greturn observer[ Pr[### replace _g with {Gesture.SALUTE} to ignore probs### for when gesture is something other than salute broker.g == _g, broker.i == _i, broker.d == _d, ] ]#### How do these two observers' priors affect their belief updates # when they observe the gesture?###### observer 1print(f"Observer 1 who, before observing the speech, ""had a uniform prior belief about whether the broker ""is pro-fascism or pro-democracy")res1 = speech_observation( prior_fasc=0.5, print_table=True, return_aux=True, return_xarray=True)print("\n\n")### observer 2print(f"Observer 2 who, before observing the speech, ""thought that the broker being pro-fascism was unlikely")res2 = speech_observation( prior_fasc=0.01, print_table=True, return_aux=True, return_xarray=True)
Note that the probability values returned by the model no longer sum to 1. This is because conditioning restricts the possibility space—we are now only considering the possibilities in the world where _g occurred. Thus, we expect the posterior probabilities of { P(I, D \mid G{=}g) } to sum to one. Is that consistent with the output?
Comparing the marginal probabilities, we see
### observer 1 ###print(f"Observer 1 thinks that...")### P(FASCISM | SALUTE; observer1)xa1 = res1.aux.xarraypr1 = xa1.loc["SALUTE"]print(f"\n there's a high probability that the broker is pro-fascism:")print(f" P(FASCISM | SALUTE) = {pr1.loc["FASCISM", :].sum():0.4f}")print(f" P(DEMOCRACY | SALUTE) = {pr1.loc["DEMOCRACY", :].sum():0.4f}")### P(AWKWARD | SALUTE; observer1)print(f"\n the broker might be socially awkward or suave, could go either way (i.e. observer has low confidence):")print(f" P(AWKWARD | SALUTE) = {pr1.loc[:, "AWKWARD"].sum():0.4f}")print(f" P(SUAVE | SALUTE) = {pr1.loc[:, "SUAVE"].sum():0.4f}")print("\n")### observer 2 ###print(f"Observer 2 thinks that...")### P(FASCISM | SALUTE; observer2)xa2 = res2.aux.xarraypr2 = xa2.loc["SALUTE"]print(f"\n the broker is unlikely to be pro-fascism:")print(f" P(FASCISM | SALUTE) = {pr2.loc["FASCISM", :].sum():0.4f}")print(f" P(DEMOCRACY | SALUTE) = {pr2.loc["DEMOCRACY", :].sum():0.4f}")### P(AWKWARD | SALUTE; observer2)print(f"\n the broker is just socially awkward:")print(f" P(AWKWARD | SALUTE) = {pr2.loc[:, "AWKWARD"].sum():0.4f}")print(f" P(SUAVE | SALUTE) = {pr2.loc[:, "SUAVE"].sum():0.4f}")
Observer 1 thinks that...
there's a high probability that the broker is pro-fascism:
P(FASCISM | SALUTE) = 0.9005
P(DEMOCRACY | SALUTE) = 0.0995
the broker might be socially awkward or suave, could go either way (i.e. observer has low confidence):
P(AWKWARD | SALUTE) = 0.5542
P(SUAVE | SALUTE) = 0.4458
Observer 2 thinks that...
the broker is unlikely to be pro-fascism:
P(FASCISM | SALUTE) = 0.0838
P(DEMOCRACY | SALUTE) = 0.9162
the broker is just socially awkward:
P(AWKWARD | SALUTE) = 0.9540
P(SUAVE | SALUTE) = 0.0460
Exercises
Describe these models in terms of Bayes’ rule. What’s the prior, likelihood, and posterior in these models? What is happening mathematically when we go from the first model to the second model?
Are observers 1 and 2 equally rational? Explain.
Adjust the likelihood and prior probabilities to match your beliefs about different people. Explain your adjustments and the effects. Did your adjustments bring the cognition predicted by the model closer to the patterns of cognition you were targeting?
Extend the model in some fashion. You could add more causes, or more types of observations. You could model how inference is affected by observing one gesture (which could be more easily explained away as noisy movement production) versus multiple similar gestures (which could imply a deliberate signal). You could convert a binary variable into a discretized linear variable (e.g. turn Gesture into a perceptual similarity metric that express how confusable a gesture is with a fascist salute). Maybe the observers didn’t see the video but rather heard about it from someone they have differing degrees of trust in (see Jaynes, 2003, and also the exercise below) – perhaps the observers think the person tends to be hyperbolic or understated, or maybe the person is in their ingroup or outgroup.
Optional
Jaynes (2003, Chapter 5, Section 3) describes how the same data can cause observers’ opinions to diverge. Extend the political power broker model above so that the observers update their beliefs in opposite directions given the same data. Describe why these changes lead to belief polarization.
For an empirical study that applies these ideas to actual behavior, see Botvinik-Nezer et al. (2023).
Render env
%reset -fimport sysimport platformimport importlib.metadataprint("Python:", sys.version)print("Platform:", platform.system(), platform.release())print("Processor:", platform.processor())print("Machine:", platform.machine())print("\nPackages:")for name, version insorted( ((dist.metadata["Name"], dist.version) for dist in importlib.metadata.distributions()), key=lambda x: x[0].lower() # Sort case-insensitively):print(f"{name}=={version}")
Botvinik-Nezer, Rotem, Jones, Matt, & Wager, Tor D. (2023). A belief systems analysis of fraud beliefs following the 2020 US election. Nature Human Behaviour, 7(7), 1106–1119. https://doi.org/10.1038/s41562-023-01570-4
Jaynes, E. T. (2003). Probability Theory: The Logic of Science (G. Larry Bretthorst, Ed.; 1st ed.). Cambridge University Press. https://doi.org/10.1017/CBO9780511790423
Footnotes
The Roman salute is not described by any historical Latin source. The gesture was anachronistically credited to the ancient Romans in the 18th century and has since been used by various political movements and nation-states. Many of those pushing this interpretation appear to be unaware that it is also known as the fascist salute, which could strike one as odd considering that they are so familiar with the gesture that they can confidently classify it on sight.