Priors and explanation

Inauguration speech

Ambiguity and explanation

Yesterday was Martin Luther King Jr. Day. This year, the US celebrated by inaugurating Trump to the presidency a second time. Those that watched the ceremony saw a speech by a political power broker.



To some people, his gesture was a clear indication of the power broker’s political ideology.



Many other people tried to explain away the ideological interpretation. Some argued that he was making a “Roman Salute”,1 or that he was pantomiming “my heart goes out to you”2 (if only he knew of some other way to gesticulate that3). Others explained that it was just an awkward gesture.

awkward

Different priors, different explanations

Let’s build a model to explore how prior beliefs can contribute to people’s differing explanations of the same data.

We start by building a generative model that simulates what the observer thinks the broker would do, given different combinations of latent causal factors.

Unconditioned model

In this generative model, the broker can choose to make the salute gesture or some other gesture4. He might be ideologically pro-democracy or pro-fascism, and his demeanor might be suave or awkward. The causal structure is that I and D are causal antecedents of G:

 

 

Figure 1: { P(I, D, G) }

We can specify these causal relationships as a memo model that enumerates over all realizations of the variables, { \{ (i, d, g) : i \in I, d \in D, g \in G \} }, to infer the joint probability { P(I, D, G) }.

import jax
import jax.numpy as jnp
from memo import memo
from enum import IntEnum

class Gesture(IntEnum):
    SOMETHINGELSE = 0
    SALUTE = 1

class Ideology(IntEnum):
    DEMOCRACY = 0
    FASCISM = 1

class Demeanor(IntEnum):
    SUAVE = 0
    AWKWARD = 1

@jax.jit
def gesture_pmf(gesture, ideology, demeanor):
    ### P(Gesture=SALUTE | Ideology=DEMOCRACY, Demeanor=SUAVE)
    p_salute__dem_suave = 0.001
    ### P(Gesture=SALUTE | Ideology=DEMOCRACY, Demeanor=AWKWARD)
    p_salute__dem_awk = 0.2
    ### P(Gesture=SALUTE | Ideology=FASCISM, Demeanor=SUAVE)
    p_salute__fasc_suave = 0.9
    ### P(Gesture=SALUTE | Ideology=FASCISM, Demeanor=AWKWARD)
    p_salute__fasc_awk = 0.92

    ### P(Gesture=SALUTE | Ideology, Demeanor)
    p_salute = jnp.array([
        [p_salute__dem_suave, p_salute__fasc_suave],
        [p_salute__dem_awk, p_salute__fasc_awk],
    ])[demeanor, ideology]
    
    ### P(Gesture | Ideology, Demeanor)
    return jnp.array([1 - p_salute, p_salute])[gesture]

@memo
def speech_simulation[
    _g: Gesture, 
    _i: Ideology, 
    _d: Demeanor
](prior_fasc=0.5):
    observer: knows(_d, _i, _g)
    observer: thinks[
        broker: given(d in Demeanor, wpp=1),
        broker: chooses(i in Ideology, wpp=(
            prior_fasc if i == {Ideology.FASCISM} 
            else 1 - prior_fasc
        )),
        broker: chooses(g in Gesture, wpp=gesture_pmf(g, i, d)),
    ]

    return observer[ 
        Pr[
            broker.g == _g,
            broker.i == _i,
            broker.d == _d,
        ] 
    ]

### observer 1
print(
    "Observer 1, with a uniform prior belief about "
    "whether the broker is pro-fascism or pro-democracy")
res1joint = speech_simulation(
    prior_fasc=0.5, 
    print_table=True, return_aux=True, return_xarray=True)
print("\n\n")

### observer 2
print(
    "Observer 2, who thinks that the broker "
    "being pro-fascism is unlikely a priori")
res2joint = speech_simulation(
    prior_fasc=0.01, 
    print_table=True, return_aux=True, return_xarray=True)
Observer 1, with a uniform prior belief about whether the broker is pro-fascism or pro-democracy
+---------------+--------------+--------------+------------------------+
| _g: Gesture   | _i: Ideology | _d: Demeanor | speech_simulation      |
+---------------+--------------+--------------+------------------------+
| SOMETHINGELSE | DEMOCRACY    | SUAVE        | 0.24975000321865082    |
| SOMETHINGELSE | DEMOCRACY    | AWKWARD      | 0.20000000298023224    |
| SOMETHINGELSE | FASCISM      | SUAVE        | 0.025000005960464478   |
| SOMETHINGELSE | FASCISM      | AWKWARD      | 0.019999995827674866   |
| SALUTE        | DEMOCRACY    | SUAVE        | 0.0002500000118743628  |
| SALUTE        | DEMOCRACY    | AWKWARD      | 0.05000000074505806    |
| SALUTE        | FASCISM      | SUAVE        | 0.22499999403953552    |
| SALUTE        | FASCISM      | AWKWARD      | 0.23000000417232513    |
+---------------+--------------+--------------+------------------------+



Observer 2, who thinks that the broker being pro-fascism is unlikely a priori
+---------------+--------------+--------------+------------------------+
| _g: Gesture   | _i: Ideology | _d: Demeanor | speech_simulation      |
+---------------+--------------+--------------+------------------------+
| SOMETHINGELSE | DEMOCRACY    | SUAVE        | 0.49450501799583435    |
| SOMETHINGELSE | DEMOCRACY    | AWKWARD      | 0.3959999978542328     |
| SOMETHINGELSE | FASCISM      | SUAVE        | 0.0005000000819563866  |
| SOMETHINGELSE | FASCISM      | AWKWARD      | 0.0003999999025836587  |
| SALUTE        | DEMOCRACY    | SUAVE        | 0.0004950000438839197  |
| SALUTE        | DEMOCRACY    | AWKWARD      | 0.0989999994635582     |
| SALUTE        | FASCISM      | SUAVE        | 0.0044999998062849045  |
| SALUTE        | FASCISM      | AWKWARD      | 0.004600000102072954   |
+---------------+--------------+--------------+------------------------+

We see that observer 1 thinks that the broker is much more likely to make the salute gesture if he harbors a pro-fascist ideology. Thus, observer 1 will interpret the salute as highly indicative of the broker’s ideology.

Observer 2 thinks that the broker is unlikely to produce the salute at all, but in the unlikely event he does, it would be because he’s socially awkward.

Parsing the model

Recall from Bayes rule that we infer the posterior from the product of the prior and likelihood: { P(\mathcal{H} \mid \mathcal{D}) \propto P(\mathcal{H}) \, P(\mathcal{D} \mid \mathcal{H}) }, and that the prior and likelihood are a factorization of the joint probability { P(\mathcal{H}, \mathcal{D}) = P(\mathcal{H}) \, P(\mathcal{D} \mid \mathcal{H})}.

The model above expresses the joint distribution { P(I, D, Q) } by specifying a likelihood model: { P(G \mid I, D) }, and a prior: { P(I, D) }. In this case, I and D are independent a priori, so the joint prior can be factorized as { P(I, D) = P(I) \, P(D) }.

To infer the probability of ideology and demeanor given a gesture, we condition the joint model on G. In other words, we infer the posterior, { P(I, D \mid G) }, by appling Bayes’ rule:

\begin{align*} P(I, D \mid G) = \frac{P(I, D, G)}{P(G)} & = \frac{P(I) \, P(D) \, P(G \mid I, D)}{P(G)} \\ \\ & = \frac{P(I) \, P(D) \, P(G \mid I, D)}{\sum\limits_{i \in I, \, d \in D} \,P(I{=}i, D{=}d, G)} \end{align*}

Conditioned model

Equipped with a generative model, we can infer what latent causes were likely to have generated the observation, according to each observer.

This is accomplished by conditioning the model of { P(I, D, G) } on the data (e.g., { G{=}\text{salute} }). In memo, we can do this with

agent: observes [frame.representation] is value.

In our case, the agent is observer, the agent is resolving uncertainty about the state of the representation g in the frame of the broker, and the query variable is _g. Thus:

observer: observes [broker.g] is _g

Note that the query value (_g in this case) is in the frame of the observer, not the frame of the broker. This is demarcated by having _g outside of the brackets (remember, brackets denote entering a frame). We will later see how this syntax is permits false beliefs.

Observing _g means conditioning the model on that specific realization of G:

 

 

Figure 2: { P(I, D \mid G) }

We are now inferring the conditional distribution { P(I, D \mid G)}.

@memo
def speech_observation[
    _g: Gesture, 
    _i: Ideology, 
    _d: Demeanor
](prior_fasc=0.5):
    observer: knows(_d, _i, _g)
    observer: thinks[
        broker : given(d in Demeanor, wpp=1),
        broker : chooses(i in Ideology, wpp=(
            prior_fasc if i == {Ideology.FASCISM} 
            else 1 - prior_fasc
        )),
        broker : chooses(g in Gesture, wpp=gesture_pmf(g, i, d)),
    ]

    ### observe gesture ###
    observer: observes[broker.g] is _g

    return observer[
        Pr[
            ### replace _g with {Gesture.SALUTE} to ignore probs
            ### for when gesture is something other than salute
            broker.g == _g,
            broker.i == _i,
            broker.d == _d,
        ]
    ]


###
# How do these two observers' priors affect their belief updates 
# when they observe the gesture?
###

### observer 1
print(
    f"Observer 1 who, before observing the speech, "
    "had a uniform prior belief about whether the broker "
    "is pro-fascism or pro-democracy"
)
res1 = speech_observation(
    prior_fasc=0.5, 
    print_table=True, return_aux=True, return_xarray=True)

print("\n\n")

### observer 2
print(
    f"Observer 2 who, before observing the speech, "
    "thought that the broker being pro-fascism was unlikely"
)
res2 = speech_observation(
    prior_fasc=0.01, 
    print_table=True, return_aux=True, return_xarray=True)
Observer 1 who, before observing the speech, had a uniform prior belief about whether the broker is pro-fascism or pro-democracy
+---------------+--------------+--------------+------------------------+
| _g: Gesture   | _i: Ideology | _d: Demeanor | speech_observation     |
+---------------+--------------+--------------+------------------------+
| SOMETHINGELSE | DEMOCRACY    | SUAVE        | 0.5048004388809204     |
| SOMETHINGELSE | DEMOCRACY    | AWKWARD      | 0.404244601726532      |
| SOMETHINGELSE | FASCISM      | SUAVE        | 0.05053058639168739    |
| SOMETHINGELSE | FASCISM      | AWKWARD      | 0.04042445123195648    |
| SALUTE        | DEMOCRACY    | SUAVE        | 0.0004948045825585723  |
| SALUTE        | DEMOCRACY    | AWKWARD      | 0.09896091371774673    |
| SALUTE        | FASCISM      | SUAVE        | 0.4453240931034088     |
| SALUTE        | FASCISM      | AWKWARD      | 0.45522022247314453    |
+---------------+--------------+--------------+------------------------+



Observer 2 who, before observing the speech, thought that the broker being pro-fascism was unlikely
+---------------+--------------+--------------+------------------------+
| _g: Gesture   | _i: Ideology | _d: Demeanor | speech_observation     |
+---------------+--------------+--------------+------------------------+
| SOMETHINGELSE | DEMOCRACY    | SUAVE        | 0.5547478199005127     |
| SOMETHINGELSE | DEMOCRACY    | AWKWARD      | 0.4442424774169922     |
| SOMETHINGELSE | FASCISM      | SUAVE        | 0.0005609123036265373  |
| SOMETHINGELSE | FASCISM      | AWKWARD      | 0.0004487296682782471  |
| SALUTE        | DEMOCRACY    | SUAVE        | 0.004558221437036991   |
| SALUTE        | DEMOCRACY    | AWKWARD      | 0.9116441607475281     |
| SALUTE        | FASCISM      | SUAVE        | 0.04143837094306946    |
| SALUTE        | FASCISM      | AWKWARD      | 0.04235922545194626    |
+---------------+--------------+--------------+------------------------+

Note that the probability values returned by the model no longer sum to 1. This is because conditioning restricts the possibility space—we are now only considering the possibilities in the world where _g occurred. Thus, we expect the posterior probabilities of { P(I, D \mid G{=}g) } to sum to one. Is that consistent with the output?

Comparing the marginal probabilities, we see

### observer 1 ###
print(f"Observer 1 thinks that...")
### P(FASCISM | SALUTE; observer1)
xa1 = res1.aux.xarray
pr1 = xa1.loc["SALUTE"]
print(f"\n    there's a high probability that the broker is pro-fascism:")
print(f"        P(FASCISM | SALUTE)   = {pr1.loc["FASCISM", :].sum():0.4f}")
print(f"        P(DEMOCRACY | SALUTE) = {pr1.loc["DEMOCRACY", :].sum():0.4f}")

### P(AWKWARD | SALUTE; observer1)
print(f"\n    the broker might be socially awkward or suave, could go either way (i.e. observer has low confidence):")
print(f"        P(AWKWARD | SALUTE) = {pr1.loc[:, "AWKWARD"].sum():0.4f}")
print(f"        P(SUAVE | SALUTE)   = {pr1.loc[:, "SUAVE"].sum():0.4f}")

print("\n")
### observer 2 ###
print(f"Observer 2 thinks that...")
### P(FASCISM | SALUTE; observer2)
xa2 = res2.aux.xarray
pr2 = xa2.loc["SALUTE"]
print(f"\n    the broker is unlikely to be pro-fascism:")
print(f"        P(FASCISM | SALUTE)   = {pr2.loc["FASCISM", :].sum():0.4f}")
print(f"        P(DEMOCRACY | SALUTE) = {pr2.loc["DEMOCRACY", :].sum():0.4f}")

### P(AWKWARD | SALUTE; observer2)
print(f"\n    the broker is just socially awkward:")
print(f"        P(AWKWARD | SALUTE) = {pr2.loc[:, "AWKWARD"].sum():0.4f}")
print(f"        P(SUAVE | SALUTE)   = {pr2.loc[:, "SUAVE"].sum():0.4f}")
Observer 1 thinks that...

    there's a high probability that the broker is pro-fascism:
        P(FASCISM | SALUTE)   = 0.9005
        P(DEMOCRACY | SALUTE) = 0.0995

    the broker might be socially awkward or suave, could go either way (i.e. observer has low confidence):
        P(AWKWARD | SALUTE) = 0.5542
        P(SUAVE | SALUTE)   = 0.4458


Observer 2 thinks that...

    the broker is unlikely to be pro-fascism:
        P(FASCISM | SALUTE)   = 0.0838
        P(DEMOCRACY | SALUTE) = 0.9162

    the broker is just socially awkward:
        P(AWKWARD | SALUTE) = 0.9540
        P(SUAVE | SALUTE)   = 0.0460

Exercises

  1. Describe these models in terms of Bayes’ rule. What’s the prior, likelihood, and posterior in these models? What is happening mathematically when we go from the first model to the second model?

  2. Are observers 1 and 2 equally rational? Explain.

  3. Adjust the likelihood and prior probabilities to match your beliefs about different people. Explain your adjustments and the effects. Did your adjustments bring the cognition predicted by the model closer to the patterns of cognition you were targeting?

  4. Extend the model in some fashion. You could add more causes, or more types of observations. You could model how inference is affected by observing one gesture (which could be more easily explained away as noisy movement production) versus multiple similar gestures (which could imply a deliberate signal). You could convert a binary variable into a discretized linear variable (e.g. turn Gesture into a perceptual similarity metric that express how confusable a gesture is with a fascist salute). Maybe the observers didn’t see the video but rather heard about it from someone they have differing degrees of trust in (see Jaynes, 2003, and also the exercise below) – perhaps the observers think the person tends to be hyperbolic or understated, or maybe the person is in their ingroup or outgroup.

Optional

  1. Jaynes (2003, Chapter 5, Section 3) describes how the same data can cause observers’ opinions to diverge. Extend the political power broker model above so that the observers update their beliefs in opposite directions given the same data. Describe why these changes lead to belief polarization.

    For an empirical study that applies these ideas to actual behavior, see Botvinik-Nezer et al. (2023).


%reset -f
import sys
import platform
import importlib.metadata

print("Python:", sys.version)
print("Platform:", platform.system(), platform.release())
print("Processor:", platform.processor())
print("Machine:", platform.machine())

print("\nPackages:")
for name, version in sorted(
    ((dist.metadata["Name"], dist.version) for dist in importlib.metadata.distributions()),
    key=lambda x: x[0].lower()  # Sort case-insensitively
):
    print(f"{name}=={version}")
Python: 3.13.2 (main, Feb  5 2025, 18:58:04) [Clang 19.1.6 ]
Platform: Darwin 23.6.0
Processor: arm
Machine: arm64

Packages:
annotated-types==0.7.0
anyio==4.8.0
appnope==0.1.4
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asttokens==3.0.0
async-lru==2.0.4
attrs==25.1.0
babel==2.17.0
beautifulsoup4==4.13.3
bleach==6.2.0
certifi==2025.1.31
cffi==1.17.1
cfgv==3.4.0
charset-normalizer==3.4.1
click==8.1.8
comm==0.2.2
contourpy==1.3.1
cycler==0.12.1
debugpy==1.8.13
decorator==5.2.1
defusedxml==0.7.1
distlib==0.3.9
executing==2.2.0
fastjsonschema==2.21.1
filelock==3.17.0
fonttools==4.56.0
fqdn==1.5.1
h11==0.14.0
httpcore==1.0.7
httpx==0.28.1
identify==2.6.8
idna==3.10
importlib_metadata==8.6.1
ipykernel==6.29.5
ipython==9.0.1
ipython_pygments_lexers==1.1.1
ipywidgets==8.1.5
isoduration==20.11.0
jax==0.5.2
jaxlib==0.5.1
jedi==0.19.2
Jinja2==3.1.6
joblib==1.4.2
json5==0.10.0
jsonpointer==3.0.0
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
jupyter-cache==1.0.1
jupyter-events==0.12.0
jupyter-lsp==2.2.5
jupyter_client==8.6.3
jupyter_core==5.7.2
jupyter_server==2.15.0
jupyter_server_terminals==0.5.3
jupyterlab==4.3.5
jupyterlab_pygments==0.3.0
jupyterlab_server==2.27.3
jupyterlab_widgets==3.0.13
kiwisolver==1.4.8
MarkupSafe==3.0.2
matplotlib==3.10.1
matplotlib-inline==0.1.7
memo-lang==1.1.0
mistune==3.1.2
ml_dtypes==0.5.1
nbclient==0.10.2
nbconvert==7.16.6
nbformat==5.10.4
nest-asyncio==1.6.0
networkx==3.4.2
nodeenv==1.9.1
notebook_shim==0.2.4
numpy==2.2.3
opt_einsum==3.4.0
optype==0.9.1
overrides==7.7.0
packaging==24.2
pandas==2.2.3
pandas-stubs==2.2.3.241126
pandocfilters==1.5.1
parso==0.8.4
pexpect==4.9.0
pillow==11.1.0
platformdirs==4.3.6
plotly==5.24.1
pre_commit==4.1.0
prometheus_client==0.21.1
prompt_toolkit==3.0.50
psutil==7.0.0
ptyprocess==0.7.0
pure_eval==0.2.3
pycparser==2.22
pydantic==2.10.6
pydantic_core==2.27.2
Pygments==2.19.1
pygraphviz==1.14
pyparsing==3.2.1
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-json-logger==3.3.0
pytz==2025.1
PyYAML==6.0.2
pyzmq==26.2.1
referencing==0.36.2
requests==2.32.3
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rpds-py==0.23.1
ruff==0.9.10
scikit-learn==1.6.1
scipy==1.15.2
scipy-stubs==1.15.2.0
seaborn==0.13.2
Send2Trash==1.8.3
setuptools==75.8.2
six==1.17.0
sniffio==1.3.1
soupsieve==2.6
SQLAlchemy==2.0.38
stack-data==0.6.3
tabulate==0.9.0
tenacity==9.0.0
terminado==0.18.1
threadpoolctl==3.5.0
tinycss2==1.4.0
toml==0.10.2
tornado==6.4.2
tqdm==4.67.1
traitlets==5.14.3
types-python-dateutil==2.9.0.20241206
types-pytz==2025.1.0.20250204
typing_extensions==4.12.2
tzdata==2025.1
uri-template==1.3.0
urllib3==2.3.0
virtualenv==20.29.3
wcwidth==0.2.13
webcolors==24.11.1
webencodings==0.5.1
websocket-client==1.8.0
widgetsnbextension==4.0.13
xarray==2025.1.2
zipp==3.21.0

References

Botvinik-Nezer, Rotem, Jones, Matt, & Wager, Tor D. (2023). A belief systems analysis of fraud beliefs following the 2020 US election. Nature Human Behaviour, 7(7), 1106–1119. https://doi.org/10.1038/s41562-023-01570-4
Jaynes, E. T. (2003). Probability Theory: The Logic of Science (G. Larry Bretthorst, Ed.; 1st ed.). Cambridge University Press. https://doi.org/10.1017/CBO9780511790423

Footnotes

  1. The Roman salute is not described by any historical Latin source. The gesture was anachronistically credited to the ancient Romans in the 18th century and has since been used by various political movements and nation-states. Many of those pushing this interpretation appear to be unaware that it is also known as the fascist salute, which could strike one as odd considering that they are so familiar with the gesture that they can confidently classify it on sight.

    ↩︎

  2. ↩︎

  3. If only he knew of some other way to gesticulate that

    ↩︎

  4. Or no gesture. There were a lot of options.

    ↩︎