Conditioning

Asking questions of models by conditional inference

Cognition and conditioning

We have built up a tool set for constructing probabilistic generative models. These can represent knowledge about causal processes in the world: running one of these programs generates a particular outcome by sampling a “history” for that outcome. However, the power of a causal model lies in the flexible ways it can be used to reason about the world. In the Generative Models 1 we ran generative models forward to reason about outcomes from initial conditions. Generative models also enable reasoning in other ways. For instance, if we have a generative model in which X is the output of a process that depends on Y we may ask: “assuming I have observed a certain X, what must Y have been?” That is we can reason backward from outcomes to initial conditions. More generally, we can make hypothetical assumptions and reason about the generative history: “assuming something, how did the generative model run?”

Much of cognition can be understood in terms of conditional inference. In its most basic form, causal attribution is conditional inference: given some observed effects, what were the likely causes? Predictions are conditional inferences in the opposite direction: given that I have observed some cause, what are its likely effects? These inferences can be described by conditioning a probabilistic program that expresses a causal model. The acquisition of that causal model, or learning, is also conditional inference at a higher level of abstraction: given our general knowledge of how causal relations operate in the world, and some observed events in which candidate causes and effects co-occur in various ways, what specific causal relations are likely to hold between these observed variables?

To see how the same concepts apply in a domain that is not usually thought of as causal, consider language. The core questions of interest in the study of natural language are all at heart conditional inference problems. Given beliefs about the structure of my language, and an observed sentence, what should I believe about the syntactic structure of that sentence? This is the parsing problem. The complementary problem of speech production is related: given the structure of my language (and beliefs about others’ beliefs about that), and a particular thought I want to express, how should I encode the thought? Finally, the acquisition problem: given some data from a particular language, and perhaps general knowledge about universals of grammar, what should we believe about that language’s structure? This problem is simultaneously the problem facing the linguist and the child trying to learn a language.

Parallel problems of conditional inference arise in visual perception, social cognition, and virtually every other domain of cognition. In visual perception, we observe an image or image sequence that is the result of rendering a three-dimensional physical scene onto our two-dimensional retinas. A probabilistic program can model both the physical processes at work in the world that produce natural scenes, and the imaging processes (the “graphics”) that generate images from scenes. Perception can then be seen as conditioning this program on some observed output image and inferring the scenes most likely to have given rise to it.

When interacting with other people, we observe their actions, which result from a planning process, and often want to guess their desires, beliefs, emotions, or future actions. Planning can be modeled as a program that takes as input an agent’s mental states (beliefs, desires, etc.) and produces action sequences—for a rational agent, these will be actions that are likely to produce the agent’s desired states reliably and efficiently. A rational agent can plan their actions by conditional inference to infer what steps would be most likely to achieve their desired state. Action understanding, or interpreting an agent’s observed behavior, can be expressed as conditioning a planning program (a “theory of mind”) on observed actions to infer the mental states that most likely gave rise to those actions, and to predict how the agent is likely to act in the future.

Below is a program that implements the following:

if A=true then { P(B{=}\mathtt{true}) = 0.7 }

if A=false then { P(B{=}\mathtt{true}) = 0.1 }

We can also write these relationships as conditional probabilities:

\begin{align*} P(B{=}\mathtt{true} \mid A{=}\mathtt{true}) &= 0.7 \\ P(B{=}\mathtt{true} \mid A{=}\mathtt{false}) &= 0.1 \\ \end{align*}

This can be expressed using the Bernoulli distribution notation:

B \sim \text{Bernoulli}(p), p = \begin{cases} 0.7,& \text{if } A = true \\ 0.1, & \text{otherwise} \end{cases}

For this example, we’ll say that {P(A{=}\mathtt{true}) = 0.5}.

import jax
import jax.numpy as jnp
from memo import memo
from memo import domain as product
from enum import IntEnum
from matplotlib import pyplot as plt

class A(IntEnum):
    FALSE = 0
    TRUE = 1

class B(IntEnum):
    FALSE = 0
    TRUE = 1

@memo
def example_joint[_a: A, _b: B]():
    agent: chooses(a in A, wpp=1)
    agent: chooses(b in B, wpp=(
        (
            0.7 if b == {B.TRUE} else 0.3
        ) if a == {A.TRUE}
        else (
            0.1 if b == {B.TRUE} else 0.9
        )
    ))
    return Pr[agent.a == _a, agent.b == _b]

_ = example_joint(print_table=True)

+-------+-------+----------------------+
| _a: A | _b: B | example_joint        |
+-------+-------+----------------------+
| FALSE | FALSE | 0.44999998807907104  |
| FALSE | TRUE  | 0.05000000074505806  |
| TRUE  | FALSE | 0.15000000596046448  |
| TRUE  | TRUE  | 0.3499999940395355   |
+-------+-------+----------------------+

The table above shows the joint probability, {P(A, \, B)}.

Can you calculate these values based on the code for example_joint? Make sure that you can.

Solution

Calculating latent probabilities of a blackbox process

Let’s pretend for a moment that example_joint is not a formal model, but is rather some blackbox process in the world. We do not know the true generative process (we can’t see the program), but we can observe the output of the process. In other words, we do not know the causal structure or any of the latent probabilities, but we can collect data from example_joint(). In this case, the process outputs tuples of {(a, b)}. By making repeated observations, we can calculate the probability of observing different tuples (i.e. combinations of a and b).

Can we recover the latent probabilities that generated these data? Using only the table above, calculate the marginal probabilities {P(A{=}\mathtt{true})} and {P(B{=}\mathtt{true})}.

Solution

P(A{=}\mathtt{true}) = \sum_{b \in B} P(A{=}\mathtt{true}, B{=}b) \\ = P(A{=}\mathtt{true}, \, B{=}\mathtt{true}) + P(A{=}\mathtt{true}, \, B{=}\mathtt{false}) \\ = 0.35 + 0.15 = 0.5

P(B{=}\mathtt{true}) = \sum_{a \in A} P(B{=}\mathtt{true}, \, A{=}a) \\ = P(B{=}\mathtt{true}, \, A{=}\mathtt{true}) + P(B{=}\mathtt{true}, \, A{=}\mathtt{false}) \\ = 0.35 + 0.05 = 0.4

This follows from the law of total probability:

P(X) = \sum_{y} P(X, \, Y{=}y)

or equivalently:

P(X) = \sum_{y} P(Y{=}y) \, P(X \mid Y{=}y)

Using only the table above, calculate the conditional probabilities {P(B{=}\mathtt{true} \mid A{=}\mathtt{true})} and {P(B{=}\mathtt{true} \mid A{=}\mathtt{false})}.

Solution

P(B{=}\mathtt{true} \mid A{=}\mathtt{true}) = \frac{P(B{=}\mathtt{true}, \, A{=}\mathtt{true})}{P(A{=}\mathtt{true})} = \frac{0.35}{0.5} = 0.7

This follows from the definition of conditional probability: {p(x, y) = p(y) \, p(x \mid y)} where {p(y) > 0}

Returning to treating example_joint as an explicit model that we defined, how could we modify example_joint to output the marginal and conditional probabilities?

@memo
def example_marginal_a[_a: A]():
    agent: chooses(a in A, wpp=1)
    agent: chooses(b in B, wpp=(
        (
            0.7 if b == {B.TRUE} else 0.3
        ) if a == {A.TRUE}
        else (
            0.1 if b == {B.TRUE} else 0.9
        )
    ))
    return Pr[agent.a == _a]

_ = example_marginal_a(print_table=True)

+-------+---------------------+
| _a: A | example_marginal_a  |
+-------+---------------------+
| FALSE | 0.5                 |
| TRUE  | 0.5                 |
+-------+---------------------+

In the below is a copy of example_joint that has been renamed to example_marginal_b. Modify it so that it prints the marginal probability of B:

@memo
def example_marginal_b[_a: A, _b: B]():
    agent: chooses(a in A, wpp=1)
    agent: chooses(b in B, wpp=(
        (
            0.7 if b == {B.TRUE} else 0.3
        ) if a == {A.TRUE}
        else (
            0.1 if b == {B.TRUE} else 0.9
        )
    ))
    return Pr[agent.a == _a, agent.b == _b]

# _ = example_marginal_b(print_table=True) ### uncomment to print table

To condition the model, one might incorrectly think that you could just specify a value in the return statement, like so:

@memo
def example_conditional_a[_b: B]():
    agent: chooses(a in A, wpp=1)
    agent: chooses(b in B, wpp=(
        (
            0.7 if b == {B.TRUE} else 0.3
        ) if a == {A.TRUE}
        else (
            0.1 if b == {B.TRUE} else 0.9
        )
    ))
    return Pr[agent.a == {A.TRUE}, agent.b == _b]

_ = example_conditional_a(print_table=True)

+-------+------------------------+
| _b: B | example_conditional_a  |
+-------+------------------------+
| FALSE | 0.15000000596046448    |
| TRUE  | 0.3499999940395355     |
+-------+------------------------+

Why doesn’t this work? What does the table show?

In many PPL, it would be possible to condition on a realization of agent.a, but memo disallows this because agent does not have any uncertainty about the value taken on by A. To make a psychologically sensible model, we have to introduce epistemic uncertainty about A.

Let’s say that the agent has a mental model of a friend’s choice of A and B:

@memo
def mental_model_conditional_a[_a: A, _b: B]():
    agent: knows(_a, _b)
    agent: thinks[
        friend: knows(_a, _b),
        friend: chooses(a in A, wpp=1),
        friend: chooses(b in B, wpp=(
            (
                0.7 if b == {B.TRUE} else 0.3
            ) if a == {A.TRUE}
            else (
                0.1 if b == {B.TRUE} else 0.9
            )
        )),
    ]

    ### condition the model
    agent: observes [friend.a] is _a

    return agent[Pr[friend.a == _a, friend.b == _b]]

_ = mental_model_conditional_a(print_table=True)

+-------+-------+-----------------------------+
| _a: A | _b: B | mental_model_conditional_a  |
+-------+-------+-----------------------------+
| FALSE | FALSE | 0.8999999761581421          |
| FALSE | TRUE  | 0.10000000149011612         |
| TRUE  | FALSE | 0.30000001192092896         |
| TRUE  | TRUE  | 0.699999988079071           |
+-------+-------+-----------------------------+

Now the agent doesn’t have inherent access to the friend’s knowledge about the state of A, but can observe it (e.g. if the friend shares that information).

By conditioning the model with A is true, we are now drawing posterior samples rather than prior samples. Thus, whereas example_joint() shows the joint distribution {P(A, B)}, mental_model_conditional_a show the posterior distribution {P(B \mid A)}.

Try removing one of the conditionals from the return statement. What do you get when the return statement is

agent[Pr[friend.a == _a, friend.b == _b]]
agent[Pr[friend.a == _a]]
agent[Pr[friend.b == _b]]
agent[Pr[friend.a]]
agent[Pr[friend.b]]

Now for each of these three return statements, try removing _a: A and/or _b: B from the definition. Make sure you understand the effect of each modification (including why some combinations do not compile, and why others return redundant information or less information than you might want).

Statistical dependence

Imagine we draw a single sample from this model and observe that in that tuple, {A{=}\mathtt{true}}. What is our belief about the realization of B in that tuple? We can see in the program that if a == {A.TRUE}: 0.7 if b == {B.TRUE} else 0.3. This tells us {P(B{=}\mathtt{true} \mid A{=}\mathtt{true}) = 0.7}, which is what calculated above. If, on the other hand, we observed that {A{=}\mathtt{false}} in a tuple, there’d be a 10\% chance that {B{=}\mathtt{true}}.

What we’ve just shown is that knowing something about A changes the belief¹ about B. Different realizations of A change {P(B \mid A)}.

It is also evident that knowing the state of B updates the belief about the state of A:

P(A{=}\mathtt{true} \mid B{=}\mathtt{true}) = 0.875 \\ P(A{=}\mathtt{true} \mid B{=}\mathtt{false}) = 0.25

P(A \mid B)

@memo
def mental_model_conditional_b[_a: A, _b: B]():
    agent: knows(_a, _b)
    agent: thinks[
        friend: knows(_a, _b),
        friend: chooses(a in A, wpp=1),
        friend: chooses(b in B, wpp=(
            (
                0.7 if b == {B.TRUE} else 0.3
            ) if a == {A.TRUE}
            else (
                0.1 if b == {B.TRUE} else 0.9
            )
        )),
    ]
    agent: observes [friend.b] is _b

    return agent[Pr[friend.a == _a]]

_ = mental_model_conditional_b(print_table=True)

+-------+-------+-----------------------------+
| _a: A | _b: B | mental_model_conditional_b  |
+-------+-------+-----------------------------+
| FALSE | FALSE | 0.7499999403953552          |
| FALSE | TRUE  | 0.125                       |
| TRUE  | FALSE | 0.25                        |
| TRUE  | TRUE  | 0.875                       |
+-------+-------+-----------------------------+

When information about one variable conveys information about another variable, the variables are statistically dependent.

We write this using a symbol that means “not orthogonal”: {B \not\perp A}.

Defining statistical (in)dependence

What if knowing something about A never gave you information about B? Then the probability of B would not be changed by A taking on different values:

P(B) = P(B \mid A{=}\mathtt{true}) = P(B \mid A{=}\mathtt{false})

If this is the case, A and B are statistically independent: {B \perp A} (and thus {A \perp B})

In general, if {P(B \mid A) = P(B)}, then A and B are statistically independent, since information about A does not change the belief about B, i.e.: P(B \mid A) = P(B) ~~\iff~~ B \perp A

The \iff symbol means “if and only if” (“iff”). I.e. {\spadesuit \iff \clubsuit} means that \spadesuit is only true and is always true if \clubsuit is true, and vice versa.

Recall the definition of conditional probability: {P(A \mid B) = \frac{P(A, B)}{P(B)}}, which can be rewritten as:

P(A, \, B) = P(B) \; P(A \mid B)

or the corollary

P(A, \, B) = P(A) \; P(B \mid A)

By substituting the relationship from above, { P(B \mid A) = P(B) ~~\iff~~ B \perp A }, into the definition of conditional probability, we arrive at the definition of statistical independence:

{P(A, \, B) = P(A) \; P(B) ~~\iff~~ A \perp B}

In other words,

A \perp B \\ \iff \\ P(A{=}a, \, B{=}b) = P(A{=}a) \; P(B{=}b) \\ ~~\forall~~ (a, b) \in { A \times B }

The \forall symbol means “for all”, as in for every value that a and b can take.

If it is ever the case that this equality does not hold, then the random variables are statistically dependent (A \not\perp B):

A \not\perp B ~~~\text{if }~~ \\ { P(A{=}a, \, B{=}b) \neq P(A{=}a) \; P(B{=}b) } \\ ~~\text{ for any }~~ (a, b) \in \{ A \times B \}

Make sure you understand why it’s necessary that, if A and B are statistically dependent, then { P(B \mid A) \neq P(B) } and also { P(A \mid B) \neq P(A) }.

Dependence vs correlation

“Statistical dependence” is not the same as “correlation”. Things that are statistically dependent can be correlated or uncorrelated.

Consider the program below. B is causally dependent on A. Do you expect A and B to be statistically dependent? Do you expect A and B to be correlated? If you’re having trouble thinking through it, try drawing the shape of the data generated by the model. Once you’ve made your prediction, run the program.

from jax.scipy.stats.norm import pdf as normpdf

A = jnp.linspace(-1, 1, 11)
B = jnp.linspace(-1, 1.5, 11)

@jax.jit
def B_pdf(b, a):
    return normpdf(b, a * a, 0.1)

@memo
def f[_a: A, _b: B]():
    agent: chooses(a in A, wpp=1)
    agent: chooses(b in B, wpp=B_pdf(b, a))
    return Pr[agent.a == _a, agent.b == _b]

Result

res = f()

def lobf(x, y):
    import numpy as np
    return (np.unique(x), np.poly1d(jnp.polyfit(x, y, 1))(np.unique(x)))

def plot_results(x, y, ax=None, **kwargs):
    if ax is None:
        fig, ax = plt.subplots()
    ax.scatter(x, y)
    _ = ax.set_xlabel(kwargs.get("xlabel", None))
    _ = ax.set_ylabel(kwargs.get("ylabel", None))
    if "xlim" in kwargs:
        _ = ax.set_xlim(kwargs["xlim"])
    if "ylim" in kwargs:
        _ = ax.set_ylim(kwargs["ylim"])
    ax.plot(*lobf(x, y), color="red")
    ax.text(0.75, 0.43, f"r = {jnp.corrcoef(x, y)[0,1]:0.3f}")

b_ = jnp.array([jnp.dot(B, res[i, :]/res[i, :].sum()) for i in range(len(A))])

fig, ax = plt.subplots()
plot_results(A, b_, ax=ax, xlabel=r"$\operatorname{\mathbf{E}} \left[ A \right]$", ylabel=r"$\operatorname{E}[B]$", xlim=(-1.1,1.1), ylim=(-0.05,1.05))

It should be clear that A and B are statistically dependent,² but have a correlation of zero. (How would you prove that {A \not\perp B}?)

Compare the f program above to this one:

normpdfjit = jax.jit(normpdf)

B0 = jnp.linspace(0, 1.2, 20)
B = jnp.linspace(-0.5, 1.5, 20)
C = jnp.array([-1, 1])
A = jnp.linspace(-1, 1, 11)

@memo
def g[_a: A, _b: B]():
    agent: chooses(b0 in B0, wpp=1)
    agent: chooses(b in B, wpp=normpdfjit(b, b0, 0.1))
    agent: chooses(c in C, wpp=1)
    agent: chooses(a in A, to_maximize=-abs(a - c * b0**0.5))
    return Pr[agent.a == _a, agent.b == _b]

res = g()

b_ = jnp.array([jnp.dot(B, res[i, :]/res[i, :].sum()) for i in range(len(A))])

fig, ax = plt.subplots()
plot_results(A, b_, ax=ax, xlabel=r"$\mathbf{E}[A]$", ylabel=r"$\mathbf{E}[B]$", xlim=(-1.1,1.1), ylim=(-0.05,1.05))

In g, B is causally dependent on A, whereas in f, B is causally independent of A. But you can’t tell that from the data.

To borrow a phrase from Richard McElreath, the causes are not in the data.

Contextual dependence

In the program below, A and B are statistically dependent (A \not\perp B). But, if you only examined the relationship between A and B when A is positive, it would look like A and B are statistically independent ({A \perp B \mid A \geq 0}).

normpdfjit = jax.jit(normpdf)

A = jnp.linspace(-1, 1, 11)
B = jnp.linspace(-0.5, 1.5, 20)

@memo
def h[_a: A, _b: B]():
    agent: chooses(a in A, wpp=1)
    agent: chooses(b in B, wpp=(
        normpdfjit(b, a**2, 0.1)
        if a < 0
        else normpdfjit(b, 0, 0.1)))
    return Pr[agent.a == _a, agent.b == _b]

res = h()

fig, ax = plt.subplots()
plot_results(A, jnp.array([jnp.dot(B, res[i, :]/res[i, :].sum()) for i in range(len(A))]),
    ax=ax, xlabel=r"$\mathbf{E}[A]$", ylabel=r"$\mathbf{E}[B]$", xlim=(-1.1,1.1), ylim=(-0.25,1.05))

The important point here is that, in order to determine if {P(B \mid A) = P(B)}, we need to assess all possible realizations of A and B, i.e. { P(B \mid A) = P(B) ~~~~\forall~~ (a,b) \in \{A \times B \} }.

This is called stratification, and it’s implied when we talk about (in)dependence.

This points to one of the things that makes science hard. We don’t always know all of the causes, let alone measure them, let alone measure them at every possible value.

This is a central issue in trying to learn generative models by observing data. The causes are not in the data. In a more philosophical sense, is even possible to observe a cause?³ If what our sense systems deliver are statistical regularities—patterns of co-occurrence, coincidence, correspondence, covariance, and association—how do people learn causally-structured mental models of the world and other minds? One view argues that evolution has equipped us with an innate causal grammar that serves as the building blocks for mental models.

Discussion

What this means for neural / connectionist models.
Relate this to ideas we’ve discussed previously:
- The role of overhypotheses and inductive constraints/biases in learning generative models. Consider Chomsky’s notion of “poverty of the stimulus”.
- The role “enactive”/“embodied” cognition. How might the ability to causally intervene in the world change the problem of learning from data? How does this relate to things like the “code interpreter” of chatGPT? To reinforcement learning from human feedback (RLHF) in deep learning?
In philosophical sense, is possible to observe a cause? Say that you observe me knock over my coffee cup. Sure, you saw me have a causal effect on my coffee cup. But you had to learn to make that causal inference by observing the statistical regularities of how objects interact over your lifespan. You inferred the causal relationship between my movement and the movement of the coffee cup, and you had to learn to make that causal inference. At a more fundamental level, you’re needing to infer the spatial structure and physical properties of the objects, and that there are objects, based on patterns of retinal transductions—purely associative information.

Render env

%reset -f
import sys
import platform
import importlib.metadata

print("Python:", sys.version)
print("Platform:", platform.system(), platform.release())
print("Processor:", platform.processor())
print("Machine:", platform.machine())

print("\nPackages:")
for name, version in sorted(
    ((dist.metadata["Name"], dist.version) for dist in importlib.metadata.distributions()),
    key=lambda x: x[0].lower()  # Sort case-insensitively
):
    print(f"{name}=={version}")

Python: 3.13.3 | packaged by conda-forge | (main, Apr 14 2025, 20:44:30) [Clang 18.1.8 ]
Platform: Darwin 23.6.0
Processor: arm
Machine: arm64

Packages:
annotated-types==0.7.0
anyio==4.9.0
appnope==0.1.4
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
astroid==3.3.10
asttokens==3.0.0
async-lru==2.0.5
attrs==25.3.0
babel==2.17.0
beautifulsoup4==4.13.4
bleach==6.2.0
certifi==2025.4.26
cffi==1.17.1
cfgv==3.4.0
charset-normalizer==3.4.2
click==8.2.0
comm==0.2.2
contourpy==1.3.2
cycler==0.12.1
debugpy==1.8.14
decorator==5.2.1
defusedxml==0.7.1
dill==0.4.0
distlib==0.3.9
executing==2.2.0
fastjsonschema==2.21.1
filelock==3.18.0
fonttools==4.58.0
fqdn==1.5.1
h11==0.16.0
httpcore==1.0.9
httpx==0.28.1
identify==2.6.10
idna==3.10
importlib_metadata==8.7.0
ipykernel==6.29.5
ipython==9.2.0
ipython_pygments_lexers==1.1.1
ipywidgets==8.1.7
isoduration==20.11.0
isort==6.0.1
jax==0.6.0
jaxlib==0.6.0
jedi==0.19.2
Jinja2==3.1.6
joblib==1.5.0
json5==0.12.0
jsonpointer==3.0.0
jsonschema==4.23.0
jsonschema-specifications==2025.4.1
jupyter-cache==1.0.1
jupyter-events==0.12.0
jupyter-lsp==2.2.5
jupyter_client==8.6.3
jupyter_core==5.7.2
jupyter_server==2.16.0
jupyter_server_terminals==0.5.3
jupyterlab==4.4.2
jupyterlab_pygments==0.3.0
jupyterlab_server==2.27.3
jupyterlab_widgets==3.0.15
kiwisolver==1.4.8
MarkupSafe==3.0.2
matplotlib==3.10.3
matplotlib-inline==0.1.7
mccabe==0.7.0
memo-lang==1.2.0
mistune==3.1.3
ml_dtypes==0.5.1
nbclient==0.10.2
nbconvert==7.16.6
nbformat==5.10.4
nest-asyncio==1.6.0
networkx==3.4.2
nodeenv==1.9.1
notebook_shim==0.2.4
numpy==2.2.6
opt_einsum==3.4.0
optype==0.9.3
overrides==7.7.0
packaging==25.0
pandas==2.2.3
pandas-stubs==2.2.3.250308
pandocfilters==1.5.1
parso==0.8.4
pexpect==4.9.0
pillow==11.2.1
platformdirs==4.3.8
plotly==5.24.1
pre_commit==4.2.0
prometheus_client==0.22.0
prompt_toolkit==3.0.51
psutil==7.0.0
ptyprocess==0.7.0
pure_eval==0.2.3
pycparser==2.22
pydantic==2.11.4
pydantic_core==2.33.2
Pygments==2.19.1
pygraphviz==1.14
pylint==3.3.7
pyparsing==3.2.3
python-dateutil==2.9.0.post0
python-dotenv==1.1.0
python-json-logger==3.3.0
pytz==2025.2
PyYAML==6.0.2
pyzmq==26.4.0
referencing==0.36.2
requests==2.32.3
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rpds-py==0.25.0
ruff==0.11.10
scikit-learn==1.6.1
scipy==1.15.3
scipy-stubs==1.15.3.0
seaborn==0.13.2
Send2Trash==1.8.3
setuptools==80.7.1
six==1.17.0
sniffio==1.3.1
soupsieve==2.7
SQLAlchemy==2.0.41
stack-data==0.6.3
tabulate==0.9.0
tenacity==9.1.2
terminado==0.18.1
threadpoolctl==3.6.0
tinycss2==1.4.0
toml==0.10.2
tomlkit==0.13.2
tornado==6.5
tqdm==4.67.1
traitlets==5.14.3
types-python-dateutil==2.9.0.20250516
types-pytz==2025.2.0.20250516
typing-inspection==0.4.0
typing_extensions==4.13.2
tzdata==2025.2
uri-template==1.3.0
urllib3==2.4.0
virtualenv==20.31.2
wcwidth==0.2.13
webcolors==24.11.1
webencodings==0.5.1
websocket-client==1.8.0
widgetsnbextension==4.0.14
xarray==2025.4.0
zipp==3.21.0

Footnotes

Here I’m using “belief” in the normative sense, meaning “confidence or probability mass assigned by a rational observer”.↩︎
The state of A changes what states of B are probable. E.g. if {A > 0.5} we expect the state of B to be greater than if {-0.1 < A < 0.1}. i.e. { \operatorname{\mathbf{E}} \left[ B \mid A > 0.5 \right] > \operatorname{\mathbf{E}} \left[ B \mid -0.1 < A < 0.1 \right] }↩︎
Philosophers including David Hume have argued against the observability of causation.↩︎