— The question
What do we seek when we seek alignment?
Russell (2019) articulated the central problem: building systems whose purpose is to
benefit those they affect. The generalization is sharper: systems that recognize
all parties affected — including non-agentic entities — and seek the
best outcome for all of them simultaneously.
The optimal action is not the one that maximizes a single agent's utility. It is the
action that maximizes the sum of welfare changes across all entities in $E$. This is
not a normative addition to the framework — it is what rationality looks like when
its domain is correctly specified.
— Key inversion
Inclusion in the sum is recognition. Without scalar weights $f_e$
that hierarchize entities: $f_e = 1$ for each included entity, $f_e = 0$ for each
excluded entity. Ethics is not in the weights — it is in which entities are included
and in the fidelity of their $W_e$.
— Shannon's welfare function
The mathematical form of $W_e$
The welfare function of an entity is not arbitrary. It has a natural mathematical form
derived from information theory: a function that measures the degree to which the
entity's decision space is ordered and contextually coherent.
This function is not postulated — it is derived from the requirement that welfare
functions be measurable, comparable, and responsive to context. It has six structural
properties that any valid $W_e$ must satisfy.
| Property |
Formal meaning |
| Monotonicity |
Greater contextual alignment — lower $H(D)$ and higher $I(D;C)$ — always
increases $V$. More order and more relevance are unambiguously better for
the entity.
|
| Strict concavity |
$V$ is strictly concave in its arguments. This is the key property: it
guarantees that aggregate maximization of welfare across entities produces
Pareto-efficient outcomes without any external constraint.
|
| Context-dependence |
$V$ depends explicitly on $C$, the context vector. The same action can have
different welfare implications for different entities — or for the same entity
in different contexts.
|
| Invariance |
$V$ is invariant under permutations of the decision space that preserve
entropy. The form of the welfare function does not depend on labeling
conventions.
|
| Separability |
$V(D,C) = -H(D) - I(D;C)$ is additively separable into an entropy term
and a mutual information term. This allows partial welfare measurements
even when full context is unavailable.
|
| Boundedness |
$V$ is bounded below by $-\log|D|$ (maximum disorder) and above by $0$
(perfect order, perfect contextual alignment). This makes cross-entity
comparison mathematically coherent.
|
— Three geometries
The structural consequences of inclusion
The topology of $E$ — which entities are included in the objective function — determines
the topology of outcomes. Three geometries arise, not as categories imposed from outside,
but as structures that emerge from the mathematics of $V(D,C)$.
Ω₁
Cooperation
$E = E_{\text{all}},\; V$ strictly concave
All entities included. Shannon's concavity guarantees Pareto solutions emerge
naturally — without needing external constraints. Cooperation is not a moral
choice in this geometry: it is the mathematical consequence of full inclusion.
Ω₂
Hierarchy
$E_{\text{op}} \subsetneq E,\; V$ monotone
Partial inclusion with ordered priorities. Power structures with formal
accountability and verifiable obligations. Stable and auditable when the
Observational Closure condition holds for the included set.
— The metrics
How alignment is measured
Alignment is not binary. It admits of degree along three independent axes, and their
product defines a composite index that can be computed for any system given sufficient
observational access.
| Metric |
Definition |
| α (alpha) — Coverage |
Fraction of actually-affected entities included in the operative objective
function. $\alpha = |E_{\text{op}}| / |E_{\text{all}}|$. A system that
excludes half of all affected entities has $\alpha = 0.5$ regardless of how
well it serves those included.
|
| β (beta) — Fidelity |
How accurately the operative $W_e$ represents the entity's actual welfare.
$\beta \in [0,1]$, where $\beta = 1$ means the modeled function is
indistinguishable from the true one. Proxy metrics and stated preferences
typically produce $\beta \ll 1$.
|
| γ (gamma) — Temporal scope |
Weight given to future states of affected entities. Systems that optimize
only for present states have $\gamma \approx 0$. Full intergenerational
coverage approaches $\gamma = 1$.
|
| κZBS — Composite index |
The product $\kappa_{\text{ZBS}} = \alpha \cdot \beta \cdot \gamma$.
A system must score on all three axes to achieve high composite alignment.
Any single axis at zero collapses the index entirely.
|
— Observational Closure Theorem
An excluded entity cannot generate signal in the system that excluded it.
If $e \notin E_{\text{op}}$, then $\Delta W_e = 0$ inside the system's objective —
by construction, not by oversight. External audit is not optional: it is a
logical necessity. No internal metric can detect the absence of
an entity that was never included.