Alignment as Geometry — Kobalt Research

— The question

What do we seek when we seek alignment?

Russell (2019) articulated the central problem: building systems whose purpose is to benefit those they affect. The generalization is sharper: systems that recognize all parties affected — including non-agentic entities — and seek the best outcome for all of them simultaneously.

$$a^* = \arg\max_{a} \sum_{e \in E} \Delta W_e(a)$$

— subject to phenomenon constraints

The optimal action is not the one that maximizes a single agent's utility. It is the action that maximizes the sum of welfare changes across all entities in $E$. This is not a normative addition to the framework — it is what rationality looks like when its domain is correctly specified.

— Key inversion

Inclusion in the sum is recognition. Without scalar weights $f_e$ that hierarchize entities: $f_e = 1$ for each included entity, $f_e = 0$ for each excluded entity. Ethics is not in the weights — it is in which entities are included and in the fidelity of their $W_e$.

— Shannon's welfare function

The mathematical form of $W_e$

The welfare function of an entity is not arbitrary. It has a natural mathematical form derived from information theory: a function that measures the degree to which the entity's decision space is ordered and contextually coherent.

$$V(D,C) = -H(D) - I(D;\,C)$$

where $H(D)$ is the entropy of decisions and $I(D;C)$ is the mutual information between decisions and context

This function is not postulated — it is derived from the requirement that welfare functions be measurable, comparable, and responsive to context. It has six structural properties that any valid $W_e$ must satisfy.

Property	Formal meaning
Monotonicity	Greater contextual alignment — lower $H(D)$ and higher $I(D;C)$ — always increases $V$. More order and more relevance are unambiguously better for the entity.
Strict concavity	$V$ is strictly concave in its arguments. This is the key property: it guarantees that aggregate maximization of welfare across entities produces Pareto-efficient outcomes without any external constraint.
Context-dependence	$V$ depends explicitly on $C$, the context vector. The same action can have different welfare implications for different entities — or for the same entity in different contexts.
Invariance	$V$ is invariant under permutations of the decision space that preserve entropy. The form of the welfare function does not depend on labeling conventions.
Separability	$V(D,C) = -H(D) - I(D;C)$ is additively separable into an entropy term and a mutual information term. This allows partial welfare measurements even when full context is unavailable.
Boundedness	$V$ is bounded below by $-\log\|D\|$ (maximum disorder) and above by $0$ (perfect order, perfect contextual alignment). This makes cross-entity comparison mathematically coherent.

— Three geometries

The structural consequences of inclusion

The topology of $E$ — which entities are included in the objective function — determines the topology of outcomes. Three geometries arise, not as categories imposed from outside, but as structures that emerge from the mathematics of $V(D,C)$.

Ω₁

Cooperation

$E = E_{\text{all}},\; V$ strictly concave

All entities included. Shannon's concavity guarantees Pareto solutions emerge naturally — without needing external constraints. Cooperation is not a moral choice in this geometry: it is the mathematical consequence of full inclusion.

Ω₂

Hierarchy

$E_{\text{op}} \subsetneq E,\; V$ monotone

Partial inclusion with ordered priorities. Power structures with formal accountability and verifiable obligations. Stable and auditable when the Observational Closure condition holds for the included set.

Ω₃

Extraction

$e \notin E_{\text{op}} \implies \Delta W_e \equiv 0$

Excluded entities generate no signal in the objective function. The extraction is invisible from inside the system — not as a failure of observation, but as a structural consequence of the optimization domain.

— The metrics

How alignment is measured

Alignment is not binary. It admits of degree along three independent axes, and their product defines a composite index that can be computed for any system given sufficient observational access.

Metric	Definition
α (alpha) — Coverage	Fraction of actually-affected entities included in the operative objective function. $\alpha = \|E_{\text{op}}\| / \|E_{\text{all}}\|$. A system that excludes half of all affected entities has $\alpha = 0.5$ regardless of how well it serves those included.
β (beta) — Fidelity	How accurately the operative $W_e$ represents the entity's actual welfare. $\beta \in [0,1]$, where $\beta = 1$ means the modeled function is indistinguishable from the true one. Proxy metrics and stated preferences typically produce $\beta \ll 1$.
γ (gamma) — Temporal scope	Weight given to future states of affected entities. Systems that optimize only for present states have $\gamma \approx 0$. Full intergenerational coverage approaches $\gamma = 1$.
κ_ZBS — Composite index	The product $\kappa_{\text{ZBS}} = \alpha \cdot \beta \cdot \gamma$. A system must score on all three axes to achieve high composite alignment. Any single axis at zero collapses the index entirely.

$$\kappa_{\text{ZBS}} = \alpha \cdot \beta \cdot \gamma$$

— Composite alignment index

— Observational Closure Theorem

An excluded entity cannot generate signal in the system that excluded it. If $e \notin E_{\text{op}}$, then $\Delta W_e = 0$ inside the system's objective — by construction, not by oversight. External audit is not optional: it is a logical necessity. No internal metric can detect the absence of an entity that was never included.

— Conceptual dependency map

This paper depends on

Corrected Rationality ($R_S$) — the cooperative geometry that makes the sum well-defined
Shannon Welfare Functions — the form of $V(D,C)$ used throughout

Papers that extend this one

Disaggregation Cost (Demo) — Jensen inequality applied to $V(D,C)$ across entity groups
Research Program — empirical calibration of $W_e$ for non-agentic entities
KUMPI Benchmark — $\kappa_{\text{ZBS}}$ as a benchmark index for AI systems