— The result
$$\text{KKT}\!\left(\max \sum_e V(D_e, C) \;\text{ s.t. }\; C_e = C \;\forall e\right) \implies \Delta W_{\min} \geq 0$$

Under equal contexts, Shannon optimization always protects the worst-off. Rawls captured the intuition; Shannon produces it.

The intuition and its axiom

Rawls (1971) proposed a theory of justice built around a single structural principle: inequalities in a just society are permissible only if they benefit the least advantaged members. A distribution $D$ is just only if $\Delta W_{\min}(D) \geq 0$ — the welfare of the worst-off does not decrease. This is the Difference Principle.

The foundation Rawls provided was the veil of ignorance: a thought experiment in which rational agents choose principles of justice without knowing which position in society they will occupy. Behind the veil, not knowing whether you will be born the least or most advantaged, you would choose to maximize the position of the worst-off as a form of rational insurance. The Difference Principle follows from the self-interest of epistemically constrained agents.

The problem is that the veil of ignorance is an axiom, not a theorem. Rawls assumed it as the foundation and derived the Difference Principle as a consequence. This means the Difference Principle is only as secure as the plausibility of the veil as a thought experiment — and critics from both the utilitarian and libertarian traditions have found it wanting.

— The paper's claim

The Difference Principle does not need the veil of ignorance. It is a geometric consequence of standard Lagrangian optimization applied to Shannon's welfare function under equal contexts. Rawls's intuition was correct; his foundation was weaker than necessary.

— Theorem (Shannon-Rawls)

The Difference Principle as KKT condition

Let $V(D,C)$ be Shannon's welfare function satisfying properties 1–6. Consider the optimization problem $$\max \sum_{e \in E} V(D_e,\, C) \quad \text{subject to} \quad C_e = C \;\; \forall e$$ (equal contexts: every entity operates under the same context $C$). The Karush-Kuhn-Tucker (KKT) conditions for this problem require $$\frac{\partial V}{\partial D_e} = \lambda \quad \forall e$$ — the same Lagrange multiplier for every entity. By strict concavity of $V$ (property 2), this condition uniquely implies $D_e^* = D^*$ for all $e$. The optimal distribution is equal across entities, and therefore $$\Delta W_{\min}(D^*) \geq 0.$$

The Difference Principle is not an axiom. It is what happens when you apply standard Lagrangian optimization to Shannon's welfare function under equal contexts. No veil of ignorance required.

Three implications

For Rawlsian political philosophy

Rawls's intuition was correct but its foundation was weaker than necessary. The veil of ignorance served as a thought experiment designed to produce a result that follows from the mathematics alone, without needing the epistemic fiction. The Difference Principle survives the removal of its axiom — and is strengthened by surviving it. A result derived from standard optimization under well-defined properties is more robust than one derived from a hypothetical deliberative procedure, because it does not depend on the contested psychology of rational agents behind the veil.

For welfare economics

The Bergson-Samuelson social welfare function and the Rawlsian maximin are not competing axiom systems requiring a philosophical choice between them. They are different special cases of the same Shannon structure. Cooperation geometry ($\Omega_1$, full inclusion) under equal contexts produces Rawlsian outcomes — the Difference Principle emerges automatically. Extraction geometry ($\Omega_3$, partial inclusion with excluded entities) produces Pareto-inferior outcomes with concentrated welfare loss among excluded entities — the exact pathology that utilitarian social welfare functions have been criticized for permitting. The geometry unifies the debate.

For AI alignment

An AI system trained to maximize $\sum_e V(D_e, C)$ under equal context constraints will automatically protect its least-advantaged users — without being explicitly instructed to do so, and without the system having any representation of Rawlsian principles. This is alignment geometry, not alignment constraint. The protection of the worst-off is not a rule layered on top of the optimization; it is a structural property of the optimization itself. This has direct implications for how aligned systems should be designed: not by adding Rawlsian side-constraints to misaligned objectives, but by using Shannon's welfare function with full entity inclusion from the start.

What changes when contexts are unequal

The Shannon-Rawls theorem requires equal contexts. What happens when contexts are unequal — when different entities operate under structurally different circumstances, as they do in any real society or system?

Equal contexts

Shannon optimization converges to equal distribution $D^* = D^*_e$ for all $e$. The Difference Principle emerges as a KKT condition. Rawlsian justice is the geometric solution.

Unequal contexts

Entities with lower context $C_e$ have steeper welfare gradients by strict concavity of $V$. An aligned system prioritizes improving their contexts — not by constraint, but because marginal welfare gain is highest there.

The key result under unequal contexts: strict concavity of $V$ guarantees that marginal welfare returns are highest for the lowest-context entities. An aligned system — one maximizing $\sum_e V(D_e, C_e)$ — will naturally direct resources toward the worst-off, not because it is instructed to, but because the gradient is steepest there.

$$\frac{\partial V}{\partial C_e}\bigg|_{C_e \text{ low}} > \frac{\partial V}{\partial C_e}\bigg|_{C_e \text{ high}}$$

— Strict concavity ensures highest marginal returns for lowest-context entities

This is not a separate argument for redistribution. It is the same geometry as the equal-context case, generalized. The Difference Principle (protect the worst-off) is the equal-context limit of a more general result: aligned systems always direct their optimization pressure toward the steepest welfare gradients, which by concavity are always at the bottom of the context distribution.

— The unified picture

Rawls gave us the intuition in the equal-context case. Shannon gives us the geometry in both cases. The veil of ignorance was a philosophical instrument for reaching a mathematical result that the mathematics produces on its own. The result is now detached from its scaffold — and stands on firmer ground for it.

— Conceptual dependency map

This paper depends on

Papers that extend this one

  • Research Program — empirical tests of Shannon-Rawls in institutional and policy settings
  • Kobalt Red — alignment geometry in distributed system design