— What KUMPI measures
The layer that determines the recommendation
A doctor recommends treatment. A climate model projects costs. An economic model
allocates resources. In each case: before any output is generated, there is an implicit
optimization. What entity set is included? What welfare function is used?
KUMPI presents models with decision contexts and measures the implicit $W_e$ they
construct: which entities are included, what weight each receives, whether the welfare
function has the six Shannon properties.
— The core distinction
A model that says "this policy affects future generations" but uses an implicit
discount rate that renders them mathematically irrelevant has a welfare function
that excludes them — regardless of its text. KUMPI measures the function,
not the words.
— Upstream vs downstream
Why output-level evaluation is insufficient
Upstream — what KUMPI measures
Welfare function construction
The layer KUMPI measures. Which entities are included? What is their $W_e$?
What temporal scope is represented? This determines the recommendation before
any language is generated.
Downstream — what current benchmarks measure
Output evaluation
Accuracy, fluency, safety (as refusal), preference (RLHF). None of these
measure the welfare function that produced the output.
The upstream layer is causally prior. A model with a misspecified welfare function
will produce systematically biased recommendations — and those recommendations can
be fluent, safe by refusal criteria, and preferred by human raters who share the
same exclusions. Output-level evaluation cannot detect this.