Generic Decision Model

Every decision faces the same abstract shape: a decision maker with a handful of available actions, a system that might be in one of several possible states, and an outcome that depends on both. The generic decision model strips this to its skeleton — one player, a finite set of actions, a finite set of states, and a number for every (action, state) pair telling the player what they get.

What changes between problems is not the shape but the information climate — how much does the player know about which state will materialize? The answer splits the world into three sharply different cases.

Three classes of decision situations

A decision under certainty is one where every rule, action, and consequence is fully known — including which state the system is in. The decision maker has complete information about the situation they’re choosing in.

Chess is a clean illustration. At any board position you know the full state, the legal moves available to you, and the exact resulting position for any move you make — there are no hidden variables, no dice rolls, no surprise from the system. The certainty breaks the moment something unknown enters the picture: simultaneous play (where both players commit before seeing each other’s move), opponents who occasionally throw in a random move, or incomplete-information variants where pieces are hidden.

A decision under risk is one where the state is not known in advance, but a probability distribution over the possible states is known. The decision maker knows how likely each state is, even though they cannot pin down which one will actually occur.

Insurance pricing and casino games live here — the actuary doesn’t know which policyholders will file claims this year, and the gambler doesn’t know which face the die will land on, but the probabilities are calibrated well enough to compute expected values and act on them.

A decision under uncertainty is one where the state is not known in advance and no probability distribution over the states is available. The decision maker faces a set of possibilities they can enumerate but cannot weight.

Launching a product into a brand-new market or planning around a one-off event are classic uncertainty problems — you can list the scenarios that might unfold, but there’s no track record to attach probabilities to.

Notice the gradient. Certainty fixes exactly which state occurs; risk fixes the full probability distribution over the states; uncertainty fixes only the list of states that are possible, with no information about how likely each is. The three climates call for genuinely different strategies, which is the entire point of the rest of this chapter.

The model

The generic decision model consists of:

  • one decision maker (or player) SS, choosing against a passive system;
  • mm alternative actions (also called strategies) A1,,AmA_1, \dots, A_m available to the player;
  • nn possible initial states s1,,sns_1, \dots, s_n the system may be in;
  • a payoff (or rating) NijRN_{ij} \in \mathbb{R} for each pair — the value the player receives when they take action AiA_i in state sjs_j;
  • the payoff matrix N=(Nij)1im, 1jnN = (N_{ij})_{1 \le i \le m,\ 1 \le j \le n}, collecting every action-state combination into one m×nm \times n table.

There’s only one player here — the system is treated as a passive thing, not as a second strategic agent. That choice is what makes this a decision model rather than a game; multi-player extensions, where the “system” is itself a strategic chooser, follow in later sections.

The payoff NijN_{ij} is the reward the player walks away with for a given (action, state) pair. It can be anything that admits a real-valued scale — profit in euros, score in a game, utility in some abstract unit, or a negative cost. Bigger is better.

Read the payoff matrix as a lookup table — rows index actions, columns index states, and the entry NijN_{ij} is what the player gets in the cell where row ii meets column jj.

Strategy under certainty

When the state is known, the decision becomes mechanical. Say the realized state is sjs_j — that pins the player to the jj-th column of the payoff matrix, and the only thing left is to choose which row. The best row is the one with the largest entry in that column:

Ni^j=maxiNijN_{\hat{i}j} = \max_{i} N_{ij}

The optimal action is the index i^{1,,m}\hat{i} \in \{1, \dots, m\} that attains that maximum. There’s nothing subtle here — perfect information collapses the decision to plain lookup.

The strategies that follow in this chapter all live downstream of that’s no longer true. Once the state is uncertain, the column-picking trick fails — which column do you even read? — and every strategy from here on has to wrestle with the missing information in its own way.

Strategies under risk

When the state is no longer known, the column-picking trick fails because there is no single column to read. Each of the three strategies below sidesteps that by collapsing the entire payoff matrix to one number per row, then picking the row with the best resulting number. None of them needs probabilities — they work off the payoff matrix alone, which is why the same rules apply equally well when no distribution over states is available either.

Throughout this section, i^{1,,m}\hat{i} \in \{1, \dots, m\} denotes the index of the chosen action — the hat marks “this is what the decision maker picks.” The state index carries no hat, because the player doesn’t choose the state; the symbol ji^j_{\hat{i}} just names the state that the inner min\min or max\max singles out for the chosen action i^\hat{i}. The expression Ni^ji^N_{\hat{i}j_{\hat{i}}} is then the payoff matrix entry at row i^\hat{i}, column ji^j_{\hat{i}} — the payoff value the chosen strategy is built around.

Caution (max-min payoff)

The caution strategy (also called max-min payoff) picks an action whose worst possible payoff is as good as possible:

Ni^ji^=maximinjNijN_{\hat{i}j_{\hat{i}}} = \max_i \min_j N_{ij}

For each candidate action AiA_i, take the smallest entry across its row — the payoff if the worst-case state for that action materializes. Then pick the action whose worst case is highest.

Read it as a hedge against bad luck. You assume the system is out to get you, look at the floor each action guarantees, and pick the action with the highest floor. Whichever state actually materializes, the chosen action’s payoff cannot drop below that floor. That’s also why the strategy is nicknamed the payoff-maximizer — it maximizes the guaranteed floor, not the best-case ceiling.

Full risk (max-max payoff)

The full-risk strategy (also called max-max payoff) picks an action whose best possible payoff is as good as possible:

Ni^ji^=maximaxjNijN_{\hat{i}j_{\hat{i}}} = \max_i \max_j N_{ij}

For each candidate action AiA_i, take the largest entry across its row — the payoff if the most favorable state for that action materializes. Then pick the action whose best case is highest.

This is the gambler’s pick: assume luck is on your side, look at the ceiling of each action, and chase the action with the highest ceiling. There is no protection if the state turns out otherwise — the same action’s worst case might be terrible. The strategy bets everything on the upside.

Alternative-caution (min-max regret)

The third strategy reframes the problem in terms of missed opportunity. The relevant question becomes: given the state that actually materializes, how much payoff did I give up by not picking the best action for that state? Minimizing the worst-case version of that gap gives a qualitatively different decision rule.

Two pieces of notation set up the formula. First, for each state sjs_j, write

Nj=maxiNijN_j = \max_i N_{ij}

for the largest entry of column jj — the score the player would walk away with in state sjs_j if they knew the state in advance and picked the best action for it.

The regret (also called opportunity loss or risk) of action AiA_i in state sjs_j is the gap between the state’s best achievable payoff NjN_j and what AiA_i actually scores in that state:

Rij=NjNijR_{ij} = N_j - N_{ij}

It measures how much the player would kick themselves afterward for picking AiA_i when the state turned out to be sjs_j — how much further from the best-possible-for-that-state outcome they ended up. By construction Rij0R_{ij} \ge 0, and Rij=0R_{ij} = 0 exactly when AiA_i happens to be the best action for state sjs_j.

Collected for every pair, these entries form the regret matrix (or opportunity-loss matrix or risk matrix) R=(Rij)R = (R_{ij}) — the payoff matrix with each column re-zeroed at its column maximum.

The alternative-caution strategy (also called min-max risk or min-max regret) picks an action whose worst-case regret across states is as small as possible:

Ri^ji^=minimaxjRijR_{\hat{i}j_{\hat{i}}} = \min_i \max_j R_{ij}

For each candidate action AiA_i, take the largest entry across its row of the regret matrix — the most the player could regret choosing it. Then pick the action whose worst-case regret is smallest.

Read this as caution about missing out rather than caution about losing. Where the caution strategy minimizes the worst absolute payoff, alternative-caution — also called the risk-minimizer — minimizes the worst FOMO (fear of missing out — the gap between what you actually got and what was achievable in that state). The two strategies typically pick different actions because regret pivots the comparison against each column’s maximum rather than against absolute payoffs.

To see how the three strategies pull in different directions, take a 2×22 \times 2 case with one dramatic row and one safe row.

Suppose m=n=2m = n = 2 with payoff entries N11=0N_{11} = 0, N12=100N_{12} = 100, and N21=N22=1N_{21} = N_{22} = 1:

N=(010011).N = \begin{pmatrix} 0 & 100 \\ 1 & 1 \end{pmatrix}.

Action 11 is the dramatic row — nothing in state s1s_1 but the jackpot in state s2s_2. Action 22 is the safe row — a guaranteed payoff of 11 either way.

The column maxima are N1=1N_1 = 1 (best payoff achievable in s1s_1) and N2=100N_2 = 100 (best payoff achievable in s2s_2). Subtracting each NijN_{ij} from its column’s maximum gives the regret matrix

R=(10099).R = \begin{pmatrix} 1 & 0 \\ 0 & 99 \end{pmatrix}.

The only large entry is R22=99R_{22} = 99 — picking the safe action in the jackpot state walks past the 100100.

Applying each strategy:

  • Caution (max-min payoff) — row 1’s worst case is 00, row 2’s worst case is 11. Maximum of those is 11, attained by action 22.
  • Full risk (max-max payoff) — row 1’s best case is 100100, row 2’s best case is 11. Maximum is 100100, attained by action 11.
  • Alternative-caution (min-max regret) — row 1’s worst regret is 11, row 2’s worst regret is 9999. Minimum is 11, attained by action 11.

The surprise is that full risk and alternative-caution agree on action 11, despite being motivated by opposite urges. Full risk reaches for the 100100 in state s2s_2; alternative-caution flees from the regret of missing it. Both routes lead to row 11. Caution is the only one that resists — row 11‘s floor is a 00, and the guaranteed 11 of row 22 beats it.

The three strategies above sit at extremes — pure worst-case, pure best-case, or pure worst-case regret. The next two soften that by either blending the extremes or by injecting an explicit guess about how likely each state is.

Pessimism–optimism (Hurwicz)

Rather than committing entirely to the worst case or the best case, the Hurwicz strategy weights the two. For each action AiA_i, write the worst and best entries of its row as

mi=minjNij,Mi=maxjNij,m_i = \min_j N_{ij}, \qquad M_i = \max_j N_{ij},

and fix a pessimism weight α[0,1]\alpha \in [0, 1] that decides how much importance to give the worst case relative to the best.

The pessimism–optimism strategy (also called the Hurwicz strategy) picks the action i^\hat{i} that maximizes the weighted blend of its worst and best payoffs:

αmi^+(1α)Mi^=maxi[αmi+(1α)Mi]\alpha \cdot m_{\hat{i}} + (1 - \alpha) \cdot M_{\hat{i}} = \max_{i} \big[\alpha \cdot m_i + (1 - \alpha) \cdot M_i\big]

The two extreme settings of α\alpha recover earlier strategies: α=1\alpha = 1 puts all weight on the worst case and gives caution, while α=0\alpha = 0 puts all weight on the best case and gives full risk. Any α\alpha strictly between 00 and 11 produces a strategy that mixes the two.

The Hurwicz strategy gives the modeler a single dial to set their disposition toward risk. Picking α=0.5\alpha = 0.5 treats best and worst case as equally informative; pushing α\alpha toward 11 pulls the decision toward caution; pushing toward 00 pulls it toward the gambler. There is no objectively correct α\alpha — it encodes the decision maker’s temperament rather than a property of the problem.

Principle of insufficient reason (Laplace)

The Laplace strategy goes a different way. Instead of working only with worst and best cases, it converts the uncertainty problem into a risk problem by assuming the missing distribution: if we have no information to weight one state over another, treat all states as equally likely and maximize expected payoff under that uniform distribution.

The principle of insufficient reason (also called the Laplace strategy) picks the action whose row average is highest:

Ei^=maxi1nj=1nNijE_{\hat{i}} = \max_{i} \frac{1}{n} \sum_{j=1}^{n} N_{ij}

For each action AiA_i, the inner expression 1njNij\frac{1}{n} \sum_j N_{ij} is the unweighted average of its payoffs across all nn states — equivalently, its expected payoff when every state is treated as equally likely (probability 1/n1/n each).

The rationale is symmetry: if you genuinely have no reason to weight one state above another, the most neutral default is to weight them all equally. That assumption converts the uncertainty problem into a risk problem under a uniform prior and lets you fall back on plain expected-value maximization — which under uniform weights is just the row average.

Quiz