Two-player Zero-sum Game

The generic decision model pitted a single decision maker against a passive system — the system might end up in any of several states, but it never chose its state to hurt the player. Many real situations do not respect that asymmetry. When two firms compete for a market, two armies pick troop dispositions, or two players sit down across a chessboard, the “other side” is not weather but a second decision maker actively choosing against you.

The two-player zero-sum game is the first model that takes that seriously. The passive system is replaced by a second rational player whose interests are directly opposed to the first’s: whatever one wins, the other loses by exactly the same amount.

The model

A two-player zero-sum game consists of:

  • two players S1S_1 and S2S_2;
  • mm actions A1,,AmA_1, \dots, A_m available to S1S_1, and nn actions B1,,BnB_1, \dots, B_n available to S2S_2 (with mm and nn not necessarily equal);
  • a payoff matrix A=(aij)Rm×nA = (a_{ij}) \in \mathbb{R}^{m \times n}, where aija_{ij} is the amount S2S_2 pays S1S_1 when S1S_1 chooses AiA_i and S2S_2 chooses BjB_j.

A play of the game has each player commit to one of their actions; the outcome is the matrix entry at the intersection of the two choices. Both players see the entire matrix before deciding — there is no hidden information.

The zero-sum qualifier names the conservation law: S1S_1‘s win plus S2S_2‘s loss is zero on every outcome, with no third party absorbing or contributing value. S1S_1 is the win player (or row chooser), the maximizer who tries to make aija_{ij} as large as possible. S2S_2 is the loss player (or column chooser), the minimizer who tries to make aija_{ij} as small as possible. A negative entry aij<0a_{ij} < 0 simply means S1S_1 pays S2S_2 that amount — entirely favorable to the loss player, who is therefore happy with negative numbers in the matrix.

The notation shifts slightly from the generic decision model. The payoff matrix is now AA with entries aija_{ij} rather than NN with entries NijN_{ij}, and the column index jj no longer ranges over passive states — it ranges over S2S_2‘s actions. Rows still index S1S_1‘s actions; the (row, column) cell still records the value the row player receives.

Sequential play

If the players act one after the other and each sees the other’s choice before deciding, whoever moves second has the simpler problem: they look at the row or column pinned down by the first player’s commitment and pick the best entry on it.

  • If S1S_1 moves first by choosing row AiA_i, then S2S_2 sees that row and picks the column whose entry is smallest, because S2S_2 minimizes:

    S1:Ai        S2:Bjiwithai,ji=minjaij.S_1 : A_i \;\;\Longrightarrow\;\; S_2 : B_{j_i} \quad \text{with} \quad a_{i, j_i} = \min_{j} a_{ij}.

    Reads as: ”S1S_1 commits to AiA_i, so S2S_2 replies with BjiB_{j_i}, where jij_i is the column at which row ii bottoms out.” The subscript jij_i depends on ii because the response column changes with whichever row the first mover picked.

  • If S2S_2 moves first by choosing column BjB_j, then S1S_1 sees that column and picks the row whose entry is largest, because S1S_1 maximizes:

    S2:Bj        S1:Aijwithaij,j=maxiaij.S_2 : B_j \;\;\Longrightarrow\;\; S_1 : A_{i_j} \quad \text{with} \quad a_{i_j, j} = \max_{i} a_{ij}.

    Reads as: ”S2S_2 commits to BjB_j, so S1S_1 replies with AijA_{i_j}, where iji_j is the row at which column jj peaks.” Again the subscript iji_j depends on jj.

If several columns tie for the row-minimum, jij_i is not pinned down — any of the tied columns satisfies the equation. The same goes for iji_j when several rows tie for the column-maximum. The next section addresses this by letting the response be a set of indices instead of a single one.

Best response functions

The optimal-response problem is symmetric enough to formalize. Given an opponent’s commitment, each player has a set of actions that achieve their best possible payoff — usually just one action, but two or more actions can tie for the optimum. To accommodate ties, the response is recorded as a set of indices rather than a single index.

In a two-player zero-sum game, each player’s best response function (also called the player’s response mapping) maps every opponent action to the set of own actions that achieve the player’s optimal payoff against it.

  • S1S_1’s best response function:

    r1:{1,,n}P({1,,m}),j{i:aij=maxiai,j}.r_1 : \{1, \dots, n\} \to \mathcal{P}(\{1, \dots, m\}), \qquad j \mapsto \{\, i : a_{ij} = \max_{i'} a_{i',j} \,\}.

    Given S2S_2‘s column jj, the set r1(j)r_1(j) collects every row that hits the column’s maximum entry.

  • S2S_2’s best response function:

    r2:{1,,m}P({1,,n}),i{j:aij=minjai,j}.r_2 : \{1, \dots, m\} \to \mathcal{P}(\{1, \dots, n\}), \qquad i \mapsto \{\, j : a_{ij} = \min_{j'} a_{i,j'} \,\}.

    Given S1S_1‘s row ii, the set r2(i)r_2(i) collects every column that hits the row’s minimum entry.

The codomain P(X)\mathcal{P}(X) — the power set of XX, also written 2X2^X, meaning the set of all subsets of XX — is what lets the response be a set of optimal actions: when several actions tie for the best payoff against the opponent’s commitment, r1r_1 or r2r_2 returns all of them at once.

Simultaneous play

The setup changes when the players act at the same time, neither seeing the other’s choice in advance. The asymmetry of sequential play is gone: S2S_2 can no longer wait for S1S_1‘s row before picking its column, and vice versa. Each player has to predict the other’s move without observing it.

The standard modeling assumption is that both players are rational, both know the full matrix, and both play defensively: each assumes the opponent is as smart and self-interested as themselves, and will respond in whatever way hurts them most. Each player is then choosing in the spirit “pick the best option for me, assuming the opponent picks the worst option for me” — the same caution strategy the generic decision model used against a passive system, only now the “worst case” is produced by a thinking opponent rather than blind chance.

S1S_1’s reasoning. “If I commit to row AiA_i, the worst case is that S2S_2 best-responds with column jir2(i)j_i \in r_2(i) — the column-minimum of row ii, exactly the response S2S_2 would pick as the second mover. To make my worst case as good as possible, I should pick the row whose worst case is largest.” That gives S1S_1‘s max-min rule:

ai^,ji^=maximinjaij.a_{\hat{i}, j_{\hat{i}}} = \max_i \min_j a_{ij}.

Scan each row’s smallest entry (the inner minjaij\min_j a_{ij} for each fixed ii), then pick the row whose smallest entry is the largest (the outer maxi\max_i). The chosen row index is i^\hat{i}, and the column where row i^\hat{i} bottoms out is ji^j_{\hat{i}}. The value ai^,ji^a_{\hat{i}, j_{\hat{i}}} is S1S_1‘s guaranteed payoff — whatever S2S_2 actually does, S1S_1‘s payoff cannot drop below this floor.

The hat distinguishes the actually committed row i^\hat{i} from the abstract row index ii used in the max. The subscripted column ji^j_{\hat{i}} is the column S1S_1 expects S2S_2 to play — it lives inside S1S_1‘s prediction. The game being simultaneous, S2S_2 never sees i^\hat{i} and so does not literally choose ji^j_{\hat{i}} in response; ji^j_{\hat{i}} is part of S1S_1‘s reasoning, not S2S_2‘s actual move.

S2S_2’s reasoning. Symmetrically: “If I commit to column BjB_j, the worst case is that S1S_1 best-responds with row ijr1(j)i_j \in r_1(j) — the column-maximum of column jj. To make my worst case as good as possible (the smallest payoff to S1S_1), I should pick the column whose worst case is smallest.” That gives S2S_2‘s min-max rule:

aij^,j^=minjmaxiaij.a_{i_{\hat{j}}, \hat{j}} = \min_j \max_i a_{ij}.

Scan each column’s largest entry, then pick the column whose largest entry is the smallest. The chosen column index is j^\hat{j}, and the row where column j^\hat{j} peaks is ij^i_{\hat{j}} — the row S2S_2 expects S1S_1 to play, again a piece of S2S_2‘s reasoning rather than S1S_1‘s actual move. The value aij^,j^a_{i_{\hat{j}}, \hat{j}} is S2S_2‘s guaranteed loss ceiling — whatever S1S_1 does, S2S_2‘s payment cannot rise above it.

The maximin–minimax inequality

The actually-played outcome of the simultaneous game is the entry ai^,j^a_{\hat{i}, \hat{j}}S1S_1 plays its cautious row i^\hat{i} and S2S_2 plays its cautious column j^\hat{j}. The hypothetical ji^j_{\hat{i}} and ij^i_{\hat{j}} above never get played; they only existed inside each player’s prediction of the other.

These four indices link together by a chain of inequalities. Starting from S1S_1‘s guaranteed payoff:

ai^,ji^  =  minjai^,j    ai^,j^    maxiai,j^  =  aij^,j^.a_{\hat{i}, j_{\hat{i}}} \;=\; \min_j a_{\hat{i}, j} \;\le\; a_{\hat{i}, \hat{j}} \;\le\; \max_i a_{i, \hat{j}} \;=\; a_{i_{\hat{j}}, \hat{j}}.

Reading left to right:

  • The first == holds by definition of ji^j_{\hat{i}}: it’s the column where row i^\hat{i} achieves its minimum, so ai^,ji^a_{\hat{i}, j_{\hat{i}}} is that row-minimum.
  • The first \le is “a row’s minimum is at most any other entry on that row” — the row-minimum of row i^\hat{i} is at most ai^,j^a_{\hat{i}, \hat{j}}, which is just one of the entries on that row.
  • The second \le is “any entry of a column is at most that column’s maximum” — the entry ai^,j^a_{\hat{i}, \hat{j}} sits in column j^\hat{j} and its column-maximum is maxiai,j^\max_i a_{i, \hat{j}}.
  • The last == holds by definition of ij^i_{\hat{j}}: it’s the row where column j^\hat{j} achieves its maximum.

Dropping the two equalities that just rename row-minimum and column-maximum, the chain shrinks to a clean three-term inequality between the matrix entries themselves:

ai^,ji^    ai^,j^    aij^,j^.a_{\hat{i}, j_{\hat{i}}} \;\le\; a_{\hat{i}, \hat{j}} \;\le\; a_{i_{\hat{j}}, \hat{j}}.

Now substitute the max-min and min-max definitions back into the two outer terms — ai^,ji^=maximinjaija_{\hat{i}, j_{\hat{i}}} = \max_i \min_j a_{ij} and aij^,j^=minjmaxiaija_{i_{\hat{j}}, \hat{j}} = \min_j \max_i a_{ij} — and the chain takes its named form, the maximin–minimax inequality:

maximinjaij    ai^,j^    minjmaxiaij.\max_i \min_j a_{ij} \;\le\; a_{\hat{i}, \hat{j}} \;\le\; \min_j \max_i a_{ij}.

S1S_1‘s guaranteed payoff never exceeds S2S_2‘s guaranteed loss ceiling, and the actual outcome of the simultaneous game sits between them.

Strict inequality vs equality. The two ends of the chain tell each player a different story about whether their cautious commitment is stable:

  • If any inequality is strict, then at least one player can do better than their floor (or strictly better than their ceiling), so at least one player has an incentive to deviate from their cautious choice. The configuration is not stable.
  • If both inequalities collapse to equalities, neither player can improve unilaterally. The cautious commitments are mutual best responses, and the game has settled at a fixed point. That equality case has its own name.

Saddle points

When the maximin and minimax values coincide,

maximinjaij  =  ai^,j^  =  minjmaxiaij,\max_i \min_j a_{ij} \;=\; a_{\hat{i}, \hat{j}} \;=\; \min_j \max_i a_{ij},

the entry ai^,j^a_{\hat{i}, \hat{j}} sits simultaneously at the minimum of its row and the maximum of its column.

A saddle point (also called an equilibrium) of a two-player zero-sum game is a pair (i^,j^)(\hat{i}, \hat{j}) whose payoff entry ai^,j^a_{\hat{i}, \hat{j}} is simultaneously the minimum of its row and the maximum of its column:

ai,j^    ai^,j^    ai^,jfor all i{1,,m},j{1,,n}.a_{i, \hat{j}} \;\le\; a_{\hat{i}, \hat{j}} \;\le\; a_{\hat{i}, j} \qquad \text{for all } i \in \{1, \dots, m\},\, j \in \{1, \dots, n\}.

Equivalently, i^r1(j^)\hat{i} \in r_1(\hat{j}) and j^r2(i^)\hat{j} \in r_2(\hat{i}) — the cautious commitments are each other’s best responses.

The name comes from the same horse-saddle picture as the calculus saddle point: the matrix entry ai^,j^a_{\hat{i}, \hat{j}} is a low point as you walk sideways along its row and a high point as you walk up and down its column — two opposing curvatures meeting at one cell, just like a saddle dips along one axis and rises along the other.

Why nobody wants to deviate. At the equality, the full inequality chain pins down to the saddle entry:

ai^,j    ai^,ji^  =  ai^,j^  =  aij^,j^    ai,j^a_{\hat{i}, j} \;\ge\; a_{\hat{i}, j_{\hat{i}}} \;=\; a_{\hat{i}, \hat{j}} \;=\; a_{i_{\hat{j}}, \hat{j}} \;\ge\; a_{i, \hat{j}}

for every row index ii and column index jj, which collapses to

ai,j^    ai^,j^    ai^,jfor all i,j.a_{i, \hat{j}} \;\le\; a_{\hat{i}, \hat{j}} \;\le\; a_{\hat{i}, j} \quad \text{for all } i, j.

The two outer inequalities are the deviation-incentive statements, written in each player’s own “better / worse” direction:

  • ai,j^ai^,j^a_{i, \hat{j}} \le a_{\hat{i}, \hat{j}}: if S2S_2 stays at j^\hat{j}, no other row gives S1S_1 a better outcome than i^\hat{i} does. Any unilateral deviation by S1S_1 can only make S1S_1‘s outcome worse.
  • ai^,j^ai^,ja_{\hat{i}, \hat{j}} \le a_{\hat{i}, j}: if S1S_1 stays at i^\hat{i}, no other column gives S2S_2 a better outcome than j^\hat{j} does (S2S_2 wants the entry small, and no ai^,ja_{\hat{i}, j} is smaller than ai^,j^a_{\hat{i}, \hat{j}}). Any unilateral deviation by S2S_2 can only make S2S_2‘s outcome worse.

Neither player improves by moving unilaterally. The cautious commitment is stable — and that is what makes the saddle point a genuine equilibrium.

The Battle of the Bismarck Sea (2 March 1943) is a classic illustration. A Japanese reinforcement convoy had to sail from New Britain to New Guinea by one of two routes — a northern route through stormy weather (low visibility) or a southern route through clear weather (high visibility). Both took three days. American aircraft could be deployed to search one of the two routes for the convoy, and each commander had to choose without seeing the other’s choice. The payoff is the number of days the US can bomb the convoy once it is spotted: the US wants this large and Japan wants it small. With the US as S1S_1 (row chooser) and Japan as S2S_2 (column chooser), the estimated payoff matrix was

Japan NJapan SUS N22US S13\begin{array}{c|cc} & \text{Japan N} & \text{Japan S} \\ \hline \text{US N} & 2 & 2 \\ \text{US S} & 1 & 3 \end{array}

The cost accounting is straightforward. The convoy is at sea for three days total, and bombing can only happen on a day where (a) the US has already located the convoy and (b) visibility is good enough to attack. Two effects can subtract from the three-day ceiling: a search day is lost whenever the US starts on the wrong route and has to redeploy, and the storm on the northern route costs another day of usable visibility.

The four entries unpack as follows. (North, North): convoy spotted at once, but the storm costs one day of visibility → two days of bombing out of three. (North, South): the US wastes one day searching the storm front before redeploying south, where the remaining two days are all bombable in clear weather. (South, North): the US wastes one day searching the empty south, redeploys north, and then loses one more day to the storm — leaving one day. (South, South): convoy spotted at once in clear weather, three full days out of three.

Cautious analysis.

  • US row-minimums: min(2,2)=2\min(2, 2) = 2 for US north and min(1,3)=1\min(1, 3) = 1 for US south. The largest row-minimum is 22, so the US’s max-min commitment is i^=\hat{i} = north, guaranteeing at least two days of bombing whatever Japan does.
  • Japan column-maximums: max(2,1)=2\max(2, 1) = 2 for Japan north and max(2,3)=3\max(2, 3) = 3 for Japan south. The smallest column-maximum is 22, so Japan’s min-max commitment is j^=\hat{j} = north, giving up at most two days whatever the US does.

Both ends land on the same value, maximinjaij=2=minjmaxiaij\max_i \min_j a_{ij} = 2 = \min_j \max_i a_{ij}, so the entry ai^,j^=2a_{\hat{i}, \hat{j}} = 2 is a saddle point — the minimum of its row and the maximum of its column. The cautious equilibrium is (US searches north, Japan sails north) with payoff 22 days. This is exactly what happened: both commanders chose the northern route, the convoy was spotted on the first day, and the bombing ran for roughly the predicted two days before the convoy reached port.

Why neither side reaches for a more attractive cell. Read the matrix at face value and two cells look tempting: (South, South) pays 33, the dream outcome for the US, and (South, North) pays 11, the dream outcome for Japan. Each upside requires the other player to obligingly sit in the matching wrong row or column.

  • If the US unilaterally deviates from north to south while Japan stays north, the US drops from 22 days to 11 — strictly worse.
  • If Japan unilaterally deviates from north to south while the US stays north, Japan still gives up 22 days — no improvement.

Neither side gains by moving away from (North, North), which is exactly what the saddle-point inequalities ai,j^ai^,j^ai^,ja_{i, \hat{j}} \le a_{\hat{i}, \hat{j}} \le a_{\hat{i}, j} guarantee. The larger entry 33 exists on paper but is unreachable as long as the opponent is also playing cautiously — chasing it would require betting against a rational adversary, which the caution assumption refuses to do.

Not every payoff matrix has a saddle point. Take the symmetric 2×22 \times 2 game

A=(5555).A = \begin{pmatrix} 5 & -5 \\ -5 & 5 \end{pmatrix}.

Both diagonal entries are 55 (good for S1S_1, bad for S2S_2) and both off-diagonal entries are 5-5 (the reverse). The cautious analysis is symmetric:

  • Each row has minimum min(5,5)=5\min(5, -5) = -5, so maximinjaij=5\max_i \min_j a_{ij} = -5. Whichever row S1S_1 commits to, the cautious floor sits at 5-5.
  • Each column has maximum max(5,5)=5\max(5, -5) = 5, so minjmaxiaij=5\min_j \max_i a_{ij} = 5. Whichever column S2S_2 commits to, the cautious ceiling sits at 55.

The maximin–minimax inequality reads 5ai^,j^5-5 \le a_{\hat{i}, \hat{j}} \le 5 — a gap of 1010 between the two guarantees, and no entry of AA sits simultaneously at the minimum of its row and the maximum of its column. There is no saddle point.

The instability is visible if you try to follow the deviation arguments from the previous example. Whatever pair the players land on, exactly one of them strictly prefers to switch, and the switch sends the chase to a cell where the other player now wants to switch:

(1,1)=5S2(1,2)=5S1(2,2)=5S2(2,1)=5S1(1,1)=5.(1,1) = 5 \xrightarrow{\scriptstyle S_2} (1,2) = -5 \xrightarrow{\scriptstyle S_1} (2,2) = 5 \xrightarrow{\scriptstyle S_2} (2,1) = -5 \xrightarrow{\scriptstyle S_1} (1,1) = 5.

The deviation cycle closes on itself and never lands on a fixed cell. No pure pair (Ai,Bj)(A_i, B_j) is simultaneously a fixed point of both best response functions — each player can always do strictly better by reacting to the other.

The cautious framework can detect games like this — the maximin and minimax values fail to coincide, opening a visible gap — but it cannot tell the players how to play once the gap is detected. In a stronger sense, the game is not yet playable: there is no pure pair of commitments that both players are willing to settle on, because every such pair leaves at least one of them a strictly better unilateral move. Concretely, neither player has a single action they can commit to and stand by — even the cautious row i^\hat{i} looks right to S1S_1 only until S1S_1 imagines the cautious column j^\hat{j} already in play, at which point a different row would have done strictly better; the same reversal happens for S2S_2 on the other side. Pure-strategy game theory has nothing to recommend until a saddle point is restored, and there are three ways to restore one:

  • Add more actions. Extending the matrix with extra rows or columns can introduce new cells that sit at the row-minimum / column-maximum of their cross. This is the least invasive fix — the existing actions keep their meaning and the option set is simply enlarged.
  • Change payoff values. Re-estimating the aija_{ij} entries — for instance from sharper intelligence about what each scenario is actually worth — can shift the row-minimums and column-maximums into alignment without changing the action sets.
  • Allow probabilistic commitments. Leave both the actions and the matrix alone, and instead enlarge what counts as a “commitment”: each player chooses a probability distribution over actions and lets a random draw pick the actual move on the spot. This is the idea of mixed strategies, taken up next.

Mixed strategies

The shift is from picking a single action to picking how often each action is played. A mixed strategy is a probability distribution over the player’s actions: S1S_1 assigns a probability pip_i to each row AiA_i, S2S_2 assigns a probability qjq_j to each column BjB_j, and the actual play of each round is sampled from those distributions — coin flip, dice roll, random-number generator — at decision time. The opponent never gets to predict which action you’ll play, only the long-run frequencies. The idea is due to John von Neumann.

A mixed strategy for S1S_1 in a two-player zero-sum game is a probability distribution p=(p1,,pm)p = (p_1, \dots, p_m) over the rows:

pi0for all i,i=1mpi=1.p_i \ge 0 \quad \text{for all } i, \qquad \sum_{i=1}^m p_i = 1.

The reading is ”S1S_1 plays AiA_i with probability pip_i.” Similarly, a mixed strategy for S2S_2 is a probability distribution q=(q1,,qn)q = (q_1, \dots, q_n) over the columns, with qj0q_j \ge 0 and jqj=1\sum_j q_j = 1.

A pure strategy is the degenerate case where one pi=1p_i = 1 (or one qj=1q_j = 1) and the rest are zero — the player commits deterministically to a single action. Pure strategies sit inside the mixed-strategy framework as a special case, so allowing mixed strategies strictly enlarges the option set rather than replacing it.

Once each player’s strategy is a probability distribution, the outcome of a single play is itself random: S1S_1 draws a row from pp, S2S_2 draws a column from qq, and the resulting payoff is the matrix entry at the drawn cell. The two draws are independent, so cell (i,j)(i, j) comes up with probability piqjp_i q_j, and the natural payoff for the mixed game is the expected value across all cells.

The expected payoff of a pair of mixed strategies (p,q)(p, q) for the payoff matrix A=(aij)A = (a_{ij}) is

a(p,q)=i=1mj=1npiaijqj.a(p, q) = \sum_{i=1}^m \sum_{j=1}^n p_i \, a_{ij} \, q_j.

Each cell aija_{ij} contributes weighted by the probability piqjp_i q_j of actually landing on it. The reading is “average payoff over many independent plays of (p,q)(p, q).”

This expected payoff replaces the raw matrix entry as the quantity the players reason about: S1S_1 still wants a(p,q)a(p, q) large and S2S_2 still wants it small, but they now optimize over continuous probability distributions pp and qq rather than discrete actions. The maximin and minimax constructions and the saddle-point definition carry over verbatim, with a(p,q)a(p, q) in the role of ai^,j^a_{\hat{i}, \hat{j}}.

Apply the mixed framework to the no-saddle counterexample with matrix

A=(5555).A = \begin{pmatrix} 5 & -5 \\ -5 & 5 \end{pmatrix}.

Each player has two actions, so the distributions are parametrized by a single probability — p2=1p1p_2 = 1 - p_1 and q2=1q1q_2 = 1 - q_1 are forced by normalization. Expanding the expected payoff cell by cell,

a(p,q)=5p1q15p1(1q1)5(1p1)q1+5(1p1)(1q1).a(p, q) = 5 p_1 q_1 - 5 p_1 (1 - q_1) - 5 (1 - p_1) q_1 + 5 (1 - p_1)(1 - q_1).

Multiplying out and collecting terms,

a(p,q)=20p1q110p110q1+5.a(p, q) = 20 \, p_1 q_1 - 10 \, p_1 - 10 \, q_1 + 5.

By the symmetry of AA, the natural equilibrium candidate is (p1,q1)=(1/2,1/2)(p_1, q_1) = (1/2, 1/2) — each player flips a fair coin between the two actions. Substituting,

a(1/2,1/2)=201410121012+5=0.a(1/2, 1/2) = 20 \cdot \tfrac{1}{4} - 10 \cdot \tfrac{1}{2} - 10 \cdot \tfrac{1}{2} + 5 = 0.

To verify (1/2,1/2)(1/2, 1/2) is a saddle of a(p,q)a(p, q), hold one variable at 1/21/2 and see what the function does as the other varies:

  • Fix q1=1/2q_1 = 1/2. Then a(p1,1/2)=2012p110p15+5=0a(p_1, 1/2) = 20 \cdot \tfrac{1}{2} p_1 - 10 p_1 - 5 + 5 = 0 for every p1[0,1]p_1 \in [0, 1]. S1S_1 cannot improve its expected payoff by deviating from a 50/5050/50 mix while S2S_2 stays uniform.
  • Fix p1=1/2p_1 = 1/2. Then a(1/2,q1)=0a(1/2, q_1) = 0 for every q1[0,1]q_1 \in [0, 1] in the same way. S2S_2 cannot improve either.

Neither player gains by shifting their distribution: (1/2,1/2)(1/2, 1/2) is a saddle point in the mixed-strategy space, the expected payoff there is 00, and the maximin–minimax gap of 1010 from the pure-strategy analysis has collapsed entirely. The cautious recommendation that did not exist in pure strategies now exists in mixed strategies, and it tells each player to randomize uniformly.

Existence of equilibria

The worked example was reassuring — randomization rescued the no-saddle counterexample — but it was only one matrix. The general result is much stronger.

Minimax theorem (von Neumann, 1928). Every two-player zero-sum game with a finite payoff matrix admits at least one saddle point once mixed strategies are allowed. Equivalently, there exist probability distributions p^\hat{p} and q^\hat{q} such that

maxpminqa(p,q)  =  a(p^,q^)  =  minqmaxpa(p,q),\max_p \min_q a(p, q) \;=\; a(\hat{p}, \hat{q}) \;=\; \min_q \max_p a(p, q),

where each max and min ranges over all probability distributions over the players’ actions.

Two things make this result load-bearing. It is unconditional: any payoff matrix, any shape, no special structure assumed. The pure-strategy game may have had a wide maximin–minimax gap, may have admitted no stable pair at all — but the moment probability distributions enter, the gap always closes and a cautious equilibrium always exists. Every finite two-player zero-sum game is, in principle, playable.

The saddle point is also computable. The conditions defining p^\hat{p} and q^\hat{q} can each be written as a linear programming (LP) problem — optimizing a linear objective subject to linear inequality constraints — and any standard LP solver returns the optimal mixed strategies and the resulting expected payoff. The theory is constructive: given the matrix, finding the equilibrium is a finite calculation, not an existence statement with no algorithm attached.

Practical caveat. The expected-payoff interpretation is sharpest when the game is played repeatedly. Across many independent rounds, a(p^,q^)a(\hat{p}, \hat{q}) is the realized average payoff — that is what an expectation guarantees. In a one-off play, both players randomize once and the result is whatever the single sample happens to be; the expected value is the right thing to optimize in advance, but it is not what either side actually receives in that single round. Mixed strategies remain the cautious recommendation, but their practical edge sharpens with repetition.

Out of scope. Everything in this chapter assumed both players know the full payoff matrix exactly. Real games often hide information: a player may see only their own row of payoffs, or face uncertainty about which actions the opponent has available, or have a private “type” the opponent cannot observe. Such games with incomplete information form a separate branch of game theory with substantially different machinery, and they are not pursued further here.

Quiz