Two-player Zero-sum Game
The generic decision model pitted a single decision maker against a passive system — the system might end up in any of several states, but it never chose its state to hurt the player. Many real situations do not respect that asymmetry. When two firms compete for a market, two armies pick troop dispositions, or two players sit down across a chessboard, the “other side” is not weather but a second decision maker actively choosing against you.
The two-player zero-sum game is the first model that takes that seriously. The passive system is replaced by a second rational player whose interests are directly opposed to the first’s: whatever one wins, the other loses by exactly the same amount.
The model
A two-player zero-sum game consists of:
- two players and ;
- actions available to , and actions available to (with and not necessarily equal);
- a payoff matrix , where is the amount pays when chooses and chooses .
A play of the game has each player commit to one of their actions; the outcome is the matrix entry at the intersection of the two choices. Both players see the entire matrix before deciding — there is no hidden information.
The zero-sum qualifier names the conservation law: ‘s win plus ‘s loss is zero on every outcome, with no third party absorbing or contributing value. is the win player (or row chooser), the maximizer who tries to make as large as possible. is the loss player (or column chooser), the minimizer who tries to make as small as possible. A negative entry simply means pays that amount — entirely favorable to the loss player, who is therefore happy with negative numbers in the matrix.
The notation shifts slightly from the generic decision model. The payoff matrix is now with entries rather than with entries , and the column index no longer ranges over passive states — it ranges over ‘s actions. Rows still index ‘s actions; the (row, column) cell still records the value the row player receives.
Sequential play
If the players act one after the other and each sees the other’s choice before deciding, whoever moves second has the simpler problem: they look at the row or column pinned down by the first player’s commitment and pick the best entry on it.
-
If moves first by choosing row , then sees that row and picks the column whose entry is smallest, because minimizes:
Reads as: ” commits to , so replies with , where is the column at which row bottoms out.” The subscript depends on because the response column changes with whichever row the first mover picked.
-
If moves first by choosing column , then sees that column and picks the row whose entry is largest, because maximizes:
Reads as: ” commits to , so replies with , where is the row at which column peaks.” Again the subscript depends on .
If several columns tie for the row-minimum, is not pinned down — any of the tied columns satisfies the equation. The same goes for when several rows tie for the column-maximum. The next section addresses this by letting the response be a set of indices instead of a single one.
Best response functions
The optimal-response problem is symmetric enough to formalize. Given an opponent’s commitment, each player has a set of actions that achieve their best possible payoff — usually just one action, but two or more actions can tie for the optimum. To accommodate ties, the response is recorded as a set of indices rather than a single index.
In a two-player zero-sum game, each player’s best response function (also called the player’s response mapping) maps every opponent action to the set of own actions that achieve the player’s optimal payoff against it.
-
’s best response function:
Given ‘s column , the set collects every row that hits the column’s maximum entry.
-
’s best response function:
Given ‘s row , the set collects every column that hits the row’s minimum entry.
The codomain — the power set of , also written , meaning the set of all subsets of — is what lets the response be a set of optimal actions: when several actions tie for the best payoff against the opponent’s commitment, or returns all of them at once.
Simultaneous play
The setup changes when the players act at the same time, neither seeing the other’s choice in advance. The asymmetry of sequential play is gone: can no longer wait for ‘s row before picking its column, and vice versa. Each player has to predict the other’s move without observing it.
The standard modeling assumption is that both players are rational, both know the full matrix, and both play defensively: each assumes the opponent is as smart and self-interested as themselves, and will respond in whatever way hurts them most. Each player is then choosing in the spirit “pick the best option for me, assuming the opponent picks the worst option for me” — the same caution strategy the generic decision model used against a passive system, only now the “worst case” is produced by a thinking opponent rather than blind chance.
’s reasoning. “If I commit to row , the worst case is that best-responds with column — the column-minimum of row , exactly the response would pick as the second mover. To make my worst case as good as possible, I should pick the row whose worst case is largest.” That gives ‘s max-min rule:
Scan each row’s smallest entry (the inner for each fixed ), then pick the row whose smallest entry is the largest (the outer ). The chosen row index is , and the column where row bottoms out is . The value is ‘s guaranteed payoff — whatever actually does, ‘s payoff cannot drop below this floor.
The hat distinguishes the actually committed row from the abstract row index used in the max. The subscripted column is the column expects to play — it lives inside ‘s prediction. The game being simultaneous, never sees and so does not literally choose in response; is part of ‘s reasoning, not ‘s actual move.
’s reasoning. Symmetrically: “If I commit to column , the worst case is that best-responds with row — the column-maximum of column . To make my worst case as good as possible (the smallest payoff to ), I should pick the column whose worst case is smallest.” That gives ‘s min-max rule:
Scan each column’s largest entry, then pick the column whose largest entry is the smallest. The chosen column index is , and the row where column peaks is — the row expects to play, again a piece of ‘s reasoning rather than ‘s actual move. The value is ‘s guaranteed loss ceiling — whatever does, ‘s payment cannot rise above it.
The maximin–minimax inequality
The actually-played outcome of the simultaneous game is the entry — plays its cautious row and plays its cautious column . The hypothetical and above never get played; they only existed inside each player’s prediction of the other.
These four indices link together by a chain of inequalities. Starting from ‘s guaranteed payoff:
Reading left to right:
- The first holds by definition of : it’s the column where row achieves its minimum, so is that row-minimum.
- The first is “a row’s minimum is at most any other entry on that row” — the row-minimum of row is at most , which is just one of the entries on that row.
- The second is “any entry of a column is at most that column’s maximum” — the entry sits in column and its column-maximum is .
- The last holds by definition of : it’s the row where column achieves its maximum.
Dropping the two equalities that just rename row-minimum and column-maximum, the chain shrinks to a clean three-term inequality between the matrix entries themselves:
Now substitute the max-min and min-max definitions back into the two outer terms — and — and the chain takes its named form, the maximin–minimax inequality:
‘s guaranteed payoff never exceeds ‘s guaranteed loss ceiling, and the actual outcome of the simultaneous game sits between them.
Strict inequality vs equality. The two ends of the chain tell each player a different story about whether their cautious commitment is stable:
- If any inequality is strict, then at least one player can do better than their floor (or strictly better than their ceiling), so at least one player has an incentive to deviate from their cautious choice. The configuration is not stable.
- If both inequalities collapse to equalities, neither player can improve unilaterally. The cautious commitments are mutual best responses, and the game has settled at a fixed point. That equality case has its own name.
Saddle points
When the maximin and minimax values coincide,
the entry sits simultaneously at the minimum of its row and the maximum of its column.
A saddle point (also called an equilibrium) of a two-player zero-sum game is a pair whose payoff entry is simultaneously the minimum of its row and the maximum of its column:
Equivalently, and — the cautious commitments are each other’s best responses.
The name comes from the same horse-saddle picture as the calculus saddle point: the matrix entry is a low point as you walk sideways along its row and a high point as you walk up and down its column — two opposing curvatures meeting at one cell, just like a saddle dips along one axis and rises along the other.
Why nobody wants to deviate. At the equality, the full inequality chain pins down to the saddle entry:
for every row index and column index , which collapses to
The two outer inequalities are the deviation-incentive statements, written in each player’s own “better / worse” direction:
- : if stays at , no other row gives a better outcome than does. Any unilateral deviation by can only make ‘s outcome worse.
- : if stays at , no other column gives a better outcome than does ( wants the entry small, and no is smaller than ). Any unilateral deviation by can only make ‘s outcome worse.
Neither player improves by moving unilaterally. The cautious commitment is stable — and that is what makes the saddle point a genuine equilibrium.
The Battle of the Bismarck Sea (2 March 1943) is a classic illustration. A Japanese reinforcement convoy had to sail from New Britain to New Guinea by one of two routes — a northern route through stormy weather (low visibility) or a southern route through clear weather (high visibility). Both took three days. American aircraft could be deployed to search one of the two routes for the convoy, and each commander had to choose without seeing the other’s choice. The payoff is the number of days the US can bomb the convoy once it is spotted: the US wants this large and Japan wants it small. With the US as (row chooser) and Japan as (column chooser), the estimated payoff matrix was
The cost accounting is straightforward. The convoy is at sea for three days total, and bombing can only happen on a day where (a) the US has already located the convoy and (b) visibility is good enough to attack. Two effects can subtract from the three-day ceiling: a search day is lost whenever the US starts on the wrong route and has to redeploy, and the storm on the northern route costs another day of usable visibility.
The four entries unpack as follows. (North, North): convoy spotted at once, but the storm costs one day of visibility → two days of bombing out of three. (North, South): the US wastes one day searching the storm front before redeploying south, where the remaining two days are all bombable in clear weather. (South, North): the US wastes one day searching the empty south, redeploys north, and then loses one more day to the storm — leaving one day. (South, South): convoy spotted at once in clear weather, three full days out of three.
Cautious analysis.
- US row-minimums: for US north and for US south. The largest row-minimum is , so the US’s max-min commitment is north, guaranteeing at least two days of bombing whatever Japan does.
- Japan column-maximums: for Japan north and for Japan south. The smallest column-maximum is , so Japan’s min-max commitment is north, giving up at most two days whatever the US does.
Both ends land on the same value, , so the entry is a saddle point — the minimum of its row and the maximum of its column. The cautious equilibrium is (US searches north, Japan sails north) with payoff days. This is exactly what happened: both commanders chose the northern route, the convoy was spotted on the first day, and the bombing ran for roughly the predicted two days before the convoy reached port.
Why neither side reaches for a more attractive cell. Read the matrix at face value and two cells look tempting: (South, South) pays , the dream outcome for the US, and (South, North) pays , the dream outcome for Japan. Each upside requires the other player to obligingly sit in the matching wrong row or column.
- If the US unilaterally deviates from north to south while Japan stays north, the US drops from days to — strictly worse.
- If Japan unilaterally deviates from north to south while the US stays north, Japan still gives up days — no improvement.
Neither side gains by moving away from (North, North), which is exactly what the saddle-point inequalities guarantee. The larger entry exists on paper but is unreachable as long as the opponent is also playing cautiously — chasing it would require betting against a rational adversary, which the caution assumption refuses to do.
Not every payoff matrix has a saddle point. Take the symmetric game
Both diagonal entries are (good for , bad for ) and both off-diagonal entries are (the reverse). The cautious analysis is symmetric:
- Each row has minimum , so . Whichever row commits to, the cautious floor sits at .
- Each column has maximum , so . Whichever column commits to, the cautious ceiling sits at .
The maximin–minimax inequality reads — a gap of between the two guarantees, and no entry of sits simultaneously at the minimum of its row and the maximum of its column. There is no saddle point.
The instability is visible if you try to follow the deviation arguments from the previous example. Whatever pair the players land on, exactly one of them strictly prefers to switch, and the switch sends the chase to a cell where the other player now wants to switch:
The deviation cycle closes on itself and never lands on a fixed cell. No pure pair is simultaneously a fixed point of both best response functions — each player can always do strictly better by reacting to the other.
The cautious framework can detect games like this — the maximin and minimax values fail to coincide, opening a visible gap — but it cannot tell the players how to play once the gap is detected. In a stronger sense, the game is not yet playable: there is no pure pair of commitments that both players are willing to settle on, because every such pair leaves at least one of them a strictly better unilateral move. Concretely, neither player has a single action they can commit to and stand by — even the cautious row looks right to only until imagines the cautious column already in play, at which point a different row would have done strictly better; the same reversal happens for on the other side. Pure-strategy game theory has nothing to recommend until a saddle point is restored, and there are three ways to restore one:
- Add more actions. Extending the matrix with extra rows or columns can introduce new cells that sit at the row-minimum / column-maximum of their cross. This is the least invasive fix — the existing actions keep their meaning and the option set is simply enlarged.
- Change payoff values. Re-estimating the entries — for instance from sharper intelligence about what each scenario is actually worth — can shift the row-minimums and column-maximums into alignment without changing the action sets.
- Allow probabilistic commitments. Leave both the actions and the matrix alone, and instead enlarge what counts as a “commitment”: each player chooses a probability distribution over actions and lets a random draw pick the actual move on the spot. This is the idea of mixed strategies, taken up next.
Mixed strategies
The shift is from picking a single action to picking how often each action is played. A mixed strategy is a probability distribution over the player’s actions: assigns a probability to each row , assigns a probability to each column , and the actual play of each round is sampled from those distributions — coin flip, dice roll, random-number generator — at decision time. The opponent never gets to predict which action you’ll play, only the long-run frequencies. The idea is due to John von Neumann.
A mixed strategy for in a two-player zero-sum game is a probability distribution over the rows:
The reading is ” plays with probability .” Similarly, a mixed strategy for is a probability distribution over the columns, with and .
A pure strategy is the degenerate case where one (or one ) and the rest are zero — the player commits deterministically to a single action. Pure strategies sit inside the mixed-strategy framework as a special case, so allowing mixed strategies strictly enlarges the option set rather than replacing it.
Once each player’s strategy is a probability distribution, the outcome of a single play is itself random: draws a row from , draws a column from , and the resulting payoff is the matrix entry at the drawn cell. The two draws are independent, so cell comes up with probability , and the natural payoff for the mixed game is the expected value across all cells.
The expected payoff of a pair of mixed strategies for the payoff matrix is
Each cell contributes weighted by the probability of actually landing on it. The reading is “average payoff over many independent plays of .”
This expected payoff replaces the raw matrix entry as the quantity the players reason about: still wants large and still wants it small, but they now optimize over continuous probability distributions and rather than discrete actions. The maximin and minimax constructions and the saddle-point definition carry over verbatim, with in the role of .
Apply the mixed framework to the no-saddle counterexample with matrix
Each player has two actions, so the distributions are parametrized by a single probability — and are forced by normalization. Expanding the expected payoff cell by cell,
Multiplying out and collecting terms,
By the symmetry of , the natural equilibrium candidate is — each player flips a fair coin between the two actions. Substituting,
To verify is a saddle of , hold one variable at and see what the function does as the other varies:
- Fix . Then for every . cannot improve its expected payoff by deviating from a mix while stays uniform.
- Fix . Then for every in the same way. cannot improve either.
Neither player gains by shifting their distribution: is a saddle point in the mixed-strategy space, the expected payoff there is , and the maximin–minimax gap of from the pure-strategy analysis has collapsed entirely. The cautious recommendation that did not exist in pure strategies now exists in mixed strategies, and it tells each player to randomize uniformly.
Existence of equilibria
The worked example was reassuring — randomization rescued the no-saddle counterexample — but it was only one matrix. The general result is much stronger.
Minimax theorem (von Neumann, 1928). Every two-player zero-sum game with a finite payoff matrix admits at least one saddle point once mixed strategies are allowed. Equivalently, there exist probability distributions and such that
where each max and min ranges over all probability distributions over the players’ actions.
Two things make this result load-bearing. It is unconditional: any payoff matrix, any shape, no special structure assumed. The pure-strategy game may have had a wide maximin–minimax gap, may have admitted no stable pair at all — but the moment probability distributions enter, the gap always closes and a cautious equilibrium always exists. Every finite two-player zero-sum game is, in principle, playable.
The saddle point is also computable. The conditions defining and can each be written as a linear programming (LP) problem — optimizing a linear objective subject to linear inequality constraints — and any standard LP solver returns the optimal mixed strategies and the resulting expected payoff. The theory is constructive: given the matrix, finding the equilibrium is a finite calculation, not an existence statement with no algorithm attached.
Practical caveat. The expected-payoff interpretation is sharpest when the game is played repeatedly. Across many independent rounds, is the realized average payoff — that is what an expectation guarantees. In a one-off play, both players randomize once and the result is whatever the single sample happens to be; the expected value is the right thing to optimize in advance, but it is not what either side actually receives in that single round. Mixed strategies remain the cautious recommendation, but their practical edge sharpens with repetition.
Out of scope. Everything in this chapter assumed both players know the full payoff matrix exactly. Real games often hide information: a player may see only their own row of payoffs, or face uncertainty about which actions the opponent has available, or have a private “type” the opponent cannot observe. Such games with incomplete information form a separate branch of game theory with substantially different machinery, and they are not pursued further here.