Optima
Where root-finding asks what input drives the output to zero?, optimization asks the dual question: what input drives the output to its largest (or smallest) value? “Largest” and “smallest” only make sense for an ordered output, so the function we study is now a scalar field — vector input, single real output — and the target has shifted from finding roots (inputs at which vanishes) to finding extrema (also called optima), inputs at which attains a largest or smallest value.
Two independent distinctions sharpen the idea:
- maximum vs minimum — largest value vs smallest value.
- global vs local — largest/smallest across the entire domain vs largest/smallest only within some neighborhood of the candidate point.
That gives four cases, each a different way a point can be extremal for .
Global Extrema
A global extremum compares against ‘s value at every other point of the domain.
Let be a scalar field. A point is a global maximum of if
and a global minimum of if
In either case is the corresponding global maximum value or global minimum value of , and is a global extremum — collectively global extrema.
In words: nowhere in the domain does go higher than (for a maximum) or lower (for a minimum). The condition is strong precisely because the comparison ranges over all of — there is nowhere left for a counterexample to hide.
Local Extrema
A local extremum weakens the comparison: only has to win on its own turf — some ε-ball around the point.
Let be a scalar field. A point is a local maximum of if there exists some such that
and a local minimum of if there exists some such that
In either case is the corresponding local maximum value or local minimum value, and is a local extremum — collectively local extrema.
The existential does the heavy lifting: it says some neighborhood, however small, witnesses the inequality. A point on a small hilltop with a much higher peak elsewhere in the domain still counts as a local maximum, because we are only asked to beat the immediate neighborhood.
Every global extremum is automatically a local extremum — any neighborhood works once the inequality already holds globally. The converse fails: a function can have many local maxima but only one global maximum (or none at all).
Strict Extrema
The four definitions above use non-strict inequalities (, ), which allow ties: another point in the comparison region can attain the same extreme value as and the definition still applies. Every point on a flat plateau, for instance, qualifies as a maximum together with the others.
If the inequalities are tightened to strict ones — in place of , in place of — then is the unique point in the comparison region at which the extreme value is attained. Such a point is called a strict local or global extremum:
- A strict global maximum satisfies for every with , and analogously for a strict global minimum with .
- A strict local maximum satisfies for every with , for some , and analogously for a strict local minimum with .
Strict extrema are isolated in their comparison region — no other point comes within reach of the extreme value, so the maximum (or minimum) is genuinely a single peak (or trough) rather than the highest tier of several tied points.
Stationary Points
How do we actually find an extremum? In 1D, a smooth has every local extremum either at a point where vanishes or on the boundary of the domain. The exact same dichotomy lifts to higher dimensions, with the gradient playing the role of .
For a continuously partially differentiable scalar field , every local extremum at an inner point of occurs at a point where the gradient vanishes:
The only other candidates for a local extremum are points on the boundary of .
Why the gradient must vanish at inner extrema
Suppose is a local extremum at an inner point of . Fix every coordinate of except the -th and watch how varies along that one axis. Concretely, define
— a 1D function that takes a step of size along the -th coordinate axis and reads off the corresponding -value. For in a small interval this is well-defined: is an inner point, so a small ball around it stays inside .
Since is a local extremum of , the 1D function inherits the property — it has a 1D local extremum at . The 1D criterion then forces , and by the definition of the partial derivative,
Repeating this for every axis gives , which is exactly .
The argument fails on the boundary because may not have a full ball around it inside — moving by might step outside the domain. The 1D restriction may then only be defined on one side of , which is too little for the 1D extremum criterion to apply.
So the points where the gradient vanishes deserve their own name — they are exactly the candidates for inner extrema.
A point is a stationary point (also called a critical point) of a scalar field if the gradient vanishes there:
Practically, this is what makes the problem tractable. Instead of hunting for extrema across all of (an uncountable set), we hunt for the finitely many roots of in the interior, and check the boundary of separately. The boundary side requires different machinery and is not pursued here; the rest of this chapter focuses on the inner, stationary-point side.
Saddle Points
The implication runs in only one direction. Every inner local extremum is a stationary point — but not every stationary point is an extremum. The gradient can vanish at without the function attaining a local maximum or minimum there at all.
The smallest example sits in . Consider
The gradient vanishes at the origin, so is stationary. Yet it is not a local extremum: in any neighborhood of the function takes both positive values (just to the right, where ) and negative ones (just to the left, where ), so is neither smallest nor largest nearby.
The surface is a ramp that does not depend on : flat along the -axis, ascending cubically along the -axis. The gradient vanishes not only at the origin but along the entire line — every point of the form is stationary, since the formula depends only on . Yet none of these points is a local extremum: moving in either direction immediately wins or loses height, so the ramp briefly flattens but never peaks or troughs.
Such points get their own name.
Let be a scalar field. A saddle point of is a stationary point that is neither a local maximum nor a local minimum of .
The example above is a saddle point in this technical sense, but it does not look like one — it’s a flat ramp that happens to plateau along . The geometric archetype, the surface that gives the name, is
At the origin the gradient vanishes, so is stationary. But the curvature pulls in opposite ways along the two axes:
- Along the -axis (), has a local minimum at .
- Along the -axis (), has a local maximum at .
Walking out from along takes you uphill; walking out along takes you downhill. Neither side wins, and the surface twists into the horse-saddle shape every rider can picture — uphill front-and-back, downhill side-to-side. The name “saddle point” generalizes from this prototype to any stationary point with the same defining feature: zero gradient, no local extremum.
The takeaway is that the gradient alone cannot tell extrema apart from saddle points: identifies a candidate, but distinguishing the cases requires looking at second-order information through the Hessian.
Definiteness of a Symmetric Matrix
To carry the 1D second-derivative test into dimensions, we first need a way to talk about the “sign” of a matrix. The 1D second derivative is a single number — positive means the curve opens upward (local minimum), negative means it opens downward (local maximum), zero is inconclusive. In dimensions the second derivative is the whole Hessian matrix, and curvature now varies with direction: a saddle point literally curves upward along one axis and downward along another. The single-number “sign” picture has to be replaced with something richer.
The replacement is definiteness. For a symmetric matrix , the quadratic form produces a single scalar for each input , and the sign of that scalar — across all non-zero — is exactly what classifies .
A symmetric matrix is
- positive definite if for all ,
- negative definite if for all ,
- positive semi-definite if for all ,
- negative semi-definite if for all .
If is neither positive nor negative semi-definite, it is indefinite.
The cases form a graded scale. Strict positivity of the form everywhere off the origin is positive definite; relaxing the strict inequality to allow zero values is positive semi-definite. The negative side mirrors this, and indefinite is what’s left: the form is positive in some directions and negative in others as varies, so neither semi-definite condition can hold.
The exclusion of in the strict cases is mandatory, not cosmetic — for every , so without the exclusion the strict inequality would fail at the origin for every matrix and the condition would be unsatisfiable.
Eigenvalue Criterion
Checking the inequality at every non-zero direction in is, taken literally, an infinite test. The shortcut is to read the definiteness off the eigenvalues of . Let be an eigenvalue of — guaranteed to be real, since is symmetric. Then:
- is positive definite all ,
- is negative definite all ,
- is positive semi-definite all ,
- is negative semi-definite all ,
- is indefinite some and some .
Intuition: why eigenvalue signs decide the form’s sign
The picture becomes concrete once we view it in the right coordinate system. For a symmetric , the eigenvectors can be chosen as an orthonormal basis — perpendicular unit axes — and along each axis does nothing but scale by its eigenvalue. Writing any in this eigenbasis as , the quadratic form collapses into a clean weighted sum of squares:
The squared coefficients are non-negative regardless of which we picked, so only the eigenvalues control the sign of each term — and the form is just their sum. If every , the form is a sum of non-negative pieces and comes out positive for every non-zero (positive definite). If every , the form is negative for every non-zero . Mixed signs let some terms push the sum up and others pull it down: depending on which eigen-axes leans toward, the form lands positive or negative — that is exactly the indefinite case. A zero eigenvalue erases its axis’s contribution entirely, so the form vanishes along that direction even when — the loophole that separates semi-definite from definite.
For diagonal matrices the eigenvalues sit on the diagonal, so the criterion is immediate. For general symmetric matrices it reduces an infinite test to a finite computation: find the roots of the characteristic polynomial and inspect their signs.
Classifying Stationary Points
Both pieces are now in place. The Hessian of a scalar field is symmetric, so its definiteness is well-defined — and at a stationary point of , that definiteness decides which of the four cases (local minimum, local maximum, saddle, inconclusive) the point falls into.
Let with open be a twice continuously partially differentiable scalar field, and let be a stationary point of . Then:
- negative definite is a local maximum of ,
- positive definite is a local minimum of ,
- indefinite is a saddle point of ,
- semi-definite but not definite the test is inconclusive.
In eigenvalue terms, the semi-definite but not definite case is exactly when at least one eigenvalue of is zero and the rest share a single sign — no mixed signs to push it indefinite, and a zero eigenvalue blocking strict definiteness.
Intuition from Taylor approximation
Each case maps cleanly onto the 1D intuition. The connection runs through the second-order Taylor approximation: near a stationary point, the expansion gives
— the gradient term has dropped out because . So the change in as we step away from is, to leading order, the quadratic form . If that form is positive in every direction (positive definite Hessian), goes up no matter which way we step — a local minimum. Negative in every direction is a local maximum. Mixed signs across directions (indefinite Hessian) means rises along some directions and drops along others — the saddle picture from the previous section. And when the form is non-negative or non-positive but vanishes along at least one direction (semi-definite but not definite), the second-order term cannot decide on its own, just as leaves the 1D test inconclusive — a higher-order analysis would be needed to settle the case.
Putting it all together gives a complete recipe for finding the inner extrema of a scalar field on an open domain:
- Compute and solve to find all stationary points.
- At each stationary point , compute the Hessian .
- Determine its definiteness — typically via the eigenvalue signs — and classify as local minimum, local maximum, saddle point, or inconclusive.
2D saddle-point shortcut. The determinant of any square matrix equals the product of its eigenvalues. Specializing to a Hessian, — so a negative determinant forces the two eigenvalues to opposite signs, making the (symmetric) Hessian indefinite. At a stationary point , the single check is therefore enough to conclude is a saddle point — no eigenvalue computation needed.