Optima

Where root-finding asks what input drives the output to zero?, optimization asks the dual question: what input drives the output to its largest (or smallest) value? “Largest” and “smallest” only make sense for an ordered output, so the function we study is now a scalar field $f : D \subseteq \mathbb{R}^n \to \mathbb{R}$ — vector input, single real output — and the target has shifted from finding roots (inputs at which $f$ vanishes) to finding extrema (also called optima), inputs at which $f$ attains a largest or smallest value.

Two independent distinctions sharpen the idea:

maximum vs minimum — largest value vs smallest value.
global vs local — largest/smallest across the entire domain vs largest/smallest only within some neighborhood of the candidate point.

That gives four cases, each a different way a point $\mathbf{x}_0 \in D$ can be extremal for $f$ .

Global Extrema

A global extremum compares $f(\mathbf{x}_0)$ against $f$ ‘s value at every other point of the domain.

Let $f : D \subseteq \mathbb{R}^n \to \mathbb{R}$ be a scalar field. A point $\mathbf{x}_0 \in D$ is a global maximum of $f$ if

f(\mathbf{x}_0) \geq f(\mathbf{x}) \quad \forall \mathbf{x} \in D

and a global minimum of $f$ if

f(\mathbf{x}_0) \leq f(\mathbf{x}) \quad \forall \mathbf{x} \in D

In either case $f(\mathbf{x}_0)$ is the corresponding global maximum value or global minimum value of $f$ , and $\mathbf{x}_0$ is a global extremum — collectively global extrema.

In words: nowhere in the domain does $f$ go higher than $f(\mathbf{x}_0)$ (for a maximum) or lower (for a minimum). The condition is strong precisely because the comparison ranges over all of $D$ — there is nowhere left for a counterexample to hide.

Local Extrema

A local extremum weakens the comparison: $\mathbf{x}_0$ only has to win on its own turf — some ε-ball around the point.

Let $f : D \subseteq \mathbb{R}^n \to \mathbb{R}$ be a scalar field. A point $\mathbf{x}_0 \in D$ is a local maximum of $f$ if there exists some $\varepsilon > 0$ such that

f(\mathbf{x}_0) \geq f(\mathbf{x}) \quad \forall \mathbf{x} \in B_\varepsilon(\mathbf{x}_0)

and a local minimum of $f$ if there exists some $\varepsilon > 0$ such that

f(\mathbf{x}_0) \leq f(\mathbf{x}) \quad \forall \mathbf{x} \in B_\varepsilon(\mathbf{x}_0)

In either case $f(\mathbf{x}_0)$ is the corresponding local maximum value or local minimum value, and $\mathbf{x}_0$ is a local extremum — collectively local extrema.

The existential $\exists \varepsilon > 0$ does the heavy lifting: it says some neighborhood, however small, witnesses the inequality. A point on a small hilltop with a much higher peak elsewhere in the domain still counts as a local maximum, because we are only asked to beat the immediate neighborhood.

Every global extremum is automatically a local extremum — any neighborhood works once the inequality already holds globally. The converse fails: a function can have many local maxima but only one global maximum (or none at all).

Strict Extrema

The four definitions above use non-strict inequalities ( $\geq$ , $\leq$ ), which allow ties: another point in the comparison region can attain the same extreme value as $\mathbf{x}_0$ and the definition still applies. Every point on a flat plateau, for instance, qualifies as a maximum together with the others.

If the inequalities are tightened to strict ones — $>$ in place of $\geq$ , $<$ in place of $\leq$ — then $\mathbf{x}_0$ is the unique point in the comparison region at which the extreme value is attained. Such a point is called a strict local or global extremum:

A strict global maximum satisfies $f(\mathbf{x}_0) > f(\mathbf{x})$ for every $\mathbf{x} \in D$ with $\mathbf{x} \neq \mathbf{x}_0$ , and analogously for a strict global minimum with $<$ .
A strict local maximum satisfies $f(\mathbf{x}_0) > f(\mathbf{x})$ for every $\mathbf{x} \in B_\varepsilon(\mathbf{x}_0)$ with $\mathbf{x} \neq \mathbf{x}_0$ , for some $\varepsilon > 0$ , and analogously for a strict local minimum with $<$ .

Strict extrema are isolated in their comparison region — no other point comes within reach of the extreme value, so the maximum (or minimum) is genuinely a single peak (or trough) rather than the highest tier of several tied points.

Stationary Points

How do we actually find an extremum? In 1D, a smooth $f$ has every local extremum either at a point where $f'$ vanishes or on the boundary of the domain. The exact same dichotomy lifts to higher dimensions, with the gradient $\nabla f$ playing the role of $f'$ .

For a continuously partially differentiable scalar field $f : D \subseteq \mathbb{R}^n \to \mathbb{R}$ , every local extremum at an inner point of $D$ occurs at a point where the gradient vanishes:

\mathbf{x}_0 \text{ is a local extremum of } f \text{ at an inner point} \implies \nabla f(\mathbf{x}_0) = \mathbf{0}

The only other candidates for a local extremum are points on the boundary of $D$ .

Why the gradient must vanish at inner extrema

Suppose $\mathbf{x}_0$ is a local extremum at an inner point of $D$ . Fix every coordinate of $\mathbf{x}_0$ except the $i$ -th and watch how $f$ varies along that one axis. Concretely, define

g_i(t) = f(\mathbf{x}_0 + t\,\mathbf{e}_i)

— a 1D function that takes a step of size $t$ along the $i$ -th coordinate axis and reads off the corresponding $f$ -value. For $t$ in a small interval $(-\delta, \delta)$ this is well-defined: $\mathbf{x}_0$ is an inner point, so a small ball around it stays inside $D$ .

Since $\mathbf{x}_0$ is a local extremum of $f$ , the 1D function $g_i$ inherits the property — it has a 1D local extremum at $t = 0$ . The 1D criterion then forces $g_i'(0) = 0$ , and by the definition of the partial derivative,

g_i'(0) = f_{x_i}(\mathbf{x}_0) = 0

Repeating this for every axis $i = 1, \ldots, n$ gives $f_{x_1}(\mathbf{x}_0) = \cdots = f_{x_n}(\mathbf{x}_0) = 0$ , which is exactly $\nabla f(\mathbf{x}_0) = \mathbf{0}$ .

The argument fails on the boundary because $\mathbf{x}_0$ may not have a full ball around it inside $D$ — moving by $-t\,\mathbf{e}_i$ might step outside the domain. The 1D restriction $g_i$ may then only be defined on one side of $t = 0$ , which is too little for the 1D extremum criterion to apply.

So the points where the gradient vanishes deserve their own name — they are exactly the candidates for inner extrema.

A point $\mathbf{a} \in D$ is a stationary point (also called a critical point) of a scalar field $f : D \subseteq \mathbb{R}^n \to \mathbb{R}$ if the gradient vanishes there:

\nabla f(\mathbf{a}) = \mathbf{0}

Practically, this is what makes the problem tractable. Instead of hunting for extrema across all of $D$ (an uncountable set), we hunt for the finitely many roots of $\nabla f = \mathbf{0}$ in the interior, and check the boundary of $D$ separately. The boundary side requires different machinery and is not pursued here; the rest of this chapter focuses on the inner, stationary-point side.

Saddle Points

The implication runs in only one direction. Every inner local extremum is a stationary point — but not every stationary point is an extremum. The gradient can vanish at $\mathbf{x}_0$ without the function attaining a local maximum or minimum there at all.

The smallest example sits in $\mathbb{R}^2$ . Consider

f : \mathbb{R}^2 \to \mathbb{R}, \quad f(x, y) = x^3

The gradient $\nabla f(x, y) = (3x^2, 0)^\top$ vanishes at the origin, so $(0, 0)$ is stationary. Yet it is not a local extremum: in any neighborhood of $(0, 0)$ the function takes both positive values (just to the right, where $f(x, 0) = x^3 > 0$ ) and negative ones (just to the left, where $f(x, 0) = x^3 < 0$ ), so $f(0, 0) = 0$ is neither smallest nor largest nearby.

f : \mathbb{R}^2 \to \mathbb{R}, \quad f(x, y) = x^3, \quad \nabla f(x, y) = \begin{pmatrix} 3x^2 \\ 0 \end{pmatrix}

The surface is a ramp that does not depend on $y$ : flat along the $y$ -axis, ascending cubically along the $x$ -axis. The gradient vanishes not only at the origin but along the entire line $x = 0$ — every point of the form $(0, y_0)$ is stationary, since the formula $\nabla f = (3x^2, 0)^\top$ depends only on $x$ . Yet none of these points is a local extremum: moving in either $\pm x$ direction immediately wins or loses height, so the ramp briefly flattens but never peaks or troughs.

Such points get their own name.

Let $f : D \subseteq \mathbb{R}^n \to \mathbb{R}$ be a scalar field. A saddle point of $f$ is a stationary point $\mathbf{a} \in D$ that is neither a local maximum nor a local minimum of $f$ .

The $x^3$ example above is a saddle point in this technical sense, but it does not look like one — it’s a flat ramp that happens to plateau along $x = 0$ . The geometric archetype, the surface that gives the name, is

f : \mathbb{R}^2 \to \mathbb{R}, \quad f(x, y) = x^2 - y^2

f : \mathbb{R}^2 \to \mathbb{R}, \quad f(x, y) = x^2 - y^2, \quad \nabla f(x, y) = \begin{pmatrix} 2x \\ -2y \end{pmatrix}

At the origin the gradient $\nabla f(x, y) = (2x, -2y)^\top$ vanishes, so $(0, 0)$ is stationary. But the curvature pulls in opposite ways along the two axes:

Along the $x$ -axis ( $y = 0$ ), $f(x, 0) = x^2$ has a local minimum at $x = 0$ .
Along the $y$ -axis ( $x = 0$ ), $f(0, y) = -y^2$ has a local maximum at $y = 0$ .

Walking out from $(0, 0)$ along $x$ takes you uphill; walking out along $y$ takes you downhill. Neither side wins, and the surface twists into the horse-saddle shape every rider can picture — uphill front-and-back, downhill side-to-side. The name “saddle point” generalizes from this prototype to any stationary point with the same defining feature: zero gradient, no local extremum.

The takeaway is that the gradient alone cannot tell extrema apart from saddle points: $\nabla f(\mathbf{x}_0) = \mathbf{0}$ identifies a candidate, but distinguishing the cases requires looking at second-order information through the Hessian.

Definiteness of a Symmetric Matrix

To carry the 1D second-derivative test into $n$ dimensions, we first need a way to talk about the “sign” of a matrix. The 1D second derivative $f''(a)$ is a single number — positive means the curve opens upward (local minimum), negative means it opens downward (local maximum), zero is inconclusive. In $n$ dimensions the second derivative is the whole Hessian matrix, and curvature now varies with direction: a saddle point literally curves upward along one axis and downward along another. The single-number “sign” picture has to be replaced with something richer.

The replacement is definiteness. For a symmetric matrix $M$ , the quadratic form $\mathbf{x}^\top M \mathbf{x}$ produces a single scalar for each input $\mathbf{x} \in \mathbb{R}^n$ , and the sign of that scalar — across all non-zero $\mathbf{x}$ — is exactly what classifies $M$ .

A symmetric matrix $M \in \mathbb{R}^{n \times n}$ is

positive definite if $\mathbf{x}^\top M \mathbf{x} > 0$ for all $\mathbf{x} \in \mathbb{R}^n \setminus \{\mathbf{0}\}$ ,
negative definite if $\mathbf{x}^\top M \mathbf{x} < 0$ for all $\mathbf{x} \in \mathbb{R}^n \setminus \{\mathbf{0}\}$ ,
positive semi-definite if $\mathbf{x}^\top M \mathbf{x} \geq 0$ for all $\mathbf{x} \in \mathbb{R}^n$ ,
negative semi-definite if $\mathbf{x}^\top M \mathbf{x} \leq 0$ for all $\mathbf{x} \in \mathbb{R}^n$ .

If $M$ is neither positive nor negative semi-definite, it is indefinite.

The cases form a graded scale. Strict positivity of the form everywhere off the origin is positive definite; relaxing the strict inequality to allow zero values is positive semi-definite. The negative side mirrors this, and indefinite is what’s left: the form is positive in some directions and negative in others as $\mathbf{x}$ varies, so neither semi-definite condition can hold.

The exclusion of $\mathbf{x} = \mathbf{0}$ in the strict cases is mandatory, not cosmetic — $\mathbf{0}^\top M \mathbf{0} = 0$ for every $M$ , so without the exclusion the strict inequality $> 0$ would fail at the origin for every matrix and the condition would be unsatisfiable.

Eigenvalue Criterion

Checking the inequality $\mathbf{x}^\top M \mathbf{x} > 0$ at every non-zero direction in $\mathbb{R}^n$ is, taken literally, an infinite test. The shortcut is to read the definiteness off the eigenvalues of $M$ . Let $\lambda_i$ be an eigenvalue of $M$ — guaranteed to be real, since $M$ is symmetric. Then:

$M$ is positive definite $\iff$ all $\lambda_i > 0$ ,
$M$ is negative definite $\iff$ all $\lambda_i < 0$ ,
$M$ is positive semi-definite $\iff$ all $\lambda_i \geq 0$ ,
$M$ is negative semi-definite $\iff$ all $\lambda_i \leq 0$ ,
$M$ is indefinite $\iff$ some $\lambda_i > 0$ and some $\lambda_j < 0$ .

Intuition: why eigenvalue signs decide the form’s sign

The picture becomes concrete once we view it in the right coordinate system. For a symmetric $M$ , the eigenvectors can be chosen as an orthonormal basis $\mathbf{v}_1, \ldots, \mathbf{v}_n$ — $n$ perpendicular unit axes — and along each axis $M$ does nothing but scale by its eigenvalue. Writing any $\mathbf{x}$ in this eigenbasis as $\mathbf{x} = c_1 \mathbf{v}_1 + \cdots + c_n \mathbf{v}_n$ , the quadratic form collapses into a clean weighted sum of squares:

\mathbf{x}^\top M \mathbf{x} = \lambda_1 c_1^2 + \lambda_2 c_2^2 + \cdots + \lambda_n c_n^2

The squared coefficients $c_i^2$ are non-negative regardless of which $\mathbf{x}$ we picked, so only the eigenvalues control the sign of each term — and the form is just their sum. If every $\lambda_i > 0$ , the form is a sum of non-negative pieces and comes out positive for every non-zero $\mathbf{x}$ (positive definite). If every $\lambda_i < 0$ , the form is negative for every non-zero $\mathbf{x}$ . Mixed signs let some terms push the sum up and others pull it down: depending on which eigen-axes $\mathbf{x}$ leans toward, the form lands positive or negative — that is exactly the indefinite case. A zero eigenvalue erases its axis’s contribution entirely, so the form vanishes along that direction even when $\mathbf{x} \neq \mathbf{0}$ — the loophole that separates semi-definite from definite.

For diagonal matrices the eigenvalues sit on the diagonal, so the criterion is immediate. For general symmetric matrices it reduces an infinite test to a finite computation: find the roots of the characteristic polynomial and inspect their signs.

Classifying Stationary Points

Both pieces are now in place. The Hessian of a $C^2$ scalar field is symmetric, so its definiteness is well-defined — and at a stationary point of $f$ , that definiteness decides which of the four cases (local minimum, local maximum, saddle, inconclusive) the point falls into.

Let $f : D \subseteq \mathbb{R}^n \to \mathbb{R}$ with $D$ open be a twice continuously partially differentiable scalar field, and let $\mathbf{x}_0 \in D$ be a stationary point of $f$ . Then:

$H_f(\mathbf{x}_0)$ negative definite $\implies$ $\mathbf{x}_0$ is a local maximum of $f$ ,
$H_f(\mathbf{x}_0)$ positive definite $\implies$ $\mathbf{x}_0$ is a local minimum of $f$ ,
$H_f(\mathbf{x}_0)$ indefinite $\implies$ $\mathbf{x}_0$ is a saddle point of $f$ ,
$H_f(\mathbf{x}_0)$ semi-definite but not definite $\implies$ the test is inconclusive.

In eigenvalue terms, the semi-definite but not definite case is exactly when at least one eigenvalue of $H_f(\mathbf{x}_0)$ is zero and the rest share a single sign — no mixed signs to push it indefinite, and a zero eigenvalue blocking strict definiteness.

Intuition from Taylor approximation

Each case maps cleanly onto the 1D intuition. The connection runs through the second-order Taylor approximation: near a stationary point, the expansion gives

f(\mathbf{x}_0 + \mathbf{u}) \approx f(\mathbf{x}_0) + \tfrac{1}{2}\, \mathbf{u}^\top H_f(\mathbf{x}_0)\, \mathbf{u}

— the gradient term has dropped out because $\nabla f(\mathbf{x}_0) = \mathbf{0}$ . So the change in $f$ as we step away from $\mathbf{x}_0$ is, to leading order, the quadratic form $\mathbf{u}^\top H_f(\mathbf{x}_0)\, \mathbf{u}$ . If that form is positive in every direction (positive definite Hessian), $f$ goes up no matter which way we step — a local minimum. Negative in every direction is a local maximum. Mixed signs across directions (indefinite Hessian) means $f$ rises along some directions and drops along others — the saddle picture from the previous section. And when the form is non-negative or non-positive but vanishes along at least one direction (semi-definite but not definite), the second-order term cannot decide on its own, just as $f''(a) = 0$ leaves the 1D test inconclusive — a higher-order analysis would be needed to settle the case.

Putting it all together gives a complete recipe for finding the inner extrema of a $C^2$ scalar field on an open domain:

Compute $\nabla f$ and solve $\nabla f(\mathbf{x}) = \mathbf{0}$ to find all stationary points.
At each stationary point $\mathbf{x}_0$ , compute the Hessian $H_f(\mathbf{x}_0)$ .
Determine its definiteness — typically via the eigenvalue signs — and classify $\mathbf{x}_0$ as local minimum, local maximum, saddle point, or inconclusive.

2D saddle-point shortcut. The determinant of any square matrix equals the product of its eigenvalues. Specializing to a $2 \times 2$ Hessian, $\det(H_f(\mathbf{x}_0)) = \lambda_1 \lambda_2$ — so a negative determinant forces the two eigenvalues to opposite signs, making the (symmetric) Hessian indefinite. At a stationary point $\mathbf{x}_0 \in \mathbb{R}^2$ , the single check $\det(H_f(\mathbf{x}_0)) < 0$ is therefore enough to conclude $\mathbf{x}_0$ is a saddle point — no eigenvalue computation needed.