Taylor Expansion
Recall from 1D calculus that the Taylor polynomial of order approximates a smooth function near a base point using its first derivatives:
The zeroth-order polynomial is just the constant value — a flat line at that height. The first-order polynomial adds the linear correction , producing the tangent line at . The second-order polynomial adds the quadratic correction , bending the line into a parabola whose curvature matches at . Each step uses one more piece of local information to track better near the base point.
The same construction extends to scalar fields with the gradient playing the role of and the Hessian playing the role of .
Multivariate Taylor Polynomials
Let be a twice (i.e. second-order) partially differentiable scalar field on an open and convex domain , and let be a base point. The zeroth-, first-, and second-order Taylor polynomials of at are the scalar fields defined by:
Each takes a point and returns a single real number, so every Taylor polynomial is itself a scalar field — an approximation of in the same shape as . The subscripts encode three pieces of information at once: the order of the approximation, the function being approximated, and the base point at which the approximation is centered. Only varies as we evaluate across the domain; is fixed.
The two domain conditions earn their place. Open ensures every base point has breathing room — exactly the condition needed for and to be well-defined, since computing partial derivatives at requires probing at points just to either side of along every axis. Convex ensures that for any the entire line segment between them stays inside , so the approximation is well-defined on the whole segment along which the polynomial is meant to track .
Before turning to the per-order analysis, it helps to see the construction at work in the simplest possible setting — 1D, where everything is directly drawable.
The example above is a 1D demo — not itself a multivariate Taylor polynomial. The function is a single-variable function , used here purely for visual intuition, since the multivariate case is hard to draw on a flat page. The blue curve is and the red dot marks the base point at . The flat gray line is — it sits at height and ignores entirely. The orange line is — it passes through with slope , the tangent line to at . The green parabola is — it adds the quadratic correction , bending the tangent line so its curvature matches at . Notice how each higher order hugs the blue curve over a wider window around before drifting off.
The multivariate Taylor polynomial follows exactly the same principle — only the dimensions grow. The 1D slope is replaced by the gradient , and the 1D curvature by the Hessian . The three “shapes” carry over directly: the flat line becomes a flat hyperplane at height , the tangent line becomes a tangent hyperplane (a tangent plane in 2D), and the fitted parabola becomes a quadric surface. The picture in 1D is the picture in D — just lifted into more dimensions.
Zeroth Order: A Flat Plateau
The zeroth-order Taylor polynomial
is a constant scalar field — it does not depend on at all. Its graph is a horizontal hyperplane sitting at height , like a flat plateau spread across the entire domain. It matches exactly at and ignores how varies anywhere else. The crudest possible approximation, but the natural starting point.
First Order: The Tangent Hyperplane
The first-order Taylor polynomial
adds a linear correction on top of . The inner product predicts how much would change under a displacement if its rate of change stayed perfectly constant — exactly the directional derivative times step length picture, scaled to whatever displacement we plug in.
Geometrically, the graph of is the tangent hyperplane to the graph of at — for the familiar tangent plane in , for the tangent line. It is the multivariate generalization of the affine approximation already used in the Jacobian discussion: , with in the scalar-field case.
Second Order: A Quadratic Bowl
The second-order Taylor polynomial
adds a quadratic correction on top of the tangent hyperplane. The quadratic form uses the Hessian to encode local curvature — how the slope itself changes as we step away from — and the factor matches the Taylor coefficient from the 1D case.
is a polynomial of degree 2 in the coordinates of , so its graph is a quadric surface — a paraboloid, saddle, or trough depending on the eigenvalues of . It is the best polynomial of degree at most 2 that hugs near , faithfully reproducing both the slope and the bending of at the base point.
Matching at the Base Point
The formula for did not fall out of the sky. It was reverse-engineered to satisfy a very specific design goal: at the base point , the polynomial should be indistinguishable from up to second order — same height, same slope in every direction, same curvature in every direction. This section verifies that the formula in the definition actually meets that specification.
This is also what makes trustworthy as a local approximation. If the value, gradient, and Hessian of all agree with those of at , then near (where the displacement is small) the two functions agree on everything a second-order expansion can capture; what remains is third order and beyond, which shrinks rapidly as . At the base point itself, is a perfect local clone of ; the further we move, the more the clone drifts.
Each of the three checks below is a one-line substitution: plugging collapses the displacement to zero, leaving only the term we care about. The algebra is mechanical — the point is that the formula was designed so each substitution recovers exactly what was prescribed.
Function value matches — the graphs touch.
Geometrically, the graphs of and pass through the same point — they are pinned together at the base point.
Verify by substitution
Substitute into . Each correction term carries a factor of which becomes , so both terms vanish:
This is the multivariate counterpart of the 1D identity .
Gradient matches — the graphs tilt the same way.
The graphs don’t merely touch at , they touch tangentially: the tangent hyperplane to at coincides with the tangent hyperplane to at . Concretely, has the same directional derivative as in every direction at the base point.
Verify by differentiating once
Differentiate once with respect to , then substitute:
Each differentiation removes one "" factor from any term that still has one — exactly like the 1D power rule sending to , dropping the polynomial degree by one.
The point worth pausing on: we differentiate with respect to , not with respect to . But because is a fixed constant — independent of — the shift contributes nothing under differentiation. Component-wise, , so the Jacobian of with respect to is the identity matrix . The 1D version is the same one-liner: , exactly as if we were differentiating on its own. So derivatives with respect to behave identically to derivatives with respect to , and the power-rule intuition transfers without modification.
Walking through the three terms of :
- Constant term : zero factors of to begin with, so differentiating gives outright.
- Linear term : one factor, which differentiation consumes — leaving the constant with no factor remaining. This mirrors in 1D.
- Quadratic term : two factors, of which differentiation consumes one — leaving , which still carries one factor.
The third bullet hides the most computation, since the matrix syntax of the quadratic term above keeps its factors less visible than does in 1D. Writing and for compactness, the 1D pattern is the familiar — the exists exactly to cancel the that the power rule pulls out of . The matrix version is the same pattern applied to a quadratic form instead of a scalar quadratic. To see it concretely in , take a symmetric Hessian and carry out the matrix multiplication step by step — first the inner , then contracted against it, then expand and combine:
The two terms in the second-to-last line are where the cross-term doubles: the off-diagonal entry contributes one copy and contributes another, and they combine because by symmetry. Now take the partials of — the leading absorbs each that comes out of differentiation:
Stacking these as a column reproduces exactly. The same structure holds in any dimension: for symmetric ,
which is the multivariate analog of . Translating back to the original variables:
This is exactly the term that appears on the second line of the gradient computation above.
So after one round of differentiation, the linear term has run out of factors and become a constant, while the quadratic term still carries one. Substituting then sends that remaining factor to . This is the counterpart of the 1D identity .
Hessian matches — the graphs bend the same way.
Beyond touching and tilting, also curves like at : along every direction through the base point, the rate at which the slope itself changes is the same for both.
Verify by differentiating twice
Differentiate once more — the gradient line has its own Jacobian taken term by term:
This is where the contrast with the previous two checks shows up. There is no left to substitute: two differentiations have already stripped both factors from the quadratic term (one per round), and the lower-order terms differentiated to zero along the way. The result is the constant matrix , with no dependence on at all — so holds at every point, not just at . This is the counterpart of the 1D identity .
These three conditions are not just consequences of the formula — they are the formula, read backward. Demand a degree-2 polynomial whose value, gradient, and Hessian at match those of , solve for the coefficients, and the expression in the definition is the only one that works. That uniqueness is what singles out among all degree-2 polynomials. Higher-order Taylor polynomials extend the same idea — pin the third-order partials too, then the fourth, and so on — with each new order locking down one more layer of local behavior at .