Partial Differentiation
Consider a scalar field of the form:
This is a function that takes a point in -dimensional space and returns a single real number — like a temperature field in a room, or an elevation map over terrain.
Directional Derivative
Recall from 1D calculus that the derivative of at a point is:
This is the slope of the function at : how much does change when we take a tiny step of size away from ? Geometrically, it is the same intuition behind — the slope of a line is simply the ratio of the vertical rise to the horizontal run, i.e., .
In multiple dimensions, the same question becomes richer: we can step away from a point in infinitely many directions. This is where the directional derivative comes in — it tells us the rate of change of when we stand at a point and look in a specific direction .
Here, is the base point — the location in the domain where we want to measure how fast is changing. Think of as your exact current GPS coordinate while standing perfectly still. is the direction you are facing. Crucially, the directional derivative is a local quantity — it captures the behavior of specifically at , not globally.
For each unit vector (with ), we take a tiny step of size from in the direction , measure the change in , and divide by the step size — the same slope formula as before, now generalized to any direction in -dimensional space:
As , the step size becomes infinitesimally small, and we recover the instantaneous rate of change in the chosen direction.
The directional derivative of a scalar field at a point in the direction of a unit vector (with ) is:
All three notations — , , and — refer to the same quantity: the rate of change of at the base point along the direction .
Requiring to be a unit vector ensures that the step has length exactly , making the derivative a pure measure of directional rate of change, independent of the magnitude of .
Since can point in any direction on the unit sphere in , there are infinitely many directional derivatives at any given point . In principle, this limit may not exist for every combination of , , and . However, throughout this course we will work with sufficiently smooth functions and always assume the limit exists — the “happy scenario.”
Partial Derivatives
Having infinitely many directions to check is impractical. The saving insight is that we are working in a linear vector space: any direction can be expressed as a linear combination of the vectors of the standard basis (also called the canonical basis) of . For smooth functions, this means the behavior of in every direction is fully determined by its behavior along the coordinate directions — one per argument of the function. We do not need to sweep through all directions; the coordinate directions are enough.
Substituting into the directional derivative formula means we step from by a tiny amount purely along the -th axis, holding all other coordinates fixed. The resulting limit is the rate of change of with respect to its -th argument .
The partial derivative of at a point with respect to the variable is:
As always, we assume this limit exists. Each partial derivative answers a single focused question: how fast does change when only the -th coordinate of is nudged, while all others stay fixed? Together, the partial derivatives provide a complete description of how varies locally — they tell the full story.
When all of them exist at a given point, earns a name for the property.
A scalar field is partially differentiable at a point if all of its partial derivatives exist at . It is partially differentiable on if this holds at every point of :
The Gradient
When is partially differentiable at , its partial derivatives can be collected into a single vector — the gradient.
The gradient of at a point is the column vector of all partial derivatives:
Gradient as a reusable formula: In practice, you do not evaluate the gradient at a specific point directly. Instead, you first derive the partial derivative formulas symbolically — keeping the variables free — and assemble them into . This gives you a general expression valid for any point in the domain. Evaluating it at a specific is then just substitution.
The gradient lives in the same space as the input — not the output. So for a scalar field , the gradient at any point is a 2D vector; for , it’s a 3D vector.
This means we can evaluate at every point in the domain, producing a gradient field — a vector field on . Think of it as an arrow attached to each input point on the flat “floor” map, pointing in the direction of steepest ascent at that location.
The symbol is called the nabla (or del) operator. Differential operators like will be explored in more depth later in the course.
The gradient is the payoff for working in a linear vector space. When are continuous, any directional derivative reduces to an inner product with the gradient:
This is the key trick: instead of computing a separate tortuous limit for every possible diagonal direction , we compute the gradient once — just partial derivatives — and recover any directional derivative for free via an inner product (dot product). The gradient packages all local directional information into a single vector.
Let . Using standard differentiation rules, the three partial derivatives are:
Assembling them into the gradient:
Steepest Ascent and Descent
The gradient does more than package the partial derivatives — it points in a very specific geometric direction. Suppose (with open) is a scalar field with continuous partial derivatives , and let be a point where . Then has its steepest ascent at in the direction , and its steepest descent in the opposite direction .
Intuitively: standing on a hillside, the gradient points straight uphill along the steepest route, and its negative points straight downhill (the way a ball would roll). Every other direction trades some uphill progress for sideways motion.
This follows directly from the inner-product formula : among all unit vectors , this inner product is largest when is perfectly aligned with and smallest when it points the opposite way. The one strict condition is — at a point where the gradient vanishes, there is no preferred direction (the ground is flat).
Isolines
Alongside the gradient, another geometric object captures the shape of a scalar field: the set of points where takes a constant value.
Let be a scalar field. For a value , the isoline (also called contour line or level set) of at level is:
Think of a topographic map: each contour traces the points at the same elevation. On a weather map, each isotherm connects places with the same temperature. More generally, isolines slice the domain into level sets on which is constant. (Note: A notation like is just a formal way of saying “the specific isoline that passes through our base point ”).
The gradient at a point is always perpendicular to the isoline passing through .
To see why, pick any unit vector that points along the isoline (a tangent vector). Elevation doesn’t change in that direction, so the directional derivative vanishes:
Since a zero inner product means two vectors are orthogonal, the steepest ascent is forced to be perpendicular to the isoline.
The “tilted ramp” picture
Imagine standing on a tilted ramp. The isoline is a horizontal line painted across the ramp (where height never changes). If you take a step at an angle (diagonally across the ramp), you are “wasting” part of your step’s length moving sideways along the isoline, meaning you gain less vertical height. To gain the absolute maximum height possible in a single step (the direction of steepest ascent, which is the gradient), you must dedicate 100% of your step to moving forward, wasting zero energy on sideways movement. The only way to move with zero sideways drift is to walk exactly at a 90-degree angle to the horizontal isoline.
In the example above, the 2D floor is the domain of : the heatmap encodes as color, the dark rings are isolines (level sets ), and the red arrows are the gradient field . This is where , its isolines, and its gradient actually live. The 3D bowl is the graph — a visualization aid, not a separate object. The same isolines and gradient arrows are lifted onto it: the rings become horizontal cross-sections of the bowl, and the arrows become tangent vectors pointing in the steepest-ascent direction along the surface.
Arrow length. The arrows are short near the center and long at the boundary — not by coincidence. For :
The gradient magnitude is just twice the radial distance. Geometrically: near the origin the bowl is nearly flat, so there’s barely any slope to point along; near the rim it’s steep, so the gradient is large. Exactly at the origin — you’re at the minimum, there’s no downhill direction, the gradient has nothing to say.
Arrow direction. Every arrow points straight outward, perpendicular to the isoline it sits on. That’s not specific to this example — it’s always true: the gradient is orthogonal to the level set. If you stood on the 3D bowl and asked “which way is straight up the slope?”, the answer is exactly where the lifted arrow points.
Second-Order Partial Derivatives
The partial derivatives of a scalar field are themselves scalar fields on — each one assigns to every point the rate of change of along the corresponding coordinate axis. We assume is an open set here, which simply means every point of has a bit of breathing room around it that still lies inside — no point sits right on the edge. That matters because computing means peeking at just to either side of along the -th axis; if were on the boundary, some of those nearby probe points would fall outside where isn’t defined, and the limit couldn’t be formed at all. Openness rules that case out, so the partial derivatives exist throughout .
When all of these derived scalar fields are continuous, we give the situation its own name.
A scalar field (with open) is continuously partially differentiable on if all of its partial derivatives exist and are continuous on :
If these first-order partial derivatives are themselves partially differentiable, we can differentiate a second time — taking a partial derivative of a partial derivative.
The second-order partial derivative of with respect to and is the partial derivative of taken once more with respect to :
All four expressions on the right denote exactly the same quantity — they are fully interchangeable, just different notations for the same second-order partial derivative (read them right-to-left like peeling an onion: first derive by , then derive the result by ).
The intuition mirrors the 1D case. In 1D, the second derivative measures curvature — whether the slope is growing or shrinking as we move along the axis. On a hill, tells you whether the climb is getting steeper or starting to level out. The same picture carries over to higher dimensions: an unmixed derivative describes how the slope along the -th axis is itself changing as we step further along , and a mixed derivative describes how a slope along one axis varies as we step along another.
Smoothness
The construction extends arbitrarily: differentiate a partial derivative once more to get a third-order partial derivative, again for fourth-order, and so on without limit. When all partial derivatives up to order exist and are continuous, we get a property worth naming.
A scalar field (with open) is -times continuously partially differentiable on if all of its partial derivatives up to order exist and are continuous on .
These nested smoothness levels get their own family of names — the classes — each one packaging ” has continuous orders of derivatives” into a single label.
For , a scalar field (with open) is of class as defined below:
So is just the continuous functions (no jumps) on ; for asks for continuous orders of partial derivatives; and is the set of functions that remain differentiable (perfectly smooth) no matter how many times you differentiate them. Two milestones from this hierarchy matter most going forward: means all first-order partial derivatives are continuous, so we can assemble them into the gradient at every point of ; and means all second-order partial derivatives are continuous, which is exactly what is needed to assemble them into the Hessian.
In practice, we rarely care about the exact value of — we just want enough continuous derivatives for whatever theorem or computation we’re doing. That looser notion gets its own name.
A function is called a smooth function if it is of class with high enough for the problem at hand:
The required depends on context — if a result needs second-order partial derivatives to be continuous, “smooth enough” means ; if third-order derivatives are needed, , and so on. In most practical settings one simply assumes to avoid having to track the exact order.
Once reaches smoothness, a useful property kicks in — the order in which we take partial derivatives stops mattering.
If is a scalar field with , then for all :
Differentiating with respect to first and then gives the same answer as differentiating in the opposite order. For any function smooth enough to land in , mixed second-order partial derivatives commute (i.e. AB = BA).
Hessian Matrix
With the first-order partial derivatives, we organized the values into a single vector — the gradient. Second-order partials are richer: for each pair there is one derivative , giving numbers in total. The natural way to organize them is as an matrix. The pattern continues into higher orders — third-order partial derivatives need a three-index object with entries (a tensor), fourth-order ones live in entries, and so on — but at second order, this matrix has its own name.
Let be a second-order partially differentiable scalar field. The Hessian matrix of at a point , denoted , is the matrix of all second-order partial derivatives of at :
The entry in row , column is — the rate at which the -th partial derivative of changes as we step along . The diagonal entries are the pure second derivatives along each axis; the off-diagonal entries (for ) are the mixed partials.
Since , the mixed partials satisfy for all — so the Hessian is always symmetric.
Stationary Points and Extrema
The gradient and Hessian are not brand-new objects — they are the -dimensional counterparts of the first and second derivatives we already know from 1D calculus, and they carry over the same roles.
In 1D, finding the extrema of a smooth function follows a two-step recipe: first solve to locate candidate points (bottoms of valleys, tops of peaks), then evaluate at those candidates — a positive value means the curve opens upward (local minimum), a negative value means it opens downward (local maximum), and zero leaves the test inconclusive.
The exact same logic plays out in dimensions, with the gradient playing the role of and the Hessian playing the role of . The “zero-slope” condition now becomes — every partial derivative vanishes at once. Points where this holds are called stationary points — defined formally in the optima chapter.
Geometrically, at a stationary point every directional derivative is zero — no matter which way you step, the slope of is flat at that point. Any actual change in as you move away only shows up through the curvature (how the surface bends), not through the slope itself. Stationary points are therefore exactly the candidates for local minima and maxima, the same way produces the candidates in 1D. Not every stationary point is an extremum, however — some fall into neither the minimum nor the maximum category.
To tell the cases apart, we turn to second-order information: the Hessian. Informally, the Hessian describes the local curvature of in every direction at once, and at a stationary point it plays exactly the same classifying role that the sign of plays in 1D — it decides whether the point is a minimum, a maximum, or something else. A precise criterion for reading this off the Hessian comes later in the optima chapter; for now it is enough to remember that gradient and Hessian together give us the multivariate extension of the 1D “set , then check ” recipe.
The partial derivatives of a scalar field should not be viewed as the actual derivative of — they are not a direct extension of the real-function derivative . In multiple dimensions the notion of “a derivative” is genuinely more subtle: each partial derivative only captures how changes along a single coordinate axis, and together they miss the fact that the input can move in any direction, not just along the axes. The partials do, however, assemble into a single linear object called the total differential, which is the proper -dimensional analog of . We will not develop the total differential in this chapter — for our purposes the gradient, as a bundle of partial derivatives, is enough — but it is worth knowing that strictly speaking, partial and total differentiation are different things.
Multivariate Functions
Every definition so far — partial derivatives, gradient, Hessian, continuous partial differentiability, class — has been stated for scalar fields . Extending them to multivariate functions is mechanical: a multivariate function is nothing more than an -tuple of scalar fields stacked vertically:
Each component function is itself a scalar field — exactly the kind of object every earlier definition applied to. Every earlier property therefore lifts componentwise:
- Partially differentiable: is partially differentiable at (or on ) if and only if every component is partially differentiable at (or on ).
- -times continuously partially differentiable: is -times continuously partially differentiable at (or on ) if and only if every is.
- Class : belongs to if and only if every belongs to .
Nothing new to prove — each property is simply applied to each component in turn.
One term gets its own label in the multivariate setting: when all partial derivatives of not only exist but are also continuous, we say is differentiable.
A multivariate function is differentiable at a point (or on ) if all of its partial derivatives exist and are continuous at (or throughout ):
Jacobian Matrix
Once a multivariate function is differentiable — every component contributing continuous partial derivatives — the natural way to organize all of its first-order information is into a single matrix. For a scalar field we collected partials into the gradient vector. With components, each carrying partials, we now have numbers in total, and they fit cleanly into an matrix.
Let be differentiable on . The Jacobian matrix of at a point , denoted , is the matrix whose entry in row and column is the partial derivative of the -th component with respect to the -th variable:
Reading the matrix row by row makes the structure obvious: the -th row is exactly the gradient of the -th component function , written as a row vector. The Jacobian is therefore a vertical stack of component gradients — each row carrying the full first-order story of one output coordinate, and each column carrying the response of all outputs to a nudge in one input variable.
An alternative notation often seen in the literature is , used interchangeably with .
As a useful special case, when the function reduces to a scalar field and the Jacobian has only a single row — exactly the transposed gradient:
So the gradient and the Jacobian are not separate ideas: the gradient is just the Jacobian’s only row when there is only one output to track.
The deeper purpose of the Jacobian is to provide the best linear approximation of near . If is differentiable at , then for a small step :
with the approximation getting sharper as . This is the multivariate version of the slope picture used earlier to introduce the directional derivative: in 1D, — the very same tangent line, just rearranged to predict the function value a small step past . The scalar slope is replaced by the matrix , which acts on the displacement vector to produce the predicted change in every output coordinate at once. A more detailed treatment follows in the discussion of coordinate transformations.
A particularly tidy special case: when is itself a linear function, for some constant matrix and constant vector , the Jacobian is just at every point of the domain. Each component is the linear combination , whose partial derivative with respect to is the constant , so the Jacobian’s entry is everywhere. The linear approximation then stops being an approximation and becomes the exact equality — a linear function coincides with its tangent linear map everywhere.
Calculation Rules
The Jacobian inherits a familiar set of computation rules — exactly the multivariate counterparts of the standard differentiation rules from 1D calculus. Sums differentiate term by term, scalars factor out, products obey a product rule, and compositions follow a chain rule. The only twist in higher dimensions is that the chain rule turns into a matrix multiplication of two Jacobians, so the order of multiplication now matters.
Let be partially differentiable. Then for every :
- Additivity:
- Homogeneity: for all
- Product rule:
Additivity and homogeneity together say that the Jacobian behaves as a linear operator on partially differentiable functions — directly mirroring the linearity of the 1D derivative, and . The product rule echoes the Leibniz pattern , but since and are vector-valued the natural scalar product is the inner product , and each factor carries a transpose to match the row-vector shapes the Jacobian expects.
The remaining rule needs two functions whose dimensions are compatible for composition. Let and with , so the composition is well-defined. Then for every :
This is the composition rule, also widely known as the chain rule. The right-hand side is a matrix product: is , is , and the result is — exactly the shape required for . It is the direct generalization of the 1D chain rule , with scalar multiplication replaced by matrix multiplication.
Reading off the entry in row and column of recovers the familiar partial-derivative form:
Here is the row index — the output component of the composition — and is the column index — the input variable . The sum over runs through the intermediate outputs of , which is exactly the inner product of the -th row of with the -th column of — the row-times-column rule of matrix multiplication, written out one entry at a time.
Differential Operators
A differential operator is an abstract mapping that takes a function as input and returns another function as output, with the (partial) derivatives of the input doing the structural work. The operators in this section all eat a function defined on a domain — a scalar field or a vector field, depending on the operator — and produce a new function as output. We assume throughout that every partial derivative of the input function exists and is continuous on , so the constructions below are well-defined at every point.
A useful organizing idea before diving in: most of the operators that follow can be expressed as a single, more primitive operator — the nabla operator — combined with the standard vector products (scalar multiplication, inner product, cross product). Setting up that primitive carefully first makes everything else mechanical.
Nabla Operator
The nabla operator is the most fundamental differential operator on — and one we have already been using implicitly: it is exactly the object that turned a scalar field into its gradient back in the gradient definition, where writing meant “stack the partial derivatives of into a column vector”. Pulling it out as a stand-alone operator simply makes that move explicit and lets the same primitive serve the divergence and the rotation below. It is best read as a formal column “vector” whose entries are the partial-derivative operators — a column of operators, not numbers, waiting to be applied to a function.
The nabla operator (also called the del operator) on is the formal column vector of partial-derivative operators:
Input: a function on — either a scalar field or a vector field , depending on which “vector multiplication” is used. Output: a new function whose shape depends on the multiplication.
By itself is not a function and has no value at a point — it is purely a column of operators. It only produces a result once paired with an actual function via one of the standard vector products, and the product determines what the result looks like:
-
Applying to a scalar field: — think of as a formal column vector of operators; applying it to means each slot acts via partial differentiation, giving exactly the gradient gradient.
-
Inner product with a vector field: — pair each with the matching component , apply, then sum, exactly like an inner product but application replaces multiplication; the result is the divergence divergence.
-
Cross product with a vector field: — exactly like a cross product but application replaces multiplication; the result is the rotation rotation.
Laplace Operator
The Laplace operator takes a scalar field and returns a scalar field of the same shape.
At each point , it sums the pure second-order partial derivatives of — one for each dimension — leaving mixed partials for out.
Equivalently, is the trace of the Hessian at : the diagonal entries of are exactly , and the trace adds them.
Divergence
The divergence takes a vector field and returns a scalar field — the component functions collapse into a single number at each point.
Each acts on its matching component , and the results are summed — the formal inner product of and , with application replacing multiplication.
Equivalently, is the trace of the Jacobian at : the diagonal entries of are exactly , and the trace adds them.
Rotation
The rotation (curl) takes a 3D vector field and returns a 3D vector field of the same shape.
Each entry follows the cyclic pattern of the cross product, with applying to rather than multiplying.
Defined only in , since the cross product itself is exclusive to three dimensions.
A few mathematical identities between these differential operators — outside the scope of this lecture, but a useful addition to your mathematical toolkit. Under the assumptions that and are two vector fields of correct dimensionality, is a scalar field of correct dimensionality, and using the notation convention , the following identities hold: