Quadrature

The word quadrature comes from Latin quadratura — literally squaring, the classical geometric problem of constructing a square of equal area to a given figure. Mathematicians later borrowed the term for computing the area under a curve, and today it stands more broadly for computing any definite integral. This chapter is concerned with high-dimensional quadrature: extending the definite integral of one variable into multiple dimensions, integrating scalar fields over 2D regions and 3D solids.

Double and triple integrals

The 1D definite integral abf(x)dx\int_a^b f(x) \, dx accumulates a 1D scalar field over an interval [a,b][a, b]. In two variables, we accumulate over a 2D region — a rectangle D=[a,b]×[c,d]D = [a, b] \times [c, d] — and call the result the double integral of ff over DD.

For a scalar field f:D=[a,b]×[c,d]Rf : D = [a, b] \times [c, d] \to \mathbb{R} on a rectangle, the double integral of ff over DD is the iterated integral

Df(x,y)dxdy=cdabf(x,y)dxdy=cd(abf(x,y)dx)dy\iint_D f(x, y) \, dx \, dy = \int_c^d \int_a^b f(x, y) \, dx \, dy = \int_c^d \left( \int_a^b f(x, y) \, dx \right) dy

equivalently obtained by integrating in the opposite order:

Df(x,y)dydx=abcdf(x,y)dydx=ab(cdf(x,y)dy)dx\iint_D f(x, y) \, dy \, dx = \int_a^b \int_c^d f(x, y) \, dy \, dx = \int_a^b \left( \int_c^d f(x, y) \, dy \right) dx

To evaluate an iterated integral, peel one layer at a time from the inside out. The inner integral runs first, treating the other variable as a constant — that produces a function of one variable, which the outer integral then accumulates. Each dxdx or dydy marks the variable being integrated at that step; the parentheses are just for clarity.

The same construction lifts to three variables. For a scalar field on a box D=[a,b]×[c,d]×[e,f]R3D = [a, b] \times [c, d] \times [e, f] \subseteq \mathbb{R}^3, three nested integrals give the triple integral:

For a scalar field f:D=[a,b]×[c,d]×[e,f]Rf : D = [a, b] \times [c, d] \times [e, f] \to \mathbb{R} on a 3D box, the triple integral of ff over DD is

Df(x,y,z)dxdydz=efcdabf(x,y,z)dxdydz=ef(cd(abf(x,y,z)dx)dy)dz\begin{aligned} \iiint_D f(x, y, z) \, dx \, dy \, dz &= \int_e^f \int_c^d \int_a^b f(x, y, z) \, dx \, dy \, dz \\ &= \int_e^f \left( \int_c^d \left( \int_a^b f(x, y, z) \, dx \right) dy \right) dz \end{aligned}

The recipe extends to any number of variables: peel off one variable at a time, evaluating the innermost integral first.

Fubini’s theorem — the order doesn’t matter

We wrote the double integral in two orders (dxdydx \, dy versus dydxdy \, dx) and asserted both give the same answer. That equivalence is not automatic — it’s a theorem.

Fubini’s theorem. If ff is continuous on a closed rectangular box D=[a1,b1]××[an,bn]RnD = [a_1, b_1] \times \cdots \times [a_n, b_n] \subseteq \mathbb{R}^n, then all iterated integrals of ff over DD have the same value, regardless of the order of integration. For example in 2D:

cdabf(x,y)dxdy=abcdf(x,y)dydx\int_c^d \int_a^b f(x, y) \, dx \, dy = \int_a^b \int_c^d f(x, y) \, dy \, dx

What this buys us in practice: if one order makes the inner integral easier — perhaps because the integrand factors more cleanly when one variable is treated as constant — we can switch freely. The two orderings are not just numerically equal; they’re interchangeable as a computational strategy, and we will exploit that whenever we have the freedom to choose.

Standard domains

Rectangles and boxes are the easy targets — every integration bound is a constant. Most regions of practical interest aren’t that clean. The next-simplest class is the standard domain (or simple domain): one variable still spans a fixed interval, and the other slides between bounds that depend on the first.

A region DR2D \subseteq \mathbb{R}^2 is a standard domain if it can be written in one of the two forms

D={(x,y)axb,  l(x)yu(x)}D = \{\,(x, y) \mid a \leq x \leq b, \; l(x) \leq y \leq u(x)\,\}

or, with the variables swapped,

D={(x,y)cyd,  l(y)xu(y)}D = \{\,(x, y) \mid c \leq y \leq d, \; l(y) \leq x \leq u(y)\,\}

where ll and uu are continuous functions defining the lower and upper boundaries of DD.

The figure below shows both forms side by side. In either case, one variable spans a fixed interval, while the other lies between a lower and upper bound that shift as the first variable moves.

Rectangles are a special case. Take constant boundary functions l(x)cl(x) \equiv c and u(x)du(x) \equiv d; the first form collapses to {(x,y)axb,  cyd}\{(x, y) \mid a \leq x \leq b, \; c \leq y \leq d\} — a plain rectangle. So everything we did in the previous section sits inside this framework as the trivially-constant case.

The construction lifts to three variables, with each inner layer’s bounds allowed to depend on every variable from the layers outside it:

A region DR3D \subseteq \mathbb{R}^3 is a standard domain if it can be written as

D={(x,y,z)axb,  l(x)yu(x),  l~(x,y)zu~(x,y)}D = \{\,(x, y, z) \mid a \leq x \leq b, \; l(x) \leq y \leq u(x), \; \tilde l(x, y) \leq z \leq \tilde u(x, y)\,\}

where l,ul, u and l~,u~\tilde l, \tilde u are continuous.

The pattern generalizes: each layer’s bounds may depend on all the variables from layers outside it; the outermost layer always lands on constants. Permuting which variable goes outermost gives equally valid standard-domain forms.

Splitting non-standard domains. Not every region fits the standard mold — take an annulus (a ring-shaped region between two concentric circles): any vertical line through the inner hole cuts it into two segments, not one. But many such regions can be partitioned into a finite collection of disjoint standard sub-domains, which is enough to bring them back inside the framework.

Integration over standard domains

With a standard domain, the bounds of the inner integrals are no longer constants — they depend on the variables from the layers outside. Otherwise the iterated structure is unchanged: peel from the inside out, treating outer variables as constants while evaluating each inner integral.

For a scalar field ff on a 2D standard domain DR2D \subseteq \mathbb{R}^2, the double integral of ff over DD is

Df(x,y)dydx=abl(x)u(x)f(x,y)dydx\iint_D f(x, y) \, dy \, dx = \int_a^b \int_{l(x)}^{u(x)} f(x, y) \, dy \, dx

The outermost integral lands on the constant interval [a,b][a, b]; the inner integral has bounds that move with xx. The same recipe lifts to three variables, where each layer’s bounds may reference every variable from the layers outside it:

For a scalar field ff on a 3D standard domain DR3D \subseteq \mathbb{R}^3, the triple integral of ff over DD is

Df(x,y,z)dzdydx=abl(x)u(x)l~(x,y)u~(x,y)f(x,y,z)dzdydx\iiint_D f(x, y, z) \, dz \, dy \, dx = \int_a^b \int_{l(x)}^{u(x)} \int_{\tilde l(x, y)}^{\tilde u(x, y)} f(x, y, z) \, dz \, dy \, dx

Swapping order takes work. Fubini still applies — all integration orders give the same value — but unlike the rectangular case the swap isn’t a mere relabeling of differentials. The inner bounds reference the outer variables, so changing order means re-describing DD in its alternative standard form (e.g. yy-outermost instead of xx-outermost in 2D) and deriving new boundary functions l,ul, u from the geometry. Worth doing whenever the alternative order makes the inner integral simpler or more intuitive to evaluate.

Splitting, formalized. When D=D1DrD = D_1 \cup \cdots \cup D_r is a disjoint split of DD into standard sub-domains D1DrD_1 \cup \cdots \cup D_r, then integral splits along the partition:

Df=D1f++Drf\int_D f = \int_{D_1} f + \cdots + \int_{D_r} f

This is the splitting workaround previewed earlier, made concrete — partition the awkward region into standard pieces, integrate over each, sum.

Area and volume of a standard domain

To calculate the area or volume of DD itself, we set ff to be the constant 11 (f1f \equiv 1) — that is, f(x,y)=1f(x, y) = 1 at every point of a 2D domain, or f(x,y,z)=1f(x, y, z) = 1 at every point of a 3D one. With the integrand contributing the same constant weight everywhere, accumulating over DD adds up all the infinitesimal pieces that fill it, and out comes the area or volume:

area(D)=D1dxdy,vol(D)=D1dxdydz\text{area}(D) = \iint_D 1 \, dx \, dy, \qquad \text{vol}(D) = \iiint_D 1 \, dx \, dy \, dz

On a 2D standard domain BB, the area integral collapses cleanly:

B1dydx=abl(x)u(x)1dydx=ab(u(x)l(x))dx=abu(x)dxabl(x)dx\begin{aligned} \iint_B 1 \, dy \, dx &= \int_a^b \int_{l(x)}^{u(x)} 1 \, dy \, dx \\ &= \int_a^b \bigl( u(x) - l(x) \bigr) \, dx \\ &= \int_a^b u(x) \, dx - \int_a^b l(x) \, dx \end{aligned}

The double integral reduces to the area between two curves from 1D calculus — the area under the upper curve minus the area under the lower curve.

Integration under coordinate transformations

The integrals defined so far run in Cartesian coordinates: every region is described by (x1,,xn)(x_1, \dots, x_n), and every infinitesimal slice has volume dx1dxndx_1 \cdots dx_n. But sometimes a region’s geometry fits other coordinates much better — a circular disk wants polar, a sphere wants spherical — and an integral that’s painful in xx can become almost trivial after a rewrite. The price is a correction factor that accounts for how the new coordinates locally stretch or compress space.

That factor is the absolute value of the Jacobian determinant of the transformation, and the rule for swapping coordinates inside an integral is:

Let B,DRnB, D \subseteq \mathbb{R}^n be open and let ψ:BD\psi : B \to D be a coordinate transformation, so each point x=(x1,,xn)D\mathbf{x} = (x_1, \dots, x_n) \in D has a unique address y=(y1,,yn)B\mathbf{y} = (y_1, \dots, y_n) \in B with x=ψ(y)\mathbf{x} = \psi(\mathbf{y}). For an integrable scalar field ff on DD,

Df(x)dx1dxn=Bf(ψ(y))detJψ(y)dy1dyn\int_D f(\mathbf{x}) \, dx_1 \cdots dx_n = \int_B f(\psi(\mathbf{y})) \, |\det J_\psi(\mathbf{y})| \, dy_1 \cdots dy_n

The formula reads naturally: integrate the same field over BB using the new coordinates yiy_i, but at every point reweight by how much ψ\psi distorts an infinitesimal volume there. Where detJψ|\det J_\psi| is small, ψ\psi is compressing — many yy-points crowd into a small region of DD, so each contributes proportionally less to the total. Where detJψ|\det J_\psi| is large, ψ\psi is stretching — each yy-point covers a larger patch of DD and counts for more.

The 1D analog. With n=1n = 1, a coordinate transformation is just a smooth bijection ψ:[a,b]ψ([a,b])\psi : [a, b] \to \psi([a, b]), and JψJ_\psi collapses to the single number ψ\psi'. The formula becomes

ψ([a,b])f(x)dx=abf(ψ(t))ψ(t)dt\int_{\psi([a, b])} f(x) \, dx = \int_a^b f(\psi(t)) \, |\psi'(t)| \, dt

— exactly the substitution rule from 1D calculus. The absolute value ψ(t)|\psi'(t)| measures how much ψ\psi stretches an infinitesimal interval; that’s the role detJψ|\det J_\psi| plays in higher dimensions.

Area of a disk. The open disk D={(x,y)x2+y2<R2}D = \{(x, y) \mid x^2 + y^2 < R^2\} has awkward Cartesian bounds — the inner integral runs from R2x2-\sqrt{R^2 - x^2} to R2x2\sqrt{R^2 - x^2}. In polar coordinates, ψ(r,φ)=(rcosφ,rsinφ)\psi(r, \varphi) = (r \cos \varphi, r \sin \varphi) maps the rectangle B=(0,R)×(0,2π)B = (0, R) \times (0, 2\pi) onto DD, with Jacobian determinant detJψ=r\det J_\psi = r. Change of variables gives

D1dxdy=02π0Rrdrdφ=πR2\iint_D 1 \, dx \, dy = \int_0^{2\pi} \int_0^R r \, dr \, d\varphi = \pi R^2

— the disk area, no awkward bounds in sight.

Surface integrals

Every integral so far has accumulated a field over a flat region — an interval, a 2D area in R2\mathbb{R}^2, a 3D solid in R3\mathbb{R}^3. Surface integrals are a different kind of integration: the integration domain is a curved 2D patch sitting inside 3D space. The setup pairs a regular surface ϕ:BR2R3\phi : B \subseteq \mathbb{R}^2 \to \mathbb{R}^3 with a field function (scalar field or vector field) defined on a 3D region that contains the surface. The integral evaluates the function at each surface point ϕ(u,v)\phi(u, v) and accumulates those values, weighted by a local distortion factor built from the tangent vectors ϕu,ϕv\phi_u, \phi_v. Two flavors arise depending on whether the field is scalar- or vector-valued.

Scalar surface integrals

When the integrand is a scalar field f:DR3Rf : D \subseteq \mathbb{R}^3 \to \mathbb{R} with ϕ(B)D\phi(B) \subseteq D, each parameter point (u,v)B(u, v) \in B contributes the reading f(ϕ(u,v))f(\phi(u, v)), weighted by how much surface area the tiny parameter patch around (u,v)(u, v) sweeps out.

For a regular surface ϕ:BR2R3\phi : B \subseteq \mathbb{R}^2 \to \mathbb{R}^3 and a scalar field f:DR3Rf : D \subseteq \mathbb{R}^3 \to \mathbb{R} with ϕ(B)D\phi(B) \subseteq D, the scalar surface integral of ff over ϕ\phi is

ϕfds=Bf(ϕ(u,v))ϕu(u,v)×ϕv(u,v)dudv\iint_\phi f \, ds = \iint_B f(\phi(u, v)) \, \|\phi_u(u, v) \times \phi_v(u, v)\| \, du \, dv

We multiply by ϕu(u,v)×ϕv(u,v)\|\phi_u(u, v) \times \phi_v(u, v)\| — the local area-stretch factor — because the integral runs over ϕ\phi‘s parameter domain BB, naturally weighting each ff-value by the parameter-patch area dudvdu \, dv. But the values come from the surface, and each tiny parameter patch around (u,v)(u, v) maps to a surface patch of a different size — so weighting by dudvdu \, dv alone doesn’t reflect how large the corresponding patch on the surface is. Scaling by ϕu×ϕv\|\phi_u \times \phi_v\| swaps in the surface-patch area as the weight, so each ff-value is weighted by the actual amount of surface it represents. For the geometric derivation, see the in-depth note below.

In-depth: where ϕu(u,v)×ϕv(u,v)\|\phi_u(u, v) \times \phi_v(u, v)\| comes from

Pick a tiny rectangle in parameter space at (u0,v0)(u_0, v_0) with sides dudu and dvdv. Under ϕ\phi, its four corners land on four nearby points of the surface, bounding a tiny curved patch — and we need its area.

A small step of size dudu along the uu-parameter (with vv fixed) shifts the surface point by approximately ϕu(u0,v0)du\phi_u(u_0, v_0) \, du — that is precisely the role of the tangent vectors: they record the 3D rate of change of ϕ\phi when we wiggle one parameter at a time. Similarly a dvdv-step shifts the point by ϕv(u0,v0)dv\phi_v(u_0, v_0) \, dv. The two displacements span a parallelogram in 3D with edges ϕudu\phi_u \, du and ϕvdv\phi_v \, dv, and to first-order Taylor approximation the curved patch is indistinguishable from that parallelogram — same shape, and to that order, the same area.

So computing the patch’s area reduces to computing the parallelogram’s. By a standard property of the cross product, the area of a parallelogram spanned by two vectors in R3\mathbb{R}^3 is the cross-product norm of its edges — applied here, the patch carries surface area ϕu×ϕvdudv\|\phi_u \times \phi_v\| \, du \, dv — i.e. the magnitude of the cross product of the tangent vectors, times the parameter-area element dudvdu \, dv. The cross-product vector ϕu×ϕv\phi_u \times \phi_v itself is the surface normal at ϕ(u0,v0)\phi(u_0, v_0) — the perpendicular vector named when we introduced the tangent vectors — and its magnitude is exactly the area element we just derived.

So ϕu×ϕv\|\phi_u \times \phi_v\| is the local area-stretch factor: it converts one unit of parameter-space area at (u0,v0)(u_0, v_0) into the corresponding amount of true surface area. Where ϕu×ϕv\|\phi_u \times \phi_v\| is large, ϕ\phi is stretching — each parameter point covers a wide swath of surface, so its ff-reading should count for more in the total. Where it is small, ϕ\phi is compressing — many parameter points crowd onto a tiny surface patch, so each one should count for less. Multiplying f(ϕ(u,v))f(\phi(u, v)) by the factor before integrating accounts for that distortion, leaving a value that depends only on the surface’s geometry, not on how we chose to parametrize it.

Cousin of change of variables. The shape mirrors the change-of-variables formula: both rewrite an integral over a curved or differently-coordinatized region as one over a flat parameter region, paying a local distortion factor at each point.

The difference is dimensional. Change of variables maps RnRn\mathbb{R}^n \to \mathbb{R}^n — the Jacobian is square, and detJψ|\det J_\psi| measures the local volume distortion. A surface parametrization maps R2R3\mathbb{R}^2 \to \mathbb{R}^3 — the Jacobian JϕJ_\phi is 3×23 \times 2, non-square, so it has no determinant. But its two columns are precisely ϕu\phi_u and ϕv\phi_v, and the natural replacement for det|\det| is the area of the parallelogram those columns span — exactly ϕu×ϕv\|\phi_u \times \phi_v\|. Same role (local distortion factor); the codimension forces a cross product in place of a determinant.

Vector surface integrals

When the integrand is a vector field v:DR3R3\boldsymbol{v} : D \subseteq \mathbb{R}^3 \to \mathbb{R}^3 with ϕ(B)D\phi(B) \subseteq D, each parameter point (u,v)B(u, v) \in B contributes the inner product of v(ϕ(u,v))\boldsymbol{v}(\phi(u, v)) with the vector ϕu×ϕv\phi_u \times \phi_v — the surface normal itself, not its norm as in the scalar case.

For a regular surface ϕ:BR2R3\phi : B \subseteq \mathbb{R}^2 \to \mathbb{R}^3 and a vector field v:DR3R3\boldsymbol{v} : D \subseteq \mathbb{R}^3 \to \mathbb{R}^3 with ϕ(B)D\phi(B) \subseteq D, the vector surface integral of v\boldsymbol{v} over ϕ\phi — also called the flux of v\boldsymbol{v} across ϕ\phi, in the direction ϕu×ϕv\phi_u \times \phi_v — is

ϕvds=Bv(ϕ(u,v))(ϕu(u,v)×ϕv(u,v))dudv\iint_\phi \boldsymbol{v} \cdot ds = \iint_B \boldsymbol{v}(\phi(u, v))^\top (\phi_u(u, v) \times \phi_v(u, v)) \, du \, dv

At each surface point, v(ϕu×ϕv)\boldsymbol{v} \cdot (\phi_u \times \phi_v) computes the scalar projection of v\boldsymbol{v} onto the surface normal weighted by the length of the normal, which is the area-stretch factor from the scalar case. (For the general inner-product principle behind splitting a dot product into “projection × magnitude”, see the weighted-projection remark.) To picture what this measures, think of v\boldsymbol{v} as a fluid’s velocity field — at each point in space, an arrow giving the direction and speed of the flow. At a surface point, split v\boldsymbol{v} into a perpendicular and a tangential part: only the perpendicular part crosses the surface (carries fluid from one side to the other), while the tangential part just slides along the surface, never crossing. A flow heading obliquely into the surface contributes its perpendicular share to crossing, not zero — only a flow that’s exactly parallel to the surface contributes nothing. So v(ϕu×ϕv)\boldsymbol{v} \cdot (\phi_u \times \phi_v) — projection onto the normal, scaled by the patch’s surface area — measures the rate at which fluid is crossing the surface patch around (u,v)(u, v). That is what we call the flux through the local patch. Integrating over BB sums these per-patch contributions into the total flux of v\boldsymbol{v} through ϕ\phi — the overall rate at which the fluid crosses the surface. For the geometric breakdown, see the in-depth note below.

In-depth: how v(ϕu×ϕv)\boldsymbol{v} \cdot (\phi_u \times \phi_v) encodes flux

Using the geometric form of the inner product:

v(ϕ(u,v))(ϕu×ϕv)=vϕu×ϕvcosθ,\boldsymbol{v}(\phi(u, v)) \cdot (\phi_u \times \phi_v) = \|\boldsymbol{v}\| \, \|\phi_u \times \phi_v\| \, \cos\theta,

where θ\theta is the angle between v\boldsymbol{v} and the surface normal. Group the factors:

vcosθcomponent of v along normal    ϕu×ϕvarea element\underbrace{\|\boldsymbol{v}\| \cos\theta}_{\text{component of } \boldsymbol{v} \text{ along normal}} \;\cdot\; \underbrace{\|\phi_u \times \phi_v\|}_{\text{area element}}

— a product of two pieces we already understand. The first is the scalar projection of v\boldsymbol{v} onto the unit normal n^=(ϕu×ϕv)/ϕu×ϕv\hat{\mathbf{n}} = (\phi_u \times \phi_v)/\|\phi_u \times \phi_v\|: the signed length of v\boldsymbol{v}‘s shadow on the normal direction. It is positive when v\boldsymbol{v} points through the surface in the ϕu×ϕv\phi_u \times \phi_v direction, negative when it points the other way, and zero when v\boldsymbol{v} is tangent to the surface — purely tangential motion slides along the surface rather than crossing it. The second factor is the same area-stretch weight that appeared in the scalar case. Multiplied, they give the flux through the local surface patch — how much of v\boldsymbol{v} crosses that patch, accounting for both alignment and patch size.

Sign depends on orientation. ϕu×ϕv\phi_u \times \phi_v has a definite direction set by the parameter order; swapping uu and vv flips it, and with it the sign of the integral. Vector surface integrals are signed, with the sign tied to which side of ϕ\phi counts as “outward” — that is what the qualifier “in the direction ϕu×ϕv\phi_u \times \phi_v” records.

Physical readings. When v\boldsymbol{v} is a fluid-velocity field, the integral measures the volume of fluid crossing ϕ\phi per unit time — the literal flux. When v\boldsymbol{v} is an electric field, it gives the electric flux through ϕ\phi, the quantity that appears in Gauss’s law. In both cases the vanishing of tangential contributions matches physical intuition: fluid sliding parallel to a wall doesn’t cross it.

Surface area

The same trick that gave us the area or volume of a standard domain — set the integrand to the constant 11 — lifts to surfaces. With f1f \equiv 1 in the scalar surface integral, the integrand contributes nothing of its own and what remains is a pure sum of area-stretch factors over the parameter domain. The result is the surface area of ϕ\phi.

For a regular surface ϕ:BR2R3\phi : B \subseteq \mathbb{R}^2 \to \mathbb{R}^3, the surface area of ϕ\phi is the scalar surface integral of the unit function over ϕ\phi:

O(ϕ)=ϕ1ds=ϕds=Bϕu(u,v)×ϕv(u,v)dudvO(\phi) = \iint_\phi 1 \, ds = \iint_\phi ds = \iint_B \|\phi_u(u, v) \times \phi_v(u, v)\| \, du \, dv

Each parameter patch of area dudvdu \, dv maps under ϕ\phi to a surface patch of area ϕu×ϕvdudv\|\phi_u \times \phi_v\| \, du \, dv — the per-patch area element built earlier. Summing those elements across all of BB recovers the total area of the surface in 3D space.

Gauss’s divergence theorem

The two integrals built so far in this chapter look like they belong to different worlds: the triple integral accumulates a scalar density throughout a 3D solid, while the vector surface integral accumulates a vector field’s flow across a 2D boundary. Gauss’s divergence theorem says they are the same number in a precise case — integrating the divergence of a vector field over a 3D region equals the flux of that field through the region’s boundary surface.

Gauss’s divergence theorem. Let BR3B \subseteq \mathbb{R}^3 be a domain with piecewise continuously differentiable boundary B\partial B. For any continuously differentiable vector field v:DR3R3\boldsymbol{v} : D \subseteq \mathbb{R}^3 \to \mathbb{R}^3 with BDB \subseteq D:

Bdivvdxdydz=Bvds\iiint_B \operatorname{div} \boldsymbol{v} \, dx \, dy \, dz = \iint_{\partial B} \boldsymbol{v} \cdot ds

Each continuously differentiable piece of B\partial B must be parametrized by some ϕ\phi with ϕu×ϕv\phi_u \times \phi_v pointing outward from BB.

The divergence referenced here is exactly the differential operator built earlier: divv=v=iivi\operatorname{div} \boldsymbol{v} = \nabla \cdot \boldsymbol{v} = \sum_i \partial_i v_i, the trace of the Jacobian of v\boldsymbol{v}.

Reading both sides — divergence as source density, flux as net escape

The two integrals are different views on the same physical accounting:

  • Volume side (Bdivv\iiint_B \operatorname{div} \boldsymbol{v}). The divergence at a point x\mathbf{x} measures how much the vector field is locally creating per unit volume there: positive divergence means v\boldsymbol{v} has a tiny source at x\mathbf{x} pumping new flow out into space (a faucet); negative means a sink absorbing flow (a drain); zero means whatever flows in immediately flows out (incompressible flow). The triple integral sums those local source strengths over every point of BBthe total amount of stuff being generated inside BB per unit time.
  • Boundary side (Bvds\iint_{\partial B} \boldsymbol{v} \cdot ds). The flux through B\partial B, with the outward orientation, measures the net amount of stuff crossing the boundary outward per unit time — what escapes minus what re-enters.

The theorem says the two match exactly: every drop of fluid produced by a source inside BB has to cross the boundary on its way out (it has nowhere else to go), and every drop absorbed by a sink must have entered through the boundary first. Local generation inside ↔ net escape through the wall — a conservation principle dressed in calculus.

The “outward” condition — what orientation each boundary patch must have

Recall from vector surface integrals that ϕu×ϕv\phi_u \times \phi_v has a definite direction fixed by the parameter order — swap uu and vv and the normal flips. For the theorem to balance, every boundary patch must be parametrized so its normal points away from the solid material of BB, never into it; the wrong orientation on any patch flips a sign on the right-hand side and breaks the equation.

The condition becomes subtle when B\partial B has more than one connected piece. Picture BB as the rubber wall of a hollow ball — material trapped between an outer sphere and an inner sphere — so B\partial B is the union of those two spheres. On the outer sphere, “out of the rubber” points away from the center, as you’d expect. On the inner sphere, “out of the rubber” points into the cavity — toward the center, which looks like the opposite direction. Both normals correctly leave BB‘s material; the lesson is that “outward” is relative to where the material lives, not to any fixed origin.

Why the theorem is useful — computational shortcut and backbone of conservation laws

Two distinct payoffs:

  • A computational shortcut. The two sides of the equation often differ wildly in difficulty for the same problem, and the theorem lets us pick the easier one. If a triple integral Bdivv\iiint_B \operatorname{div} \boldsymbol{v} over an awkward 3D region is the painful side, swap it for the flux Bvds\iint_{\partial B} \boldsymbol{v} \cdot ds through the boundary. Going the other way, a flux integral over a complicated closed surface — say one stitched from many patches that would each need separate parametrizations — can be rewritten as a single triple integral of divv\operatorname{div} \boldsymbol{v} over the enclosed solid. Whichever side is friendlier wins.
  • The mathematical backbone of conservation laws. Most fundamental conservation statements in physics — of mass, of electric charge, of energy — are this theorem applied to a specific vector field. Gauss’s law of electromagnetism — electric flux through any closed surface equals the enclosed charge (up to a constant) — is the divergence theorem applied to the electric field. The continuity equation in fluid dynamics — “the rate of mass change inside a region equals the net mass flow across its boundary” — is the same identity rearranged into local form. The theorem is what lets physics translate freely between “what’s happening at every point inside” and “what’s crossing the boundary”.

Applications

Two short examples of where high-dimensional integrals show up in practice — one from statistics, one from machine learning.

Statistics — computing moments

Probability theory uses integrals to extract single-number summaries from a distribution. Given a random variable XX — a quantity whose value is uncertain, like a measurement or a coin flip — its probability density f:RRf : \mathbb{R} \to \mathbb{R} is a non-negative function (with f=1\int f = 1) encoding how likely each value of XX is. From it we extract single-number summaries — moments — by integrating powers of the offset from a reference point against the density.

For a probability density ff on R\mathbb{R} and a reference point cRc \in \mathbb{R}, the qq-th moment of ff about cc is the definite integral

μq=(xc)qf(x)dx\mu_q = \int_{-\infty}^{\infty} (x - c)^q \, f(x) \, dx

Two specific (q,c)(q, c) choices recover the most familiar summary statistics:

  • q=1q = 1, c=0c = 0: μ1=xf(x)dx\mu_1 = \int x \, f(x) \, dx — the expected value (or mean) E[X]\mathbb{E}[X], the average of XX weighted by how likely each value is.
  • q=2q = 2, c=μ1c = \mu_1: μ2=(xμ1)2f(x)dx\mu_2 = \int (x - \mu_1)^2 \, f(x) \, dx — the variance, the average squared deviation of XX from its mean.

Reading the variance form. You may know variance from a finite sample as the average squared deviation, 1ni(xixˉ)2\frac{1}{n}\sum_i (x_i - \bar{x})^2 — every sample contributes equally with weight 1n\frac{1}{n}. In the integral, the density f(x)dxf(x) \, dx takes that weighting role: each squared deviation (xμ1)2(x - \mu_1)^2 is scaled by how likely xx is to occur, then accumulated. Same averaging operation, lifted from a discrete sample to a continuous distribution — and the expected value formula xf(x)dx\int x \, f(x) \, dx reads as the continuous version of 1nixi\frac{1}{n}\sum_i x_i.

The multivariate case. When the random object is a vector xRn\mathbf{x} \in \mathbb{R}^n with joint density f:RnRf : \mathbb{R}^n \to \mathbb{R}, the same recipe runs over Rn\mathbb{R}^n — every moment integral becomes an nn-fold integral, exactly the kind of high-dimensional quadrature this chapter has been about.

Machine learning — kernel density estimation

The previous example assumed the probability density ff was known. The reverse problem — given a sample x1,,xnx_1, \dots, x_n drawn from an unknown distribution, estimate its density ff — is kernel density estimation (KDE). The idea is to place a small smooth “bump” at each data point and average them all: where the data clusters, the bumps pile up and the estimated density is high; where data is sparse, it is low.

f^h(x)=1ni=1nK ⁣(xxih)\hat{f}_h(x) = \frac{1}{n} \sum_{i=1}^n \mathcal{K}\!\left(\frac{x - x_i}{h}\right)

The kernel K\mathcal{K} is a non-negative function (typically a Gaussian centered at zero), and the bandwidth hh controls each bump’s width — small hh gives narrow spikes that overfit individual points, large hh gives wide bumps that oversmooth fine structure. The catch is picking hh.

A standard selection criterion minimizes the mean integrated squared error:

MISE(h)=E ⁣[(f^h(x)f(x))2dx]\text{MISE}(h) = \mathbb{E}\!\left[ \int_{-\infty}^{\infty} (\hat{f}_h(x) - f(x))^2 \, dx \right]

— the average (over random samples) of how far f^h\hat{f}_h deviates from the true density ff, integrated across all xx. In practice ff is unknown, so MISE cannot be computed directly; practical methods such as cross-validation or Silverman’s rule of thumb approximate it instead. However, analyzing MISE theoretically reveals that finding the optimal hh requires evaluating an nn-fold integral over the data’s joint distribution — a high-dimensional quadrature problem that explains why no simple closed-form solution exists.